flipCode - Project Earthlight Interview

flipCode - Project Earthlight Interview [an error occurred while processing this directive]

Project Earthlight Interview

Project Earthlight is a 3D game developed by Andrew Wu, building upon his original computer vision research. The unique thing about this game is the fact that you control the player not with a mouse, keyboard, or joystick, but rather a toy lightsabre. The player sits in front of the computer (and a camera) to take on the computer opponent in a one on one battle. How does this work? How will this kind of technology affect games to come? I'd like to thank Andrew Wu for taking the time out to respond to some questions about his project. Check it out...

For starters, please take a moment to explain what exactly Project Earthlight is, who you are, and why you decided to take on a project of this kind.

Project Earthlight explores new ways of controlling games (and computers) using computer vision. More simply, I use a single webcam to capture 2D images and infer the 3D state of a physical game control device (a toy lightsaber), in real-time.

I study CS as an undergraduate at the University of Illinois at Urbana-Champaign (home of Mosaic, Descent, and Mortal Kombat :). I began my work at the U. of Central Florida last summer where I worked as a research assistant with some well-known professors in computer vision. There I learned a lot and came up with the idea of a Virtual 3D Blackboard, which eventually became the subject of my research paper.

My paper was accepted to and published in the IEEE International Conf. on Automatic Face and Gesture Recognition. The conference was held March 2000 in Grenoble, France, where I had a poster session.

After the summer I had wanted to apply my research to a game that ran in real-time. One on one lightsabre fighting seemed like a good idea and it matched my research well.

How did this project's development progress, a basic summary if you will, from start to finish?

The research took the whole summer, and I spent most of my fall semester daydreaming about the project instead of paying attention in class. Actual work began last winter break and continued until early March -- about 3 months. (Note that includes artwork, which I had to do all myself as I was working alone.)

I targetted early March because that's when UIUC holds the annual Engineering Open House, where I presented my work to hundreds of kids, college students, and research professors.

How does the program pick out the saber from the rest of the environment?

The lightsabre is a plain US$5 (Qui-Gon) toy lightsabre I picked up at Toys 'R Us. All the work is done by the vision algorithms that read the webcam input. Research-wise, the most important element is that I only use one camera and can get 3D information from the 2D image. Most researchers use a stereo-pair of cameras, where the camera disparity allows 3D reconstruction just as our two eyes do.

Color is used to determine the lightsabre's position. I train the program on color images of the lightsabre, which is a bright neon-green. After that, I can easily determine the ends of the lightsabre.

Now we have the 2D positions of both ends of the lightsabre. If I stopped there, that would be just as far as the RealityFusion guys go. As far as I can tell, they only use 2D images for 2D control.

The core research I developed for a "Virtual 3-D Blackboard" notes that if you move your elbow around your shoulder, your elbow forms a sphere of constant radius around your shoulder. If I ignore the half-space behind your body plane, then by looking at you head-on, I can reconstruct the 3D position of your elbow given its 2D position (on the hemisphere). We do the same to your finger and elbow, and thus we have an approximation to the parametric state of your arm.

The key priors we enforce to make this work is that we have to make the user face the camera at a certain distance. For a lightsaber fighting game this works perfectly. The player always faces the camera, and instead of looking at his arm, we look at the lightsabre of constant radius. Thus, we have 4 degrees of freedom -- 2 translational and 2 rotational.

Is the "translation" to screen fast enough for real-time games?

John Carmack ran into the same delay problems I ran into. Webcams are designed for video conferencing and not for gaming, so there tends to be up to a second's delay between action and webcam input. In practice, this is not too bad for lightsabre combat. The vision algorithms are quite simple and don't involve large matrices or neural nets.

How easy would it be to take what you've done and apply it to other games, or other fields all together?

The RealityFusion guys have used computer vision technology for 2D input. My research focused on 3D input, and thus can be generalized to other gaming applications.

What do you plan to do from here with this technology? Further, what do you think computer vision research in general has in store for the gaming industry?

I'll try to work on it more, but I'm a student so classwork sucks a lot of my time up. Eventually I may end up working for the gaming industry in some fashion.

Besides this sort of "3D gesture interface", there are other promising research areas in computer vision. John Carmack of id software seems interested in (head/eye) pose estimation, but researchers have pointed out that our eyes are an input device, not an output device.

I'm most interested in the virtual control possibilities of computer vision. Keyboard are great for typing and mice are great for pointing, but our bodies already are well-tuned I/O devices. Computer vision is a cost effective way of polling that device without special hardware.

A friend wants to know: why didn't you make the opponent Jar Jar Binks? :)

I don't hate Jar Jar that much :). Seriously, the opponent was based on an character named Shinomori Aoshi from the anime series Rurouni Kenshin. During the game, the computer opponent will even scream the name of his Japanese fighting techniques at you.

Any other comments you'd like to share on life, the universe, or breakfast foods?

"People get annoyed when you try to debug them." -- Larry Wall, 2nd State of the Onion.

"A + B + C = Success if, A = Hard Work, B = Hard Play, C = Keeping your mouth shut." -- Albert Einstein

Reference:
"A Virtual 3D Blackboard: 3D Finger Tracking with a Single Camera," Andrew Wu, Mubarak Shah, and N. da Vitoria Lobo, 2000.

For more information about this project, visit the Project Earthlight web site.

Thanks again, Andrew!

Return to flipcode

[an error occurred while processing this directive]