"Rendering Synthetic Objects Into Legacy Photographs" demonstrates a new technology for adding three-dimensional shapes to preexisting photographs. With the help of a human eye to establish the geometry of a space and light sources, the software predicts how the light would fall on and around the inserted volume. In this video, it looks so realistic, it's almost spooky. Kevin Karsch, a Computer Science PhD student at the University of Illinois at Urbana Champaign, explains how it works in an interview below. Other members of the team are Varsha Hedau, David Forsyth, and Derek Hoiem.
The Atlantic: What drew you to this project? What are your goals for this technology?
Kevin Karsch: Going into graduate school, I knew I wanted to do a mix of computer graphics and computer vision research. By computer vision, I mean teaching a computer to interpret a picture or video the same way people do, for example by being able to tell how far away things are, where light is coming from, and so on. This was a perfect project for satisfying both of my interests. My colleagues and I were able to incorporate many recent breakthroughs in these fields and apply them to address an interesting research problem.
We hope that this technique will become standard in image editing and 3D modeling software. Based on comments from several visual effects experts, this method could significantly cut down on the amount of time spent by artists and editors. Home redecorating and augmented reality are other applications that we believe this may be useful for.
There's still a lot of work left to do, and we have many ideas for the future. Right now, our technique requires a bit of user input, and is only proven for single images. We'd like to extend the method to handle videos and eliminate most if not all user interaction. It would also be nice to make the system work for inserting pictures into other pictures without requiring a 3D model (as we do currently).
What are some of the challenges of inferring light and surface attributes in a flat image?
Since we're working with a single picture, we only have one view of the scene. There's likely to be many different configurations of lighting and surface properties that would produce the same image. It's currently very difficult to automatically predict even one of these configurations, much less the right one. It's well known that people can do a decent job of solving this problem, but it's still unclear exactly which perceptual cues people use to make these judgments. However, people usually cannot get things such as surface reflectance and light intensity exactly right. So, we combine this insight by allowing a human user to provide a very rough physical configuration, and use modern research techniques to refine this configuration to be suitable for inserting synthetic objects.
The implications for photography are pretty amazing. Do you think this technology will change how we think about “truth” in photography?
Well, there seems to be a great deal of doubt in photography already, but surely this won't help. After hearing about our project, I've had several people tell me that they "will never trust a photograph again." Misuse of the technology is of course possible, but we hope it will be used save great a time and money for visual effects artists, architects, interior designers, among others.
Where do you see this technology in ten years?
Both computer vision and computer graphics have come a long way in the past ten years, and I imagine the same will be true for the next ten. In the near future, we expect people to interact with photographs and videos as they would with a 3D scene, adding, repositioning, and relighting objects without cumbersome pixel-based tools. Designers will have the flexibility of computer graphics with the realism of photography. There are many implications that could also emerge as the technology becomes faster and more accurate. For example, augmented reality could be even more immersive than it already is, and home redecoration and purchasing furniture may be painless. Given recent trends in computer vision and computer graphics, coupled with the prevalence of depth-sensing cameras (e.g. Microsoft's Kinect), I have a feeling that we will see these applications sooner rather than later.
For more background on the project, visit http://kevinkarsch.com/.