About the Project

The future has always been hard to predict,but as technology evolves ever more quickly, it can now be just as hard to imagine. To understand the next wave of innovations—such as nanotechnology in our bodies, computers that can see like humans, and the Internet of Things in the age of scarcity—Qualcomm collaborated with Atlantic Re:think, The Atlantic’s creative marketing group, to explore the emerging edge of technology through art.

Part 2

THE SPACE
WITHOUT

The emergence of better cameras and smarter computers are giving machines the ability to see like humans—recognizing what it is they’re seeing and reacting to it. These breakthroughs are empowering both people, like the visually impaired, and machines, like self-driving cars, see the world like never before.

select

How Computers
See The World
For Those Who Can't

The technology behind self-driving cars helps the visually impaired engage with the world like never before.

THE SPACE WITHOUT Visualized by Jeff Nishinaka

Standing behind a podium at MIT’s Center for Brains, Minds and Machines, Amnon Shashua began a speech that wouldn’t have been out of place at the hyperbole-happy TechCrunch Disrupt conference. In front of an audience of neuroscientists and computer scientists, however, he recognized that a speech titled “Computer Vision That Is Changing Our World,” which suggests that our future is in large part being built by cameras, needed a reality check up front.

“All smartphones have cameras,” Shashua acknowledged at the event in March. “But you cannot state that what the cameras on smartphones do today will change our world.

“I am going to talk about something else related to cameras.”

Shashua is one of the world’s leading experts in the field of computer vision—the ability of computers to process what they see in the same way humans do. We already use early forms of computer vision when our smartphones identify faces and stitch together panoramic photos, but beyond that, they don’t act very much on what they see.

Paper sculptor Jeff Nishinaka, a Los Angeles-based artist, collaborated with technologists and journalists to imagine the future of cognitive technology through art.

The cameras Shashua is building, however, are what allow him to zip along Israeli highways in a customized self-driving car. In partnerships with Hyundai and Tesla, among other auto companies, they empower cars already on American roads to detect what’s around them, calculate the chances of a collision, and automatically brake if an accident appears inevitable. These smart cameras are the core technology of Mobileye, the company Shashua cofounded with the ambition of eliminating car accidents within 20 years.

Considering that 1.3 million people are killed on the world’s roads each year—37,000 in the United States alone—Mobileye is an example of what Shashua means when he says computer vision is changing the world. But seemingly aware that just one example of the technology’s abilities, however impressive, may not be enough to convince people of its potential, the man who endowed self-driving cars with sight turned the focus of his speech to those who cannot see.

“In the U.S., there are about 25 million visually impaired [people],” he said. “Corrective lenses cannot correct their disability. They cannot read anymore. They cannot negotiate in the outdoors. […] And this segment of society doesn’t have real technology to help them.”

Shashua realized that if cameras could see for cars, they ought to be able to see for people too. In 2010, he cofounded OrCam with Ziv Aviram, Mobileye’s CEO, to develop a smart camera that clips onto glasses and reads the world to the visually impaired, from restaurant menus to crosswalk signals.

QUALCOMM TECH INSIGHT

As our devices passively gather more data from us, they'll perfect the recommendations they make, the content they send us, and the functions they perform until they're our own brilliant and futuristic personal assistants.
More From Qualcomm

But OrCam is just one of a growing gaggle of companies and researchers pushing the boundaries of artificial intelligence and finding themselves creating an ecosystem of technologies that, among other things, will soon allow the visually impaired to engage with the world like never before. As computer vision continues to evolve, cameras can now look at images and translate them into plain language descriptions. Soon, 3D mapping technology will guide people through unfamiliar environments like digital guide dogs. Meanwhile, technology companies industry-wide are in an innovation arms race to imbue smartphones with the processing power they need so anyone can access all these features by simply reaching into their pockets.

Though most of these technologies are in their beta stages, their continued evolution and eventual convergence seems likely to result in, among other feats, unprecedented independence for the visually impaired.

Back at the Center for Brains, Minds and Machines, Shashua played a clip from a conference last summer, where Devorah Shaltiel, one of OrCam’s users, took the stage. Wearing sunglasses with a camera clipped on by her right temple, she described the experience of eating lunch with a friend.

“Usually when we sit down, we would be presented with the menu,” she began. “My friend would then read the menu, place her order, and then she would read the menu to me. […] This time was very different. I had OrCam with me, and I was able to read the menu myself. I was able to place my order. […] I was able to continue my conversation with my friend, without my friend being focused on my disability.

“For the first time since losing my sight, I was able to feel like a normal person.”

vision
Source: Corbis

We tend to overlook how complex a process it is to see things. Look around you at any moment and you’ll see an endless stream of objects, people, and events that you may or may not recognize. A coffee-stained mug on your desk.  A jogger plodding down the sidewalk. The headlights of a car in the distance.

Our eyes are regularly bombarded with new and often unfamiliar information. Yet for most people, sight is a pop quiz that’s almost impossible to fail.

The technologies that will help the visually impaired aren’t quite at a human level yet, but through a sophisticated barrage of trial and error, they’re catching up. Earlier this year, when one of today’s leading seeing computers was shown an image of a man eating a sandwich on the street, it described the scene as, “A man is talking on his cellphone as another man watches.”

The mistake, silly as it may seem to us, is a major leap forward for technology. Just to get to the point where a computer could confuse a sandwich for a cellphone, the computer had to understand that a cellphone is a handheld object that people typically hold close to their faces. The process of learning those kinds of patterns—whether it’s what a cellphone is or what a face looks like—is known in the artificial intelligence community as deep machine learning, or deep learning.

It’s Time for Smartphones to Think for Themselves

How Qualcomm is
bringing humanlike cognition
to mobile devices

Since Alan Turing’s 1950 seminal paper Computing Machinery and Intelligence asked, “Can machines think?” filmmakers have fascinated us by imagining a world in which they do.

Back in the 1960s, the creators of “The Jetsons” imagined a playful world where Rosie the Robot cleaned our houses, washed our dishes, and played ball with our kids. More recently, Hollywood dreamed up the Samantha character for the movie “Her”—not a robot, but software created to do her owner’s bidding, from managing calendars to providing emotional support in darker times. Samantha even had the capacity to learn, adapt, and evolve. She was indistinguishable from a human being, other than that she happened to be software within a pocket-sized, supersmart smartphone.

Last year, Benedict Cumberbatch reintroduced Turing’s famous question to the public with “The Imitation Game.” Months later, we wrestled with it again in “Ex Machina,” which centers on an interrogation with a robot to see if it can, in fact, think for itself.

continue reading

Hollywood’s visions of artificial intelligence are still, in many ways, a fantasy. But in recent years, we’ve seen the technology take early strides toward making these visions a reality. Breakthroughs in cognitive computing—an industry term for technologies, such as machine learning, computer vision, and always-on sensing—are rewiring our smartphones to become capable of sensing like humans do, evolving beyond call-and-response technologies such as Siri and Cortana to a more sophisticated interplay between machines and their users.

Powered by the rapidly evolving field of cognitive computing, the devices we use every day will soon be able to see, hear, and process what’s around them—enabling them to do more for us, while requiring less from us.

At Qualcomm, through its cognitive computing research initiative, researchers are leading the field of machine learning to make these ambitions a reality. A branch of machine learning called deep learning is demonstrating state-of-the-art results in pattern-matching tasks. This makes a deep learning–based approach ideal for giving our devices humanlike perceptual pattern-matching capabilities.

With every word we speak to our devices, machine learning will help these machines better comprehend the quirks of our speech, and with every route we travel, they’ll better understand the places that matter most to us. As our devices passively gather more data from us, they’ll perfect the recommendations they make, the content they send us, and the functions they perform until they’re our own brilliant and futuristic personal assistants.

“We’re trying to basically mimic what humans do,” says Maged Zaki, director of technical marketing at Qualcomm. “We’re […] trying to give them sight, and we’re trying to give them ears and ways to sense the environment and feel and touch all that, basically all the senses that we as human beings have.”

One of the biggest challenges for Qualcomm’s team was how to harness the elaborate processing power required by deep learning and shrink it down onto a pocket-sized device.

“[Today’s] machine learning is very compute intensive,” explains Zaki. “It basically entails big servers on the cloud, and running the algorithms and training the machines days and days on the network to be able to recognize images.”

Putting forms of deep learning onto a phone requires not just a firm grasp of deep learning itself but a knack for working in tight spaces. Qualcomm’s innovation has unlocked the way to put these power- and compute-intensive features completely on a chip within a smartphone. As a result, phones will no longer need to completely rely on the cloud to outsource all their most daunting computing, which drains today’s phone batteries and pushes phones to their technical limits.

Machines that think like humans are still in their adolescence—recently, one of the most powerful artificial intelligence machines mistook Facebook CEO Mark Zuckerberg for a cardigan—but on-device machine learning will begin to push computer intelligence out of its awkward phase in the coming years. Our interactions with our devices will become far more natural: We’ll eschew

keyboards in favor of commands based on voice, gesture, or vision that work reliably.

The idea of always-on devices, able to listen to us and watch our every move, can send even the most tech-savvy person into a state of paranoia. But Zaki says that locked within these very same advances in cognitive computing are the solutions to better protecting our security and privacy in an increasingly connected world. "Instead of being scared of machine learning and having so many sensors on the device, we would like to use these technologies to actually enhance privacy and security,” he says.

A phone with humanlike “awareness” would notice suspicious activity, such as malware infiltrating our contact lists or credit card information, even when we’re not even using the phone, and it would alert us—or automatically stop this from happening altogether. Zaki believes that machine learning will also make security and authentication far more convenient, as phones could use background verification of our fingerprints as we type, for example. “Our vision is that authentication should be happening in the background continuously and seamlessly," he notes.

Soon enough, our smartphones will truly be extensions of ourselves. We won’t always have to tell them what to do, as they'll know our schedules, our desires, our needs, our anxieties. These are thinking machines that "complement us, not replace us, on everyday tasks,” says Zaki. “They’ll expand the human ability and serve as an extension of our five senses.”



Learn more about Qualcomm.

It’s Time for Smartphones to Think for Themselves

How Qualcomm is bringing
humanlike cognition
to mobile devices

Since Alan Turing’s 1950 seminal paper Computing Machinery and Intelligence asked, “Can machines think?” filmmakers have fascinated us by imagining a world in which they do.

Back in the 1960s, the creators of “The Jetsons” imagined a playful world where Rosie the Robot cleaned our houses, washed our dishes, and played ball with our kids. More recently, Hollywood dreamed up the Samantha character for the movie “Her”—not a robot, but software created to do her owner’s bidding, from managing calendars to providing emotional support in darker times. Samantha even had the capacity to learn, adapt, and evolve. She was indistinguishable from a human being, other than that she happened to be software within a pocket-sized, supersmart smartphone.

Last year, Benedict Cumberbatch reintroduced Turing’s famous question to the public with “The Imitation Game.” Months later, we wrestled with it again in “Ex Machina,” which centers on an interrogation with a robot to see if it can, in fact, think for itself.

Hollywood’s visions of artificial intelligence are still, in many ways, a fantasy. But in recent years, we’ve seen the technology take early strides toward making these visions a reality. Breakthroughs in cognitive computing—an industry term for technologies, such as machine learning, computer vision, and always-on sensing—are rewiring our smartphones to become capable of sensing like humans do, evolving beyond call-and-response technologies such as Siri and Cortana to a more sophisticated interplay between machines and their users.

Powered by the rapidly evolving field of cognitive computing, the devices we use every day will soon be able to see, hear, and process what’s around them—enabling them to do more for us, while requiring less from us.

At Qualcomm, through its cognitive computing research initiative, researchers are leading the field of machine learning to make these ambitions a reality. A branch of machine learning called deep learning is demonstrating state-of-the-art results in pattern-matching tasks. This makes a deep learning–based approach ideal for giving our devices humanlike perceptual pattern-matching capabilities.

With every word we speak to our devices, machine learning will help these machines better comprehend the quirks of our speech, and with every route we travel, they’ll better understand the places that matter most to us. As our devices passively gather more data from us, they’ll perfect the recommendations they make, the content they send us, and the functions they perform until they’re our own brilliant and futuristic personal assistants.

“We’re trying to basically mimic what humans do,” says Maged Zaki, director of technical marketing at Qualcomm. “We’re […] trying to give them sight, and we’re trying to give them ears and ways to sense the environment and feel and touch all that, basically all the senses that we as human beings have.”

One of the biggest challenges for Qualcomm’s team was how to harness the elaborate processing power required by deep learning and shrink it down onto a pocket-sized device.

“[Today’s] machine learning is very compute intensive,” explains Zaki. “It basically entails big servers on the cloud, and running the algorithms and training the machines days and days on the network to be able to recognize images.”

Putting forms of deep learning onto a phone requires not just a firm grasp of deep learning itself but a knack for working in tight spaces. Qualcomm’s innovation has unlocked the way to put these power- and compute-intensive features completely on a chip within a smartphone. As a result, phones will no longer need to completely rely on the cloud to outsource all their most daunting computing, which drains today’s phone batteries and pushes phones to their technical limits.

Machines that think like humans are still in their adolescence—recently, one of the most powerful artificial intelligence machines mistook Facebook CEO Mark Zuckerberg for a cardigan—but on-device machine learning will begin to push computer intelligence out of its awkward phase in the coming years. Our interactions with our devices will become far more natural: We’ll eschew keyboards in favor of commands based on voice, gesture, or vision that work reliably.

The idea of always-on devices, able to listen to us and watch our every move, can send even the most tech-savvy person into a state of paranoia. But Zaki says that locked within these very same advances in cognitive computing are the solutions to better protecting our security and privacy in an increasingly connected world. "Instead of being scared of machine learning and having so many sensors on the device, we would like to use these technologies to actually enhance privacy and security,” he says.

A phone with humanlike “awareness” would notice suspicious activity, such as malware infiltrating our contact lists or credit card information, even when we’re not even using the phone, and it would alert us—or automatically stop this from happening altogether. Zaki believes that machine learning will also make security and authentication far more convenient, as phones could use background verification of our fingerprints as we type, for example. “Our vision is that authentication should be happening in the background continuously and seamlessly," he notes.

Soon enough, our smartphones will truly be extensions of ourselves. We won’t always have to tell them what to do, as they'll know our schedules, our desires, our needs, our anxieties. These are thinking machines that "complement us, not replace us, on everyday tasks,” says Zaki. “They’ll expand the human ability and serve as an extension of our five senses.”

Learn more about Qualcomm.

Much like teaching a child without any frame of reference, deep learning involves researchers feeding computers massive amounts of data, from which they start to form what we might call understanding. Mobileye’s cars, for example, were trained to navigate the intricacies of moving traffic. OrCam was taught to read. IBM’s Watson has been fed the world’s leading cancer research by oncologists at the Memorial Sloan Kettering Cancer Center. Meanwhile companies such as Google, Facebook, and Amazon are exploring how to use deep learning in their quests for hyper-personalized experiences.

Deep learning allows computers to act independently on learned knowledge and get smarter as they encounter more of it. In the case of computer vision, it allows them to recognize what they’re seeing.

The sandwich-cellphone gaffe was published in a paper by researchers in Canada who displayed images to a computer and asked it to tell them what it saw in plain English or, in AI terms, natural language. They tracked the way the computer studied an image, where it looked and when, as it translated the image into a description in real time. That ability—to coherently describe things as we see them—has long been a holy grail of sorts for computer vision scientists.

plain-language
A computer accurately describes images with captions.
Source: Microsoft COCO/Kelvin Xu et al.

“It amounts to mimicking the remarkable human ability to compress huge amounts of salient visual information into descriptive language,” the paper’s authors wrote.

The computer accurately described pictures of “A giraffe standing in a forest with trees in the background” and “A group of people sitting on a boat in the water.” It identified “A little girl is sitting on a bed with a teddy bear,” though she was actually on a couch. It also mistook a violin for a skateboard.

“There are analogies to be made between what the model is doing and what brains are doing,” said Richard Zemel, a computer scientist at the University of Toronto who contributed to the study. But unlike brains, deep learning computers don’t make the same mistake twice. They learn, but they never forget.

That superpower is behind a recent article in Re/code profiling Facebook’s, Google’s, and the rest of the technology industry’s growing investment in the field. “AI experts suggest that deep learning could soon be the backbone of many tech products that we use every single day,” wrote Mark Bergen and Kurt Wagner.

That vision of the technology isn’t ready for primetime yet, and its applications are still being developed on Silicon Valley research campuses. But its potential to support the visually impaired is obvious to OrCam’s Shashua.

“Assume you are out there, you are outdoors,” he said in his speech. “You have lost orientation completely. You want the system to tell me what I see. Every frame, every second, tell me what I see. I see a tree. I see a chair. I see people. […] This is something that is at the cutting edge of research today. This is something that can be done.”

Computer vision has already conquered text. It’s even outperformed humans at identifying images. But to match human sight, translating the amorphous visual field of life happening in real time is the next frontier.

plain-language
Amnon Shashua believes computer vision is changing our world.

In 2014, Google debuted something called Project Tango at its annual I/O conference in San Francisco. It promised to give “a mobile device the ability to navigate the physical world similar to how we do as humans.” Should Google deliver on this promise, it would be a major leap in the evolution of computer vision and a significant step toward fulfilling Shashua’s claim that it will change our world.

Navigating like humans requires a spatial awareness that, as with vision, we take for granted. When we walk up a staircase in an unfamiliar building, then turn a corner and enter a room for the first time, we know how to get back downstairs. Without thinking about it, we automatically mapped out the space as we passed through it, and, until recently, computers could do no such thing.

The skill is known as simultaneous localization and mapping, or SLAM, and “Researchers in artificial intelligence have long been fascinated (some would say obsessed) with the problem,” wrote Erik Brynjolfsson and Andrew McAfee in their futuristic classic The Second Machine Age. In 2010, Microsoft scientists cracked the SLAM code and ushered in a flood of robotics innovation over the last several years.

Project Tango has emerged as a frontrunner in bringing SLAM technology into our day-to-day lives. Through partnerships with Nvidia and Qualcomm, Google researchers have developed tablets and smartphones with sensors that work with a camera to navigate and manipulate the space around them. Using the technology, drones can now explore the interior of buildings autonomously, while gamers can transform their rooms into virtual reality forests and their friends into floating heads.

"We never know whether [Tango and similar projects] even make viable business applications,” says Johnny Lee, who leads the Tango team at Google, “but we want to push the technology at times because you don't know what’s possible on the other side."

One application that’s consistently mentioned in the press around Tango is assisting the visually impaired. Though such a project hasn’t been described in detail, Google’s partnership with Walgreens and a mobile shopping company called Aisle411 offers a clue to how it might work.

As shoppers navigate the aisles of Walgreens, a Tango-powered tablet can track their movement within centimeters to identify where specific products are located within the store and on the shelves. Whether all shoppers want or need this sort of hyper-customized in-store guidance is up for debate, but for the visually impaired, it could transform their lives outside the home.

Devorah Shaltiel, the OrCam user who spoke at the conference, described the difficulties she faces at the grocery store. From even just a few feet away, a ketchup bottle would look like “a fuzzy red blob,” and reading the labels on similar-looking items—such as different types of cookies from the same brand—would be all but impossible. Today, once she finds the item she’s looking for, OrCam solves the latter problem. But combining that technology with Project Tango could help Shaltiel find the items more efficiently and independently to begin with.

For the visually impaired, independence is what matters. OrCam, which has the tagline “OrCam Gives Independence,” launched to the public last year and has a waiting list of about 10,000 people for a $3,500 device. In his speech, Shashua plays emotional videos of people who have had their lives transformed simply by being able to open their own mail, read a book, and identify the value of a dollar bill they’re holding.

“We are on the right track,” Shashua said at the end of his seminar.

That track, he suggests, leads to the continued convergence of deep learning and computer vision, where devices can not only translate images into plain English, but also examine and describe the physical world around us.

While most of these technologies are in their research lab beta phases, they’ll emerge eventually. And when they do, they’re likely to do so in force—with applications that go far beyond helping the visually impaired.

"Nowadays, you wouldn’t consider buying a phone without GPS," Google’s Lee says. "We hope to see Tango kind of reach the same level of adoption."

"The Space Without" by Jeff Nishinaka, cut paper, 2015

READ THE SPACE WITHOUT

THE SPACE WITHOUT

Imagining A World Where
Everything Is Communicative

All the tools of paper artist Jeff Nishinaka’s trade can fit in his back pocket.
Nishinaka doesn’t need more than a pencil, a pair of tweezers, an Exacto knife, some glue, and a sheet of paper to create his trademark paper sculptures. Nishinaka has been working with paper for more than 30 years, and his work appears frequently in art galleries, major ad campaigns, and art publications.

He created “The Space Without” as a textured, elaborate, hand-cut paper representation of the neural networks through which we perceive sights, sounds, and touch. Symbols—an eyeball, a fingerprint, sound waves—are scattered around the silhouette of a head. “By not making it so literal, I think we give someone the possibility to stop and think, ‘What is this really trying to communicate?’” Nishinaka says. “Of course, we’re trying to communicate not the five senses but the more important senses that I believe Qualcomm has been developing to enhance a person’s everyday life.”

Advances in computing technology are allowing machines to think and communicate in exactly the ways Nishinaka depicts in “The Space Without,” which is why the paper sculpture embeds a colorful human brain in a larger circuit board. “The important aspect of this piece,” he says, “was that technology is connected to humanity and to enjoying nature through things that were lost either through time, accident, or age.”

Paper artist Jeff Nishinaka painstakingly cuts every individual piece of paper by hand. He estimates that “The Space Without,” with its elaborate layers and minute attention to detail, took approximately 120 hours to complete.

Technology and humanity blend together seamlessly in “The Space Without,” in part because the paper medium lends similar textures and shadows to every aspect of the piece. It’s also because he goes to painstaking lengths to make every cut and every furrow by hand: According to his website, his hands are “his most treasured tools.” That craftsmanship is apparent in the polished detail of the cutout paper flowers that fill the human silhouette, the strands of paper that come together to form the brain, and the icons and lines that branch around the silhouette.

“The Space Without” is both natural and technological, both realistic and imaginary. For Nishinaka, that’s a perfect reflection of his artistic approach. “Being an artist, you have to be open to things that are different, open to change, open to things you think in your wildest dreams would never happen,” he says. “It’s amazing that technology is making these dreams come true.”

The final version of “The Space Without,” viewed in person, responds to the light, when shadows can add emphasis and additional form to the three-dimensional sculpture.