A computer keyboard, at its most reliable, is barely noticeable to the person using it. It is designed to be felt, heard, and mostly ignored. Once you’re comfortable with the thing, you usually aren’t focused on the keys you’re pressing, or the gentle clack-clacking sound those keys makes. Instead, you’re immersed in whatever is happening on the screen.
Technology has always been a go-between in this way, a means by which humans relate to their environment. In the beginning, when people couldn’t use their bodies to best complete a job, they built a tool—an axe, a hammer, a pulley—to do it for them. For millennia, machines handled increasingly complicated physical work, but humans still carried most of the burden of information processing. Within the last century, that changed dramatically.
With computers that could handle tasks like code breaking and complex calculations, the balance shifted. Machines that could handle data required more sophisticated interfaces beyond simple levers to pull or wheels to turn. Yet the same interfaces that would enable a new level of machine-human coordination on informational tasks would also remain limited by their own design.
“It created an information-processing problem,” says Gerwin Schalk, the deputy director of the National Center for Adaptive Neurotechnologies. “So in our interaction with the environment, all of the sudden, it wasn’t just humans and some tool, there was something in between—and that something was a computer.”
The problem, as Schalk sees it, is that humans and computers are both able to do far more than the interface between them allows.
“Computers are very fast, they’re extremely sophisticated, and they can process data of gargantuan complexity in just a fraction of a second,” he told me. “Humans are very good at other things. They can look at a scene and immediately know what’s going on. They can establish complex relationships. The issue that is now emerging is an issue of communication. So the underlying problem and question is, how do humans, who are extremely powerful and complex, interact with their increasingly complex and capable environments? Robots are extremely complex. Computers are extremely complex. Our cellphones are extremely complex.”
At this point in technological history, interfaces are built so that computers can do as much as possible within the limitations of a human’s sensory motor systems. Given what many people use computers for, this arrangement works out well—great, even. Most of the time, people are reading, writing text, and looking at or clicking on pictures and video. “For that, keyboards and mice—and trackpads, and to a lesser extent, voice control, which I think is still not so ubiquitous due to its relative unreliability—are still cheap, robust, and well-suited to the task,” says Kaijen Hsiao, a roboticist and the CTO of Mayfield Robotics, located just south of San Francisco. For others though, traditional interfaces aren’t enough.
“If I’m trying to explain to a computer some complex plan, intent, or perception that I have in my brain, we cannot do that,” Schalk says.
Put simply, it’s a communication issue that’s even more challenging than human-to-human communication—which is itself complex and multi-faceted. There’s always some degree of translation that happens in communicating with another person. But the extra steps required for communicating with a machine verge on prohibitively clunky.
“And when you’re trying to explain that same thing to a computer, or to a robot, you have to take this vivid imagery [from your head], and you have to translate this into syntactic and semantic speech, thereby already losing a lot of the vividness and context,” Schalk says. “And then you’re taking the speech and you’re actually translating this into finger movements, typing those sentences on a computer keyboard. It’s completely ridiculous if you think about it.”
On a practical level, for most people, this ridiculousness isn’t apparent. You have to write an email, you use your keyboard to type it. Simple.
“But if you just, on a very high level, think about how pathetic our interaction with the environment has become, compared with where it used to be, well, that’s a problem, and in fact that problem can be quantified,” Schalk says. “Any form of human communication doesn’t really [travel at] more than 50 beats per second—that’s either perceived speech or typing. So that’s basically the maximum rate at which a human can transmit information to an external piece of technology. And 50 beats per second is not just inadequate. It is completely, grossly pathetic. When you think about how many gigabytes per second a computer can process internally and what the brain can process internally, it’s a mismatch of many, many, many orders of magnitude.”
This mismatch becomes even more pronounced as machines get more sophisticated. So much so, several roboticists told me, that a failure to improve existing interfaces will ultimately stop advances in fields like machine learning and artificial intelligence until there are changes. “As technologies like speech recognition, natural language processing, facial recognition, etcetera, get better, it makes sense that our communication with machines should go beyond screens and involve some of the more subtle forms of communication we use when interacting with other people,” says Kate Darling, who specializes in robot-human interaction at the Massachusetts Institute of Technology. “If we want a machine to be able to mimic states of human emotion, then having it express those through tone, movement, and other cues will be a fuller representation of its abilities.”
Such cues will have to be part of a larger fluid interaction to work best. That might mean, for instance, making sure to build subtle forms of communications for robots designed to work with a pilot in a cockpit or a surgeon in an operating room—settings where humans need to be able to predict what a robot is about to do, but still stay focused on what they’re doing themselves. “There are all these ways people are working alongside a robot, and they need to understand when a robot’s about to move,” says Missy Cummings, the head of Duke University’s Robotics Lab. “[With other humans,] we use our peripheral vision and we see slight motion, so we infer, but robots don’t have those same fluid motions. So we’re trying to figure out how to use a combination of lights and sounds, for example, to figure out how to communicate more nuanced interactions.”
In some settings, like when a person is driving and needs to pay attention to the road, voice communication is still the best interface. “Of course, the problem with that is voice-recognition systems are still not good enough,” Cummings says. “I’m not sure voice recognition systems ever will get to the place where they’re going to recognize context. And context is the art of conversation.”
There is already a huge effort underway to improve voice-based interfaces, and it’s rooted in the idea that digital assistants like Siri and devices like the Amazon Echo will take on increasingly prominent roles in people’s lives. At the same time, we’re likely to see improvements to other mediative interfaces.
This already happened to some extent with the touch screen, an interface that was long dismissed as worthless on popular gadgets because the technology really wasn’t very good. “Touch screen buttons?” one commenter wrote on the website Engadget as the iPhone was unveiled in 2007, “BAD idea. This thing will never work.” (Schalk calls the iPhone a “profound advance” in human-machine interaction, but also just a “mitigating strategy.”) So far, though, other interfaces—voice control, handwriting digitizers, motion control, and so on—haven’t really taken off.
Many technologists argue that the rise of augmented reality and virtual reality will produce the next big interface. But several engineers and scholars told me that such a leap will require technological advancement that just isn’t there yet.
For one thing, even the most sophisticated mixed-reality platforms—Microsoft’s HoloLens comes up a lot—aren’t precise enough in terms of their mapping of the real world in real time, as a user moves through it. Which means these sorts of systems are handy for projecting webpages or other virtual elements onto the walls of the room you’re in, but they’re nowhere near able to do something revolutionary enough to fundamentally change the way people think about communicating with machines.
One of the key questions for developers of these systems is to figure out to what extent—and at what times—the non-virtual world matters to people. In other words, how much of the physical world around you needs to be visible, if any of it? For a conference call, for instance, augmented reality is far preferable to virtual reality, says Blair MacIntyre , a professor in the School of Interactive Computing at the Georgia Institute of Technology. “You totally wouldn’t want just VR version of that because maybe I need to look at my notes, or type something on my computer, or just pick up my coffee cup without knocking it over.”
This is an issue MacIntyre likes to call “the beer problem,” as in the need to pause for a sip of your beer while you’re playing video games. “In VR that becomes hard,” he says, “whereas in AR that becomes a little easier.” Eventually, he says, augmented reality really will be able to track smaller objects and augment smaller pieces of the world, which will make its applications and interface more sophisticated. Displays will become more clear. Checking the status of a flight at the airport, for instance, could mean using mixed-reality to look up in one’s field of vision—rather than searching for information by smartphone or finding the physical display board in the airport.
“But I think we still need the keyboards and touch screens for the input that requires them, honestly,” he says. “Haptic feedback is super important. I’ve done the touch-typing on the HoloLens, with the virtual keyboard in midair. They don’t work well, right? Because by the time you see the visual or hear the auditory feedback of your finger hitting it, you end up consciously having to control the typing of your hand.”
Eventually, he says, the motions that have become normalized to smartphone users may translate to the augmented reality sphere. This is the sort of interface that many people associate with the film Minority Report, in which a series of hand gestures can be used to orchestrate complex computing tasks. “I have this vision of the future of HoloLens,” MacIntyre says, “where maybe I still have my phone or a small tablet I can use for very precise interaction, and then—when the visual tracking gets good enough—I’m doing pinch-zoom or drag-to-rotate in the air if it can really precisely track my fingers. But it has to be better than it is now. And I think that stuff will get better.”
Better interfaces don’t just have to work, technically, though. They also have to delight users. This was arguably one of the iPhone’s greatest triumphs; the fact that the device—sleek, original, and frankly gorgeous—made people want to interact with it. That made the iPhone feel intuitive, although an engaging interface is arguably more important than an intuitive one. There’s an entire research community devoted to gesture-based interfaces, for instance, when the use of gestures this way isn’t really intuitive. This isn’t necessarily a good thing, Cummings, the Duke roboticist, told me. Humans are accustomed to gesturing as a way of emphasizing something, but with the exception of people who use their hands for speaking sign language, “How much do we actually do by gestures?” she says. “And then it actually increases your mental workload because you have to remember what all the different signals mean. We get led by bright, shiny objects down some rabbit holes.”
Then again, it’s not as though keyboards are particularly intuitive. They may be second-nature to many people now, but it wasn’t always this way. “Even still today, if you think about it, you’re using a horizontal pointing device with the mouse on a vertical screen,” says Michael Clamann, another roboticist at Duke. “So already there’s this weird translation. Then you’ve got touch screens, which fix that a little bit, but the resolution is limited to the width of your finger, and every time you touch the screen you’re covering up part of it. Even though we’ve gotten better, there’s still limitations with all these interfaces.”
“The problem with computers is that they are still this go-between the person and the work they’re trying to do,” he added.
How, exactly, are we supposed to take the computer out of the equation? Augmented reality, once it hits certain technological milestones, begins to solve this problem. But there are far more radical ideas, too. Back at Gerwin Schalk’s laboratory, in Albany, New York, scientists are devoted to developing interfaces for direct brain-to-computer communication. One of their main challenges is to develop better ways to measure what the brain is doing, in real time, as a way to make sense of what that means. Schalk and his colleagues have already demonstrated that deriving meaning from brain activity is theoretically possible, and idea that has astonishing implications for the future of human-machine interaction.
“If you could somehow interface with a computer directly, bypassing all the constraints of your sensory motor capacities, you could make all the feelings and perceptions and desires of a person directly accessible to technology,” he says. “You could completely eliminate this communication bottleneck and essentially create a symbiotic relationship between technology and the brain.”
There’s reason to believe that this could someday be possible. “We can already translate brain signals into fully written, complete sentences,” he said. “By looking at brain signals, we can tell you whether you speak a word or whether you imagine speaking a word. This isn’t science fiction. This is published, peer-reviewed work.”
If some new interface is soon to transmogrify the way we talk to machines, however, it’s not likely to be a direct human-brain connection. Not yet, anyway. For now, Schalk and his colleagues work with patients who, for clinical reasons, already have electrodes implanted on their brains. And even then, their work is error-prone. As Daniel Engber put it in a deeply reported Wired story about brain hacking earlier this year, Schalk’s methods are “at best, a proof of concept.” That’s essentially how Schalk described them to me, too. What he’s waiting for, he says, is better technology to allow for precise measurements of brain activity that wouldn’t require invasive brain implants. If that were to happen, he says, “we would be very close to our world being turned upside down.”
“If you could do that, this would have implications that would far exceed anything we as humans have done before,” he added. You could make a computer do your bidding, with a single thought. You could pilot a swarm of drones, with mere intentions. The potential implications are mind-blowing and not just a little creepy. “It would far exceed what humans have done with technology to society. Not only would society transform—both society and what it means to be human would change.”