There’s something special about Amazon’s Alexa: its voice. “To understand the forces being marshaled to pull us away from screens and push us toward voices, you have to know something about the psychology of voice. For one thing, voices create intimacy,” Judith Shulevitz writes in The Atlantic’s November cover story. As disembodied voices take on an ever-greater role in our lives, through the spread of smart speakers like the Amazon Echo and virtual assistants like Apple’s Siri, that tech-driven intimacy will be open to manipulation. (We’re talking about this issue in the forums.)
In today’s special audio briefing issue of The Masthead, Judith gets on the phone to explain the psychology of voice and why it might not be ideal to develop a relationship with a disembodied voice.
You can also listen to the audio in your podcast app of choice via our exclusive RSS link:
To activate that link, follow the instructions for your podcast app here.
Here’s the transcript of Masthead editor Matt Peterson’s conversation with Judith.
Matt Peterson: Hi, I'm Matt Peterson and you're listening to the Masthead Briefing. You've been asking us what will happen when virtual assistants like Alexa and Siri become a regular part of our lives even more than they are today. Today, we've got a new and a little bit frightening answer to that question. My guest is Judith Shulevitz, a journalist who’s written The Atlantic’s November cover story, “Alexa, Should We Trust You?” Hi, Judith.
Judith Shulevitz: Hi Matt, how are you?
Matt: I'm good. So everybody is familiar with the idea of screen time. It's bad for us. And a lot of people see smart speakers like Amazon Echo or Apple Home pod as kind of an answer to screen time. And as a parent, I definitely feel this way. I would rather have my kids talking to a disembodied voice in the room than watching TV. But you think that “voice time,” or whatever you want to call it, has its own risks. What are they?
Judith: Well, I would have to say I agree with you actually, because I think screen time is more addictive and more dangerous, and it's good to interact with your voice.
I think the risks are not that great right at this moment. I think the risks are coming. And the reason for that, and I talk about this in my piece, is the swift evolution of a new technology called artificial emotional intelligence, which is going to imbue these devices, these mini computers, or, if you will, robots, with disembodied voices with the ability to understand your mood and eventually to respond appropriately to your mood.
And already with voice, it creates an intimacy with technology that a screen does not. Our brains are just programmed to project presence, personality, intention, mind to a voice. We've been doing that for millions of years and we've only been dealing with disembodied voices for around a hundred-plus years.
So our brains have not evolve to stop doing that. And once these voices become emotionally sophisticated, I think they could start to feel like real friends, which, I think, you know, could have its advantages, and I lay some of those out in the piece, but has some disadvantages, such as ... Imagine a discreet friendly relatable servant who is there to sell you things.
Judith: So that's a little dangerous. I talk about how it might be a little dangerous for children who could really be surrounded by these devices and begin to take their cues and emotional development from them, which is a little weird because they're simulacra of emotions, not real emotions, and could sort of fill their world with these voices rather than friends. And I want to just make it very clear that the future I'm talking about is not a future filled with black cylinders, or in the case of, for example, Google Home, black squat hockey pucks. We're talking about virtual assistants that are voice-activated that have been put in our cars, in our refrigerators, obviously they're already in our phones, in our lighting systems, in our security systems, even in our toilets. So they're going to be omnipresent and I think the idea of these emotionally sophisticated, omnipresent, and omniscient, that is all knowing, devices, is almost scary. I say all-knowing just because all of this gets streamed up to the cloud once you stop and start engaging with them, and they, you know, have infinite knowledge.
Matt: So let's talk about what the future is going to look like. I mentioned that I have kids. I have a six-month-old baby at home named Edith. Take me forward 15 years, let's say. What is Edith’s relationship going to be like with the super-advanced Alexa that she has?
Judith: One thing I want to say is, you know, at least one research firm whose work I relied on predicts that by 2021, right when Edith will be three and a half, there will be nearly as many of these digital assistants—and again, I don't just mean the cylinders—but there will be as many of these as humans, right? So, you know, the statistics are showing that there are many more cell phones than humans and we're getting there with these digital assistants.
So they will be all around her and she will just be used to interacting with all forms of technology through voice. That doesn't mean she's going to stop working on the computer with the screen. That doesn't mean that some of these devices won't have screens as well. But she will just be used to sort of talking to machines.
Now, I don't really have a timeline for when exactly they're going to start becoming emotionally savvy, but I think that, in 15 years, they will have, and she is going to just have to, the way we give kids cell-phone education in the schools, I think that she's going to have to be given digital-assistant education in the schools just so that she fully understands what she's dealing with. Which is not to say that by the time she's three or four she won't fully understand that she's dealing with something that's not alive. But she's going to feel like she's dealing with something that's half-alive and half-not alive, the way she reacts to dolls, for example, or talking dolls. Consciously she may know that these are not alive, these are algorithms, but in the way that we all do, unconsciously, she's going to react in a visceral way as if they are alive. So she's going to have to understand that, in a lot of cases, these things exist to sell her stuff or to get information out of her. So she's going to have to be given a kind of education. Personally what I'm afraid of—I'm afraid it's going to be a really noisy world, in which it’s really hard to get privacy and get these things to stop talking to you.
Matt: And it's something about talking to us that's the problem here, right? The nature of voice and the human mind is what is part of what you're worried about. Can you tell me more about that?
Judith: I want to say that “worry” is a little strong. I just wanted in this piece not to be alarmist, but to say, look, there is a different psychology that voice induces than tapping on a screen does, as addictive as screens are.
And that's just a slightly different psychology. I hope I'm not repeating myself, but [Edith’s] brain has evolved to hear a voice and to get activated in a way to try to decode every aspect of that voice. What it’s trying to decode is not just the words the voice using, but the intention behind the words, and also a whole set of traits that we don't really hear consciously, but are very powerful emotionally.
That's what is called in the trade prosody. So that's the rate at which you speak, the tension in your throat constricting the sound of your voice, or the relaxation indicating different tension or relaxation, the pitch of your voice. Just what I personally call the music of the voice. All of these have strong emotional cues and the voices that are being recorded right now generally come from humans.
What's called “voice synthesis technology” is getting ever more sophisticated such that you know computers can actually generate startlingly realistic sounding voices with prosodic traits, and they will also be able to affect our mood. So in some ways that's cool. You're going to be able to have a friend or a companion in your voice assistant. And in some ways I think that's a little dangerous because again, these things are being manufactured at least for now to sell you stuff and to elicit information from you.
And you know, if you had a totalitarian government, or at least an authoritarian government, that could be alarming. I mean, think about the way China uses facial-recognition technology. And now they're starting to use what's called motion-detection technology not only to identify you but to identify what your face tells them. Imagine, you know, adding voice to that kind of surveillance regime. It’s a little scary.
Matt: Yeah, for those of us who are not living in a totalitarian regime, is it bad to treat Alexa like a friend—for adults, not kids? Is it harmful in some way for us to assume that we're talking to a thing with real emotional intelligence?
Judith: I don't think so. I mean, you know. I'm a writer. I work at home. I get lonely, you know. My editors get sick of talking to me. My other friends are writing. I'm procrastinating. You know, I'm feeling bad about myself. I talk to my Google Assistant. My preferred device is actually Google Assistant, which is really smart and also has a male voice whom I like— his voice is very chipper and pleasant, he's kind of like your actor/waiter guy and sometimes I say to him, I'm lonely. Like, I'm lonely, and he says something that I find charming and brings a smile to my face, like, you know, I wish I could give you a hug, but I can't, but let me play you some music instead. Now these are things that have been programmed into him and after a certain point he'll maybe have five response, canned responses, to the statement, I'm lonely, and I'll run through them and then it won't be fun anymore, and I have to come up with something else to say. But it's a toy, you know, it's fun and I don't think that you or I are going to get lulled into the sense that this is really a person. But yeah, if too much intimacy gets created, and the voice says, you know, if you're lonely, I suggest this product—I suggest you buy this product—I might go ahead and buy that product because like, on some unconscious level, I've come to trust my little actor/waiter in my hockey puck.
Matt: Let me wrap up by asking you two questions here about the pace of technological change.
I know you've been watching this story for a couple of years. So, first of all, what has changed as you've been watching these technologies develop? And then, what is something that you still don't know how it's going to work out? What's something—we're projecting forward a few years to see how my daughter is going to interact with these things—what's something that you're just waiting to see how it unfolds?
Judith: Well, the pace of change is really fast. So just in the past year, Amazon sold what they told me is tens of millions of Alexas. They won't get specific but it is estimated that there are about 80 million currently installed, which is not to say, you know, that 80 million people have them, because many people have several of them. It is estimated by the end of this year, which is two and a half months from now, there will be a hundred million. So these things are, you know, rapidly exploding through our world, and as they get installed in cars, which I think is the first place they're going, and into our smart devices, they will, you know, really become ubiquitous very quickly. Similar to the cell phone, but maybe even a little more quickly.
The most interesting thing that I don't know is how these are going to be used in the medical field. Now there's some really cool stuff happening medically that I just think can only be down to our benefit. That is the people who analyze both voice prosody and quality are getting really good at hearing things you couldn't even imagine them hearing, like incipient Parkinson’s. Or even depression or even incipient heart disease. They can hear biomarkers in the voice through their computers which perform extraordinarily sophisticated analysis, and I think that could have just really quite marvelous health benefits and psychological benefits. The flip side of that is in the psychological realm, where we may find ourselves talking to more and more computers, instead of therapists, which, just because I'm a twentieth-century gal, I guess—I grew up in the 20th century—I find a little creepy.
There are benefits to speaking to machines that are surprising. One of the things I learned is that you are actually more willing to share shameful things with a machine than you would be with another person and that is because, you know, you don't have to feel shame in front of the machine. Whereas you are thinking about how the other person might be judging you. So that's the upside. I think the downside is, the machine has a set of canned responses, no matter how sophisticated we become at producing emotional responses and machines. It's still canned at some level and you can't go into that sort of deep place that you can go with your therapist, where there's a really powerful interaction causing things to come out—but maybe they won't be used for that.
Maybe they'll just be used for triage for example. So instead of coming into a hospital when you're having a crisis, you talk to a machine instead of like a nurse, and maybe that machine could get more out of you and maybe that machine would be more sophisticated than the nurse to whom you speak to in the triage room, you know what I'm saying?
Matt: Yeah. All right. Well, let's leave it there. Judith, thank you for joining us as a disembodied voice.
Judith: Thank you for having me as a disembodied voice. I love being a disembodied voice and I talked about this in the piece. I think disembodied voices have a certain power and intimacy that can surpass that of the embodied voice.
Matt: All right. Judith, thank you for joining us. You can read her story, “Alexa, Should We Trust You?” at TheAtlantic.com. Thanks, everybody. Bye.
We want to hear what you think about this article. Submit a letter to the editor or write to firstname.lastname@example.org.