The Linguistics of 'YouTube Voice'

The attention-grabbing tricks that keep an audience watching, even when people are just talking at a camera

Barracuda Studio / Shutterstock / Kara Gordon / The Atlantic

Hey guys! What’s up? It’s Julie. And today I want to talk about YouTube voice.

So the other day, I was watching this YouTube video from the PBS Idea Channel about whether Ron Weasley from Harry Potter is really a time-traveling Dumbledore (as you do), and I realized—the guy talking sounds exactly like the Vlogbrothers. The Vlogbrothers are John and Hank Green, and their combined YouTube channel, on which they post videos of themselves musing on and explaining everything from world politics to farts, has more than 2 million subscribers.

And the guy in the PBS Ronbledore video—Mike Rugnetta—was talking just like the Green brothers do. It wasn’t a matter of their accents, or the sound of their voices, it was the way they were talking. The only word that came to mind was … bouncy.

I found more examples in other popular YouTube channels. Tyler Oakley does it. Franchesca Ramsey does it. Hannah Hart of My Drunk Kitchen does it (when she’s not drinking, or using weird voices). This Game of Thrones fan-theory guy does it.

But I had a hard time putting my finger on exactly what “it” was, beyond a vague sense of similarity. So I asked a linguist.

Naomi Baron is a professor of linguistics at American University who studies electronically mediated communication. She watched some videos that I sent her, and was very patient with my continued pleas of, “No, but I feel like something is going on here.” And so here, thanks to Baron, are the linguistic components of YouTube voice:

Overstressed vowels: A lot of the time, people are lazy about pronouncing certain vowels—they’re un-emphasized and neutral, and just sort of hang loosely in the middle of the mouth, making an “euh” sound, regardless of which vowel it actually is. This “euh” is called the schwa. (Hear it pronounced here.) When you make the effort to actually pronounce a vowel that is usually a schwa, that’s a way of emphasizing the word. For example: “If I say the word ‘exactly,’ you don’t really know what that first vowel is. ‘Euh,’” Baron says. “If I say ‘eh-xactly,’ you have the sound ‘eh,’ like in the word ‘bet.’”

Sneaky extra vowels between consonants: Listen to the way Rugnetta says “trapping” at 35 seconds here. “Terraping.” “I’ve added a little vowel between the t and the r,” Baron says. “It elongates the word, it adds an extra syllable to the word, it emphasizes the word. There’s a name for this: epenthetic vowel.”

Long vowels: Stretching out vowels is a common way of emphasizing words—sometimes it’s obvious, and clearly done on purpose (listen to the word “five” in this Franchesca Ramsey video). But sometimes in these videos the vowels are just sliiightly longer than normal (see what I did there?), resulting in the kind of emphasis and “bounce” I wasn’t able to put my finger on until Baron pointed it out to me. (See: every time Rugnetta says “magic,” or when Ryan Higa says “channel.”)

For that matter, long consonants as well: Especially those at the beginning of words. Take the word “fascinatingly” from this Vlogbrothers video as an example.

Aspiration: This was the part of our phone call where things got interactive.

“If you put your finger in front of your mouth, I'll teach you a very quick phonology lesson,” Baron said. I did. “Are you ready? Say ‘keep.’”


“Now say ‘geep.’”


“When you said keep, did you feel a breath of air on your finger?” She asked. (Indeed I did.) “That’s called an aspiration.” There’s normally an aspiration on the K, even if you say it normally, but if you huff and puff a little more, that makes the word stand out.

For an extreme example, see how Hank Green says “couldn’t care less.” A more subtle example is how Australian vlogger Natalie Tran says “fake” and “sick.” You hear this sometimes with “p” and “t” sounds, too. Like when Charlie McDonnell says “salt.”

* * *

So it turns out the “YouTube voice” is just a variety of ways of emphasizing words, none of which are actually exclusive to YouTube—people employ these devices in speech all the time. But they generally do it to grab the listener’s attention, and when you’re just talking to a camera without much action, it takes a little more to get, and keep, that attention. All the videos I used as examples in this article come from popular YouTube accounts, with hundreds of thousands or millions of subscribers—in other words, from people who know how to engage an audience.

There are other factors at play here, too. YouTubers’ monologues often speed up and slow down, for example. “Changing of pacing—that gets your attention,” Baron says. And elongating certain words helps change up the pace. People also tend to move their heads and hands a lot in these videos, raise their eyebrows, and open their mouths wider than necessary.

Baron says she suspects that this style comes at least in part from a trend toward informality in TV newscasting. Influential “infotainment” programs like The Daily Show use some of the same linguistic styles—in this clip of Jon Stewart, I detected aspiration, and elongated consonants and vowels.

Of course, people adopt these tricks to varying degrees. But I think there is something to the newscasting comparison—I notice this “YouTube voice” the most in videos where people are just talking to the camera as themselves, with no acting, no props, no action. And in videos where people monologue for a minute, and then break away into a sketch or a scene (such as this Natalie Tran video), the tics, if they’re there, seem far less pronounced than when the person was speaking directly to the camera. It’s a “talking to the audience” voice. Another linguist, Mark Liberman of the University of Pennsylvania, who runs the Language Log blog, called it “intellectual used-car-salesman voice.”

“You get the same kind of thing in other high-energy sales pitches,” he wrote to me in an email. “I guess the purest form of this style is the carnival barker.” It’s less intense on YouTube, of course, where the audience chooses to click on a video—the speaker isn’t trying to grab the attention of people who just happen to be walking by, chewing on their cotton candy.

But it’s a style video-makers have likely picked up on, because it works. And it may be a particularly popular way of speaking at the moment.

“Things become stylish. That happens with language all the time,” Baron says. “What I think you have is an Internet platform that many people are taking to, and what they’re doing is taking models they recognize from elsewhere. [They’re] dressing up their language through particular kinds of spoken emphases, gestures, and facial expressions. What’s interesting is how similar people end up sounding, rather than sounding like themselves. In an attempt to make yourself sound special, you end up sounding like this whole genre of other people.”