Yasser Al-Zayyat / AFP / Getty

Listen to this recording. Before you go any further, just listen to it.

Late Monday night, this tweet was posted by a 20-year-old Instagram “influencer” named Cloe Feldman. It appears to be a screengrab of a poll that also appears on Feldman’s Instagram account (although the Instagram version was posted after the Twitter one, but the poll design is clearly Instagram-original, don’t ask me). On Feldman’s Instagram, yanny was in the lead as of press time with 51 percent of the vote. (If you can’t hear it, it sounds like yeah knee.) Another Instagram account, @KFCRadio, also added the poll to their Instagram story. As of press time, laurel was winning that one with 53 percent.

Having read this far, you may be outraged, and/or concerned about hearing loss. Because neither @CloeCouture nor @KFCRadio has responded to my request for comment yet (and neither has @Yanni), I can’t say anything about how or why the clip was made. But I do have a degree in linguistics, so I can hazard a guess about why this two-syllable recording is driving everyone bonkers.

When you speak, you’re producing sound waves that are shaped by the length and shape of your vocal tract, which includes your vocal folds (vocal cords is a misnomer), throat, mouth, and nose. Linguists can study these sound waves and separate them out into their component frequencies, and display them in something called a spectrogram. Here’s the spectrogram for the yanny/laurel recording:

Higher frequencies (up to 5,000 hertz, or waves per second) appear toward the top, and lower ones (down to zero) toward the bottom. The dark bands are called formants; they’re the resonant frequencies of the vocal tract, and they depend on the length and shape of your vocal tract—i.e., all the space between your vocal folds, where the sound waves begin, and your mouth and nose, where they’re released.

The length of your vocal tract depends mostly on physiology: Women’s vocal folds tend to be higher up, so their tracts are shorter. The shape is largely based on where you put your tongue, like when you place the tip of your tongue between your teeth to make a th sound. By moving your tongue around in your mouth and opening and closing your lips, you change the sounds you’re making, and the formants you see in the spectrogram.

Chelsea Sanker, a phonetician at Brown University, looked at the spectrogram above to help me figure out what was going on. (For the record, when Sanker listened to the recording, she “[could] not hear it as having ls at all.” Point to yanny.)

First of all, the clip is, according to Sanker “not prototypical” of either laurel or yanny. It’s somewhere in the middle. Sanker said the l/y discrepancy might come from the fact that the sound there isn’t velarized—the speaker’s tongue isn’t touching the back of their soft palate (the velum), as many American English speakers do when they say an l. The middle consonant is definitely not an n, Sanker said, but you might hear one because the vowel in front of it sounds particularly nasal. People who hear laurel are hearing a syllabic l in the second syllable, which has some similarities to the vowel sound at the end of yanny. Both are sonorants—you could go on singing them until you run out of air, as opposed to an obstruent like p or t.

One of the more interesting things to come out of the yanny/laurel debate was the discovery that, by changing the pitch of the recording, you could adjust what you heard. In general, people heard yanny more consistently when the pitch was lower and laurel when the pitch was higher.

This makes perfect sense. When it’s not being shifted around via computer program, the pitch of your voice depends on how thick and how tense your vocal folds are. It’s entirely independent of the formants, which are based on how long your vocal tract is and where you’re constricting it. In real life, when you raise or lower your voice, the formants remain unaffected. When vocal recordings are pitch-shifted, though, the formants are actually shifted, too. But even in shifted recordings, we’re still biased to think that the formants of low voices sound high, and the formants of high voices sound low.*

When the speaker’s voice is artificially lowered, we’re inclined to hear the formants as if they’ve been raised; if the speaker’s voice is raised, we think the formants sound lower. The sounds in yanny generally have higher formants and fewer dips than the sounds in laurel—to see for yourself, here are spectrograms of my colleague Robinson Meyer saying each word:

Yanny
Laurel

Plenty of things could be influencing your interpretation of yanny/laurel, including your dialect and whether you listened to the recording over a speaker or headphones. People have a tendency to try to match the sounds they hear onto real words that they’ve heard before, like laurel. But reading yanny first, since it appears on the left side of the poll, could have primed listeners to hear it over laurel.

A common misperception of linguistics is that it’s prescriptive, telling people how to speak and write. But the linguistic perspective on this whole debacle is that everyone is right. (If, like me, you’ve now listened dozens of times, you’ll know that both sides have a very good point.) We’re all just trying to classify the sound waves we’re hearing into categories we’re familiar with—no differently than we do in everyday speech.


* This article previously misstated how pitch-shifting affects the formants in audio recordings. We regret the error.

We want to hear what you think about this article. Submit a letter to the editor or write to letters@theatlantic.com.