People don’t generally speak in a monotone. Even someone who couldn’t carry a tune if it had a handle on it uses a different melody to ask a question than to make a statement, and in a sentence like “It was the first time I had even been there,” says “been” on a higher pitch than the rest of the words.

Still, if someone speaks in a monotone in English, other English-speakers can easily understand. But in many languages, pitch is as important as consonants and vowels for distinguishing one word from another. In English, “pay” and “bay” are different because they have different starting sounds. But imagine if “pay” said on a high pitch meant “to give money,” while “pay” said on a low pitch meant “a broad inlet of the sea where the land curves inward.” That’s what it feels like to speak what linguists call a tonal language. At least a billion and a half people worldwide do it their entire lives and think nothing of it.

Mandarin Chinese, with its four tones, is a typical example. Take the word ma. If you say it the way an English-speaker would say it, just reading it sitting by itself on a page, then it means “scold.” Say ma as if you were looking for your mother—ma?—and it means “rough.” If you were just whining at her—“ma-a-a?!?”—with your voice swooping down a bit and then back up even higher, that would mean, believe it or not, “horse.” And if you say ma on a high pitch, as if you were singing the first syllable of “The Star-Spangled Banner” as ma instead of “oh” for some reason, that would actually mean mother. That’s the way almost every syllable works in Chinese.

As tone languages go, Mandarin is by no means the most complicated. The Hmong language, spoken in China, Vietnam, Laos, and Thailand, can have seven or even eight tones. It’s dazzling, really. If you say paw like a statement, it means “female.” Say it like a question and it means “to throw.” Say it up high in an impatient way and you’re saying “ball.” Say it down low as if you ran into someone in a basement and didn’t want anyone upstairs to know you were down there, and it means “thorn.” Say it in a tone between the impatient high and the down-low and it means “pancreas.” If you say paw in a creaky way—kind of like the way one might imitate an elderly person’s voice—then it means “to see,” while if you say it in a breathy, amazed way as if you were seeing a horsey in the clouds, then it means “paternal grandmother.” (For what it’s worth, maternal grandmother is tai, said on the “basement” tone.)

Tone languages are spoken all over the world, but they tend to cluster in three places: East and Southeast Asia; sub-Saharan Africa; and among the indigenous communities of Mexico. Why there and not elsewhere? One thing these regions might have in common is heat, though it’s hard to imagine how that would make people speak more melodically. Yet environment may not be entirely unrelated to the phenomenon—according to one hypothesis, tone languages are less likely to develop in dry environments because dry air deprives the vocal cords of the suppleness required to produce subtle differences in tone.

The jury is still out on that one, but even if it turns out to be true, it only gets us so far. The theory proposes that where the climate isn’t dry, there’s no predicting whether a language will take on tones or not. As such, it’s easy to suppose—and fun to imagine—that people decide to “sing” language out of some kind of cultural impulse. The reality is less groovy, but just as interesting.

It’s ultimately a matter of one thing leading to another. Take the words “pay” and “bay.” It looks like the only difference between them is that they start with different letters, but there’s more to it up close. English-speakers tend to say the ay sound on a slightly lower pitch after a b than after a p, because of the different mechanics involved in saying those consonants. That is, one tends to say “pay” a little higher than one says “bay.” In daily life that’s so subtle as to be barely noticeable: What stands out is the good old difference between p and b. But p and b are very similar sounds, and sounds that are similar have a way of melting together—a Cockney English-speaker can say “bref” for breath and “fing” for thing because the f and th sounds are made close together at the front of the mouth. Suppose as time went by English-speakers started pronouncing b as p so that there was no more b sound at all?

Imagine: “Brother” is “prother,” “bat” is “pat,” “big” is “pig.” Things like that happen in languages all the time, and if it happened to English, then instead of “pay” and “bay” there would just be “pay” and “pay”—except there would still be that difference in the tone. “Pay” with a neutral tone would mean “pay,” while “pay” with a low tone would mean “bay.” The tone alone would convey the difference in meaning. This is exactly how a tone language happens, and in some places you can even see the steps in the process. For example, there is a language called Khmu spoken in parts of Laos, Vietnam, Thailand, and China. In one dialect of Khmu, pok means “bite” and bok means “to cut down a tree.” In another dialect, though, b has become p, and all that’s left behind is the difference in the tone, as if the Cheshire Cat had left behind his smile. Thus in that dialect, pok on a high tone means “bite,” while pok on a low tone means “to cut down a tree.”

There are certain advantages to speaking tone languages. Speakers of some African languages can communicate across long distances playing the tones on drums, and Mazatec-speakers in Mexico use whistling for the same purpose. You know those people who can hear a stray note and instantly identify its pitch, for instance recognizing that a certain car horn is an A flat? They have “absolute pitch,” and there is evidence that speakers of tone languages are more likely to have it. In one experiment, for instance, Mandarin-speaking musicians were better at identifying musical pitches than English-speaking ones. The same has been found for speakers of Cantonese—which has six or even nine tones, depending on how you count—relative to English- and French-speakers.

Could a language rely completely on tones? As key as tones can be to conveying meaning, they aren’t fine-grained enough by themselves to communicate the full range of human expression—speaking only in tones would be akin to writing only in emojis. The tone-language counterpart to the new all-emoji translation of Moby-Dick, for example, would have been a language created by a Frenchman in the 1820s called Solresol, which was based solely on musical tones. Do-re-mi was “day,” do-re-fa was “week,” do-re-sol was “month,” and so on; mi-sol was “good,” and the reverse, sol-mi, was “bad.” Cute idea, but Solresol would have been no more able to equal the speed, nuance, and complexity of actual speech than Emoji-Dick can render the magnificence of Melville’s prose.