What’s a Language, Anyway?

The realities of speech are much more complicated than the words used to describe it.

A man writes Chinese characters on a footpath using water and a brush in central Beijing.
A man writes Chinese characters on a footpath using water and a brush in central Beijing. (David Gray / Reuters)

What’s the difference between a language and a dialect? Is there some kind of technical distinction, the way there is between a quasar and a pulsar, or between a rabbit and a hare? Faced with the question, linguists like to repeat the grand old observation of the linguist and Yiddishist Max Weinreich, that “a language is a dialect with an army and a navy.”

But surely the difference is deeper than a snappy aphorism suggests. The very fact that “language” and “dialect” persist as separate concepts implies that linguists can make tidy distinctions for speech varieties worldwide. But in fact, there is no objective difference between the two: Any attempt you make to impose that kind of order on reality falls apart in the face of real evidence.

And yet it’s hard not to try. An English speaker might be tempted to think, for example, that a language is basically a collection of dialects, where speakers of different dialects within the same language can all understand each other, more or less. Cockney, South African, New Yorkese, Black, Yorkshire—all of these are mutually intelligible variations on a theme. Surely, then, these are “dialects” of some one thing that can be called a “language”? English as a whole, meanwhile, looks like a “language” that stands by itself; there’s a clear boundary between it and its closest relative, Frisian, spoken in Northern Europe, which is unintelligible to an English speaker.

As such, English tempts one with a tidy dialect-language distinction based on “intelligibility”: If you can understand it without training, it’s a dialect of your own language; if you can’t, it’s a different language. But because of quirks of its history, English happens to lack very close relatives, and the intelligibility standard doesn’t apply consistently beyond it. Worldwide, some mutually understandable ways of speaking, which one might think of as “dialects” of one language, are actually treated as separate languages. At the same time, some mutually incomprehensible tongues an outsider might view as separate “languages” are thought of locally as dialects.

I have a Swedish pal I see at conferences in Denmark. When we’re out and about there, he is at no linguistic disadvantage. He casually orders food and asks directions in Swedish despite the fact that we are in a different country from his own, where supposedly a different “language”—Danish—is spoken. In fact, I’ve watched speakers of Swedish, Danish, and Norwegian conversing with each other, each in their own native tongues, as a cozy little trio over drinks. A Dane who moves to Sweden does not take Swedish lessons; she adjusts to a variation upon, and not an alternate to, her native speech. The speakers of these varieties of Scandinavian consider them distinct languages because they are spoken in distinct nations, and so be it. However, there is nothing about Swedish, Danish, and Norwegian in themselves that classifies them as “languages”; especially on the page, they resemble each other closely enough to look more like dialects of one “language.”

Meanwhile, one generally hears Mandarin, Cantonese, and Taiwanese described as “dialects” of something called Chinese. But the only single “Chinese” language that exists is on paper, in that all of its varieties have the same writing system, where each word has its own symbol that (more or less) stays the same from one Chinese “dialect” to another. Mandarin and Cantonese, for example, are more different than Spanish and Italian. I, you, and he in Mandarin are , nǐ, and , but in Cantonese they are, respectively, ngóh, léih, and kéuih. Dialects? A Mandarin-speaker can no more “adjust” to Cantonese than a Swede could “adjust” to German.

There are cases of the Scandinavian and the Chinese kind worldwide. A Moroccan’s colloquial “Arabic” is as different from the colloquial “Arabic” of Jordan as Czech is from Polish. In order to understand each other, a Moroccan and a Jordanian would have to communicate in Modern Standard Arabic, a version preserved roughly as it was when the Koran was written. The cultural unity of Arab nations makes the Moroccan and the Jordanian consider themselves to be speaking “kinds of Arabic,” whereas speakers of Czech and Polish think of themselves as speaking different languages. But then, while I’m on Czech, there is no such language as “Czechoslovakian”—at least in name. A Czech and a Slovak can usually converse. However, they consider themselves to speak different “languages” because of historical and cultural factors.

It turns out that it’s also impossible to determine precisely where one “language” leaves off and another begins.

An example is certain languages—um, dialects?—in Ethiopia. According to data from Sharon Rose of UC San Diego, speakers of Soddo say, for he thatched a roof, kəddənəm. (The upside-down e is pronounced a lot like the oo in foot.) Not far away, people speaking Muher say it starting with kh instead of k: khəddənəm. A further ways distant, people who speak what they call Ezha say it with an r in the place of the n: khəddərəm. In Gyeto, the same word is khətərə. Then in Endegen they start with an h instead of a kh: həttərə. Now, where we started and where we finished look like what one might call different languages: Soddo’s kəddənəm and Endegen’s həttərə seem about as distinct as French’s dimanche and Italian’s domenica, for Sunday. But in between Soddo and Endegen are several other stages—I gave only a few of them—that each differ from the previous one by just a little change, such that the speakers can converse. If those stages are “dialects,” what are they “dialects” of? Both Soddo and Endegen over on the ends?

All of them are simply dialects—even though the ones on the ends are not mutually intelligible and don’t feel like the same “language” to their speakers. Speech worked this way from village to village across Western Europe until recently, when unwritten, rural dialects started steadily disappearing. People now know this area as home to a few “languages” like Portuguese, Spanish, French, and Italian, but on the ground there once was basically a smudge of countless Romance “dialects” shading gradually into one another from Portugal to Italy. In each nation, the serendipities of history chose one “dialect” as a standard and enshrined it on the page, but in real life, the situation was much like in Ethiopia. There are hints of this history today; in Catalan in Spain, key is clau; to the north, in Occitan, it’s clau as well; but then a little further north, in obscure rural varieties called Franco-Provençal, it’s clâ; in the Romansh of the Swiss mountains it’s clav; in the northern Italian variety Piedmontese it’s ciav (pronounced “chahv,”); and then in what’s known as standard Italian it’s chiave (pronounced “KYAH-vay”).

The idea of distinguishing “languages” from “dialects” is of no logical use here. As often as not, it’s more that speech is a little different from place to place, such that a person can get along speaking when in the town a few valleys over; one starts having trouble the further away he gets; and after a traveling a certain distance can no longer understand a thing anyone is saying.

The only thing that can save an attempt to impose a formal definition on the terms “language” and “dialect” now is perhaps to be found in popular usage, which suggests that languages are written and standardized and have a literature, while dialects are oral, without codified rules, and have no literature. Now, a typical objection to using literature as the dividing line is that there is oral literature—the Iliad and the Odyssey likely originated as memorized poems. But even allowing that memories can retain only so much, and that perhaps it is legitimate to distinguish what Greek bards knew from, say, Russian’s written literature, there’s another problem.

Namely, it’s the implication that there is something lesser about a “dialect.” Is a dialect, on some level, unsophisticated, as if it doesn’t have a literature because it is unsuited to extended thought and abstraction? I recall an exquisite exchange I once caught between a man whom Nathan Lane could easily play, wearing an ascot and a long scarf and rather plummy of expression, and a man whom Sacha Baron Cohen would be cast as, straight-backed, earnest, and a little wary. Nathan asked Sacha what he spoke. Sacha said “Uzbek.” Nathan asked breezily, “Is that a dialect?” Sacha, almost snapping, replied, “No, it is a beautiful language.”

Despite Sacha’s defensiveness, it’s not the case that what one is taught to think of as “dialects” are somehow lowlier or simpler. As often as not, obscure, unwritten “dialects” are much more grammatically complicated than familiar “languages.” The Foreign Service Institute ranks what it calls languages in terms of their difficulty for English speakers; the hardest to learn to speak include Finnish, Georgian, Hungarian, Mongolian, Thai, and Vietnamese. However, just about any Native American, Australian Aboriginal, or indigenous African tongue would easily rank among these in terms of difficulty, and actually, many obscure tongues around the world make any language on the FSI list look like a toy. For example, in Archi, spoken in the Caucasus mountains, a verb can occur in 1,502,839 different forms—that’s more than a thousand times more forms than the number of people who even speak it (about 1,200).

Meanwhile, here in the English “language,” there are walk, walks, walked, and walking. If sophistication separated languages from dialects, Archi would have more claim to the “language” title than English.

A language, then, is indeed a dialect with an army and a navy; or, more to the point, a language is a dialect that got put up in the shop window. Yes, people can sit down in a room and decide upon a standardized version of a dialect so that large numbers of people can communicate with maximal efficiency—no more clau, clav, and ciav. But standardization doesn’t make something “better”—donning a Catholic school uniform isn’t “better” than wearing different clothes to school every day.

Or, yes, the written dialect will have its words collected in dictionaries. The Oxford English Dictionary does have more words than Archi and Endegen do; the existence of print has allowed English speakers to curate many of their words instead of letting them come and go with time. But words are only part of what makes human speech: You have to know how to put them together, and knowing how to handle Archi’s words (or Endegen’s) requires its own level of sophistication.

So, what’s the difference between a language and a dialect? In popular usage, a language is written in addition to being spoken, while a dialect is just spoken. But in the scientific sense, the world is buzzing with a cacophony of qualitatively equal “dialects,” often shading into one another like colors (and often mixing, too), all demonstrating how magnificently complicated human speech can be. If either the terms “language” or “dialect” have any objective use, the best anyone can do is to say that there is no such thing as a “language”: Dialects are all there is. “Is it a dialect?” asks Nathan. Properly, Sacha could have answered, “Yes, a beautiful one.” And Nathan should have understood that he was speaking a “dialect” too.

Related Video
How you start a conversation with a stranger depends on where you live.