The New Work of Words

The basic unit of the sentence is doing more than it ever was before. An Object Lesson.

The word is a popular thing. Charismatic, even. Of all the parts of our language, it sticks out grandly. As such, it’s the easiest to tell stories about: The provenance of the word “entrepreneur” is also a slice of economic history, while no one knows where the word “hobo” comes from. These narratives get most people as close as they’ll come to understanding language as a concrete thing. So here’s a new story you might not know about words: They’re working harder than ever before, corralled into new forms of work at which they are doomed to fail.

You might have heard about the “word gap” that low-income children are said to have. It occurs because their parents don’t talk to them in the same way that middle and high-income American parents talk to theirs. In the 1990s, University of Kansas researchers Betty Hart and Todd R. Risley found that low-income children hear 30 million fewer words in their first four years than middle and high-income children. This difference was hypothesized to lead to disadvantages, particularly in terms of school readiness, so it received a scary name: the “word gap.”

But in reality the gap was never about words. Instead, it was about the difference in the type of interactions between caregivers and children. This quality, though not exactly ephemeral, was certainly slippery: too slippery to capture in the ways that Hart and Risley wanted. So, in order to avoid disrupting family life by inserting observers into homes, Hart and Risley handed tape recorders (it was the early 1990s, remember) to the 42 families in the study, then transcribed their verbal behavior. In those days of relatively low computational power, the analysis included a few measures of content (such as whether child-directed speech was positive or negative) but focused mostly on brute counts of words. Lo, a gap was discovered.

The gap was a clear problem, and a problem that should be fixed. Fill the word gap, the experts hypothesized, and you fix the outcomes gap. So an interventional industry was born, aiming to teach low-income parents how to talk to their children, telling them to log the words they say every day. But this year we found out, as reported in the New York Times, that some parents might be mistaking the goals of the interventions. Motoko Rich writes:

Child advocates say programs need to convey the subtlety of communication as well as simply trying to bolster word counts. “It’s not just saying, ‘You need to say this amount of words to your kids every day and then they’re going to be smart and successful,’” said Claire Lerner, director of parenting resources at Zero to Three, a nonprofit group that promotes healthy development in the early years.

This is a contradictory prescription. First, the process of satisfying “the subtlety of communication” is quantified in a medicalized dosage of “words.” But then, the word turns out to be a pale substitute for what kids actually need. It’s like measuring the health of the forest by counting trees, then planting more trees to improve that health only to discover that the undergrowth is still full of invasive species and the animals keep leaving. You’ve got more trees, but you’ve fixed nothing at all.

But what else could the researchers have measured? Well, conversational turns, for one—that is, how many times the exchange switches between participants in a conversation. After all, they were really looking at an interaction gap, not a word gap. Had they counted turns, they wouldn’t have had to conjure up an ad hoc distinction between words that come from caregivers’s mouths and words that come from radio or television speakers. These latter words somehow, intrinsically, weigh less than the “true” words of caregivers. 
(Meanwhile, words read from books do fill the word gap. Weighting them like spoken words is another piece of evidence that word-gap interventions are, at least implicitly, about proving the superiority of middle-class child-rearing practices.)

We seem to want the word to serve as diagnosis and as cure. Not only is the word meant to measure intellectual abilities, but also it is meant to fix its delinquency, as when it becomes the metric for the intervention to fix the “word gap.” But soon enough, people discover that the outcomes they desire actually lie in phenomena that are more difficult to capture and that the terms they’ve used to define the problem are insufficient. When that happens, they turn on the word, admitting that just as malnutrition isn’t diminished by the sheer volume of calories consumed, so communicative competence isn’t improved by the volume of words one knows.

*     *     *

As historians of science will tell you, scientific endeavors begin to take off as soon as they construct (or acquire) standardized ways of measuring a phenomenon. This is known as a “metrology.” The watt, the ohm, the tesla, the lumen, the sievert, the mole, the second, the katal: The list of units for various physical phenomena is long, and a global administrative infrastructure (the International System of Units) maintains their definitions.

Linguists and others who study language and communication also have a metrology, though it’s less fixed and more divided by sub-discipline: There’s the phoneme, the syllable, the feature, the morpheme, the phrase, the conversational turn, the specifier, and so forth. You might be surprised to learn that, in this metrology, the word isn’t central, mainly because linguists don’t have to treat the word as a surrogate for anything else. A word is what it is according to the language it belongs to (and word-like things can vary greatly in size and structure from language to language).

The word can be an artifact of writing. Linguists, who study speech, have tried to move past such residual effects of literacy. But, even in spoken language, the word is a derived unit, not necessarily the most basic. That job is given to morphemes, which are the fundamental units of word meaning. (To use a Linguistics 101 example, the English word “dogs” is made of two morphemes: “dog” and the plural marker “-s.”) Hart and Ripley could have counted morphemes, which would have provided a better sense of the vocabularies that middle- and high-income children were bathed in. Such a measure might reflect a mastery of expression rather than recall or orthography. But the “morpheme gap” would be too hard to sell to the media.

Outside of technical fields, the word retains a halo of immanence. We follow the addition of new words like selfie and hashtag into official dictionaries. We have Urban Dictionary. Every December we hear about words of the year (declared not only in English but in German, French, and Spanish). We circulate lists about untranslatable words, words that other languages have but we don’t, old words that we should bring back, relatively recently coined words that should be retired. All this is familiar work for the word, which is also the basic unit by which we measure our language proficiencies. Scholars and hobbyists of language-learning often debate the value of bulking up one’s vocabulary as well as the minimum number of words one needs to be “fluent.”

When we think about arguments involving words, it is arguments about what a word means or doesn’t mean that come immediately to mind. See, for instance, the ongoing fracas over the true meaning of “literally.” For words themselves, these are familiar burdens. Part of the job description of word-ness is to be involved in such affairs. Increasingly, however, words are burdened with more and more of our century’s exhausting work.

The word deserves our pity. It’s become a methodological commodity. The word is a concrete, discrete thing that can be called upon as a surrogate for non-discrete, slippery phenomena. Because its surrogacy seems reasonable, the word’s concreteness is exploited; it becomes a lazy handle on the phenomenon of linguistic sophistication and communicative richness.

Our obsession with words does turn out to have consequences, but not necessarily the ones we thought. Instead, a focus on words—sometimes overzealous, sometimes necessary—often blinds us to the real (and more complex) operation of language.

For example, consider the way neuroscientists think about how language functions in the brain. You might be familiar with investigations into “where” in the brain the different languages of a multilingual person reside. This brain-imaging research depends on giving its subjects tasks with words—not with utterances or sentences or even phrases—in part because longer units produce cluttered results. You can control for a word, but what in a phrase induces the neural activity that you’re interested in?

Word tasks also create longer-lasting, more focused types of activity in the brain. These are better suited for fMRI imaging, which measures activity at the slow speed of oxidation (which spans entire seconds), rather than electricity (which takes mere milliseconds). As a result, a lot of the activity that researchers claim as language in the brain is actually about words in the brain. While experts know this limitation tacitly, the rest of us have to be slowly introduced to the idea that our geopolitical metaphors for the brain are obsolete, and that our capacity to speak Spanish and English might not be located in the same place as our Spanish and English words.

Or, think about the way computers orient us toward lexical analysis in the first place. In the digital realm, the trait of the word that’s most often exploited isn’t its concreteness, but an artifact of that concreteness: the space it takes up in storage. All that computers see of words are character strings of some specified type that are bookended by characters of a “whitespace” type. To search for “words,” the computer looks for, and gathers, the characters that aren’t white spaces or punctuation. Each gathering is a word. (That’s in English. For a host of other languages—Lao, Thai, Japanese, Chinese, among many others that don’t typically mark word divisions with spaces, the machines must work a bit harder.) This gatherability is what makes crunching and sorting vast numbers of words into patterns and frequencies very easy. It’s also what makes them surrogates for things like intention or emotion that can’t (or don’t yet) exist as data.

This is what “sentiment analysis” does: It measures the emotional valence of a corpus of words—often scraped from social media streams like Twitter’s—in order to predict things like the outcome of a presidential election. When a text analysis firm analyzed the words of most-emailed and most-favorited New York Times articles, it concluded that “message clarity” and “engagement” were the key attributes of their popularity.

Yet, focusing on the wordness of such utterances has limits that are frequently ignored. Surely the fact that these articles speak to the anxieties of middle-class life at a certain moment (and weren’t clearly written articles about, say, dense mathematical theorems) is a more relevant variable than the emotional tenor of the individual words that comprise those articles.

Instead of analyzing sentiment, we ought to look at topicality to understand the popularity of news articles. Topicality is a function of time and context, a quality that can’t be extracted from strings of characters. Likewise with sarcasm, which the Secret Service wants social media surveillance software to detect. Of course they do: There are too many threats on too many channels to watch. “Yeah, right” will be picked up easily, but more subtle distinctions will still require human powers. It’s the same with plagiarism. The modern professor submits his or her students’s writing to anti-plagiarism websites, which perform automated matches between texts. How often are matches taken as conclusive evidence of plagiarism, despite the fact that plagiarism is a subtle conclusion to be drawn from a writer’s process and his or her intent? (And a lack of matches is not evidence of honest composition, either. I recently heard about the practice of “text-laundering,” in which plagiarizers pass a stolen text through plagiarism detection software and tweak it until it stops raising alarms.)

Then there are the trends in the digital humanities known as distant reading. Taking advantage of the computer’s slant toward word-based analysis, these techniques reimagine literary works as stacks of words. For computer-aided distant readers, literary expertise now amounts to interpreting the patterns in those stacks—for example, the frequency of indefinite articles (“a”) versus definite articles (“the”) in the titles of 19th century novels. According to its proponents, such analysis yields new insights. (For example: Counter-French Revolutionary novel titles use the definite article predominantly, indicating a commitment to the established past rather than an anticipation of the unknown future.)

Rather than construing literature as an experience with language, this new literary scholarship has embraced the materiality of words: shuffling, sorting, and counting them to make sense of the patterns. At the same time, the experience of that materiality has been ceded to machines, which do the work of “discovering” their aggregate meaning in terms of the words they use. Now that the readings of a text are infinitely multiple, the human reader has been rendered so wholly irrelevant that the “humanities” part of “digital humanities” no longer seems to fit.

Writing in The New Republic last spring, Adam Kirsch critiqued digital humanities approaches like those of Franco Moretti (who coined the phrase “distant reading”), Erez Aiden, and Jean-Baptise Michel because they “aggregate data, and they reveal patterns in the data, but to know what kinds of questions to ask about the data and its patterns requires a reader who is already well-versed in literature,” Kirsch writes.

At first glance, it might appear that the word has gained some status in the world in an era where image seemed likely to predominate. But in fact, this work enlists the word in a betrayal of its homelands in the humanities, where the study of ecologies of texts, readers, and writers have resisted science-like metrologies. Until now, that is.

Just as solving climate change isn’t about closing the polar bear gap, and preventing environmental degradation isn’t about closing the tree gap, you can’t increase children’s school readiness by closing the word gap. Sure, you count bears and trees as one part of dynamic, complex phenomena that have other units to measure and compare. And maybe, when you’re done counting words, you undertake the work of counting other units. Even so, it seems inevitable that “save the word!” will become the rallying cry to preserve reading or revitalize endangered minority languages or protect face-to-face interactions in an age of cheap telecommunications. Let’s hope that when that happens, it’s only because movements require charismatic megafauna to catch the public eye, not because words scooped up, stacked, and counted are all that people can see a use for.

An ongoing series about the hidden lives of ordinary things