The Limits of Quantification, Part II

Over the past couple of days I've had a chance to see the Oxford English Corpus in action, and I'm really impressed. Covetous. The thing contains 2 billion words of text (and counting), making it by far the largest linguistic corpus in existence. All of the sources are 21st-century, and every passage is meticulously tagged as to whether it's British, American, Canadian, Australian, etc., and whether it's from news, fiction, blogs, online chat rooms, medical journals ... Naturally, the tags make it possible to pick apart usage in the different realms. If you want to see sentences containing the word "balloon" in British fiction or in American medical literature (where it's not as scarce as you might suppose, owing to "balloon angioplasty"), no problem. Click, click, hit "Enter," and the passages line up neatly on the screen.

The developers of the corpus have tried to make the text as representative a sample of contemporary English as possible. Which of course gets me thinking, What does that mean? Certainly, the developers have given a lot more thought to this question than I have. They're obviously smart, experienced, and passionate about their work - I'm not at all skeptical of them. I would love to get my hands on the corpus. But I can't help being skeptical that anything anyone could come up with could be "representative" of contemporary English. Have I zeroed in on a fundamental design problem, a fundamental problem with the nonspecialist's relationship to technology, or a fundamental problem with my state of mind?

Presented by

Join the Discussion

After you comment, click Post. If you’re not already logged in you will be asked to log in or register with Disqus.

Please note that The Atlantic's account system is separate from our commenting system. To log in or register with The Atlantic, use the Sign In button at the top of every page.

blog comments powered by Disqus

Video

Cryotherapy's Dubious Appeal

James Hamblin tries a questionable medical treatment.

Video

Confessions of Moms Around the World

In Europe, mothers get maternity leave, discounted daycare, and flexible working hours.

Video

How Do Trees Know When It's Spring?

The science behind beautiful seasonal blooming

More in Entertainment

Just In