The Limits of Quantification, Part II

More

Over the past couple of days I've had a chance to see the Oxford English Corpus in action, and I'm really impressed. Covetous. The thing contains 2 billion words of text (and counting), making it by far the largest linguistic corpus in existence. All of the sources are 21st-century, and every passage is meticulously tagged as to whether it's British, American, Canadian, Australian, etc., and whether it's from news, fiction, blogs, online chat rooms, medical journals ... Naturally, the tags make it possible to pick apart usage in the different realms. If you want to see sentences containing the word "balloon" in British fiction or in American medical literature (where it's not as scarce as you might suppose, owing to "balloon angioplasty"), no problem. Click, click, hit "Enter," and the passages line up neatly on the screen.

The developers of the corpus have tried to make the text as representative a sample of contemporary English as possible. Which of course gets me thinking, What does that mean? Certainly, the developers have given a lot more thought to this question than I have. They're obviously smart, experienced, and passionate about their work - I'm not at all skeptical of them. I would love to get my hands on the corpus. But I can't help being skeptical that anything anyone could come up with could be "representative" of contemporary English. Have I zeroed in on a fundamental design problem, a fundamental problem with the nonspecialist's relationship to technology, or a fundamental problem with my state of mind?

Jump to comments
Presented by
Get Today's Top Stories in Your Inbox (preview)

CrossFit Versus Yoga: Choose a Side

How a workout becomes a social identity


Join the Discussion

After you comment, click Post. If you’re not already logged in you will be asked to log in or register. blog comments powered by Disqus

Video

CrossFit Versus Yoga: Choose a Side

How a workout becomes a social identity

Video

Is Technology Making Us Better Storytellers?

The minds behind House of Cards and The Moth weigh in.

Video

A Short Film That Skewers Hollywood

A studio executive concocts an animated blockbuster. Who cares about the story?

Video

In Online Dating, Everyone's a Little Bit Racist

The co-founder of OKCupid shares findings from his analysis of millions of users' data.

Video

What Is a Sandwich?

We're overthinking sandwiches, so you don't have to.

Video

Let's Talk About Not Smoking

Why does smoking maintain its allure? James Hamblin seeks the wisdom of a cool person.

Writers

Up
Down

More in Entertainment

Just In