What is The Simpsons? It's a television show, certainly—specifically, the longest-running American sitcom of all time. It's a cultural touchstone. It's a delight. But it's also an archival collection—25 years' worth of characters, themes, stories, and scripts.
To celebrate the show's quarter-century of existence, fans are being treated to projects that capitalize on this documentary breadth. There's the marathon of the show that's been airing on the cable network FXX; the social media conversation that has accompanied the marathon; the new app, Simpsons World, that will function like a DVD box set for the show, with even more extras. But there's another Simpsons project Fox isn't responsible for: a searchable database. One that has taken every episode of The Simpsons and made it, in its way, interactive. As Homer might put it: "Mmmmmm, searchability."
As Homer could also put it, though: "Mmmmmm, digital humanities." The project is the work of Ben Schmidt, a professor of digital and intellectual history at Northeastern University. Schmidt works in the field of the digital humanities, meaning he focuses on applying computational approaches to things like books, newspapers, and other pieces of literature.
So how do you turn The Simpsons, the show, into The Simpsons, the textual corpus? You take advantage of the fact that the series' episodes—all 552 of them—have been close-captioned. You treat the show's subtitles, essentially, as their texts. Which isn't a fool-proof method—"it's often very quickly done," Schmidt points out of the transcript-creation process—but it does allow for an overall, text-based reading of the show. And, because subtitles are plotted by time, they allow you to understand the shows as they move forward, minute by minute as well as season by season. So they allow you to compare the over-time appearances of, say, Mrs. Krabappel with those of, say, Mayor Quimby. They allow you to plot the writers' relative reliance on particular catchphrases ("D'oh!," "Release the hounds!," "Ay, carumba!") over the show's evolution.
They allow you to treat The Simpsons as, effectively, a single book. A single, enormous, unapologetically four-fingered book.
Once Schmidt had gathered the show's subtitles, it was a quick process to convert those into a database. He and a research partner, Erez Lieberman Aidan, had already created the Bookworm project, which turned a large corpora of books, scientific papers, newspapers, and legal documents into a searchable database. (It's similar in that way to Google's extensive nGrams database.) The Simpsons-specific Brookworm was built on Bookworm's pre-existing architecture, which allowed Schmidt to put it together quickly. As he told me: "I just, over an evening, threw it together."
Searching the resulting database, you get findings like this, which charts the minutes within each episode that characters talk about "school":
And like this—which suggests that, as Schmidt puts it, "'I'm Kent Brockman' seems to be overwhelmingly a gag from the opening scene":
And what were some of Schmidt's broader findings? "I think the show just got more self-referential over time," he says. "Everybody's been talking about the Family Guy shift"—meaning the shift in TV comedy to include in its plot line external pop cultural references—"and the thing The Simpsons can do that Family Guy can't do is refer to something that we actually care about in its own universe. So I think there are some of those jokes."
One other finding? The Simpsons was so creative that its catchphrases often defy common spellings—not only among the show's caption-writers, but among the people who have so far searched the Simpsons database. Take Ned Flanders' signature catch phrase. "Nobody has any idea how to spell 'okeley dokely,'" Schmidt says. "It sort of defies measurement for the time being."