This week, a new Google search tool made its debut: the Books Ngram Viewer, which draws on a database of nearly 5.2 million books published in six languages between 1500 and 2008. You can use the Ngram Viewer to search for words and phrases, and track the frequency with which they appear over a given amount of time. Erez Lieberman Aiden, a junior fellow at Harvard who co-authored a research paper about the database, has described the project's goal as "[giving] an 8-year-old the ability to browse cultural trends throughout history, as recorded in books." Bloggers are marveling over their sudden ability to comb centuries of the written word.
Here's a bit of what we've learned so far:
We're Quicker to Adopt Technology and Forget Celebrities The New York Times reports that "the researchers measured the endurance of fame, finding that written references to celebrities faded twice as quickly in the mid-20th century as they did in the early 19th." Meanwhile, "looking at inventions, they found technological advances took, on average, 66 years to be adopted by the larger culture in the early 1800s and only 27 years between 1880 and 1920."
We're a Planet of Godless Sushi-Eaters! The Wall Street Journal notes that researchers "could track changing tastes in food, noting the waning appetite for sausage, which peaks in the 1940s, and the advent of sushi, the mentions of which start to soar in the 1980s. They documented the decline of the word 'God' in the modern era, which falls sharply from its peak in the 1840s."
Pay Attention to What's Not There "The absence of words can be just as informative as their presence," writes Ed Yong at Discover. "Tiananmen Square became massively more common in English books following 1989, but the frequency of the equivalent characters in Chinese texts remained stable. The names of the Hollywood Ten--a group of alleged Communist sympathisers--were mentioned far less often in English texts after 1947. This repression was never clearer than in Nazi Germany ... None of this is surprising in a historical context, but in the future, the corpus could help to identify victims of censorship in a rapid way, for current or recent events."
And some of the responses to the project:
This Is Astounding Academics seem bowled over by the possibilities the new search tool presents. The Journal quotes Robert Darnton, director of the Harvard University Library, who says: "It is just stunning ... They've come up with something that is going to make an enormous difference in our understanding of history and literature." The Times quotes Steven Pinker, a Harvard linguist, who predicts that the use of tools like this will "become universal."
Just Awesome, agrees Audrey Watters at ReadWriteWeb. "This is an incredible amount of data, a boon to researchers in both the humanities and social sciences. As well as a pretty fun tool for the more casual lit-geeks and word-lovers among us."
It Does Have Blind Spots, points out Erez Lieberman Aiden. Discover quotes the scholar as saying, "Books are not representative of culture as a whole, even if our corpus contained 100% of all books ever published. Only certain types of people write books and get them published, and that subclass has changed over time, with the advent of things like public literacy." Eventually, he says, the database will have to include "newspapers, manuscripts, maps, artwork, and a myriad of other human creations."
How Useful Is This, Really? The Times quotes Louis Menand, a professor of English at Harvard and a staff writer at The New Yorker. "In general it's a great thing to have," Menand says, but cautions that "obviously some of the claims are a little exaggerated." The Times goes on to say that Menand "was also troubled that, among the [Science] paper's 13 named authors, there was not a single humanist involved. 'There's not even a historian of the book connected to the project,' Mr. Menand noted."
Goodbye, Productivity "Allow me to introduce you to a most excellent time-wasting tool," writes The Atlantic's Alexis Madrigal. "Google estimates they've scanned and OCR'd more than 10 percent of all the books ever published, and they use (as Ed Yong pointed out in the comments) about a third of the total books in the tool. So, perhaps this isn't a perfect tool for research, but man is it fun to play with."
Good Times for Grammar Wonks Kevin Drum at Mother Jones points to a chart showing the respective occurrences of "data is" and "data are." Says Drum: "As you can see, data are reached a peak in the early 80s and then began a precipitous nosedive. By the mid-aughts, I'm delighted to report, data is was nearly as widely used and looks to be on course to overtake the obnoxious data are sometime in the next decade. Hooray! (As you've probably guessed, I'm a longtime proponent of data is as the proper modern usage.)"