An information scientist turns Wikipedia into a data set.
History, as a field, is pretty good at tracking concrete events. We know a lot -- though not, and never, enough -- about wars and plagues and celebrations and explorations and revolutions. We know considerably less, however, about the emotional tenor of those events: the sentiment that underscores our history.
There's a place, however, where much of that knowledge is embedded: Wikipedia. So the researcher Kalev Leetaru, of the University of Illinois Graduate School of Library and Information Science, loaded the full text contents of the English-language articles on Wikipedia into a computer program. He then mapped those contents, identifying each location and date denoted in the articles, as well as the linked connections between them. Cross-referencing those factors with the positive or negative sentiment represented in the text, Leetaru was able to create the sentiment-through-time visualization pictured above.
As Leetaru explains,
each location is plotted against the date referenced and cross referenced when mentioned with other locations. The sentiment of the reference is expressed from red to green to reflect negative to positive.
The average tone of Wikipedia's coverage of each year closely aligned with major global events, Leetaru found, with the most negative period in the last 1,000 years being the American Civil War, followed by World War II.
For the most part, the project is an awesome data visualization in a sea of awesome data visualizations -- this one made even more powerful as a video that suggests movement through time. But it's also a nice reminder of the new modes of thinking that data-mining and visualization, as genres and practices, can encourage. As a topic for the humanities -- as a topic for text -- sentiment can be frustratingly (and sadly! and infuriatingly!) hard to track with much rigor. Sure, we can read Pepys and Austen and Franklin and Montaigne, and try to reconstruct the sentiments they indicate ... but that work, for all its value, is also imprecise and severely limited in scope. It doesn't account for the broad sweep of history.
Leetaru's digital humanities approach, on the other hand, takes the summative if not always superlative insights of Wikipedia and renders them as data. It allows for a new sense of the past -- one focused not just on events, but on the way people experienced them.