One of the tasks the human brain best performs is identifying patterns. We're so hardwired this way, researchers have found, that we sometimes invent repetitions and groupings that aren't there as a way to feel in control.
Pattern recognition is, of course, a skill computers have, too. And machines can group data at scales and with speeds unlike anything a human brain might attempt. It's what makes computers so powerful and so useful. And seeing the structural framework for patterns across vast systems of categorization can be enormously revealing, too. Which is why a seemingly small adjustment to TimesMachine—the astonishing archival trove that lets New York Times subscribers explore millions of pages of past newspapers—is actually a pretty big change.
"People are somewhat overwhelmed when presented with the entire archive like, 'Here's 11 million articles. Go find what you want,'" said Evan Sandhaus, who is the director of search, archives, and semantics at the Times. "We thought there would be a real need to sort of guide people into the archive."
One way to do that, he told me, was to add a search function that's linked to The New York Times Index, the sprawling database of references that the newspaper has published as a resource for libraries and other researchers since 1913. (Another is with the Twitter account NYTArchives, which the Times is launching today.) The index, which the Times calls "a painstakingly assembled and cross-referenced guide to virtually every article," is what made it the newspaper of record. Even today there is a team of about a dozen people whose only job is to read the paper and index its contents—"this Herculean effort to categorize everything," Sandhaus told me.
But unlike thumbing through the bound-volume pages of the print index, TimesMachine shows you results for the search term you're seeking as well as suggesting related terms—kind of like Google but with results only tied to the index, and not to search terms inputted by others. In other words, it reveals some of the connections between topics you might not otherwise search for. This is meaningful not just because it's useful but because it is so intuitive. It's an archive search tool that acts the way you might expect an Internet tool to act. Which would sound sort of mundane if it weren't so rare in journalism and library science, two fields that have struggled to transform their archival practices and functionalities in the post-print era.
In recent years, though, the Times has been a leader—along with institutions like the Library of Congress—where other stewards of great archives have fallen short. In academia, many researchers still have to travel great distances to access remote collections. Too much valuable information remains isolated to print, unlinked to related resources and collections. Even what's digitized is often done so in formats that are practically unreadable. TimesMachine is built using the mechanics of online mapping, so that instead of making users wait for a single huge file (with teeny tiny text) to load, they can zoom in, tile-by-tile, on the section that interests them.
This latest iteration of TimesMachine also offers a glimmer into how the Times sees its work, how it categorizes its own coverage—and how that has changed over the years. A simple search can reveal a clear sense of how Times coverage might have reflected or influenced attitudes about that search term at a given time. Consider, for instance, some of these headlines related to "robots" in the 20th century:
• Robot Can't Have Soul (1929)
• The Robots Again (1930)
• Defeat of the Robots (1944)
• The Menace of the Robots (1944)
• Electronic Robots (1949)
• Unreasonable Robots (1959)
• Robots Are Coming But Slowly (1966)
• They're Not Robots, They're Cyborgs (1969)
• Revolt of the Robots (1972)
TimesMachine is, in other words, a context machine. And it's a historical record designed to be used, not just saved. The archive now delivers groupings of search terms, related articles, and plenty of clues that hint at how those topics may have been received over time. And if you really want to get crazy, you can combine this kind of TimesMachine search with a query within Chronicle, the paper's visualization tool for terms used over the course of Times history. Here's a look at incidences of "robots" in the paper since 1860:
For as comprehensive as TimesMachine is—and there's nothing else I've seen like it—the archive has its limitations. As with any search functionality, you still have to think critically about what kinds of terms to use. (Search suggestions for coverage of "women" are limited to terms like "women's wear" and "fashion, women's," but the terms "suffrage" and "woman suffrage" yield tens of thousands of search results.)
Really, it's most fun to peruse TimesMachine at random. Search results appear as an image gallery of the relevant newspaper page, so you can assess stories by headline, abstract, and visually—glorious old-timey ads and all. (You can view stories as they appeared on the original page or switch to a PDF view that shows text only.) Over the course of about an hour of clicking my way through TimesMachine, I found dozens of articles about famous elephants, thousands of references to The Atlantic Monthly (including this 1895 gem about martians that the Times reprinted from our magazine), and the 1914 obituary for Agnes Irwin, great-great-great granddaughter of Benjamin Franklin and the first dean of Radcliffe College. There was a snarky 1920 piece imagining the inauguration of the first woman president. And a search for "beer" revealed, among at least 84,731 stories containing the word, one of the oldest New York Times references to the stuff. The story is about "a street affray" in October 1851 that involved men who had been to "a lager beer shop...where they drank 'schnaps' and other stimulating liquors very freely" before getting into a knife fight.
Go looking for the rare, ahem, barnyard expletive that may have slipped through the Times's notoriously prudish standards and you may wind up, as I did, at the June 23, 1972 edition of the paper. A transcript of Watergate-era White House tapes features President Nixon using a few choice curse words. (In other cases, expletives turn up search results by accident—in place of the words, like buck and shift, that actually appeared in print. "Some degree of noise has crept into the data," Sandhaus says.) But that noise is actually sort of delightful. A reminder that when you go searching for something, even in the most impressive collections and using the best tools, you often find a pattern you may not have expected.