One of the tasks the human brain best performs is identifying patterns. We're so hardwired this way, researchers have found, that we sometimes invent repetitions and groupings that aren't there as a way to feel in control.
Pattern recognition is, of course, a skill computers have, too. And machines can group data at scales and with speeds unlike anything a human brain might attempt. It's what makes computers so powerful and so useful. And seeing the structural framework for patterns across vast systems of categorization can be enormously revealing, too. Which is why a seemingly small adjustment to TimesMachine—the astonishing archival trove that lets New York Times subscribers explore millions of pages of past newspapers—is actually a pretty big change.
"People are somewhat overwhelmed when presented with the entire archive like, 'Here's 11 million articles. Go find what you want,'" said Evan Sandhaus, who is the director of search, archives, and semantics at the Times. "We thought there would be a real need to sort of guide people into the archive."
One way to do that, he told me, was to add a search function that's linked to The New York Times Index, the sprawling database of references that the newspaper has published as a resource for libraries and other researchers since 1913. (Another is with the Twitter account NYTArchives, which the Times is launching today.) The index, which the Times calls "a painstakingly assembled and cross-referenced guide to virtually every article," is what made it the newspaper of record. Even today there is a team of about a dozen people whose only job is to read the paper and index its contents—"this Herculean effort to categorize everything," Sandhaus told me.
But unlike thumbing through the bound-volume pages of the print index, TimesMachine shows you results for the search term you're seeking as well as suggesting related terms—kind of like Google but with results only tied to the index, and not to search terms inputted by others. In other words, it reveals some of the connections between topics you might not otherwise search for. This is meaningful not just because it's useful but because it is so intuitive. It's an archive search tool that acts the way you might expect an Internet tool to act. Which would sound sort of mundane if it weren't so rare in journalism and library science, two fields that have struggled to transform their archival practices and functionalities in the post-print era.
In recent years, though, the Times has been a leader—along with institutions like the Library of Congress—where other stewards of great archives have fallen short. In academia, many researchers still have to travel great distances to access remote collections. Too much valuable information remains isolated to print, unlinked to related resources and collections. Even what's digitized is often done so in formats that are practically unreadable. TimesMachine is built using the mechanics of online mapping, so that instead of making users wait for a single huge file (with teeny tiny text) to load, they can zoom in, tile-by-tile, on the section that interests them.
This latest iteration of TimesMachine also offers a glimmer into how the Times sees its work, how it categorizes its own coverage—and how that has changed over the years. A simple search can reveal a clear sense of how Times coverage might have reflected or influenced attitudes about that search term at a given time. Consider, for instance, some of these headlines related to "robots" in the 20th century: