One of the tasks the human brain best performs is identifying patterns. We're so hardwired this way, researchers have found, that we sometimes invent repetitions and groupings that aren't there as a way to feel in control.
Pattern recognition is, of course, a skill computers have, too. And machines can group data at scales and with speeds unlike anything a human brain might attempt. It's what makes computers so powerful and so useful. And seeing the structural framework for patterns across vast systems of categorization can be enormously revealing, too. Which is why a seemingly small adjustment to TimesMachine—the astonishing archival trove that lets New York Times subscribers explore millions of pages of past newspapers—is actually a pretty big change.
"People are somewhat overwhelmed when presented with the entire archive like, 'Here's 11 million articles. Go find what you want,'" said Evan Sandhaus, who is the director of search, archives, and semantics at the Times. "We thought there would be a real need to sort of guide people into the archive."
One way to do that, he told me, was to add a search function that's linked to The New York Times Index, the sprawling database of references that the newspaper has published as a resource for libraries and other researchers since 1913. (Another is with the Twitter account NYTArchives, which the Times is launching today.) The index, which the Times calls "a painstakingly assembled and cross-referenced guide to virtually every article," is what made it the newspaper of record. Even today there is a team of about a dozen people whose only job is to read the paper and index its contents—"this Herculean effort to categorize everything," Sandhaus told me.