The search giant is scrapping its ambitious plan to digitize hundreds of years of old newspapers
After three years, Google announced today that it would shutter its ongoing quest to scan and archive printed newspapers. Google's News Archive, which has scanned nearly a million pages from 2,000 newspapers into an easily browsable database since 2008, was among the most ambitious attempts to record and archive newspapers in their printed form. While Journalism.net keeps a running list of digital newspaper troves around the world, the News Archive was the first major attempt to centralize digital scans of old broadsheets in a single, searchable archive.
No one is totally sure why Google chose to shut down the project. Matt McGee at Search Engine Land obtained this statement from Google:
We work closely with newspaper partners on a number of initiatives, and as part of the Google News Archives digitization program we collaborated to make older newspapers accessible and searchable online. These have included publications like the London Advertiser in 1895, L'Ami du Lecteur at the turn of the century, and the Milwaukee Sentinel from 1910 to 1995.Users can continue to search digitized newspapers at http://news.google.com/archivesearch, but we don't plan to introduce any further features or functionality to the Google News Archives and we are no longer accepting new microfilm or digital files for processing.
The end of Google's News Archive won't be the end of digitized broadsheets. The New York Times, noticeably absent from the project, recently unveiled its TimesMachine engine, which allows readers to browse and read any issue of the of the Times from September 18, 1851 to December 30, 1922. But the service was an incredible resource for smaller publications which lacked the resources to undertake an extensive -- and expensive -- digitization process. Carly Carlioi, a longtime editor at the Boston alt weekly The Boston Phoenix, offers a thoughtful heartfelt eulogy for the archive:
The five-year-old News Archive project was Google's attempt to do for old newspapers what Google Books has been attempting to do for the world's libraries. As part of the project, newspapers opened their morgues to Google, which promised to scan, index, and host the digital files it made from the archives. Google and the newspapers would then share revenue on the pageviews of those archives. Google says it eventually scanned 60 million pages, covering 250 years.Was this cool? It was kind of cool. For instance, here's 21 articles about the Sex Pistols' final US concert in '78. And here's some fresh-off-the-press news from 1860.Some newspapers complained that Google, after quickly scanning their archives, was slow to process the scans. The Phoenix sent Google a stash of archives covering several decades; some fraction of those have made their way online.News Archive was generally a good deal for newspapers -- especially smaller ones like ours, who couldn't afford the tens or hundreds of thousands of dollars it would have cost to digitally scan and index our archives -- and a decent bet for Google. It threaded a loophole for newspapers, who, in putting pre-internet archives online, generally would have had to sort out tricky rights issues with freelancers -- but were thought to have escaped those obligations due to the method with which Google posted the archives. (Instead of posting the articles as pure text, Google posted searchable image files of the actual newspaper pages.) Google reportedly used its Maps technology to decipher the scrawl of ancient newsprint and microfilm; but newspapers are infamously more difficult to index than books, thanks to layout complexities such as columns and jumps, which require humans or intense algorithmic juju to decode. Here's two wild guesses: the process may have turned out to be harder than Google anticipated. Or it may have turned out that the resulting pages drew far fewer eyeballs than anyone expected.
Carlioli raises an interesting point about pure text: if Google can develop a substitute system to extract pure text, does this serve the same purpose as the stalled News Archive? At The Atlantic, we frequently scan articles from our extensive archives into a plain text form, formatting them in our content management system and publishing them on TheAtlantic.com. If Google's mission to "organize the world's information and make it universally accessible and useful," does downgrading the Newspaper Archive into a text-only database still serve the same function of preserving those first drafts of history as they appeared in the pages of The Boston Phoenix, The Milwaukee Sentinel, or Le Monde?
Not necessarily. For print newspapers, layout is key: Before the Internet catalyzed the unbundling of news packages into single, shareable article pages, editors had to decide what deserved front page treatment and what would be better suited for small corner in the Classifieds. Ad sales shaped how and where articles was presented in the paper. Stories lived and died by how many inches a broadsheet had space for. The process is still important, as my colleague Nicholas Jackson noted when the New York Times was literally forced to stop the presses after the death of Osama bin Laden, but less visible with news consumption shifting from newspapers to website. The medium is the message, and capturing the layout of the printed paper allows us to digest a facet of the news, as it was reported, that a plain text scrape cannot.
Image: The front page of Le Monda, December 26, 1887/Google