From Colbert to Borges, and still onward from there: The fascination with completeness is as timeless as it is ingrained. In the last decade, the Internet has made the ambition of universality appear closer to realization than ever before: What is the Web, if not a vast collection, and an accessible one? But as with any new frontier, formidable challenges attend exciting possibilities—and nowhere has this been more apparent than in the efforts of the Digital Public Library of America, a coalition spearheading the largest effort yet to curate and make publicly available the "cultural and scientific heritage of humanity," with a focus on materials from the U.S., by harnessing the Internet's capabilities. The DPLA hopes to create a platform that will orchestrate millions of materials—books from public and university libraries, records from local historical societies, museums, and archives—into a single, user-friendly interface accessible to every American with Internet access. It will launch a prototype in April 2013. If successful, the resource has the potential to revolutionize the way information is organized and found online, to radically expand public access to knowledge, and to represent a sharp counterpoint to the model already offered by search-giant Google, whose "Google Books" program is now eight years old.
THE RALLYING CALL FOR THE DPLA was circulated in the fall of 2010. Summoned by Robert Darnton, the great book historian and current director of Harvard's library system, about 40 people came together in October at the Radcliffe Institute for Advanced Study. There had been some concern that the attendees—a diverse group hailing from different parts of the library science universe—might have trouble fusing their agenda. But after a half hour, the effort found solid ground. "We were able to come up with a single sentence: 'It's a worthy effort, and we are willing to work together toward it,' " recalls DPLA chair John Palfrey, who is also director of Harvard's Berkman Center for Internet and Society, and a former professor at Harvard Law School. The spirit of unanimity had legs: A steering committee quickly formed, and the Alfred P. Sloan Foundation, a non-profit organization that supports a variety of digital information projects, offered to fund a planning process. The attendants conceived of
an open, distributed network of comprehensive online resources that would draw on the nation's living heritage from libraries, universities, archives, and museums in order to educate, inform and empower everyone in the current and future generations.
While ambitious, the project was not unprecedented. The creation of a large-scale digital library catering to public access has been attempted for decades, by a cast of characters worth noting. Aside from Google, there's the Internet Archive, a non-profit digital library based in San Francisco that sees itself as a bulwark against a modern-day version of the loss of the Library of Alexandria. Brewster Kahle, who founded the Internet Archive in 1996 and is now on the DPLA steering committee, aims to supplement this digital reserve with a physical copy of every book in existence, collected and stored in a mammoth warehouse in California; he currently has about 500,000 volumes and hopes to reach 10 million one day. His efforts are complemented by the HathiTrust ("Hathi" is the Hindi word for "elephant," an animal that, as the saying goes, never forgets), a digital preservation repository founded in 2008 that has digitized over 10 million volumes contributed by participating research institutions and libraries. The 3 billion-plus pages amount to over 8,000 tons (but weigh close to nothing online, of course). Meanwhile, national institutions like the Library of Congress have been digitizing their in-house materials for years. The DPLA is not the first player to step onto the field.
But that doesn't make it any less of a milestone. Consider these facts: The Library of Congress, the largest library in the world, added 480,000 books to its collections in the last fiscal year alone, and now boasts more than 34 million books and other print materials. Add other items like maps and manuscripts, and the collection towers at 150 million items. And then there's information stored in digital forms (e-mails, websites, even President Obama's Twitter feed), which compounds things astronomically. Speaking in digital terms, the world produced more data in 2009 than in the entire history of mankind through 2008, according to the former chief scientist at Amazon.com. In one way, this explosion and the digital platforms that support it have been a boon for librarians and archivists, who specialize in collecting information and making it available to users. But in others, it has been a scourge, rendering the goal of staying abreast of the world's intellectual output (not to mention the hardware and software needed to store and display it), more quixotic than ever. Simply to reap the accessibility benefits that the Internet so tantalizingly affords, the centuries-worth of items currently extant only in cloth and paper need to be imaged into bits and bytes—a monumental, manpower-intensive, and prohibitively expensive task. And that is to say nothing of figuring how to cull and catalog the terabytes of information that have spent their whole life in digital format. All of which goes to show that the problem of networking the nation's "living heritage" online has barely begun to be addressed. The problem is one of time, money, and most of all, scale—massive scale.