Moving Towards a Physical Archive of the World's Books

Internet Archive, a digital repository, wants to collect and preserve a copy of every single book that's ever been published


My love for books comes from a story of their utter destruction.

As an adolescent, my father made an effort to turn me on to Ray Bradbury by screening the 1966 François Truffaut film adaptation of Fahrenheit 451, set in a dystopian future where possessing books is a criminal offense. Apart from the enduring irony of a society where firemen are dispatched to set fires rather than extinguish them, one scene has stayed with me since that initial viewing. Oskar Werner -- in the role of the former fireman and now fugitive Guy Montag --  escapes to the countryside and finds himself among a group who have memorized entire books, preserving them orally until the law against books is overturned. "Are you interested in Plato's Republic?" asks Granger, the leader of the book-loving vagabonds, pointing Montag towards a young woman. "Well, I am Plato's Republic," replies the woman. "I'll recite myself for you whenever you like."

"Now here's Wuthering Heights by Emily Bronte," Granger continues, gesturing to his fellow "Book People." "And here's The Corsair by Byron. She used to be married to a chief of police. That skinny fellow is Alice in Wonderland by Lewis Carroll. Now where's Alice Through The Looking Class today, she should be somewhere about. Ah ... now there's The Pilgrim's Progress by John Bunyan. He ate his book so they couldn't burn it."

Recommended Reading

Bradbury's Book People are the equivalent of a literary Noah's Ark, living repositories relying on the oral tradition that preceded bound books to safeguard works of cultural significance. And they are far from fictional characters.

The Internet Archive, a non-profit digital library with the Wikipedian mission of "universal access to all knowledge," has offered free storage and access to digitized music, movies, websites and nearly three million public domain books since 1996. In May, the Archive turned its focus offline, towards the preservation of physical reading materials. The aptly-named  Physical Archive to the Internet Archive, a prototype facility devoted to the long-term preservation of physical records, launched last Sunday in Richmond, California. Materials are stored in 40-foot shipping containers, modified for secure and individually controllable environments of 50 or 60 degrees Fahrenheit and 30 percent relative humidity and designed to keep out undesirable pests.

On the Internet Archive's blog, founder Brewster Kahle compares the Physical Archive to the Svalbard Global Seed Vault as "an authoritative and safe version of crops we are growing." Saving physical copies of digitized books might at least be seen in a similar light as an authoritative and safe copy that may be called upon in the future:

Digital technologies are changing both how library materials are accessed and increasingly how library materials are preserved. After the Internet Archive digitizes a book from a library in order to provide free public access to people world-wide, these books go back on the shelves of the library. We noticed an increasing number of books from these libraries moving books to "off site repositories" (1 2 3 4) to make space in central buildings for more meeting spaces and work spaces. These repositories have filled quickly and sometimes prompt the de-accessioning of books. A library that would prefer to not be named was found to be thinning their collections and throwing out books based on what had been digitized by Google. While we understand the need to manage physical holdings, we believe this should be done thoughtfully and well.

Two of the corporations involved in major book scanning have sawed off the bindings of modern books to speed the digitizing process. Many have a negative visceral reaction to the "butchering" of books, but is this a reasonable reaction?

While no crushing yoke of political censorship or governmental censorship has made literature and criticism a dying medium, physical books are running into trouble in the digital age. A recent report by the Online Computer Library Center (OCLC) recently reaffirmed what any casual Internet observer recognizes as fact: that information-seeking behavior is constrained by the convenience of tracking down resources. A Google-generation untrained in the art of deep archival research is likely to have less patience in tracking down that hard-to-find translation of Dutch jurist Hugo Grotius' De Jure Belli ac Pacis should a digitized scrape of the tome not appear in the first few pages of Google or JSTOR results. And as demand for physical books drops, so does usage: thousands of books spend hours, untouched and unread, in musty basements. Atlantic correspondent Yoni Applebaum noted a new trend towards "deaccessioning" of newspapers -- the transferring of newsprint to microfilm -- in 2001, along with an unintended consequence:

In 1997, Columbia celebrated his 150th birthday with Joseph Pulitzer Day, complete with cake and speeches. Our library houses a significant portion of his papers, including an extensive collection of documents relating to the World.

If you go to Butler Library, you may search the shelves for hours, but you will not find the New York World. Gone are the bound volumes, preserving the faded newsprint upon which Pulitzer's fame and fortune were founded. No more are the beautiful color advertisements, the full-page illustrations in a dozen fantastic shades, and the promotional inserts on hard stock. They were "deaccessioned," in the technical jargon of the library trade, and replaced with grainy black-and-white images on microfilm. A half-million pages of newsprint were discarded for a few drawers of film in small cardboard boxes.

Columbia was hardly the only library to junk its newspapers. In fact, almost every major research library in America did the same. Along with them went nearly a million books, all destroyed in the name of preservation.

While reams of microfilm have been replaced by crowded avenues of servers, the principle remains the same: as the written word moves online and print libraries are in lesser demand, the economic incentives to preserve and maintains extensive collections wane.

Kahle's comparison to the Svalbard Seed Bank may be more indicative of the Physical Archive's real utility. While concentrated server farms may be better homes to the digitized sum of the world's cultural and literary knowledge than libraries to their physical counterparts, a single phenomenon -- an electromagnetic surge in the earth's atmosphere, a terrorist attack, a natural disaster or even a spilt coffee cup -- could instantly hasten the evaporation of our literary cloud. One need not be a bibliophile like myself to recognize the potential importance of this project: should our digitized world of knowledge and information suffer such a drastic reduction in the vein of Fahrenheit 451, it should be comforting to know that the Internet Archive has Book People in California ready to preserve and protect our literary history.

Image: The book stacks at The British Library, via SteveCadman/Flickr.