As the highly anticipated Obama Presidential Library in Chicago morphed into the Obama Presidential Center—without a place to hold the records of his administration—reactions ranged from slight confusion to rote dismissiveness. “The Obama Presidential Library That Isn’t” led the coverage in The New York Times. Philip Terzian complained in an op-ed in the Washington Examiner that what was proposed was “not, in fact, a library at all.”
Instead of the physical research library that 13 previous presidents had established as the centerpiece of their buildings, there would be a digital library, providing online access to Barack Obama’s years in office. Robert Caro, the Pulitzer Prize–winning writer and biographer of Lyndon B. Johnson, registered his concern: “I don’t want anyone deciding what’s going to be digital.”
Is a digital library a library? This is not an abstract koan for me. As the founding executive director of the Digital Public Library of America, I worked with thousands of civic-minded librarians, archivists, museum professionals, and technologists—and the nonprofit and governmental institutions that housed them—to provide broad access to tens of millions of digitized works of literature, art, audio, and video. Six years after its launch, DPLA is clearly and robustly a library, albeit one that leverages the characteristics of computer networks and the power of digital media and technology to synthesize and serve materials from multiple locations. In my current role as the dean of the library at Northeastern University, I oversee both large physical spaces and vast online resources, and can identify significant advantages to each.
The debate about the Obama library exhibits a fundamental confusion. Given its origins and composition, the Obama library is already largely digital. The vast majority of the record his presidency left behind consists not of evocative handwritten notes, printed cable transmissions, and black-and-white photographs, but email, Word documents, and JPEGs. The question now is how to leverage its digital nature to make it maximally useful and used.
I can understand where the hand-wringing about the Obama library comes from. Caro vividly remembers the moment he became a true researcher: when the editor of Newsday, Alan Hathway, revealed the simple method behind good investigative reporting: “Turn every page. Never assume anything. Turn every goddamn page.” Now finishing the fifth volume of his magisterial LBJ series, Caro has turned a lot of pages, many of them at Johnson’s presidential library in Austin, Texas. The neat rows of red boxes containing Johnson’s papers, with researchers such as Caro carefully scanning through them in the reading room, offer the canonical vision of an American presidential library and its use.
This vision is necessarily coming to an end in our age of prolific and diverse electronic communication and documents. That transformation presents an opportunity to rethink what a presidential library should be—and just as important, how it might serve multiple audiences, including historians and other researchers but also the general public, students and teachers, and the vast majority of people who cannot make a pilgrimage to what Caro has called “America’s pyramids.”
The LBJ Presidential Library includes Johnson’s papers going back to his first term as a U.S. representative in 1937, as well as the documents of Cabinet members and those of many other adjacent figures stretching over years—in all, some 45 million pages. For someone like Robert Caro to flip the requisite number of pages to get a full sense of Johnson’s life and career requires decades, judicious selection, and a lot of coffee.
But that scale pales in comparison with the record of President Obama’s White House: 1.5 billion “pages” in the initial collection, already more than 33 times the size of President Johnson’s library. I use “pages” because the Obama Foundation has noted that “95 percent of the Obama Presidential Records were created digitally and have no paper equivalents.” The email record alone for these eight years is 300 million messages, which NARA (the U.S. National Archives and Records Administration) estimates amounts to more than a billion printed pages. In addition, millions of other “pages” associated with the Obama administration are word-processing documents, spreadsheets, or PDFs, or were posted on websites, apps, and social media. Much of the photographic and video record is also born-digital. There are also 30 million actual pages on paper, which are currently stored in a suburb near Chicago. Given the likelihood that a decent portion of this paper record actually came from digital files—think about all of the printouts of PDFs, for instance—only a miniscule portion of what we have from Obama’s White House is paper-only.
This almost-entirely digital collection, and its unwieldy scale and multiple formats, should sound familiar to all of us. Over the past two decades, we have each become unwitting archivists for our own supersized collections, as we have adopted forms of communication that are prolific and easy to create, and that accumulate over time into numbers that dwarf our printed record and can easily mount into a pile of digital files that borders on shameful hoarding. I have more than 300,000 email messages going back to my first email address in the 1990s (including an eye-watering 75,000 that I have sent), and 30,000 digital photos. This is what happens when work life meets Microsoft Office and our smartphone cameras meet kids and pets.
Now multiply those levels of modern media production by the hundreds of staffers and intense communications of the White House, and you can begin to imagine the quandary of the 21st-century presidential library—and the difficulties that await a future Robert Caro or anyone else who will want to use the Obama presidential collection. While that collection contains a smattering of objects that look like they come from the 20th century, such as handwritten edits by President Obama on drafts of important speeches, there are countless more prosaic documents and communications that might be important to understand his presidency and its era more deeply, and that will help us trace, day by day, the attention given to topics such as climate change or al-Qaeda.
Whether we like it or not, this challenge of a digital record and its enormous scale requires us to use the commensurate power of digital technology to scan and sort these documents, and to provide an interface to make sense of it all. This is not to the exclusion of Caro-like page flipping, but a billion pages are beyond the efforts of even the most dogged, caffeinated researcher. Before you can do some close reading of the Obama collection, it will be necessary to do some distant reading—working with indexing, search, and analytical tools to separate the wheat from the chaff. As Leslie Johnston, the director of digital preservation for NARA, has put it, given hundreds of millions of email messages, we will have to figure out “which of those are the relevant important records and which are, ‘Please go over to the corner and get me a sandwich.’”
The presidential-library system has always been a sandwich itself, with layers of a library and an archive but also museum and educational ingredients, and often a mixture of presidential foundations (and their funding) and public agencies such as NARA and the Smithsonian (and their taxpayer dollars). Over the past two decades, the question of who does what and who pays for what has been a persistent source of tension within the system. The addition of an almost unimaginably large digital record creates a stress that makes some kind of rethinking of the institution a necessity. The digitization of tens of millions of documents is also enormously expensive, and David Ferriero, the archivist of the United States, has wisely pushed NARA to become more digitally capable and to make more of the American record available online.
That’s why, despite understandable concerns from some researchers and archivists who see merit in the previous model for the presidential library—which, we should remember, is a relatively new idea, largely created in the past half century—the Obama Foundation is going about this the right way in thinking about the Obama presidential library as digital-first and, alongside NARA, beginning to envision what it would look like from that assumption, including the Foundation’s commitment to fund the digitization of the print record so that we can begin to think about a unified online collection.
Given the shift to digital, it is hard to imagine how we would be able to continue the traditional model of the presidential library. Would presidential libraries run their own servers to preserve, migrate, and emulate digital records, files, and apps so that they remain accessible into the coming decades and centuries? Or would it be better for those electronic records to be managed at dedicated technical locations, and for a virtual presidential library to provide an overarching interface for all relevant records for a presidency? The Digital Public Library of America has shown just such a model, providing access to tens of millions of digitized objects from thousands of libraries, archives, and museums, held by regional hubs. Just as the web has provided the DPLA with a method for uniting remote documents, a presidential library could take a similar form. As other collections become relevant to the Obama virtual presidential library, they can be seamlessly added through digital means even though they might reside at multiple sites.
Will we have lost something in this transition? Of course. Keeping a dedicated archival staff in close proximity to a bounded paper-based collection yields real benefits. Having a researcher who is on-site discover a key note on the back of a typescript page is also special.
However, although the analog world can foster great serendipity, it does not have a monopoly on such fortunate discoveries. Digital collections have a serendipity all their own.
By simply allowing much wider use—millions more can access a digital collection than an analog one—they enable more questions to be asked and answered, from a broader range of people. Existing presidential libraries make only a small amount of their collections available online. The Clinton Presidential Library, for instance, has only digitized 1 percent of its 78 million printed pages, as scanning is done on a much more gradual basis than the wholesale process proposed by the Obama Foundation. Anyone who cannot travel to Arkansas is thus out of luck. Even with the proliferation of smartphones and tablets, a segment of Americans are still without easy access to our online world, but one could imagine presidential digital libraries presented on computers in public libraries to help bridge that remaining divide.
Moreover, digital collections can reorder themselves on the fly with interfaces that accommodate diverse audiences. The research interface for a fifth grader should not be the same as that for a professional historian. By starting off as virtual, the Obama library has the potential to rethink how we present, in multiple ways, the vast record of the presidency, to grade schoolers, amateur enthusiasts, casual browsers, and many others. Presidential libraries have always had those different audiences, but going digital-first can make this much more of a reality than a fixed physical space or the often fairly basic websites of existing libraries—all of which were designed for an age of laptops and desktop computers, now a poor baseline when most online visitors access these sites through their smartphone.
Thinking of the presidential library as a virtual collection can also enable entirely novel digital research that will greatly enrich our understanding. Northeastern University’s Network Science Institute has used the data from large electronic collections, such as email, to look at how information flows within organizations and society—how new ideas spread and how groups and alliances form. Researchers in our NUlab for Texts, Maps, and Networks have revealed how compelling bits of text become virally shared—and who sees them. One can easily imagine the Obama digital library opening itself to this kind of analysis, which will undoubtedly uncover new historical insights that are complementary to those of a Robert Caro.
Digital collections can also allow for unanticipated uses. Two decades ago, the September 11 Digital Archive saved 150,000 electronic records from that tragic day, and while the collection has been used for memorial and traditional research purposes, it has also unexpectedly opened up avenues of research into the early uses of cellphones and text-messaging, which were relatively new at that time.
In the same way that the Obama Presidential Center has invested in designing a building for its events, activities, and community, now is the time for it to design an information architecture for a digital presidential library, thinking about the structures that will enable—or hinder—the discovery and analysis of Obama’s record and the record of our country during his presidency. Years or even decades will pass, as they have for other presidential libraries, before this architecture will come to contain the full historical record. There are federal restrictions on the vetting and timed release of White House documents; other collections could be added over time, as they become of interest or relevance to the core collection and its audiences. But the basic outlines of Obama’s presidential library, and of any other in the future, will have to take account of the nature of digital records themselves—their challenges, such as their great scale, but also their possibilities, such as their great potential to share a library, democratically, with all.