Unfortunately, one of the things the library learned was that the Twitter data overwhelmed the technical resources and capacities of the institution. By 2013, the library had to admit that a single search of just the Twitter data from 2006 to 2010 could take 24 hours. Four years later, the archive still is not available to researchers.
Across the board, the reality began to sink in that these proprietary services hold volumes of data that no public institution can process. And that’s just the data itself.
What about the actual functioning of the application: What tweets are displayed to whom in what order? Every major social-networking service uses opaque algorithms to shape what data people see. Why does Facebook show you this story and not that one? No one knows, possibly not even the company’s engineers. Outsiders know basically nothing about the specific choices these algorithms make. Journalists and scholars have built up some inferences about the general features of these systems, but our understanding is severely limited. So, even if the LOC has the database of tweets, they still wouldn’t have Twitter.
In a new paper, “Stewardship in the ‘Age of Algorithms,’” Clifford Lynch, the director of the Coalition for Networked Information, argues that the paradigm for preserving digital artifacts is not up to the challenge of preserving what happens on social networks.
Over the last 40 years, archivists have begun to gather more digital objects—web pages, PDFs, databases, kinds of software. There is more data about more people than ever before, however, the cultural institutions dedicated to preserving the memory of what it was to be alive in our time, including our hours on the internet, may actually be capturing less usable information than in previous eras.
“We always used to think for historians working 100 years from now: We need to preserve the bits (the files) and emulate the computing environment to show what people saw a hundred years ago,” said Dan Cohen, a professor at Northeastern University and the former head of the Digital Public Library of America. “Save the HTML and save what a browser was and what Windows 98 was and what an Intel chip was. That was the model for preservation for a decade or more.”
Which makes sense: If you want to understand how WordPerfect, an old word processor, functioned, then you just need that software and some way of running it.
But if you want to document the experience of using Facebook five years ago or even two weeks ago ... how do you do it?
The truth is, right now, you can’t. No one (outside Facebook, at least) has preserved the functioning of the application. And worse, there is no thing that can be squirreled away for future historians to figure out. “The existing models and conceptual frameworks of preserving some kind of ‘canonical’ digital artifacts are increasingly inapplicable in a world of pervasive, unique, personalized, non-repeatable performances,” Lynch writes.