The Race to Preserve History as It Happens Online

Webpages from six historical events over the last three years have disappeared faster than they could be archived.

Hany M. SalahEldeen and Michael L. Nelson

There is a well-known thrill that comes from watching -- nearly in real time -- as big news unfolds on Twitter. Millions upon millions of people pass information around, celebrate it, mourn it, and discuss it. How will this whole process look to historians of the future? Will they be able to recreate and understand what it was like?

The question goes beyond the archiving of the tweets themselves (something the Library of Congress has taken a lead on), though that matters too. But the tweets are only part of the story; where their links bring readers to also needs to be preserved, and that's not happening fast enough, according to a new study (pdf) from Hany M. SalahEldeen and Michael L. Nelson at Old Dominion University in Norfolk, Virginia.

They looked at some one million tweets from six historical events over the past three years (Iranian elections, Michael Jackson's death, the H1N1 outbreak,  Obama's Nobel Peace Prize, the Egyptian revolution, and the recent Syrian uprising) and found that archiving is not keeping apace with the web's fast turnover -- as time progressed, the webpages linked to became increasingly unavailable. "We estimate that after a year from publishing about 11 percent of content shared in social media will be gone," they write. "After this point, we are losing roughly 0.02 percent of this content per day."

Where does it all go? As Megan Garber wrote about earlier work from SalahEldeen and Nelson:

There are several different reasons why that could be the case, SalahEldeen told me, some intentional and some not. A user, for example, could have posted a video, photo, or comment and later regretted it, out of fear of government reprisal or a simple change in mood. For the same reasons, a user might delete his or her entire account on a service like YouTube. And then there's also, of course, the government's interest in repressing information and its ability to act on that desire. "In the case of the Egyptian revolution," SalahEldeen says, "there are other reasons among which the past regime brought down some of this content that might incriminate them. Or the bloggers/photographers get arrested and their accounts and contents got confiscated. Or, as a final resort, the content got old and got over-written."

Of course, any historian is used to working with holes in the record.* What's neat here is that though these webpages may be gone, we have the record of their existence in the tweets. We lose one data point, gain another one -- an echo of a lost document. (Similarly, there are historical documents that point to other documents we no longer have, but the records of their once-existence are not nearly as robust.) These Twitter traces may not fill in the gaps, but they'll give future scholars another path through which to reach our time.

* Ed. note: Sadly, this will continue to be the case until we get a time machine.