Raiders of the Lost Web, Cont'd

Editor’s Note: This article previously appeared in a different format as part of The Atlantic’s Notes section, retired in 2021.
The earliest version of via the Wayback Machine

Adrienne wrote a great feature last week about the fleeting nature of the Internet:

If a sprawling Pulitzer Prize-nominated feature in one of the nation’s oldest newspapers can disappear from the web, anything can. “There are now no passive means of preserving digital information,” said Abby Rumsey, a writer and digital historian. In other words if you want to save something online, you have to decide to save it. Ephemerality is built into the very architecture of the web, which was intended to be a messaging system, not a library.

A reader thought the piece was “beautifully put”:

So many people talk about how “when you put something online it’s there FOREVER” (often people paranoid about social media), and I think that mindset is responsible for a lack of care in this subject.

These are still the burgeoning times of the internet as far as future generations will be concerned. When you find a forum thread from even five years ago, you’ll be lucky if half the images still work and if the URLs even load. If it’s hard reading those threads after five years, I can only imagine how frustrated future humans will be.

There’s too much that can be lost by the innocent-enough decision to not wanting to continue paying the annual fees for a domain name and hosting of an inactive site. An incredible number of websites and legal content disappeared from the internet within the week of Megaupload getting taken down out of fear, much of which I still mourn. I always feel the urge to save what I can on my computer, like I’m nurturing a small injured animal.

She adds, “It’s a small joy every time I find an old website from the ‘90s with embedded MIDIs and tiled backgrounds and the works.” If you have a favorite example, drop me an email. We also got a message from Paul Jones, who revealed two years ago that he possessed what’s believed to be the closest copy of the original pages of the World Wide Web. Here’s a screenshot:

Here’s Jones in response to Adrienne’s piece:

Glad to see the continued attention to this problem. With the help of support from IBM research grant, I’m involved in using virtualization to keep older versions of older software open and accessible. In fact, I’ve kept open and accessible a version of Tim Berners-Lee’s earliest public webpage from 1991. You can see it here.

But that’s not enough. We need a concerted effort on the part of the owners of important sites and archivists who collect important sites to develop means and ways of keeping sites open, accessible, running, and functional. I hope that those of us working on digital preservation and in virtualization can provide a framework that benefits the creators, the archivist, and publics present and future.

Another reader on Adrienne’s piece:

This story makes me think of just how information from the ancient world came to us. Virtually none of our knowledge of the Classical world comes from original carbon documents written in that era. The works that survive to us have only been retained because scholars made the choice to recopy them in the Early Middle Ages. Papyrus doesn’t last, and honestly I think that even if the Library of Alexandria hadn’t been destroyed, most of its contents would have been lost to us anyways.

And yet the funny thing is that the great libraries of the truly ancient civilizations such as the Hittites have actually survived very well because their documents were written on clay tablets. These are great cultures whose existence was almost totally forgotten after their destruction, and yet we now know quite about their history and customs and literature. Languages dead for thousands of years, such as Luwian, are better understood to us than, say, Frankish or Gaulish or Etruscan, which are the actual ancestors for the languages we use today.

Cultural transmission can be surprising, what’s lost for now might not always be lost, what we remember now won’t always be remembered. And in a sense, what we chose to remember makes what we decline to even less likely to survive. I think it’s fair to say that the works that have passed the test of time are likely to last as long as our civilization, but that civilization might not continue to exist in the scale of hundreds of generations. After that, well, what’s the digital equivalent to a clay tablet?

This is—according to this reader:

I teach world history at a high school in Shanghai, and I was surprised last week by a web page I discovered. It was a detailed discussion put up by an MIT anthropology student back in 1995, all text and hypertext. It even had references to Usenet and told you where you get the info as an FTP download. It was a window into a lost past that I lived through, as lost in its way as the world of cuneiform tablets from Uruk or Babylon.

It made me think of a time when the Internet was still full of the sort of academic research material that was its original purpose. Today little of this remains, aside from Wikipedia, which is far less rigorous than the old stuff from universities. The net has stopped being a library and become a shopping mall, with commerce and socializing crowding out the research materials.