"From the Tech Toolbox" (September 2006)
Some products and links worth knowing about. By James Fallows
Here’s something new to worry about! What if all the material you have accumulated on your computer—financial records, e-mails, photos of the family, elaborate graphs—is destined to disappear in a few years? What would that mean, not simply for retaining useful work-and-business archives but also for capturing and perhaps later savoring the baby pictures, notes from friends, diary entries, and other tokens of passage through life?
The main concern here is not that your files might vanish all at once, in a catastrophic hard-disk failure. Like airline crashes, such events occur but are rare. I have suffered exactly one irretrievable hard-disk failure over the last quarter century. That was in 1994, when I dropped a laptop while on a long overseas trip. I’ve been more careful about making backups (and carrying computers) since then.
The deeper problem involves a paradox of the digital age. It is easier than ever to generate and store vast quantities of data, but harder to be sure that the information you want will be available later on. A single frame taken by my current digital camera occupies seven megabytes of disk storage—more than was on the entire hard drive of my first IBM PC. An hour’s worth of a TV broadcast or a movie can take more than one gigabyte, or more than 1,000 megabytes. As big as our hard drives get, we find ways to fill them. James Billington, the head of the Library of Congress, told me about a report estimating that digital information equivalent in scale to the contents of all the library’s books is produced by the world’s computers every fifteen minutes.
The convenience is obvious: the PDF file rather than the stack of papers on the desk; the instantly viewable photo rather than the wait for prints from the camera shop; the quick keyword search rather than the need to flip through pages to find the desired passage. But this marvelously handy information is strangely transient—especially the information each of us might want to store for our own purposes, as opposed to the Big Brother–style central registries of our phone calls, credit-card transactions, and similar activities.
“The best-preserved data tends to be on stone steles and cuneiform tablets,” Billington told me when I went to the library to hear about its recent attempts to solve the “digital preservation” problem. “Papyrus, vellum, parchment—all those classical modes hold up pretty well.” Chris Weston, of the library’s Office of Strategic Initiatives, recounted what he called a typical story of old-style data preservation. “Someone in upstate New York was cleaning out the attic of an old farmhouse—and there was a letter from Benedict Arnold. It had been in a cool, dry place for 200-some years. With most things on paper, unless you throw them away or actively destroy them, they’re likely to stay around.”
It’s just the reverse, of course, with digital data. Unless you go out of your way to renew and preserve it, information on a computer will disappear fairly quickly. More precisely, it will become unusable. Several related processes are involved. One is what Clay Shirky, a media scholar at New York University, has mock-portentously called “Playback Drift: The Silent Killer.” This boils down to the idea that the physical devices for storing and then retrieving digital data succeed one another so quickly that information is in constant jeopardy of being trapped in an obsolete format.
The first files I produced on a computer, in the 1970s, were stored on Radio Shack audiotape cassettes. After that, I used a computer with eight-inch floppy disks. The book I wrote twenty-five years ago using that computer still looks fine—but the interview notes for it, which I “saved” on those big old disks, I might just as well have burned. For all practical purposes, there is no way for me to get at them anymore—nor at other information that over the years I’ve lodged on 5.25-inch disks, small archival high-density tapes, some varieties of Zip drives, and other media that my current computers can’t handle. As each new and improved storage system comes out, computers generally remain compatible with the immediate past system but not with anything older. A few years ago, virtually any new PC had a built-in 3.5-inch disk drive. Now such drives are often missing or optional, as CD-ROMs and DVDs have become standard. Eventually the small disks will be obsolete and information on them will be orphaned. Any file stored more than six or eight years ago, and not transferred to something more modern in the meantime, is on its way to doom.
The old files that I still can use—and in fairness, there are a lot of them—are the ones I’ve taken the trouble to copy from an old computer to a newer one each time I’ve bought a new system. But even those files suffer from a different form of playback drift, which is the constant change in file formats. I have word- processing files that were originally created in WordStar, XyWrite, Electric Pencil, DeScribe, and other now-extinct programs. Some can be transferred into the current standard, Word, but not all—and the problem is worse for many database, note-taking, and e-mail programs. As if that were not enough, there is another silent killer: “bit rot.” Pictures fade over time, and so, in a sense, does digital information. Both hard and floppy disks store data with tiny magnetic charges. Inevitably, the charges weaken, corrupting and finally eliminating the data. CD-ROMs and DVDs store data by etching pits in a layer of dye, which can also fade. It is as if all of our books and newspapers had been printed with disappearing ink. How long does each kind of degradation take? Some people told me five years on average, some people said fifteen—but in any case, less time than you’d hope to keep those digital pictures of your wedding. Recently I came across a box full of snapshots of my mother as a child, in the 1930s. They survived without being tended; today’s counterparts will not.
All of these problems affect institutions on a grand scale. The Library of Congress, as part of its new National Digital Information Infrastructure and Preservation Project, is investing $100 million to lead a consortium of universities and high-tech companies whose aim is to design new backup systems, convert old data in danger of being orphaned, and generally cope with the flood of high-volume, low-durability digital data. (Details about the full, ambitious program can be found at www.digitalpreservation.gov.)
When I asked James Billington whether there was a precedent for the potential large-scale loss of data the consortium is trying to head off, he said, “Of course! The library at Alexandria.” This had been the central repository of the Mediterranean world’s knowledge, including the works of the Greek philosophers and playwrights, until it disappeared, probably in a fire, in the third century. “There was no backup!” Billington said. “And people took it for granted, until it was gone. I’ll be frank, that is a major haunting thing for me about our library.” He had broader threats to libraries in mind—mainly budgetary, if citizens begin to consider libraries outdated and lose interest in funding them. But the point applied to digital preservation.
How can we be less haunted about our own computers? During a meeting with six members of the library’s preservation team, I asked what individuals should do. “Make copies!” they said in unison. One of the library’s Internet experts said that if a document was important enough, she would always print it out, as backup for the digital file. (I wish I’d printed out those old interview notes.) Another kept a disk-copying device next to her desktop—and stored the disks in a different building.
For both laptop users and mighty institutions, then, preserving digital files is an active rather than passive process. You must transfer files from old tapes or disks to new ones, as storage standards change; you must import information from one program to another whenever you change software; you must rerecord backup files at least every few years, so the information doesn’t just crumble away.
Among hardware backup devices, an external hard drive, like the popular Maxtor OneTouch, is the simplest to use, because it can be set to automatically mirror the contents of your computer’s hard drive. But an external drive is relatively expensive, at $300 and up, and if your main computer is affected by fire or flood, so is the nearby backup. Systems that copy your hard disk onto DVDs, tape cassettes, or media that you can take to a different location are cheaper, but they require more fussing with, and to avoid “playback drift” you have to replace them when storage standards change.
Many backup possibilities involve no hardware. With a free utility available at tinyurl.com/4yg48, I’ve turned a Gmail account into a free online storage system, and it can hold more than two gigabytes’ worth of files. For-pay systems like Xdrive, iBackup, and others let you store larger amounts of data online for $100 a year and up. With Flickr, Snapfish, Shutterfly, and similar online services, you can store and categorize your digital photos. Of course, relying on online storage requires two leaps of faith: faith in the company’s durability, that it will still have your notes and pictures two years or ten years from now; and faith in its integrity, that it won’t share your information.
My own most important backup system consists of one long-term and one everyday strategy. Every two or three years, when I buy a new desktop or laptop, I make sure that all the contents of the old machine are copied onto the new one. I may have trouble later on decoding those files that are in WordStar format, but at least I know where they are. Laplink’s PCmover, at $39.95 and up, automates this process. Then, every day, I sync up any new or altered files between my desktop and laptop machines. For this I use the popular transfer software Laplink Gold, which requires your two computers to be connected—either physically with cables, or wirelessly over a network. It costs around $100. Services like GoTo MyPC and Laplink Everywhere allow you to connect to your home PC over the Internet and sync up files, for fees of around $100 to $180 per year.
The main leap is recognizing that preserving data will be an ongoing semi-hygienic chore, like brushing your teeth or taking out the trash. But this ongoing chore, at least, offers hope for a happy outcome.