A new study has found that as much as 80 percent of the raw scientific data collected by researchers in the early 1990s is gone forever, mostly because no one knows where to find it.
According to a study by Timothy H. Vines, et al. titled "The Availability of Research Data Declines Rapidly with Article Age," published last week in Current Biology, most raw data from scientific papers published twenty years ago is unobtainable - either because authors have since changed their contact information and can't be reached or because the data was stored using outdated technology, like floppy disks.
A post on Smithsonian.com's Surprising Science blog explains the researchers' methodology:
To make their estimate, [Vine's] group chose a type of data that’s been relatively consistent over time - anatomical measurements of plants and animals - and dug up between 25 and 40 papers for each odd year during the period that used this sort of data, to see if they could hunt down the raw numbers. A surprising amount of their inquiries were halted at the very first step: for 25 percent of the studies, active email addresses couldn’t be found, with defunct addresses listed on the paper itself and web searches not turning up any current ones.
According to Surprising Science, 38 percent of the researchers data queries yielded no response. The scientists report that the likelihood of finding an existing data set falls by 17 percent each year, starting the third year after a paper's publication.
Obviously, the finding is problematic. First, scientific findings are validated by their reproducibility, making access to raw data an essential way to test and retest outcomes. Vines notes that "much of the data is unique to time and place, and are thus irreplaceable, and many other data sets are expensive to regenerate." Second, as the Smithsonian points out, most of this data is funded by federal grants stipulating that all data must be available to the public, presumably for longer than a few years. And finally, the loss of data makes it impossible to do broad, decades-long studies.
Vines and his team recommend that scientists be required to turn over their raw data to publications, that can systematically archive the information.