Is There a Scientific Data Logjam?

Is the explosion of scientific information good for science? Many researchers are saying no, not as it is currently stored and organized, often in silos with inefficient access to each other. Josh Fischman reports in the Chronicle of Higher Education:

Scientists are wasting much of the data they are creating. Worldwide computing capacity grew at 58 percent every year from 1986 to 2007, and people sent almost two quadrillion megabytes of data to one another, according to a study published on Thursday in Science. But scientists are losing a lot of the data, say researchers in a wide range of disciplines.

In 10 new articles, also published in Science, researchers in fields as diverse as paleontology and neuroscience say the lack of data libraries, insufficient support from federal research agencies, and the lack of academic credit for sharing data sets have created a situation in which money is wasted and information that could reveal better cancer treatments or the causes of climate change goes by the wayside.

This isn't the first rumbling of a possible tectonic shift in science. The editor of Wired magazine, Chris Anderson, proclaimed "The End of Theory" in 2008:

The new availability of huge amounts of data, along with the statistical tools to crunch these numbers, offers a whole new way of understanding the world. Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all.

It's also part of a broader data deluge.

The underlying problem, though, may be that global science has produced more talented researchers than governments, corporations, and foundations are able to support adequately now. Science depends both on intense competitive individualism and on equally authentic generosity, and I observed plenty of both when I was in science book publishing. The history of science has many examples of men and women whose shared data and ideas helped others to glory, like Rosalind Franklin and John Atanasoff. So fairness has to go along with transparency.

Reform of science funding and publication is a notoriously complex issue, and even standards for shared data might leave winners and losers, depending on whose present methods are ultimately favored. But there is no excuse for the disappearance of publicly funded data. Since storage even of large data sets is becoming virtually free -- think of social media sites and online email allowances -- a good beginning would be a requirement to deposit a copy of data in escrow.

The challenge of reform is that the present scientific establishment seems to be doing its job reasonably well. There are occasional scandals, but far fewer than in politics, sports, or finance. And even after all the disasters of the last of those, it seems hard to change Wall Street's ways.