Government Data Wants To Be Free

"I have a theory that 20 percent of federal data holds 80 percent of the public value."

This illustration can only be used with the Kaveh Waddell piece that originally ran in the 2/28/2015 issue of National Journal magazine. (National Journal)

The federal government spends a lot of time and energy collecting data. Hundreds of agencies sit on vast supplies of information compiled from sources such as tax returns, geologic surveys, regulatory filings, student-loan statements, and Medicare records.

(Koren Shadmi)That information can be difficult to find—but not necessarily because it's hidden behind lock and key. Some of the information the government compiles and organizes is publicly available as searchable data on sites like and The problem is that much of what remains has been released as lengthy documents that aren't readily searchable (think scanned paper) instead of as machine-readable datasets (think Excel spreadsheets).

The potential benefits of cleaning up existing government data, and releasing more data in machine-readable format, are vast. One beneficiary, of course, would be the business world—and it's therefore no surprise that businesses are lobbying for the change. For one thing, if the federal government accepted documents like regulatory filings as data rather than forms, firms that spend thousands a year on compliance could instead use automated processes to submit required information. Imagine a "TurboTax for everything," says Hudson Hollister, executive director of the Data Transparency Coalition, a trade association that lobbies on behalf of tech firms. Moreover, if consulting firms had better access to government data, he argues, they could in turn help the government manage itself better.

But there are also more-idealistic reasons for the government to make data easier to find and digest. For nonprofit groups like the Sunlight Foundation and the Center for Responsive Politics, open data is indispensable to the mission of keeping government accountable. Or consider a nonprofit called College Abacus, which allows students and families to compare the "net cost" of attending different colleges using information that schools are required to report to the federal government.

The Center for Open Data Enterprise, a nonprofit that launched last week, is pushing open data for both business and public-interest purposes. The aim is to play matchmaker between federal agencies and potential users of open-government data. "The way open data has been treated in the U.S. and around the world is a supply-side model: Governments put data out there and hope it's useful," says Joel Gurin, the group's founder. (The organization's projects are funded by Amazon and PricewaterhouseCoopers. Gurin, who was formerly at New York University, says he's looking into getting foundation funding as his organization grows.) By bringing businesses and nonprofits to the data suppliers, Gurin hopes to make it easier for agencies to focus on cleaning up and restructuring the information that matters most. "I have a theory that 20 percent of federal data holds 80 percent of the public value," he says. "A lot of what we're trying to do is to figure out how to help agencies find that 20 percent."

President Obama and Congress have already made moves toward prying open the government's treasure trove of data. Obama signed the Digital Accountability and Transparency Act last May, after the House and Senate both passed it unanimously. The measure requires the government to make its spending information accessible and searchable online. And in 2013, Obama signed an executive order that would make "open and machine-readable the new default for government information."

The administration's position is "terrific in principle," says Gurin, "but in practice, this is not easy for agencies to do." The many-headed federal bureaucracy runs on layers of computer systems and data policies that have accumulated throughout the years, and coaxing agencies into the 21st century is a tough task. Gurin also worries that open-data policies, which can be costly, may look like low-hanging fruit for lawmakers wielding the fiscal machete. "The biggest threat is whether budgets "for data collection are going to be cut," he says.

Still, the momentum appears to be on the side of open-data advocates. Last week, the White House hired the first ever U.S. chief data scientist, DJ Patil. In an introductory letter he published, Patil made clear that he would be pushing forward the open-data agenda.

Correction: An earlier version of this story misidentified the Center for Open Data Enterprise.