Catch My Diff: Github's New Feature Means Big Things for Open Data

Changes on the map are more important than they may appear.

On Wednesday, Github announced that maps would be “diffable”—a silly-sounding term that means much in the world of Github. It’s a small and even long-expected feature, but an important one, and one that aids Github’s role in the emerging ecosystem around open data.

First, a gloss on some terms. Github is a San Fransisco-based startup whose main product—also called Github—helps developers manage different versions of a project’s code. Many developers already use a piece of software on their computer called git to manage versions of code, and Github gives them a place to store Git’s files in the cloud and collaborate with others about them. It also provides messaging functions that sometimes supplant company email.

While it costs money to host a project on Github privately, the company provides free hosting to any open-source project. If you make your code public, hosting on Github is free.

Github, then, already plays a happy home to code projects. In the past year, it’s tried to make itself friendlier and more useful for projects that use open data.

Open data, meanwhile—the effort to make information already produced by the government available to the public—is a bigger and bigger deal. Late last year, the Knight Foundation gave $250,000 to explore the creation of a U.S. Open Data Institute, an organization centered around freeing data and making it easier for people to use. Freeing, for instance, municipal restaurant health code data will allow local review apps like Yelp to display it.

And Github has made itself more amenable to hosting data. Last summer, Github began to render 3D models, geographical data, and tables. These made the site an attractive home for municipal data—like the city of Chicago’s—and also allowed any information in Excel to be placed and viewed on Github.

But Github’s strongest suit is visualizing the differences between documents. Versions of software, after all, are composed of a kind of textual data, and the site became famous for its visualizations of change. Github, too, could already show the differences between image files through a variety of animated means:

Extending this ability to maps is the final step. In July of last year, after Github had announced map rendering but before they announced tables, the technologist Clay Johnson predicted a similar path for Github to Northwestern University’s Knight Lab.

“The interesting stories in data aren’t in the rows, they’re in the differences between them,” he said, explaining why seeing the diffs in the maps were important.

“I think,” he added, “GitHub is getting ready to make as much of a play in data as it has in code.”

Play: made. Github now even has a staff member in D.C. trying to coordinate open data efforts with government agencies. No other website is presenting itself as so friendly to open data of all types. That means the de facto home for open data on the web will be a private company, and that the influence of that company—Github— will only expand beyond its developer audience.

“If I were running a newsroom,” Johnson added, “I’d be making my team experiment with doing workflows in GitHub.”