Hacking a Universe's Worth of Data

The American Museum of Natural History's first-ever hackathon yielded results that might actually help the museum.

On a Friday night in New York City you can find just about anything. And this past Friday about 130 hackers gathered in the Hayden Planetarium to participate in the American Museum of Natural History’s very first hackathon.
The premise was simple: The museum handed the huge dataset they call The Digital Universe to the hackers and gave them 24 hours to make something. (Part of what made this hackathon different was the literal universe of data hackers were given. More on that in a minute.) There were some specific challenges and categories (Education, Visualization, Tool Kit, and Wildcard) but the hackers were otherwise free to explore the data and run with it.
Hackathons are often most useful to the hackers themselves—participants come and work through ideas, meet one another, and learn new tricks and coding languages. They’re great events for community building, for publicity and for experimentation, but rarely—in my experience—do hackathons create lasting, useful products. This hackathon was different.
It shouldn't have been. The hackers had a lot working against them. The dataset the museum threw at them is huge. The Digital Universe combines data from dozens of different organizations into a three-dimensional atlas of the entire Universe. Before digging into the data, you have to sift through pages and pages of documentation. Partiview, the public software that allows for people to fly through the data, is difficult to use. And not only is it a lot of data, it's really difficult data. It involves measurements of stars and exoplanets and satellites that are billions of miles away, spanning literally the entire universe. The majority of these hackers had no training in astronomy, and not all of them were skilled at data analysis. So they faced two overlapping challenges: They had to wrangle a huge dataset, and they had to wrangle a huge concept.
The museum offered some assistance—several astrophysicists and coders and visualization experts roamed the hackathon offering to help anybody who was stuck. And the setting couldn’t have been more fitting—surrounded by asteroids and typing away beneath the huge dome of the planetarium, teams worked through the night.
On Saturday night the groups got up on stage and presented about 30 projects. Some were better than others. But there were, in fact, a handful of projects that were truly impressive and that could actually help people use and understand the vast amount of data at hand. One team made an API for the data—and was met with a room full of applause. APIs make it easier for anybody to build apps using the data, meaning that anyone can delve into the Digital Universe without having to read through pages of explanatory documentation. You've likely used an API before—if you've ever used any of the various newsletters generated from your Twitter users, for example, like Paper.li or Nuzzel, those are all built using the Twitter API. “If you want developers to use your data, make an API,” said Surendran Mahendran, who presented the team’s project. Using the Star API, rather than having to read pages of documentation, anybody could query the data and search for stars of a variety of sizes and types and luminosity.
Other teams built virtual reality tools to explore the data. One common complaint was that Partiview, the open-source software that lets anybody fly through the data themselves, wasn’t particularly friendly. WebUniverse, for example, built a way to fly through the data using just your arrow keys on your keyboard. Univrse built a system for teachers to use that utilizes Oculus Rift to literally bring their students through the data. Using Univrse, teachers could create a special class code and drive their class through the universe. “It’s like the Magic School Bus,” said the Univrse team.
Some teams built games. The winning team for the Education category, called PlanIt, created a virtual reality game that allows users to build their own solar systems, create planets, and throw them into orbit with one another. “The choices you make determine if your system will survive a day, or a billion years,” said Geoffrey Ryan, a team member who is also a doctoral candidate in physics at New York University.
By the end of the presentations, the teams had given the museum an API, teachers all kinds of tools, and regular users like me a new way of thinking about scale and the universe. The projects weren’t all totally finished, and they weren’t all working quite right, but it seemed clear that these weren’t just throw-away ideas meant to hone a team’s coding skills. These were things the museum and its scientists could absolutely use not just for fun, but for outreach, and even for science.
Christina Wallace, head of the museum’s brand-new BridgeUp: STEM program, says that the hackathon had one other purpose, beyond exploring data: “We wanted to quietly showcase the diversity of people who go into science and tech careers. Half our participants at the hackathon are women. All four of our judges are women—that one we didn’t even plan, they were just the best.” (BridgeUp: STEM is a program for high-school girls to introduce them to computer science, and its applications in fields like genetics and archaeology and paleontology.)
The museum plans to have more hackathons, using other data from their collections. And why shouldn’t they—this was more than a publicity stunt, it yielded products and concepts and frameworks the museum (and you) can almost certainly use in the future.