A.I. Is Getting Better at Spotting Galaxies

A computer can’t match a human’s ability to identify cosmic objects, but recent advances in machine learning are helping to close the gap.


Look at that galaxy, up there at the top of this page. Beyond its sparkling beauty, what do you see?

It’s a spiral, for starters, and not an ellipse. Its arms spin out from the end of a straight bar, rather than curving into something smaller, and more circular, which is to say it’s a “barred spiral.”(The Milky Way is, too.) You can see supergiant stars, both blue and red, scattered throughout the arms, then dark lanes filled with interstellar dust, and a nice bulge there in the middle.

Broken down into these basic shapes, this galaxy, called NGC 1300, makes for a pretty simple picture, one that your brain can easily recognize. If you looked at a few more galaxies, you would know immediately whether they were barred spirals or something different. Humans are excellent at picking up patterns, even better than computers, despite recent advances in machine learning. So do you want to look at 37 billion more images like this?

That’s the projected galaxy haul of the Large Synoptic Survey Telescope, currently under construction in Chile. Starting around 2023, the LSST camera’s 3,200 megapixels will soak up 15 terabytes of data every night, which will fill the biggest astronomical database ever built. Computer scientist Lior Shamir of Lawrence Technological University in Michigan says the amount of data defies comprehension.

“With just a mountain of images, no one will ever inspect them one by one. You can never make a discovery," he says. “You need to convert it into something that machines can understand.”

This means trying to train computers to recognize patterns as well as we can, one of the thorniest problems in computer science. Computers are still second to humans on this and they have much longer learning curves, says Matias Carrasco Kind, an astronomer at the University of Illinois at Urbana-Champaign.

“We can recognize faces in a big crowd, blurry objects in a picture, and notice people from behind or from the way they walk,” he says. “By just looking at a few examples, we can extrapolate much better than computers, which need a much larger training set, and more time to process. [And computers have] a harder time [thinking] ‘outside the box.’”

This is especially true when it comes to characterizing galaxies. You have to account for brightness, which is different across pixels; the galaxies’ shape and symmetry; and their orientation, including whether we’re looking at them face-on or sideways. Humans can do this very quickly, which is why astronomers created something called the Galaxy Zoo. For the better part of a decade, thousands of citizen scientists have volunteered to organize galaxies from the Sloan Digital Sky Survey, a robotic mission that has mapped an astonishing chunk of the observable universe, including some 208 million galaxies. The Zoo website shows you a small photo of a galaxy, and you answer simple questions, like whether or not it’s a spiral. It’s oddly meditative, but decidedly slow work—it's taken years to populate the Zoo, and it doesn’t come close to classifying everything the SDSS has seen. Shamir estimates that at this rate, it would take human volunteers 120,000 years to classify everything that comes through the LSST.

So Shamir is trying to give computers an edge. He and coauthor Evan Kuminski fed some galaxies to a machine-learning algorithm called Wndchrm, which can classify images based on data in their pixels. Shamir, who designed it, has also used the algorithm to categorize microscope images and distinguish a fake Jackson Pollock painting from the real thing.

It works by turning physical attributes into numbers, and it uses 2,885 of these numerical descriptors for each galaxy image. Each relates to characteristics like textures, shapes and edges, allowing the algorithm to categorize a galaxy as spiral or elliptical.

Shamir and Kuminski trained it with 300 galaxies Shamir classified himself. Confident Wndchrm had learned the ropes, they fed it 3 million SDSS galaxies, and then compared its classification with Galaxy Zoo images. They only used “superclean” SDSS galaxies, which meant many different Zoo volunteers had looked at the same images and 95 percent of them agreed on the galaxies’ physical attributes. Wndchrm gave each classification a bit of a hedge: say, 85 percent certainty it’s a spiral, 15 percent certainty it’s an elliptical. This uncertainty was built in because a galaxy’s attributes are often subjective, and difficult for a computer to discern, Shamir says.

“It can be rather ambiguous what is considered an arm and what is not. Computers don’t always excel at that,” he says.

Shamir and Kuminski had to throw away the most uncertain galaxies, rejecting anything less than 54 percent certain on the spiral side and anything less than 80 percent certain on the elliptical side. When they did that, the computer matched the humans 98 percent of the time. In all, they classified about 900,000 spiral galaxies and 600,000 elliptical galaxies.

“It doesn't mean that the rest can’t be used, but they need to be treated more carefully,” says Carrasco Kind, who wasn’t involved with the work but says he read it with great interest. “They provide extra motivation that we are getting better at this.”

These galactic characteristics may seem simple, but astronomers can learn a lot from them. Elliptical and reddish galaxies are passive, meaning they rarely form new stars. They are comfortably in their senior years, and often live in groups. Spirals and bluer galaxies are more active and youthful, forming new stars all the time, but they tend to be more isolated. By classifying tens of billions of galaxies, scientists can use statistics to analyze how they are distributed throughout space, giving us a more granular map of the cosmos—and a better sense for how fast it will thin out as the universe expands.

Astronomers are especially interested in weird galaxies that don’t look like the majority. These oddballs will be too hard to find without a computer’s help, especially given the insane amount of data LSST will bring in. And they can help fill in gaps in theories about how the cosmos evolved over time.

“They are really important to science, because they carry a lot of information about the universe. Something caused them, something made them look that way,” Shamir says. “Finding that needle in the haystack is something we want to do.”