“I think a lot of people would have said that this problem is impossible to crack,” says Vosshall. “The fact that we’ve made any progress at all was surprising.”
This has been hard, she says, for two reasons. First, it's hard to know which aspects of a molecule contribute to its odor. We know that the wavelength of light determines its color—simple. But a molecule’s smell might depend on the number of carbon atoms it has, how stable it is, and the chemical branches that protrude from it. Second, scientists who study smell have understandably focused on molecules that are relevant to the food and perfume industries. That’s like trying to understand color vision by only studying red, and ignoring blue and green. “It’s a cramped space,” she says. “We tried to break out.”
She and colleague Andreas Keller began by collecting a much broader range of 480 molecules—including unfamiliar, unpleasant, and even odorless ones. They then presented these chemicals to 55 volunteers, whom they recruited through Craigslist. The participants visited the lab and worked their way through rack upon rack of glass vials, opening and inhaling. They noted whether they smelled anything at all. If so, they rated the scents along several categories. How intense or pleasant is it? How garlicky? Fishy? Fruity?
The result was one of the largest data sets on smell ever collected, which Keller and Vosshall turned over to Meyer at IBM. He runs the DREAM Challenges, in which volunteers compete to build machine learning algorithms that can make useful predictions from large sets of data. “I like to call it the piñata of science—everyone is trying to hit the problem with a different algorithm,” Meyer says. The competitors have tackled everything from improving the accuracy of mammograms to predicting how people respond to cold viruses. And between January and May 2015, 22 teams of them worked on smells.
Richard Gerkin from Arizona State University and Yuanfang Guan from the University of Michigan led the teams that produced the winning algorithms, which were constructed using data from 338 molecules, refined using 69 more, and finally tested on another 69. After the challenge, the teams shared all their results to produce even better models.
On a performance scale from 0 to 1, their ability to predict a molecule’s smell came in at 0.71 for pleasantness, 0.78 for intensity, and anywhere from 0.1 to 0.7 for the other 19 descriptors like garlic, fish, fruit, sour, musky, decayed, sweaty, sweet, grassy, grass, and burnt. As the team writes, “It [is] possible to predict the perceptual qualities of virtually any molecule with an impressive degree of accuracy to reverse-engineer the smell of a molecule.”
Their scores may not sound like much, but they’re better than those from previous studies. “They’re not blockbuster in terms of how correlations go, but having done some work like this myself, I know that’s actually quite impressive,” says Jason Castro from Bates College.