Before I started playing Guess the Correlation, I didn’t expect to spend an hour of my Easter weekend obsessing over an 8-bit video game, much less one based on something that many scientists do every day. I also didn’t expect to be hypnotized by graph after graph of black dots, trying to accurately gauge the patterns they concealed, in exchange for points and a place on a leaderboard. And I definitely didn’t expect to have fun doing it.
Guess the Correlation is the brainchild of Omar Wagih, a graduate student at the European Bioinformatics Institute, and nefarious devourer of the thing I once called “my free time.” On paper, it sounds incredibly boring. In practice, it is inexplicably addictive. Try it.
Players see a stream of scatter plots—common graphs that visualize the relationship between two things, whether temperature and ice-cream sales, or body weight and heart disease risk, or number of time spent on this infernal game and number of friends you have. Your job is to eyeball the plots and estimate a number called R, which measures how correlated the two things are. In the game, R can range from 0 (no correlation at all) to 1 (a perfect positive correlation).
Scientists do this all the time, making judgments about correlations by looking at scatter plots. It’s deceptively hard, which I discover as I play. A strong correlation, say where R is higher than 0.8, is obvious enough because the dots line up in a clean slash. Likewise, a weak correlation, where R is lower than 0.2, looks like the target sheet of a blind shooter. But there’s a large middle ground where my estimates are often hilariously off—which is why Wagih created the game in the first place.
Last December, he attended a seminar where the speaker presented a scatter plot and baldly claimed that there was a correlation. “It looked, you know, not very correlated,” says Wagih. “I thought: Should I take his word for it? Afterwards, he showed me the R and it turned out that there was a correlation, and I had underestimated the signal. I realized I was probably not alone.”
He found several sites where you could guess the R values of randomly generated scatter plots but “they got really boring,” he says. “There’s nothing driving you to stay. That’s where I got the idea for a game.”
The mechanics are simple, enforced by minimalist design and nostalgic music. You guess the R value for a steady stream of scatter plots. You lose lives for inaccurate guesses and regain them for accurate ones. You also earn coins for good estimates, which contribute to your final score. You can even play against a friend. And that’s it.
Wagih launched the game last December and has collected a database of everyone’s estimates. He plans to analyze that data to see if there are visual elements in scatter plots that hoodwink people, causing them to overestimate or underestimate correlations. “This has been done before but the key thing here is the large amount of data that I have,” he says. Other studies involved dozens of volunteers and a few thousand estimates. As of mid-March, Wagih had 170,000 registered players and a database overflowing with four million guesses.
“I want to create a more sophisticated game, something that’s a little more addictive,” says Wagih. (Oh dear god.) He’s going to add levels of difficulty that toy with the number of points on the plots or the size of the points. (Make it stop.) “I want to make it a mainstream game that you’ll play on your smartphone when you’re bored, so you don’t realize you’re sitting there guessing correlations and contributing to this research topic.” (No, please no.)
“I’ve played it myself more than I probably should have,” he adds. “I would sit next to friends and watch them type in an answer and go, no, no, that’s 0.72. They wouldn’t believe me and I’d be very close.”
Backseat-correlationing aside, his experience attests to the game’s potential as a training tool for improving a researcher’s ability to suss out correlations. “That was the primary purpose,” Wagih says. “I cross paths with these kinds of plots almost every day, either ones of my own or in papers I read. If this game can train you to subconsciously identify the structures or properties of a scatter plot that contribute to the correlation, that would be key.”
Visualizations help us to make sense of large swathes of data, but they contain their own biases that can lead us astray. Information is beautiful, but beauty is deceptive. “As a researcher, you read papers and a lot of the time, you eyeball the figures without even reading the text,” he says. “You see a plot—it could even be your own plot—and make a judgment based on it. Contrary to what people believe, they’re not very good at this. And I have the data to prove that.”