“This shows that there is information known in advance of conducting a replication that anticipates replication success,” says Nosek. What kind of information? “I'm not sure I had a clear strategy,” says Marcus Munafo from Bristol University, who was one of the better-performing traders and who has also used prediction markets to evaluate science. He paid attention to statistical power, the journals that the original studies were published in, and which branch of psychology they were part of. “Beyond that, I simply used my gut instinct for whether the original finding felt plausible.”
That's the most interesting bit, says Daniele Fanelli from Stanford University, who studies research bias and misconduct. “It opens some fascinating research questions about understanding which factors are consciously or unconsciously most informative to participants,” he says.
Nosek adds, “We may be able to use prediction markets to be more efficient in deciding what needs to be replicated, and making estimates of uncertainty about studies for which replication is not likely or possible.”
But Fanelli isn't convinced, saying that it “seems like a rather laborious process that's unlikely to be applied across the board.” Hanson has heard similar skepticism before. “We’ve had enough experiments with prediction markets over the years that these findings are not at all surprising,” he says, but “I expect that most ordinary academic psychologists will require stronger incentives than personal curiosity to participate.”
Their success in these markets would have to be tied to tangible benefits, like actual money or the likelihood of securing publications, grants, and jobs. “Imagine that one or more top journals used prediction-market chances that a paper’s main result would be confirmed as part of deciding whether to publish that paper,” he says. “The authors and their rivals would have incentives to trade in such markets, and others would be enticed to trade when they expect that trades by insiders, or their rivals alone, are likely to produce biased estimates.”
The prediction markets have uses beyond analyzing the reliability of individual studies. They also provide an interesting look at the scientific process itself. Using the final market prices and a few statistical traits, Dreber's team could backtrack through each study's history and show how its hypotheses became strengthened or weakened with every step along the way.
For example, before any of these experiments were actually done, what were the odds that they were testing hypotheses that would turn out to be true? Just 8.8 percent, it turned out. This reflects the fact that psychologists often look for phenomena that would be new and surprising.
More worryingly, after the experiments were completed, reviewed, and published, the odds that their hypotheses were true improved to just 56 percent. “So, if you read through these journals and ask, ‘Is this true or not?,’ you could flip a coin!” says Dreber. “That's pretty bad I think. People often say that if you have a p-value that's less than 0.05, there is a 95 percent probability that the hypothesis is true. That's not right. You need a high-powered replication.”