This is not a new idea. In the 1970s, social scientist Donald Campbell wrote that any metric of quality can become corrupted if people start prioritizing the metric itself over the traits it supposedly reflects. “We realized that his argument works even if individuals aren’t trying to maximize their metrics,” says Smaldino.
He and McElreath demonstrated this by creating a mathematical model in which simulated labs compete with each other and evolve—think SimAcademia. The labs choose things to study, run experiments to test their hypotheses, and try to publish their results. They vary in how much effort they expend in testing their ideas, which affects how many results they get, and how reliable those results are. There’s a trade-off: more effort means truer but fewer publications.
In the model, as in real academia, positive results are easier to publish than negative one, and labs that publish more get more prestige, funding, and students. They also pass their practices on. With every generation, one of the oldest labs dies off, while one of the most productive one reproduces, creating an offspring that mimics the research style of the parent. That’s the equivalent of a student from a successful team starting a lab of their own.
Over time, and across many simulations, the virtual labs inexorably slid towards less effort, poorer methods, and almost entirely unreliable results. And here’s the important thing: Unlike the hypothetical researcher I conjured up earlier, none of these simulated scientists are actively trying to cheat. They used no strategy, and they behaved with integrity. And yet, the community naturally slid towards poorer methods. What the model shows is that a world that rewards scientists for publications above all else—a world not unlike this one—naturally selects for weak science.
“The model may even be optimistic,” says Brian Nosek from the Center of Open Science, because it doesn’t account for our unfortunate tendency to justify and defend the status quo. He notes, for example, that studies in the social and biological sciences are, on average, woefully underpowered—they are too small to find reliable results.
Low statistical power is an obvious symptom of weak research. It is easily calculated, and people have been talking about it since the 1960s. And yet, in over 50 years, it hasn’t improved at all. Indeed, “there is still active resistance to efforts to improve statistical power by scientists themselves,” says Nosek. “With desire to get it published dominating desire to get it right, researchers will defend low statistical power despite it having zero redeeming qualities for science.”
Scientists are now grappling with the consequences of that stagnation. In many fields, including neuroscience, genetics, psychology, ecology, and biomedicine, there’s talk of a reproducibility crisis, where weak and poorly designed studies have flooded the world with doubtful findings. “We spend a lot of time complaining about the culture of science, but verbal arguments allow people to talk past each other,” says Smaldino. “A formal model allows you to be clearer about what you’re talking about.”