Education has entered the era of Big Data. The Internet is teeming with stories touting the latest groundbreaking studies on the science of learning and pedagogy. Education journalists are in a race to report these findings as they search for the magic formula that will save America's schools. But while most of this research is methodologically solid, not all of it is ready for immediate deployment in the classroom.
Jessica was reminded of this last week, after she tweeted out an interesting study on math education. Or, rather, she tweeted out what looked like an interesting study on math education, based on an abstract that someone else had tweeted out. Within minutes, dozens of critical response tweets poured in from math educators. She spent the next hour debating the merits of the study with an elementary math specialist, a fourth grade math teacher, and a university professor of math education.
Tracy Zager, the math specialist, and the author of the forthcoming book Becoming the Math Teacher You Wish You’d Had, emailed her concerns about the indiscriminate use of education studies as gospel:
Public education has always been politicized, but we've recently jumped the shark. Catchy articles about education circulate widely, for understandable reason, but I wish education reporters would resist the impulse to over-generalize or sensationalize research findings.
While she conceded that education journalists “can’t be expected to be experts in mathematics education, or science education, or literacy education,” she emphasized that they should be held to a higher standard than the average reader. In order to do their jobs well, they should not only be able to read studies intelligently,“they should also consult sources with field-specific expertise for deeper understanding of the fields.”
After she was schooled on Twitter, Jessica called up Ashley Merryman, the author of Nurture Shock: New Thinking About Children, and Top Dog: The Science of Winning and Losing. “Just because something is statistically significant does not mean it is meaningfully significant,” Merryman explained. “The big-picture problem with citing the latest research as a quick fix is that education is not an easy ship to turn around.” When journalists cite a press release describing a study without reading and exploring the study’s critical details, they often end up oversimplifying or overstating the results. Their coverage of education research therefore could inspire parents and policymakers to bring half-formed ideas into classroom. Once that happens, said Merryman, “the time, money, and investment that has gone into that change means we are stuck with it, even if it’s later proven to be ineffective in practice.”
As readers and writers look for solutions to educational woes, here are some questions that can help lead to more informed decisions.
1. Does the study prove the right point?
It’s remarkable how often far-reaching education policy is shaped by studies that don’t really prove the benefit of the policy being implemented. The Tennessee Student Teacher Achievement Ratio (STAR) study is a great example.
In the late 1980s, researchers assigned thousands of Tennessee children in grades K-3 to either standard-sized classes (with teacher-student ratios of 22-to-1) or smaller classes (15-to-1) in the same school and then followed their reading and math performance over time. The landmark STAR study concluded that K-3 kids in smaller classes outperformed peers in larger classes. This led to massive nationwide efforts to achieve smaller class sizes.
Subsequent investigations into optimal class size have yielded more mixed findings, suggesting that the story told in STAR was not the whole story. As it turns out, the math and reading benefits experienced by the K-3 kids in Tennessee might not translate to eighth grade writing students in Georgia, or geography students in Manhattan, or to classes taught using different educational approaches or by differently skilled teachers. A key step in interpreting a new study is to avoid extrapolating too much from a single study, even a well-conducted one like STAR.
2. Could the finding be a fluke?
Small studies are notoriously fluky, and should be read skeptically. Recently Carnegie Mellon researchers looked at 24 kindergarteners and showed that those taking a science test in austere classrooms performed 13 percent better than those in a “highly decorated” setting. The authors hypothesized that distracting décor might undermine learning, and one article in the popular press quoted the researchers as saying they hoped these findings could inform guidelines about classroom décor.
While this result may seem to offer the promise of an easy 13-percent boost in students’ learning, it is critical not to forget that the results may come out completely different if the study were replicated in a different group of children, in a different school, under a different moon. In fact, a systematic review has shown that small, idiosyncratic studies are more likely to generate big findings than well-conducted larger studies. Would that 13 percent gap in student performance narrow in a larger study that controlled for more variables?
In other words, rather than base wide-reaching policy decisions on conclusions derived from 24 kindergarteners, it would seem reasonable, for now, to keep the Jane Austen posters and student art on the classroom wall.
3. Does the study have enough scale and power?
Sometimes education studies get press when they find nothing. For instance, Robinson and Harris recently suggested that parental help with homework does not boost academic performance in kids. In negative studies like these, the million-dollar question is whether the study was capable of detecting a difference in the first place. Put another way, absence of evidence does not equal evidence of absence.
There are multiple ways good researchers can miss real associations. One is when a study does not have enough power to detect the association. For example, when researchers look for a rare effect in too small a group of children, they sometimes miss the effect that could be seen within a larger sample size. In other cases, the findings are confounded—which means that the factor being studied is affected by some other factor that is not measured. For example, returning to Robinson and Harris, if some parents who help their kids with homework actually do the kids’ homework for them while others give their kids flawed advice that leads them astray, then parental help with homework might appear to have no benefit because the good work of parents who help effectively is cancelled out by other parents’ missteps.
It’s always a good idea to check whether a negative study had enough power and scale to find the association it sought, and to consider whether confounds might have hidden—or generated—the finding.
4. Is it causation, or just correlation?
It turns out that the most important way for parents to raise successful children is buy bookcases. Or at least this is what readers could conclude if they absorbed just the finding summarized in this Gizmodo article, and not the fourth paragraph caveat that books in the home are likely a proxy for other facets of good parenting—like income, emphasis on education, and parental educational attainment.
Correlation—in this case, of bookshelves at home with achievement later in life—does not indicate causation. In fact, it often does not. The rooster might believe it causes the sun to rise, but reality is more complex. Good researchers—such as the original authors of the bookcase study—cop to this possibility and explain how their results might only refer to a deeper association.
No research study is perfect, and all of the studies we cited above have real merit. But, by asking good questions of any research finding, parents and journalists can help bring about sounder conclusions, in life and in policy-making. It’s easy to believe catchy, tweet-able headlines or the pithy summaries of institutional press releases. But since our kids’ education ultimately depends on the effectiveness and applicability of the available research, we should ensure that our conclusions are as trustworthy and formed as they can possibly be.