Reader Ted writes:

As a social scientist, I agree with Megan on this: this paper is not sufficiently transparent about its data and methodology, and may contribute to some misleading inferences due to a lack of contextualization and selective use of statistics. This is not to say I know the conclusions of the articles are wrong, but rather to say I can't be sure from what is presented in this published paper. Megan in fact barely touches on the problems in transparency, which would have led to this paper to be sent back for revisions by any leading social science journal with which I'm familiar. It is a surprising fact to some (including myself a few years back) that medical and health journals seem to have lower standards for statistical evidence than journals in sociology, economics, political science, and psychology.

Time won't permit me to run through all of the issues now, but a few highlights:

First, let's address the percentage versus absolute numbers debate. Both pieces of information can be useful for answering distinct questions. Sometimes we're mostly interested in compositional numbers (proportions), like when we discuss the changing ethnic or racial make-up of the U.S. In the latter case, it would be more useful to know not the absolute population increase among Latinos, but instead whether Latinos have increased as a proportion of the U.S. population. In other cases, we're may be interested in the actual numbers, for example, how many people have contracted a new deadly disease. Often both numbers are valuable.

That's likely the case with medical bankruptcy. While we might want to know whether medical costs are a leading cause of bankruptcy, we also surely wish to know whether medical costs are causing an increase in the number of bankruptcies. (To take a simple example for illustration, many people wouldn't care as much about gun violence, even if guns were involved in 90% of violent deaths IF there were also only 10 violent deaths in the country per year. The size of the overall "problem" and whether it is decreasing or increasing are relevant considerations, especially for an important public policy debate. The paper should indeed have been more forthcoming about this. Had it been so, the authors would come across as more fair minded without necessarily minimizing the problem (there's still plenty of bankruptcies). By giving us more context, the paper's argument becomes somewhat less dire but hardly unimportant.

On top of that, we'd want better time trend data before inferring that the role of medical costs in bankruptcies changed between 2001 and 2007. All we know from their multivariate logistic analysis (accepting it on face value) is that the share of medical bankruptcies (by their definition) was higher in 2007 than 2001. Okay, but was that due to some change in the nature of medical costs, medical insurance, or anything else do to with health? Or was it due to how the intervening change in national bankruptcy laws differentially affected the composition of bankruptcies? (They address this only superficially at the end of the paper.) Or perhaps due to other changes in the financial environment - a plummeting housing market, a change in the lending environment, and so on? We can't tell from these data, because the binary variable for year of study would capture anything and everything that was different between the years (and not otherwise controlled for). One might be able to use court data to construct a dataset of annual bankruptcy filings and then add variables to control for other factors, and in that case a better outcome variable is probably number of bankruptcies per thousand people (or similar). This would allow researchers to monitor whether medical costs were causing actual increases in bankruptcy (not just becoming an increasing share of bankruptcies), yet still control for population growth and other causes.

Second, there are a number of other problems with the paper. These don't clearly invalidate the conclusions of the paper, but they do make me skeptical that we can have confidence in those conclusions from the data as presented.

Example 1: Tables and analyses don't report the number of respondents in the particular analyses, which is an issue because evidently the baseline or reference group shifts from analysis to analysis without explanation for why. See the note to Table 2, where the key percentages on medical bankruptcy are reported; the note indicates that these percentages are based not on the full sample but only the home-owning half of the sample. Does this make a difference? I don't know. Why was it done? I don't know, it's not explained.

Example 2: Unless I'm missing an explanation somewhere, it appears like the 2001 to 2007 comparison is based on different populations. The former was a 5-state study, the latter seems to be a national study. If that's true, then we really don't know in fact anything about the national trend from these data (but the authors could try to shed light on it by using their 2007 data to tell us the trend for the 5 states from 2001, which may or may not mirror the national trend).

Example 3: They provide a fair amount of information about their sampling and survey procedures, but not enough. One case in point is that they provide a verbal assurance that respondents to their initial mail survey "resemble" non-respondents on many basic characteristics. Given that they have the data to reach this judgment, most studies would provide more precise numbers to demonstrate that the two groups are statistically indistinguishable. We also don't learn much about what respondents were told about the study, so that we can assess whether "demand" effects may have inflated reports of medical problems. The researchers may have been very careful about this, but they need to report what they did.

Example 4: Most seriously, the authors invite improper inferences by conducting a national survey of bankruptcy filers, rather than a national survey of all households. By doing what they did, the researchers can study how often medical costs are an issue among bankruptcy filers. But many of us, including in all likelihood the researchers and the journalists and politicians who will be talking about this study, are more interested in whether and how much medical costs are causing Americans to file for bankruptcy. If that's so, then the researchers would appear to have committed a cardinal sin in principles of scientific research design: selecting on the dependent variable. They are examining only people for whom the problem has occurred. If we want to know whether a factor is causing that problem, then we need to collect data on instances where the problem did AND did not occur (i.e., those who did and did not file for bankruptcy). Then one can conduct a multivariate analysis to determine the impact of the variable in question (medical costs), controlling for other factors, on the outcome of interest (the likelihood of filing for bankruptcy). When you select on the dependent variable as these researchers did, you can't tell something as basic as whether more Americans with high medical costs actually avoided bankruptcy than Americans without high medical costs.

Perhaps had researchers fixed these and other problems, we'd discover the problem is even worse than they currently report. Or perhaps we'd discover things are actually getting better, or there is no change. We just can't tell from their data as conducted and presented, and that's a shame because that means it really can't inform an important on-going discussion about national conditions and public policy.

(By the way, this is what a real peer review looks like, and it is how these sorts of problems can be fixed (and often are) before a study is released to the public.)

We want to hear what you think about this article. Submit a letter to the editor or write to letters@theatlantic.com.