Guest post by Jim Manzi, founder and Chairman of Applied Predictive Technologies, and the author of Uncontrolled: The Surprising Payoff of Trial-and-Error for Business, Politics and Society.
Gabriel, your very deep post that, in passing, requested my comment was fascinating. My family thanks you for the weekend I just spent staring off into space.
You open with this:
Sampling error? Omitted variable bias? Bah, that's for first-year grad students. What I find really interesting is there are some fairly basic principles for how analysis can get really screwy but which can't be fixed by adding more control variables, increasing your sample size, or fiddling with assumptions about the distribution of the dependent variable.
I spend an enormous amount of time in my book arguing that that this problem is pervasive and significant, and that exactly this triptych of remedies will fail to enable us to build models that make useful, reliable and non-obvious predictions for the effects of our interventions in human social systems. In it, I take apart some celebrated social science models for failing in this respect. But in the spirit of what's sauce for the goose is sauce for the gander, I then take apart a model that I built to estimate the effect of changing the name of a convenience store, to show how all three together can't put Humpty Dumpty back together again.
Start at the most foundational level: What is causality? I have an engineer's perspective on this. What I care about is my ability to predict the effect of my interventions better than I can without the model.
Consider two questions:
1. - Does A cause B?
2. - If I take action A, will it cause outcome B?
I don't care about the first, or more precisely, I might care about it, but only as scaffolding that might ultimately help me to answer the second.
For example, in your shoes story, I don't care whether the characteristic of discomfort cause shoes to be considered attractive. I care about whether, for example, if I take an existing type of shoes and narrow the toes, this will cause them to get more coverage in fashion magazines, sell more units or whatever.
In general, the best way to determine this is to take some comfortable shoes, narrow the toes, and then see what happens to sales. That is, to run an experiment.
There are big problems with this approach. One obvious one is that it is often impossible or impractical to run the experiment. But even if we assume that I have done exactly this experiment, I still have the problem of measuring the causal effect of the intervention. In a complicated system, like shoe stores, I have to answer the question of how many pairs I would have sold in the, say, three months after changing my design to narrow toes - I can't just assume that I would have sold the same number of wide-toed shoes that I did in the prior three months. For reasons well-known to you, and that I go through at length in the book, the best way to measure this in a complicated system is a randomized field trial (RFT) in which I randomly assign some stores to get the new shoes and others to keep selling the old shoes. In essence, random assignment allows me to roughly hold constant all of the "screwy" effects that you reference between the test and control group.
But what many cheerleaders for randomized experiments gloss over is that even if I have executed a competent experiment, it is not obvious how I turn this result in to a prediction rule for the future (the problem of generalization or external validity). Here's how I put this in an article a couple of years ago:
In medicine, for example, what we really know from a given clinical trial is that this particular list of patients who received this exact treatment delivered in these specific clinics on these dates by these doctors had these outcomes, as compared with a specific control group. But when we want to use the trial's results to guide future action, we must generalize them into a reliable predictive rule for as-yet-unseen situations. Even if the experiment was correctly executed, how do we know that our generalization is correct?
A physicist generally answers that question by assuming that predictive rules like the law of gravity apply everywhere, even in regions of the universe that have not been subject to experiments, and that gravity will not suddenly stop operating one second from now. No matter how many experiments we run, we can never escape the need for such assumptions. Even in classical therapeutic experiments, the assumption of uniform biological response is often a tolerable approximation that permits researchers to assert, say, that the polio vaccine that worked for a test population will also work for human beings beyond the test population.
But as we climb a ladder of phenomenological complexity from physics to biology to sociology, this problem of generalization becomes more severe. As I put it in Uncontrolled:
We can run a clinical trial in Norfolk, Virginia, and conclude with tolerable reliability that "Vaccine X prevents disease Y." We can't conclude that if literacy program X works in Norfolk, then it will work everywhere. The real predictive rule is usually closer to something like "Literacy program X is effective for children in urban areas, and who have the following range of incomes and prior test scores, when the following alternatives are not available in the school district, and the teachers have the following qualifications, and overall economic conditions in the district are within the following range." And by the way, even this predictive rule stops working ten years from now, when different background conditions obtain in the society.