When pollsters field a survey, they randomly call (or contact online) a representative sample of the population they’re studying. If you’re looking at the entire United States, maybe that’s 1,000 people. The magic of statistics means the researcher can be fairly confident, within a certain margin of error, that the opinions of that sample will match up with the population as a whole. If the sample is large enough—and 1,000 people almost always is—that margin of error will be minimal.
But if the pollster is particularly interested in a smaller subgroup—say, suburban housewives—they might run into trouble. What if they only contacted 50 suburban housewives during their random phone calls? That’s a lot less than 1,000. The margin of error for their responses will be a great deal higher—making it harder to accurately predict what suburban housewives really think.
Oversampling is the solution. When pollsters launch a survey, they’ll often try to interview more people from underrepresented groups so they’ll end up with a large enough samples to draw real conclusions. Before they report the results, they’ll rebalance the sample to bring it back in line with the overall demography of the population—negating the inflationary effect of the oversample.
Take the example in the screenshot above. If a campaign is particularly interested in how Hispanics will vote in Arizona, it might interview a bunch more than a random chance alone would offer, just to make sure the sample is large enough to reasonably draw conclusions about how the entire community feels. Hispanics, after all, are still a minority in the Grand Canyon State; a random sample of 1,000 people might only net 300 or so.
Perhaps they’ll aim to interview 500 instead. But once the pollster shifts back to looking at the entire state, they will dial back the proportion of Hispanics in the survey to mirror the actual demographics of Arizona.
Oversampling is, in other words, a completely valid statistical practice that everyone uses, including Republicans pollsters and probably Trump’s own campaign. If the polls are overestimating Clinton’s lead, and Trump is headed for an upset win, it’s not because of pollsters using oversampling to get more accurate results for demographic subgroups.
“If you wanted to just bias the poll, you wouldn’t waste the extra money making all these extra calls—you’d try to manipulate it from the beginning,” said Jon McHenry, a Republican pollster. “This is an added expense to the pollster with the idea of getting more information about a certain subgroup, and weighting that back so you understand the overall as well.
“Sometimes,” he added wryly, “the polls say what they say because they’re accurate.”
It’s more common to complain about how pollsters weigh their surveys; The New York Times recently supplied four researchers with the same data and received five different results back, all because of differences in how the respondents were screened. Pollsters have different methods to determine whether a surveyed voter will actually cast a ballot, and that’s usually behind complaints about a given poll being too Clinton- or Trump-friendly.
But oversampling? It’s benign. The fact that a post detailing a media oversampling conspiracy has gotten more than 1 million pageviews says something about Trump supporters’ fears of losing the election, but even more about America’s statistical illiteracy.