Late last night, pro-Trump Twitter lit up with excited chatter. Donald Trump is falling fast in the polls, sliding through a month-long decline most statisticians would say is a result of him being, you know, unpopular. (And maybe this. Or this. Or this.) But one blogger had another theory: Polling organizations are deliberately interviewing more Democrats to skew the surveys toward Hillary Clinton.
This afternoon, Trump threw his support behind the idea. “When the polls are even, when they leave them alone and do them properly, I’m leading,” he said at a rally in Florida. “But you see these polls where they’re polling Democrats. How’s Trump doing? Oh, he’s down. They’re polling Democrats. The system is corrupt and it’s rigged and it’s broken.”
Let me cut this off at the head: Neither the writer nor Trump understand how polls work.
Both are citing a 2008 email from the hacked account of Clinton aide John Podesta, posted yesterday on WikiLeaks. In the email, a prominent Democratic strategy firm recommended “oversampling” certain voters when running polls, including blacks, Hispanics and young people.
Looks like a smoking gun, crowed the anonymous writer: If you need make it look like Hillary Clinton is winning, just interview more Democrats. “And that’s how you manufacture a 12-point lead for your chosen candidate and effectively chill the vote of your opposition,” they wrote.
Setting aside for the moment that the email in question wasn’t sent during this election (nor the last one, but the one before that), the blogger and the conspiracists now piling on in support seriously misunderstand what oversampling actually is.
When pollsters field a survey, they randomly call (or contact online) a representative sample of the population they’re studying. If you’re looking at the entire United States, maybe that’s 1,000 people. The magic of statistics means the researcher can be fairly confident, within a certain margin of error, that the opinions of that sample will match up with the population as a whole. If the sample is large enough—and 1,000 people almost always is—that margin of error will be minimal.
But if the pollster is particularly interested in a smaller subgroup—say, suburban housewives—they might run into trouble. What if they only contacted 50 suburban housewives during their random phone calls? That’s a lot less than 1,000. The margin of error for their responses will be a great deal higher—making it harder to accurately predict what suburban housewives really think.
Oversampling is the solution. When pollsters launch a survey, they’ll often try to interview more people from underrepresented groups so they’ll end up with a large enough samples to draw real conclusions. Before they report the results, they’ll rebalance the sample to bring it back in line with the overall demography of the population—negating the inflationary effect of the oversample.
Take the example in the screenshot above. If a campaign is particularly interested in how Hispanics will vote in Arizona, it might interview a bunch more than a random chance alone would offer, just to make sure the sample is large enough to reasonably draw conclusions about how the entire community feels. Hispanics, after all, are still a minority in the Grand Canyon State; a random sample of 1,000 people might only net 300 or so.
Perhaps they’ll aim to interview 500 instead. But once the pollster shifts back to looking at the entire state, they will dial back the proportion of Hispanics in the survey to mirror the actual demographics of Arizona.
Oversampling is, in other words, a completely valid statistical practice that everyone uses, including Republicans pollsters and probably Trump’s own campaign. If the polls are overestimating Clinton’s lead, and Trump is headed for an upset win, it’s not because of pollsters using oversampling to get more accurate results for demographic subgroups.
“If you wanted to just bias the poll, you wouldn’t waste the extra money making all these extra calls—you’d try to manipulate it from the beginning,” said Jon McHenry, a Republican pollster. “This is an added expense to the pollster with the idea of getting more information about a certain subgroup, and weighting that back so you understand the overall as well.
“Sometimes,” he added wryly, “the polls say what they say because they’re accurate.”
It’s more common to complain about how pollsters weigh their surveys; The New York Times recently supplied four researchers with the same data and received five different results back, all because of differences in how the respondents were screened. Pollsters have different methods to determine whether a surveyed voter will actually cast a ballot, and that’s usually behind complaints about a given poll being too Clinton- or Trump-friendly.
But oversampling? It’s benign. The fact that a post detailing a media oversampling conspiracy has gotten more than 1 million pageviews says something about Trump supporters’ fears of losing the election, but even more about America’s statistical illiteracy.