A month before the 2000 presidential election, George W. Bush benefited from a statistical miracle.

He started October trailing opponent Al Gore, with a CNN/USA Today/Gallup poll projecting a 51-40 split among likely voters. Then came the first presidential debate—remember, the one with the sighing? A few days later, Bush was on top, 49-41, a stunning 19-point swing.

But wait! Gallup polling also showed most voters thought Gore had won the debate. So had Bush shot ahead—by losing?

Looking back, social scientists doubt public opinion was really bobbing around like a pogo stick. Instead, they blame the filtering Gallup used to determine “likely voters”—a vexing bit of psychological prediction pollsters have never gotten quite right.

If you’re in the business of guessing the future, even the best presidential polls do no good if they’re diluted by the opinions of people who will never make it to the ballot box. Studies show unengaged respondents skew Democratic, and polls that don’t filter for registered or likely voters will reliably underestimate Republican support.

Voter screens make polls more accurate. But they’re also a source of variation and error. With several recent high-profile whiffs in the polling world—see the Kentucky governor’s race in 2015, or several House races in 2014—they’re getting another look.

Earlier this week, the Pew Research Center released a report examining several screening methods, including cutting-edge screens using computer algorithms and machine learning. Using data from the 2014 midterm elections, researchers compared how a person responded to pre-election questions against their actual 2014 voting record—a luxury pollsters didn’t have at the time.

The report’s conclusion: The old ways leave much to be desired, especially for unusual elections.

“These methods, which have been around for so long, may be losing some of their accuracy because circumstances have changed,” said Scott Keeter, a senior survey adviser at Pew Research. “Whether there has been a change in our politics in just the last two years that makes all of this less accurate is really impossible to answer at this point.”

For years, the gold standard in screening likely voters was a set of questions developed in the 1950s by Gallup statistician Paul Perry. Recognizing that simply asking people if they planned to vote wasn't enough—nearly everyone says yes—Perry posed a series of questions that factored into a final score.

Here’s Pew’s version:

  • How much thought have you given to the coming November election?
  • Have you ever voted in your precinct or election district?
  • Would you say you follow what’s going on in government and public affairs most of the time, some of the time, only now and then, hardly at all?
  • How often would you say you vote?
  • How likely are you to vote in the general election this November?
  • In the 2012 presidential election between Barack Obama and Mitt Romney, did things come up that kept you from voting, or did you happen to vote?
  • Please rate your chance of voting in November on a scale of 10 to 1.

Tallying up these questions, pollsters would estimate the expected turnout for the election and cut the list accordingly. Only the most engaged voters would be included—or so the theory went.

But Pew’s research shows a number of these questions have dubious predictive value. For instance, pollsters will routinely drop respondents who say they have little interest in politics or haven’t followed the election. However, 55 percent of the people in Pew’s study who said they thought “only a little” about the coming election ending up voting on Election Day.

"We found in 2012 that a lot of people said, 'I have no interest in this election, but you can be damn sure I’m going to show up to vote,’“ Keeter said, recounting individual conversations with voters. "That showed to me that question might not be as useful."

Not even machine learning, the magic elixir of our time, can completely solve this problem. Keeter and his crew built several models to predict 2014 voter turnout using logistic regressions and the "random forest" algorithm, which slices and dices data to find correlations difficult for humans to pick out. The results were decent, but still not spot-on.

One surefire way to improve accuracy, Keeter says, is to match each respondent with their voting record, available via a public file that marks whether an individual voted in previous elections (though not how they cast their ballot). But that requires the cooperation of the poll respondents, who may understandably be unwilling to give their names and addresses to a stranger over the telephone.

And if a machine-learning approach takes off, another danger arises: the false confidence of the algorithm. A pollster could use a computer to make a beautiful model that accurately predicts how a voter would have acted in 2014. But would that have any relevance in the Trump-powered campaign of 2016?

Gallup has taken plenty of heat for its likely voter screens—perhaps most notably from Nate Silver, the founder of FiveThirtyEight. This time around, the polling company says it will sit out the 2016 primary horse race. But Pew’s research shows the challenge of measuring voter interest extends far beyond just one company.

Likely voter predictions dramatically improve the closer you get to an election. In the future, given polling’s other problems, that may be all we can reasonably expect.