Ted Cruz said one of the following quotes in his presidential campaign announcement. Hillary Clinton said the other in her first major speech of the campaign. Can you guess which is which?
"... imagine in 2017 a new president signing legislation repealing every word of Obamacare."
"I believe that success isn't measured by how much the wealthiest Americans have, but by how many children climb out of poverty."
For humans with some political knowhow, this task is simple. The word "repeal" in the context of "Obamacare" immediately signals a Republican talking point, whereas Democrats are more likely to refer to "the wealthiest Americans" in a statement about poverty.
The question that data scientists at Quorum, a political analytics firm, sought to answer was this: Can computers use a similar process to come to the same conclusion? Could they teach a computer to predict political party from speech?
Mining the text of House and Senate floor speeches in the Congressional record, Quorum cofounder Jonathan Marks and his team wanted to see if they could accurately predict which congressional members belong to which party.
"We gave the computer a large amount of text, which had been fed by Republicans and Democrats," Marks explains. "And then we asked it to identify patterns in the way that Democrats and Republicans talk that make them different."
The program searched for the favorite words used by each party, but it also searched for the words that were uniquely favored by each party. Each party may say "America" often. But Republicans are much more likely to say "bureaucrats," for example.
According to Marks, about 80 percent of the variation in the difference between what representatives say in Congress can be explained by party affiliation. According to his computer program, here are the words and phrases with the greatest predictive power for both parties.