It is virtually impossible that the former governor's recent spike in Twitter followers resulted from normal activity on the social network.
Last week Zach Green of 140Elect, noticed some strange goings-on with Mitt Romney's Twitter account (@MittRomney). Romney's account, which had been averaging around 2,000 to 5,000 new followers a day, gained 141,000 followers in two days.
This observation prompted speculation - from Green, Slate,The Huffington Post, CNN, and many others - that the Romney Campaign was buying robot followers, or perhaps (conspiratorially) someone else was buying them to make Romney look bad.
But actual analysis of these new followers has been limited to manual observation; many do, indeed, look fake. However, high-profile users can be targets for the algorithms that run bot accounts, and some amount of bogus followers is to be expected. We decided to dig into the data of these new followers to see if they differ statistically from the new followers of other accounts similar in size to Romney. We subjected Barack Obama's account, @BarackObama, to the same analysis.
We developed a simple methodology for testing whether a set of followers is likely to be the product of natural user following behavior or bot networks. This test revealed a significant difference between the distribution of followers among the accounts in Mitt Romney's recent spike and that of similar users in our comparison. It strongly indicates that non-organic processes induced Romney's recent surge in followers. We did not find a similar pattern in Barack Obama's recent followers. The details of these findings are presented below.
First, let's get some technical details out of the way. Social networks like Twitter are composed of "nodes", which are individual users, and "edges," which are the ties between them. The number of edges, or ties, connected to a particular node is referred to as that node's degree. In a Twitter network edges have a direction; the tie signifies either that one is following or is followed by another user. The number of edges pointing towards a node - how many others are following that person - is called the indegree.
Accounts that are likely to be bots will tend to have a small indegree. That's because Twitter bots, in general, don't come even close to passing the Turing test; when real people look at them, it is obvious that they are bots. As a result, these fake accounts tend to have 0, or at least very few followers. Based on this, we looked at the proportion of new followers with low follower counts as a proxy for determining the proportion of the accounts that were likely to be "followers for hire." More sophisticated bot networks can use algorithms to follow each other in an attempt to mimic indegree distribution of authentic users. But since our method tests for how distributions differ, it can detect any notable deviation from the expected distribution, not merely the over-presence of accounts with small indegree we would expect from unsophisticated bots.
Our test is based on the underlying assumption that the followers of Twitter accounts tend to display a some kind of general indegree distribution. Exploratory analysis revealed that this distribution varies depending on the size of the original account. We are able to detect probable bot involvement because this distribution would be quite difficult to mimic in a bot network, so presence of many bot followers skews this distribution.
Using TwitterCounter's lists of most followed users, we selected the twenty accounts closest in size to both Romney and Obama, with approximately the same number of followers; ten had slightly fewer followers, ten slightly more. For Romney the twenty accounts were selected from those listed within the Eastern Standard Time-zone. No such restriction was possible for Obama, since his is the 6th most followed account globally, with nearly 18 million followers.
We looked at the indegree distribution of the 150,000 most recent followers from each account in our sample, to see if Romney's dramatic follower spike was truly as suspect as it seemed. This allows us to see the proportion of followers that have very small indegree, and are likely to be bots. On Twitter, degree distributions for networks of a particular size tend to follow a fairly consistent pattern, although this distribution differs notably between very large networks, like Barack Obama's, and medium-size networks, like Mitt Romney's. By comparing the presidential candidates' distributions to the distributions of our set of other accounts of roughly the same size, we were able to see if either Romney's or Obama's new followers differed significantly from the typical distribution.
According to a random sample of 1000 followers from the candidates' accounts, 26.9% of Romney's 150,000 newest followers had fewer than 2 followers. For other accounts of similar size, only 9.6% of new followers had less than 2 followers themselves. The median number of followers for Romney's new followers was 5, whereas the median for the comparison group was 27. This represents a stark, and statistically significant difference. If you are a statistics nerd, like us, you might want to know that the p-value on this was 0.0000. For the rest of the world, this means that there is, essentially, a zero percent chance that the underlying characteristics of Romney's followers are actually the same as the comparison users.
The graph below shows the indegree distribution of Mitt Romney's new followers compared to the control accounts. The large spike in Romney's graph show on the left-hand side indicates a large number of these new followers have very few followers, while the relative paucity in the right hand signals that few of these new followers are highly followed themselves. In fact, 63.7% of the accounts following Romney from the sample had 10 or fewer followers. The control groups, on the other hand, show many more followers who are, themselves, reasonably well followed.
A certain amount of followers with a small indegree is to be expected; there is a set of real users without many followers (e.g. new users, users who only use Twitter to follow people, but don't tweet). However, such a high concentration of followers with small indegree relative to the comparison accounts is highly unlikely, absent funny business. There is no strong reason to believe that the legitimate users with small indegree would follow Romney with any more frequency than other accounts the same size. In short, the degree distribution of Romney's new followers is strongly indicative of a concentration of bot or bot-like followers.
We found a notably different story when we analyzed Barack Obama's most recent followers. In fact, new followers of Obama tended to have more followers than those of comparison accounts. The median number of followers for Obama's cohort was 7, while the median number in the comparison accounts was 6. Additionally, we found no statistically significant difference between the distribution of followers among those who had recently followed Obama and those who had recently followed other accounts of the same size.
The above graph shows that, if anything, the new followers of Barack Obama tend to have slightly more followers, as the peak on the left hand side is higher for the comparison accounts and the tail slightly thicker on the right for Obama.
Regardless specific shapes of Obama and Romney's distributions, the extent of deviation from the comparison accounts that Romney's followers show, is strongly indicative of bot involvement. We found no such deviation among Barack Obama's followers.
According to a Newt Gingrich staffer who spoke with Gawker during the primaries, buying followers is not unheard of in political campaigns:
Newt employs a variety of agencies whose sole purpose is to procure Twitter followers for people who are shallow/insecure/unpopular enough to pay for them. As you might guess, Newt is most decidedly one of the people to which these agencies cater.
It is not clear if Romney -- or more likely one of his staffers or a consultant -- followed Newt's lead here or if last week's spike was, as some have speculated, planted to embarrass the candidate. The Romney camp has denied buying the followers. Based on the results above, the one thing that we can be fairly sure of, however, is that someone did.