It is virtually impossible that the former governor's recent spike in Twitter followers resulted from normal activity on the social network.
Last week Zach Green of 140Elect, noticed some strange goings-on with Mitt Romney's Twitter account (@MittRomney). Romney's account, which had been averaging around 2,000 to 5,000 new followers a day, gained 141,000 followers in two days.
This observation prompted speculation - from Green, Slate,The Huffington Post, CNN, and many others - that the Romney Campaign was buying robot followers, or perhaps (conspiratorially) someone else was buying them to make Romney look bad.
But actual analysis of these new followers has been limited to manual observation; many do, indeed, look fake. However, high-profile users can be targets for the algorithms that run bot accounts, and some amount of bogus followers is to be expected. We decided to dig into the data of these new followers to see if they differ statistically from the new followers of other accounts similar in size to Romney. We subjected Barack Obama's account, @BarackObama, to the same analysis.
We developed a simple methodology for testing whether a set of followers is likely to be the product of natural user following behavior or bot networks. This test revealed a significant difference between the distribution of followers among the accounts in Mitt Romney's recent spike and that of similar users in our comparison. It strongly indicates that non-organic processes induced Romney's recent surge in followers. We did not find a similar pattern in Barack Obama's recent followers. The details of these findings are presented below.
First, let's get some technical details out of the way. Social networks like Twitter are composed of "nodes", which are individual users, and "edges," which are the ties between them. The number of edges, or
ties, connected to a particular node is referred to as that node's degree. In a
Twitter network edges have a direction; the tie signifies either that one is following or is followed by another user. The number of edges pointing towards
a node - how many others are following that person - is called the indegree.
Accounts that are likely to be bots will tend to have a small indegree. That's because Twitter bots, in general, don't come even close to passing the Turing test; when real people look at them, it is obvious that they are bots. As a result, these fake accounts tend to have 0, or at least very few followers. Based on this, we looked at the proportion of new followers with low follower counts as a proxy for determining the proportion of the accounts that were likely to be "followers for hire." More sophisticated bot networks can use algorithms to follow each other in an attempt to mimic indegree distribution of authentic users. But since our method tests for how distributions differ, it can detect any notable deviation from the expected distribution, not merely the over-presence of accounts with small indegree we would expect from unsophisticated bots.
The median number of followers for Romney's new followers was 5, whereas the median for the comparison group was 27. This represents a stark, and statistically significant difference... the p-value on this was 0.0000.
Our test is based on the underlying assumption that the followers of Twitter accounts tend to display a some kind of general indegree distribution. Exploratory analysis revealed that this distribution varies depending on the size of the original account. We are able to detect probable bot involvement because this distribution would be quite difficult to mimic in a bot network, so presence of many bot followers skews this distribution.
Using TwitterCounter's lists of most followed users, we selected the twenty accounts closest in size to both
Romney and Obama, with approximately the same number of followers; ten had slightly fewer followers, ten slightly more. For Romney the twenty accounts were
selected from those listed within the Eastern Standard Time-zone. No such restriction was possible for Obama, since his is the 6th most followed
account globally, with nearly 18 million followers.
We looked at the indegree distribution of the 150,000 most recent followers from each account in our sample, to see if Romney's dramatic follower spike was truly as suspect as it seemed. This allows us to see the proportion of followers that have very small indegree, and are likely to be bots. On Twitter, degree distributions for networks of a particular size tend to follow a fairly consistent pattern, although this distribution differs notably between very large networks, like Barack Obama's, and medium-size networks, like Mitt Romney's. By comparing the presidential candidates' distributions to the distributions of our set of other accounts of roughly the same size, we were able to see if either Romney's or Obama's new followers differed significantly from the typical distribution.