Hey, Science: Will This Post Get Shared on Twitter?

Researchers have developed a tool to predict the spread of news-related tweets.


You can use Twitter to predict swings in the stock market. You can use it to predict movies' success at the box office. You can use it to predict the spread of illness.

Turns out, you can also use Twitter to predict news articles' popularity ... on Twitter.

Over at HP Labs, the researchers Bernardo Huberman, Sitaram Asur, and Roja Bandari, using the API of the aggregator Feedzilla, collected a sample of over 40,000 articles posted to Twitter over a week-long span last August. The team then analyzed and rated the articles they'd gathered according to four factors: the news outlet that writes and first tweets the article; the information category that the article fits into; the relative emotion of the article's language; and the people and things named in the article.

What they found both confirms and flouts convention wisdom. On the one hand, sourcing, per their analysis, is the most significant predictor of the amount of tweets that an article will encourage. Similarly, stories that belong to popular topic categories (health! technology! cats!), tend to spread more readily on Twitter than stories that don't. As do stories that mention celebrities and, as the paper puts it, "a known place, person, or organization."

Not too surprising. More unexpected, though, is the researchers' finding that the emotional component of articles doesn't seem to make much difference in how, or at least in how often, they're shared. Emotional content and more "objective" content, the team discovered, seem to effect about the same amount of distribution on Twitter. Brand matters; information matters; tone, however, not so much.

Once they'd analyzed their data, the team then converted their info into an algorithm that predicts the number of tweets a given article will receive when it's posted to the social web. The model, they say -- which divides its outcomes according to a "low-tweet," "medium-tweet," or "high-tweet" classification framework -- works with an 84 percent accuracy rate.

Even beyond the tantalizing possibility of accurate tweetcasting, though, the assumption Huberman and his colleagues are testing -- that the qualities of the content passed through social networks can determine, for themselves, how well that content will spread through those networks -- is an important one. We tend to focus on the structure of networks as the primary factor in how content spreads within them: If I send a tweet containing nothing but a happyface emoticon (as one does, etc.), and if Ashton Kutcher proceeds to retweet it (as he does, etc.), it would stand to reason that my little happyface emoticon might spread pretty far across the Twitterverse.

But that would be a little sad. Because my little happyface emoticon would be effectively content-free, and (with apologies to any fakefaceophiles out there) pretty uninteresting. It's nice to think that the content that's effectively the inverse-emoticon -- content that contains rich information from a trusted source -- will spread not because of gimmicks, but because it's sharable on its own terms.

And it's intriguing to consider how an algorithmic appreciation of content's implicit virality might change the way that content providers approach Twitter. As Huberman told me, "Basically, you would write a story and apply this formula, and it would tell you what to tweak" in order to get even more shares. Which could be taken too far, in the same way that SEO can be taken too far ... but which could also offer valuable data for news sites that have an interest in getting their content widely circulated.

The research is part of a broader effort at HP, Huberman says, "a very large research agenda" -- one that dives into questions of "how attention is allocated with anything in the web." It would, in a subsequent study, be fascinating to see HP's findings fleshed out in more detail. Can you break the content categories down even further, for example, or in a different way? Can relatively ineffable things like articles' humor, or smartness, or even the art that illustrates them, feed into an algorithm that might forecast their tweets' virality? Can all the little factors that transform "content" into "stories" predict the life those stories will take in the social web?

Image: HP.