Hey, Science: Will This Post Get Shared on Twitter?

More

Researchers have developed a tool to predict the spread of news-related tweets.

twitpredix615.png

You can use Twitter to predict swings in the stock market. You can use it to predict movies' success at the box office. You can use it to predict the spread of illness.

Turns out, you can also use Twitter to predict news articles' popularity ... on Twitter.

Over at HP Labs, the researchers Bernardo Huberman, Sitaram Asur, and Roja Bandari, using the API of the aggregator Feedzilla, collected a sample of over 40,000 articles posted to Twitter over a week-long span last August. The team then analyzed and rated the articles they'd gathered according to four factors: the news outlet that writes and first tweets the article; the information category that the article fits into; the relative emotion of the article's language; and the people and things named in the article.

What they found both confirms and flouts convention wisdom. On the one hand, sourcing, per their analysis, is the most significant predictor of the amount of tweets that an article will encourage. Similarly, stories that belong to popular topic categories (health! technology! cats!), tend to spread more readily on Twitter than stories that don't. As do stories that mention celebrities and, as the paper puts it, "a known place, person, or organization."

Not too surprising. More unexpected, though, is the researchers' finding that the emotional component of articles doesn't seem to make much difference in how, or at least in how often, they're shared. Emotional content and more "objective" content, the team discovered, seem to effect about the same amount of distribution on Twitter. Brand matters; information matters; tone, however, not so much. 

Once they'd analyzed their data, the team then converted their info into an algorithm that predicts the number of tweets a given article will receive when it's posted to the social web. The model, they say -- which divides its outcomes according to a "low-tweet," "medium-tweet," or "high-tweet" classification framework -- works with an 84 percent accuracy rate.

Even beyond the tantalizing possibility of accurate tweetcasting, though, the assumption Huberman and his colleagues are testing -- that the qualities of the content passed through social networks can determine, for themselves, how well that content will spread through those networks -- is an important one. We tend to focus on the structure of networks as the primary factor in how content spreads within them: If I send a tweet containing nothing but a happyface emoticon (as one does, etc.), and if Ashton Kutcher proceeds to retweet it (as he does, etc.), it would stand to reason that my little happyface emoticon might spread pretty far across the Twitterverse. 

But that would be a little sad. Because my little happyface emoticon would be effectively content-free, and (with apologies to any fakefaceophiles out there) pretty uninteresting. It's nice to think that the content that's effectively the inverse-emoticon -- content that contains rich information from a trusted source -- will spread not because of gimmicks, but because it's sharable on its own terms. 

And it's intriguing to consider how an algorithmic appreciation of content's implicit virality might change the way that content providers approach Twitter. As Huberman told me, "Basically, you would write a story and apply this formula, and it would tell you what to tweak" in order to get even more shares. Which could be taken too far, in the same way that SEO can be taken too far ... but which could also offer valuable data for news sites that have an interest in getting their content widely circulated.

The research is part of a broader effort at HP, Huberman says, "a very large research agenda" -- one that dives into questions of "how attention is allocated with anything in the web." It would, in a subsequent study, be fascinating to see HP's findings fleshed out in more detail. Can you break the content categories down even further, for example, or in a different way? Can relatively ineffable things like articles' humor, or smartness, or even the art that illustrates them, feed into an algorithm that might forecast their tweets' virality? Can all the little factors that transform "content" into "stories" predict the life those stories will take in the social web? 


Image: HP.

Jump to comments
Presented by

Megan Garber is a staff writer at The Atlantic. She was formerly an assistant editor at the Nieman Journalism Lab, where she wrote about innovations in the media.

Get Today's Top Stories in Your Inbox (preview)

What Is a City?

Cities are like nothing else on Earth.


Join the Discussion

After you comment, click Post. If you’re not already logged in you will be asked to log in or register. blog comments powered by Disqus

Video

CrossFit Versus Yoga: Choose a Side

How a workout becomes a social identity

Video

Is Technology Making Us Better Storytellers?

The minds behind House of Cards and The Moth weigh in.

Video

A Short Film That Skewers Hollywood

A studio executive concocts an animated blockbuster. Who cares about the story?

Video

In Online Dating, Everyone's a Little Bit Racist

The co-founder of OKCupid shares findings from his analysis of millions of users' data.

Video

What Is a Sandwich?

We're overthinking sandwiches, so you don't have to.

Video

Let's Talk About Not Smoking

Why does smoking maintain its allure? James Hamblin seeks the wisdom of a cool person.

Writers

Up
Down

More in Technology

Just In