Researchers have found a way to predict a news story's popularity -- with an astounding 84 percent accuracy.
Here, per one algorithm, could be the Platonic version of the news tweet:
Bits Blog: Apple Buddies Up With Cheaper Wireless Partners for iPhone nyti.ms/LcLviE-- The New York Times (@nytimes) June 8, 2012
If that seems a little dull for Twitter Perfection ... well, that's the point. Steadiness -- compelling news expressed in straightforward, not hyperbolic, language -- is actually a component of maximally shareable content, the algorithm suggests. And this particular tweet is also sent from a credible source, The New York Times, which makes it extra-spreadable. It's about technology, the most popular, shareable category of news story. It's engaging without being insistent. And it stars a company -- Apple -- with high name recognition.
The algorithm comes courtesy of a fascinating paper [pdf] from UCLA and Hewlett-Packard's HP Labs. The researchers Roja Bandari, Sitram Asur, and Bernardo Huberman teamed up to try to predict the popularity -- which is to say, the spreadability -- of news articles in the social space. While previous work has relied on articles' early performance to predict their popularity over their remaining lifespan, Bandari et al focused on predicting their popularity even before they're formulated in the first place. The researchers have developed a tool that allows people -- and, in particular, news organizations -- to calibrate their content in advance of their posting and tweeting, creating stuff that's optimized for maximum attention and impact. That tool allows for the forecasting of an article's popularity with a remarkable 84 percent accuracy -- and it has implications not just for articles, but for tweets themselves.
To develop their algorithm, the researchers hypothesized that four factors would determine an article's social success:
- The news source that creates and publishes the article
- The category of news the article belongs to (technology, health, sports)
- Whether the language in the article was emotional or objective
- Whether celebrities, famous brands, or other notable institutions are mentioned
The team then used publicly available tools like Feedzilla's API to gather a dataset of over 40,000 news articles, collected during a nine-day span in August 2011. They used Feedzilla's topic metadata to assign a category to each article (distinguishing among, say, tech stories, business stories, sports stories, and the like). And they used Stanford's Named Entity Recognizer to identify text representing a famous person or company name -- Lady Gaga, say -- and to measure the prominence of that name relative to others. What resulted was a score for each of those 40,000 articles based on the team's four factors.
The team then compared the number of retweets and shares each news article garnered over time, using the Twitter search engine Topsy. Their key metric was what they termed t-density, or the number of tweets earned by each news link.
Here are their findings, broken down by news category:
As the graph makes clear, the category of news involved in the article certainly made a difference to a tweeted story's popularity: Technology was the most tweetable news area, followed by Health and the ever-shareable Fun Stuff. Also impactful was the name recognition of the text. You can know with some certainty that a story about Lady Gaga will do well, and you can know with even more certainty that a tech story about Lady Gaga will do well. But what led most overwhelmingly, and most predictably, to sharing was the person or organization who shared the information in the first place -- hence, the @NYTimes origin of the tweet above. A "WHOA, GUYS, HERE'S HUGE NEWS ABOUT LADY GAGA" sent from The New York Times means a lot more than the same declaration from me, or even from @LadyGaga herself. Brand, even and especially on the Internet, matters.