Whole new categories of weird noise are being introduced into the news world as a result of Google's algorithm, whatever its virtues.
If something comes out of a computer on the basis of statistics, it must be objective, right? No bias is even possible, unlike the judgment of us flawed Homo sapiens!
But... that's not actually true. Over at Nieman Journalism Lab, Nick Diakopoulos has a great story about the ways that various algorithms introduce biases that are different from the human ones, but no less real. Looking at Google News, Circa, IBM Research, and a crop of other automated information tools, here's his thesis:
It can be easy to succumb to the fallacy that, because computer algorithms are systematic, they must somehow be more "objective." But it is in fact such systematic biases that are the most insidious since they often go unnoticed and unquestioned.
Even robots have biases.
His story is well worth reading for the ways in which it shows how many algorithms are now at play in the news ecosystem and the potential they have for bending the information people receive in one way or another.
What I want to discuss, though, is how the rather simple application of a series of rigid rules can introduce new and bad behaviors on the part of human actors who realize that they can exploit the system. Whole new categories of weird noise are being introduced into the news world as a result of Google's algorithm, whatever its virtues.
Because the rules are quite rigid, e.g. newer is *always* better, different organizations try to have the newest stories about a given popular event. So, in the lead up to the early December snowstorm here in California, the Weather Channel's website published a great preview of the storm on November 29th or 30th. I read it on or about when it came out. *After* the storm on December 3rd, I went looking to see which of the predictions from the story had come true. I popped a few search terms into Google News and lo and behold, there was a December 3rd story from the Weather Channel. Excitedly, I clicked through the link and found ... the exact same preview with a timestamp that now read December 3, 2012, 9:08 AM.
Keep in mind that this now makes the story completely nonsensical. It is a preview of an event dated after that event has already passed. It's like a story dated November 7th story about who might win the presidential election. A Christmas preview on December 29th.
In short, this is lunacy! At least to a human.
But to a machine, this looks like a "fresh" story with lots of keywords about the Shasta snowfall. The machine can't tell that the article is written in the future tense or that it is worse than useless now. This type of thing actively degrades the news ecosystem, and it's only happening because of the way that Google's algorithm works.
Granted, this is the lawless variety of optimizing for Google News. But there are a lot of examples and techniques that have developed solely because of the way the algorithm works. If you really want to peer down the rabbit hole, take a look at the depth of the analysis in this series of posts on the "Top 10 Most Important Google News Ranking Factors." It was assembled by a team of people at some top publications, agencies, and SEO shops. Keep in mind that some of these optimizations benefit human beings. Punny headlines are slowly dying, and I'm OK with that. But other factors that Google is looking for -- like keyword density -- reward people who write the way that everyone else does, using the same words and using them frequently. Google also rewards specialists over generalists. If you (as author or site) publish a ton on one thing, you're more likely to move up the rankings than if you take a more horizontal view of a field (say, technology). And lastly, take a look at the Google News front page now. It's almost exclusively traditional media outlets. It's actually shocking how little at least I see from media entities created after 2004. There's a shocking apolitical conservatism to however Google's algorithm works.
My point in discussing these details at such length is to strengthen Diakopoulos' point about the lack of "objectivity" in algorithmic operations. Even if one could design some perfectly balanced system that had no observable bias when it began to run, the people who are producing the inputs to the system and dependent on the outputs will begin to adjust their behavior. They'll change to make themselves more legible to the machine, and those that do so best will prosper.
Simply look at what happens with wire stories, say today's on Alan Alda. The local news sites have no disincentive to publish this work, regardless of whether their audience wants to read it. Some TV station in Charlotte might run a few dozen AP stories every few hours not because they improve the news system but because there is almost no cost to doing so, and it might get picked up by Google News and drive more traffic than a week's worth of regular content. Combined, all those individual decisions end up sending a signal to Google that a story is *really important* when there is no real signal; all we have is the aggregate hopes of news publishers for a low-probability traffic spike.
I don't think Google News has ever taken enough responsibility for the cybernetics of the system it created. What is important is not just how the software works, but the ways it structures humans' thoughts and actions in new ways. To my eye, it created feedback loops with mostly deleterious effects not because the algorithm itself was bad, but because the service did not take the human repercussions seriously. That's not what Krishna Bharat, the product's creator, set out to do. But it's happened.