Whole new categories of weird noise are being introduced into the news world as a result of Google's algorithm, whatever its virtues.
If something comes out of a computer on the basis of statistics, it must be objective, right? No bias is even possible, unlike the judgment of us flawed Homo sapiens!
But... that's not actually true. Over at Nieman Journalism Lab, Nick Diakopoulos has a great story about the ways that various algorithms introduce biases that are different from the human ones, but no less real. Looking at Google News, Circa, IBM Research, and a crop of other automated information tools, here's his thesis:
It can be easy to succumb to the fallacy that, because computer algorithms are systematic, they must somehow be more "objective." But it is in fact such systematic biases that are the most insidious since they often go unnoticed and unquestioned.
Even robots have biases.
His story is well worth reading for the ways in which it shows how many algorithms are now at play in the news ecosystem and the potential they have for bending the information people receive in one way or another.
What I want to discuss, though, is how the rather simple application of a series of rigid rules can introduce new and bad behaviors on the part of human actors who realize that they can exploit the system. Whole new categories of weird noise are being introduced into the news world as a result of Google's algorithm, whatever its virtues.
Because the rules are quite rigid, e.g. newer is *always* better, different organizations try to have the newest stories about a given popular event. So, in the lead up to the early December snowstorm here in California, the Weather Channel's website published a great preview of the storm on November 29th or 30th. I read it on or about when it came out. *After* the storm on December 3rd, I went looking to see which of the predictions from the story had come true. I popped a few search terms into Google News and lo and behold, there was a December 3rd story from the Weather Channel. Excitedly, I clicked through the link and found ... the exact same preview with a timestamp that now read December 3, 2012, 9:08 AM.
Keep in mind that this now makes the story completely nonsensical. It is a preview of an event dated after that event has already passed. It's like a story dated November 7th story about who might win the presidential election. A Christmas preview on December 29th.
In short, this is lunacy! At least to a human.
But to a machine, this looks like a "fresh" story with lots of keywords about the Shasta snowfall. The machine can't tell that the article is written in the future tense or that it is worse than useless now. This type of thing actively degrades the news ecosystem, and it's only happening because of the way that Google's algorithm works.
Granted, this is the lawless variety of optimizing for Google News. But there are a lot of examples and techniques that have developed solely because of the way the algorithm works. If you really want to peer down the rabbit hole, take a look at the depth of the analysis in this series of posts on the "Top 10 Most Important Google News Ranking Factors." It was assembled by a team of people at some top publications, agencies, and SEO shops. Keep in mind that some of these optimizations benefit human beings. Punny headlines are slowly dying, and I'm OK with that. But other factors that Google is looking for -- like keyword density -- reward people who write the way that everyone else does, using the same words and using them frequently. Google also rewards specialists over generalists. If you (as author or site) publish a ton on one thing, you're more likely to move up the rankings than if you take a more horizontal view of a field (say, technology). And lastly, take a look at the Google News front page now. It's almost exclusively traditional media outlets. It's actually shocking how little at least I see from media entities created after 2004. There's a shocking apolitical conservatism to however Google's algorithm works.