Welcome to the Internet of Thingies: 61.5% of Web Traffic Is Not Human

And here's how to build your own little traffic bot, even though you shouldn't
More

It happened last year for the first time: bot traffic eclipsed human traffic, according to the bot-trackers at Incapsula.

This year, Incapsula says 61.5 percent of traffic on the web is non-human. 

Now, you might think this portends the arrival of "The Internet of Things"—that ever-promised network that will connect your fridge and car to your smartphone. But it does not.

This non-human traffic is search bots, scrapers, hacking tools, and other human impersonators, little pieces of code skittering across the web. You might describe this phenomenon as The Internet of Thingies. 

Because bots are not difficult to build. In fact, it's so simple that a journalist (who has not learned to code) can do it.

I do it with a ($300) program called UBot Studio, which is an infrastructural piece of the botting world. It lets people like me program and execute simple scripts in browsers without (really) knowing any code.

Do you need 100 Hotmail accounts? I got you.

Perhaps you'd like some set of links autotweeted? I'm there.

You want to scrape a few numbers from a government website or an online store? Easy. It'd take 10 minutes. 

Or — and this is the one that gets to me — perhaps you want to generate an extra 100,000 pageviews for some website? So simple. A programmer friend of mine put it like this, "The basics of sending fake traffic are trivial."

I'm going to tell you how here, even though I think executing such a script is highly unethical, probably fraud, and something you should not do. I'm telling you about it here because people need to understand how jawdroppingly easy it really is.

So, the goal is mimicking humans. Which means that you can't just send 100,000 visits to the same page. That'd be very suspicious. 

So you want to spread the traffic out over a bunch of target pages. But which ones? You don't want pages that no one ever visits. But you also don't want to send traffic to pages that people are paying close attention to, which tend to be the most recent ones. So, you want popular pages but not the most popular or recent pages.

Luckily, Google tends to index the popular, recentish stories more highly. And included with UBot are two little bots that can work in tandem. The first scrapes Google's suggestions searches. So it starts with the most popular A searches (Amazon, Apple, America's Cup) then the most popular B searches, etc. Another little bot scrapes the URLs from Google search results. 

So the first step in the script would be to use the most popular search suggestions to find popularish stories on the domain (say, theatlantic.com) and save all those domains.

The first search would be "amazon site:theatlantic.com." The top 20 URLs, all of which would be Atlantic stories, would get copied into a file. Then the bot would search "apple site:theatlantic.com" and paste another 20 in. And so on and so forth until you've got 1,000. 

Now, all you've got to do is have the bot visit each story, wait for the page to load, and go on to the next URL. Just for good measure, perhaps you'd have the browser "focus" on the ads on the page to increase the site's engagement metrics.

Loop your program 100 times and you're done. And you could do the same thing whenever you wanted to. 

Of course, the bot described here would be very easy to catch. If anyone looked, you'd need to be fancier to evade detection. For example, when a browser connects to a website, it sends a little token that says, "This is who I am!" And it lists the browser and the operating system, etc. Mine, for example, is, "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36" 

If we ran the script like this, an identical 100,000 user agents would show up in the site's logs, which might be suspicious. 

But the user agent-website relationship is trust-based. Any browser can say, "I'm Chrome running on a Mac." And, in fact, there are pieces of software out there that will generate "realistic" user agent messages, which Ubot helpfully lets you plug in.

The hardest part would be obscuring that the IP addresses of the visits. Because if 100,000 visits came from a single computer, that would be a dead giveaway it was a bot. So, you could rent a botnet — a bunch of computers that have been hacked to do the bidding of (generally) bad people.

Or you could ask some "friends" to help out via a service like JingLing, which lets people use other people on the network to send traffic to webpages from different IP addresses. You scratch my back; I'll scratch yours! 

But, if the botting process is done subtly, no one might think to check what was going on. Because from a publisher's perspective, how much do you really want to know? 

In the example I gave, no page has gotten more than 100 views, but you've added 100,000 views to the site as a whole. It would just seem as if there was more traffic, but it'd all be down at the bottom of the traffic reports where most people have no reason to look.

And indeed, some reports have come out showing that people don't check. One traffic buyer told Digiday, "We worked with a major supply-side platform partner that was just wink wink, nudge nudge about it. They asked us to explain why almost all of our traffic came from one operating system and the majority had all the same user-agent string."

That is to say, someone involved in the traffic supply chain was no more sophisticated than a journalist with 10 hours of training using a publicly available piece of software. 

The point is: It's so easy to build bots that do various things that they are overrunning the human traffic on the web.

Now, to understand the human web, we have to reckon with the logic of the non-human web. It is, in part, shady traffic that allows ad networks and exchanges to flourish. And these automated ad buying platforms — while they do a lot of good, no doubt about it — also put pressure on other publishers to sell ads more cheaply. When they do that, there's less money for content, and the content quality suffers.

The ease of building bots, in other words, hurts what you read each and every day on the Internet. And it's all happening deep beneath the shiny web we know and (sometimes) love. 

Jump to comments
Presented by

Alexis C. Madrigal

Alexis Madrigal is the deputy editor of TheAtlantic.com, where he also oversees the Technology Channel. He's the author of Powering the Dream: The History and Promise of Green Technology. More

The New York Observer has called Madrigal "for all intents and purposes, the perfect modern reporter." He co-founded Longshot magazine, a high-speed media experiment that garnered attention from The New York Times, The Wall Street Journal, and the BBC. While at Wired.com, he built Wired Science into one of the most popular blogs in the world. The site was nominated for best magazine blog by the MPA and best science Web site in the 2009 Webby Awards. He also co-founded Haiti ReWired, a groundbreaking community dedicated to the discussion of technology, infrastructure, and the future of Haiti.

He's spoken at Stanford, CalTech, Berkeley, SXSW, E3, and the National Renewable Energy Laboratory, and his writing was anthologized in Best Technology Writing 2010 (Yale University Press).

Madrigal is a visiting scholar at the University of California at Berkeley's Office for the History of Science and Technology. Born in Mexico City, he grew up in the exurbs north of Portland, Oregon, and now lives in Oakland.

Get Today's Top Stories in Your Inbox (preview)

What's the Number One Thing We Could Do to Improve City Life?

A group of journalists, professors, and non-profit leaders predict the future of livable, walkable cities


Elsewhere on the web

Join the Discussion

After you comment, click Post. If you’re not already logged in you will be asked to log in or register. blog comments powered by Disqus

Video

Adventures in Legal Weed

Colorado is now well into its first year as the first state to legalize recreational marijuana. How's it going? James Hamblin visits Aspen.

Video

What Makes a Story Great?

The storytellers behind House of CardsandThis American Life reflect on the creative process.

Video

Tracing Sriracha's Origin to Thailand

Ever wonder how the wildly popular hot sauce got its name? It all started in Si Racha.

Video

Where Confiscated Wildlife Ends Up

A government facility outside of Denver houses more than a million products of the illegal wildlife trade, from tigers and bears to bald eagles.

Video

Is Wine Healthy?

James Hamblin prepares to impress his date with knowledge about the health benefits of wine.

Video

The World's Largest Balloon Festival

Nine days, more than 700 balloons, and a whole lot of hot air

Writers

Up
Down

More in Technology

Just In