The Big Data Boom Is the Innovation Story of Our Time

For centuries, breakthroughs in innovation rely on breakthroughs in collecting and measuring data. That should make us optimistic.

The data revolution has turned customers into unwitting business consultants, as our purchases and searches are tracked to improve everything from websites to delivery routes



In the 1670s, in Delft, Netherlands, a scientist named Anton van Leeuwenhoek did something many scientists had done for 100 years before him. He built a microscope.

This microscope was different, but it was not extraordinary. Like so many inventions, he borrowed and tweaked his predecessors' ingenuity. But when he looked through this microscope, he found things that did seem extraordinary. He called them "animalcules," microbes in water droplets and human blood that ultimately provided the foundation for the germ theory of disease and eventually inspired a host of medicines and treatments.

The Leeuwenhoek discovery is crucial to our understanding of innovation, not only because it changed the face of biochemistry, but also because it represents a fundamental theme of discovery.

Breakthroughs in innovation often rely on breakthroughs in measurement.


Today businesses can measure their activities and customer relationships with unprecedented precision. As a result, they are awash with data. This is particularly evident in the digital economy, where clickstream data give precisely targeted and real-time insights into consumer behavior.

Where great ideas really come from. A special report

In turn, customers are acting as unwitting business consultants for these companies. Our purchases, searches, and online activities are being tracked to improve everything from websites to delivery routes and drug manufacturing.

Anyone with access to a Web browser can get summaries of billions of keyword searches, and this information is highly predictive of present and future economic activity, such as housing purchases and prices. Mobile phones, automobiles, factory automation systems and other devices are routinely instrumented to generate streams of data on their activities, making possible an emerging field of "reality mining" to analyze this information. Manufacturers and retailers use radio-frequency identification (RFID) tags to deliver terabits of data on inventories and supplier interactions and then feed this information into analytical models to optimize and reinvent their business processes.

Much of this information is generated for free, by computers, and sits unused, at least initially. A few years after installing a large enterprise resource planning system, it is common for companies to purchase a "business intelligence" module to try to make use of the flood of data that they now have on their operations. As Ron Kohavi at Microsoft memorably put it, objective, fine-grained data are replacing HiPPOs (Highest Paid Person's Opinions) as the basis for decision-making at more and more companies. For example:

-- Enologix has used this approach to help Gallo vineyards accurately predict the wine ratings that Robert Parker would give to various new wines

-- UPS has mined data on truck delivery times to develop a new routing method

-- as even developed new algorithms for matching men and women for dates

For each innovation, analysts drew on new measurement technologies to supplant human experts who relied more on intuition. However, for all its strengths, measurements have a shortcoming. They cannot determine causality. (A simple example: Shoe sizes and readings scores are correlated for school children, but one does not cause the other; instead, they both reflect a third variable, which is age.) Fortunately, science has a second powerful tool designed precisely to address questions of causality.

That tool is called experimentation.


Science has been dominated by the experimental approach for nearly 400 years. Running controlled experiments is the gold standard for sorting out cause and effect. But experimentation has been difficult for businesses throughout history because of cost, speed and convenience. It is only recently that businesses have learned to run real-time experiments on their customers. The key enabler was the Web.

Consider two "born-digital" companies, Amazon and Google. A central part of Amazon's research strategy is a program of "A-B" experiments where it develops two versions of its website and offers them to matched samples of customers. Using this method, Amazon might test a new recommendation engine for books, a new service feature, a different check-out process, or simply a different layout or design. Amazon sometimes gets sufficient data within just a few hours to see a statistically significant difference.

This ability to rapidly test ideas fundamentally changes the company's mindset and approach to innovation. Rather than agonize for months over a choice, or model hypothetical scenarios, the company simply asks the customers and get an answer in real time.

According to Google economist Hal Varian, his company is running on the order of 100-200 experiments on any given day, as they test new products and services, new algorithms and alternative designs. An iterative review process aggregates findings and frequently leads to further rounds of more targeted experimentation.

At the same time, Google's competitors, partners, customers and third party consultants are doing their own experiments, creating a complex, interacting ecosystem that demands continuous innovation. While Google currently dominates the market for web search, it is unlikely that it would have any market share at all if it still relied on the original, unmodified PageRank algorithm that Larry Page and Sergey Brin developed in 1998.


Greg Linden, who led one set of experiments at Amazon, describes the emerging experimentation philosophy succinctly: "To find high impact experiments, you need to try a lot of things. Genius is born from a thousand failures. In each failed test, you learn something that helps you find something that will work. Constant, continuous, ubiquitous experimentation is the most important thing."

These words echo the approach of innovators since Thomas Edison, but IT has made it possible to apply it to a much broader class of business challenges and significantly compress the "hypothesis-to-experiment" cycle time.

While web-based companies have been particularly aggressive in using business experiments to drive innovation, other industries are getting in the game. Caesar's Entertainment (formerly Harrah's), the hotel and casino company, transformed itself from a 2nd-tier casino to an industry leader in large part because of the culture of experimentation introduced by CEO Gary Loveman.

When Loveman, an economics PhD from MIT and former Harvard Business School professor, arrived at the company, he found that it was already gathering a great deal of data about its customer interactions with existing information systems and programs such as its Total Rewards loyalty card. However, it wasn't using these data to develop improved processes, products and services. After becoming CEO, he developed strategies to continually tests new promotions, price points, services, workflow, employee incentive plans and casino layouts using controlled experiments.

Widespread business experimentation has required a fundamental change in the corporate culture. As Loveman puts it "There are two things that will get you fired here: stealing from the company, or running an experiment without a properly designed control group."


While passive data gathering can be useful, measurement is far more valuable when coupled with conscious, active experimentation and sharing of insights. Likewise, the value of undertaking the experiments themselves is proportionately greater if the organization can capitalize on those experiments in more locations and at greater scale. In combination, these practices constitute a new kind of "R&D" that draws on the strengths of digitization to speed innovation.