With Manchester City opening up its data to the masses, the golden age of soccer analytics is set to begin.
This month Manchester City, the younger brother (and rival) of the better-known Manchester United, announced that it will release detailed data about the team for public consumption.
The club's press release noted that "the speed of growth for the discipline of performance analytics is essentially in the clubs' hands -- it is they who have bought the data at significant cost and the rest of the analytics community simply do not have access to the data at the same level ... [But while] there are many people in the analytics community right now who have the skills, desire and vision to make a difference in the performance analytics space...those people have no significant data to work with." By opening up this data and making it available to those within the analytics community Manchester City hopes to "encourage and inspire the next generation of analytics."
This move, while essentially unprecedented in the soccer world, fits clearly within larger cross-sector trends of making data open to harness the distributed human capital and innovative potential of hobbyists, enthusiasts, and geeks with pro-level skills. The history of success of making data available to the wonks who want to use it bodes well for the future of soccer analytics; we may be at a watershed moment.
The move to promoted innovation through openness is premised on the idea that innovation is often about cost. In particular, entry costs are important. For a pool of potential innovators (in basically any sector) the less costly the inputs required to begin innovating, the more likely it is that potential innovators will become actual innovators. If more equipment, materials, special skills or privileged information is required, fewer people will experiment, tinker, and discover. It follows that the more people are experimenting and trying to innovate, the more valuable innovation is likely to happen. This dynamic implies that in sectors in need of innovation, it is useful to assess the costs of entry and try to lower them.
A common explanation for the radically innovative tech scene in recent decades, is that the Internet lowered barriers to market entry, as basically anyone with a computer and enough time could write some killer code. Yochai Benkler, a scholar at Harvard's Berkman Center for Internet and Society, has made a career of looking at how radically low barriers to entry in labor markets can change the cost structures and organizations of production. This trend is nowhere more evident than the Open Data movement. This movement, which gets it philosophical inspiration from the older Open Source movement, holds that data should be freely available to anyone without restriction.
In knowledge discovery in datasets, the major barrier to entry is access to the data. When corporations, governments or other private firms jealously guard their proprietary data, the number of people playing with the data and trying to discover valuable things, or putting that data to good use, will remain small. When data is made public, anyone can put that data to work. In recent years governments have begun making large troves of their data publically accessible. The U.S. government's open-data project, data.gov, for example, has begotten over 200 citizen-developed apps. Similarly, the city of Vancouver, an early mover in the municipal open-data space, opened up their data in 2009, spawning valuable mashups of transit data, the water grid, and common spaces.
A common adage in open-source development known as Linus' Law states that "with enough eyeballs, all bugs are shallow," indicating that if you can get enough people involved, hard problems become easier. This is what open data does for knowledge discovery and innovation. When looking for a needle in the haystack of data, it helps to have a more people looking. The best way to get more people looking is to make it cheap to look.
Lowering the cost to look, and thus enabling more people to get involved is precisely what Manchester City has begun to do. Opening the data up promises to lower barriers to entry for experimenting with new data-driven ways of understanding the game. With more eyeballs, this problem can become shallow.
Normally "the only data you can get [publicly] is the really basic stuff: goals, assists, cards... [which is] nothing you can really work from," says Graham MacAree, SBNation's soccer editor, and one of the leaders in the field of public soccer analytics.