Make Way for the Soccer Geeks

More

With Manchester City opening up its data to the masses, the golden age of soccer analytics is set to begin.

RTR375AG-615.jpg

Reuters

This month Manchester City, the younger brother (and rival) of the better-known Manchester United, announced that it will release detailed data about the team for public consumption.

The club's press release noted that "the speed of growth for the discipline of performance analytics is essentially in the clubs' hands -- it is they who have bought the data at significant cost and the rest of the analytics community simply do not have access to the data at the same level ... [But while] there are many people in the analytics community right now who have the skills, desire and vision to make a difference in the performance analytics space...those people have no significant data to work with." By opening up this data and making it available to those within the analytics community Manchester City hopes to "encourage and inspire the next generation of analytics."

This move, while essentially unprecedented in the soccer world, fits clearly within larger cross-sector trends of making data open to harness the distributed human capital and innovative potential of hobbyists, enthusiasts, and geeks with pro-level skills. The history of success of making data available to the wonks who want to use it bodes well for the future of soccer analytics; we may be at a watershed moment.

The move to promoted innovation through openness is premised on the idea that innovation is often about cost. In particular, entry costs are important. For a pool of potential innovators (in basically any sector) the less costly the inputs required to begin innovating, the more likely it is that potential innovators will become actual innovators. If more equipment, materials, special skills or privileged information is required, fewer people will experiment, tinker, and discover. It follows that the more people are experimenting and trying to innovate, the more valuable innovation is likely to happen. This dynamic implies that in sectors in need of innovation, it is useful to assess the costs of entry and try to lower them.

A common explanation for the radically innovative tech scene in recent decades, is that the Internet lowered barriers to market entry, as basically anyone with a computer and enough time could write some killer code. Yochai Benkler, a scholar at Harvard's Berkman Center for Internet and Society, has made a career of looking at how radically low barriers to entry in labor markets can change the cost structures and organizations of production. This trend is nowhere more evident than the Open Data movement. This movement, which gets it philosophical inspiration from the older Open Source movement, holds that data should be freely available to anyone without restriction.

In knowledge discovery in datasets, the major barrier to entry is access to the data. When corporations, governments or other private firms jealously guard their proprietary data, the number of people playing with the data and trying to discover valuable things, or putting that data to good use, will remain small. When data is made public, anyone can put that data to work. In recent years governments have begun making large troves of their data publically accessible. The U.S. government's open-data project, data.gov, for example, has begotten over 200 citizen-developed apps. Similarly, the city of Vancouver, an early mover in the municipal open-data space, opened up their data in 2009, spawning valuable mashups of transit data, the water grid, and common spaces.

A common adage in open-source development known as Linus' Law states that "with enough eyeballs, all bugs are shallow," indicating that if you can get enough people involved, hard problems become easier. This is what open data does for knowledge discovery and innovation. When looking for a needle in the haystack of data, it helps to have a more people looking. The best way to get more people looking is to make it cheap to look.

Lowering the cost to look, and thus enabling more people to get involved is precisely what Manchester City has begun to do. Opening the data up promises to lower barriers to entry for experimenting with new data-driven ways of understanding the game. With more eyeballs, this problem can become shallow.

Normally "the only data you can get [publicly] is the really basic stuff: goals, assists, cards... [which is] nothing you can really work from," says Graham MacAree, SBNation's soccer editor, and one of the leaders in the field of public soccer analytics.

According to the club, some data will be entirely available for public consumption, but the most detailed data --"a time coded feed that lists all player action events within the game with a player, team, event type, minute and second for each action, together with the x/y/z co-ordinates for each event" -- will be sent to analysts who present a project submission that is approved by the club and their data provider Opta, the leaders in soccer data mining.

This more detailed data will be useful for experts like MacAree, a veteran of baseball's statistical revolution known as "sabermetrics" (think Moneyball), because it contains so much more information than can be gleaned from traditional soccer analysis, which has focused on individual actions in a vacuum -- that is, without context: Player X passes, Player Y dribbles, and Player Z shoots and scores.

"The most important thing for me is knowing where the ball is at all times, and where all the players are at all times," MacAree explains. "And City are proposing to release not just the what, but the where and when of the data. We're talking very much about space and time, which are very difficult to get out of the data set we've already had."

This is a foundational moment for the soccer-analytics community. The field of study, despite all the bluster about a soccer Moneyball or Jamesian moment (after the godfather of the sabermetric movement, baseball writer Bill James), has yet to progress past the equivalent of a box score. Large-scale advanced metrics are years of research away, especially because data has been so scarce. Most of the cutting-edge analytics have been painstakingly developed by hand. Previously, researchers without access to the kind of data Manchester City is making available have had to record every event in a match, watching frame-by-frame, then transcribe it to Excel, and write the code themselves to analyze it. Single match analyses like MacAree's radial-passing maps take more than a day of labor-intensive work to assemble.

In this data environment, researchers have little hope of coming up with testable, verifiable, predictive metrics.

"If you look at baseball, the sabermetric revolution came about because data was available before it was valuable," MacAree explains. In this environment the costs of entry to innovate were low, and Bill James, among others, was able to experiment. But "now that we know how valuable data is, there's no reason for it to be [freely] given to us... but our contribution [community analysts'] can also be valuable. And we've always been about showing that we're worth giving that data to."

This is what is so unique about Manchester City's decision to, at least partially, open up one of their most valuable assets to the public. They have decided to embrace the open-source nature of baseball's Jamesian revolution, and bring it, at least partially, to soccer.

Their press release speaks directly to the analytics community, describing areas of performance analysis that City would "like to discuss with you": "We will work directly with those of you who came up with good concepts, and also connect you to others who are working in the same research area," they crow.

There is a long way to go in soccer analytics, and this is but a small first step into a larger world. City's data is only for one year; for predictive models to be valuable, they must be based off, and tested against, various years of data. And this type of scientific peer review, based off years of data, will only be feasible if teams and organizations continue in City's footsteps. But City's move to begin opening up their detailed data represents a strong first step in capitalizing on the power of peer-production and decentralized expertise that we have seen yield meaningful results in other sectors. If the public proves that they can make something -- be it a real predictive model, or even an interesting concept -- worthy of investment with this data, it seems likely that other teams will follow City's lead.

And that's a challenge that MacAree, and others, are more than ready for.

Jump to comments
Presented by

Alexander Furnas and Gabe Lezra

Alexander Furnas is a writer based in Washington, D.C. Gabe Lezra is La Liga editor for SBNation and editor of the Real Madrid fan website Managing Madrid

Get Today's Top Stories in Your Inbox (preview)

Adventures in Legal Weed

Colorado is now well into its first year as the first state to legalize recreational marijuana. How's it going? James Hamblin visits Aspen.


Elsewhere on the web

Join the Discussion

After you comment, click Post. If you’re not already logged in you will be asked to log in or register. blog comments powered by Disqus

Video

Adventures in Legal Weed

Colorado is now well into its first year as the first state to legalize recreational marijuana. How's it going? James Hamblin visits Aspen.

Video

What Makes a Story Great?

What makes a story great? The storytellers behind House of CardsThis American LifeThe Moth, and more reflect on the creative process.

Video

Tracing Sriracha's Origin to Thailand

Ever wonder how the wildly popular hot sauce got its name? It all started in Si Racha.

Video

Where Confiscated Wildlife Ends Up

A government facility outside of Denver houses more than a million products of the illegal wildlife trade, from tigers and bears to bald eagles.

Video

Is Wine Healthy?

James Hamblin prepares to impress his date with knowledge about the health benefits of wine.

Video

The World's Largest Balloon Festival

Nine days, more than 700 balloons, and a whole lot of hot air

Writers

Up
Down

More in Technology

Just In