The Real Matchup in March Madness: Fandom vs. Big Data

When athletic achievements are translated into statistics, can they be translated back to human triumphs again?

The person who wins your office March Madness pool may not be able to tell Kansas star Joel Embiid of Kansas from Arizona star Nick Johnson by looking at them—but he or she might know each player's effective field-goal percentage.

In fact, the casual fan is as likely to be familiar with data about an NCAA men's basketball team's offensive efficiency as to know its team colors or whether it plays zone or man-to-man.

Basketball fandom has been transformed in recent years by the combination of the bracket and the profusion of statistics about teams. For many fans, "filling out a bracket" has practically replaced the act of watching games. The abstraction of the teams into data that's pushed through the logic of the "pool" is what the tournament is now, just as much as players' bodies moving through space in the effort to throw a ball through a hoop.

March Madness: America's most popular exercise in statistical reasoning!

Take the Huffington Post's remarkable achievement: The Predict-o-Tron. Users set a series of parameters like a school's graduate rate or (more probably) the school's defensive efficiency or pre-season AP rating. Those parameters are then fed through software that adds up your choices and makes your picks for you. 

The whole thing is sitting on a vast pool of data assembled by HuffPo's deputy data editor, Jay Boice, and his team. "I'm not a huge basketball fan," Boice told me. "I'm a data fan."

The project actually began during last year's tournament, when Boice noticed so many people doing modeling and projections to help people make their picks. "But there was nothing that let you play with the different factors yourself," he said. So, over six weeks, the HuffPo data team created this (frankly) amazing app.

It does what we've all been doing, but faster and more rigorously. It is what happens when you take the logic of March Madness to its conclusion: automagic, knowledge-independent fun.

You might say that what's going on with The Predict-o-Tron is nothing new in sports fandom. You could point to rotisserie baseball leagues or fantasy football's explosive growth to show that some fans have always loved sports through data analysis. And that's a fair point.

But communications scholar Thomas Patrick Oates has argued that fantasy sports offer the thrill of "vicarious management" (which, he adds, encourages fans to "identify with the institutional regimes of the NFL (and the authorities who conduct them) rather than with the athletes.")

But March Madness's statistical fans don't dream of controlling players. They are, instead, enacting what it's like to be a coder, an engineer who tunes an algorithm that figures out the world for him.

A perfectly predictable, understandable world.

Which makes it, the traditional fan in me proclaims, directly opposed to the spirit of sport.

* * *

Let me step back for a minute. I am a UCLA fan by birth and experience. My father was enrolled at UCLA during the Wooden years, the greatest sporting run in American history. I grew up going to and watching games. When we moved away from LA, I came to love walking through the enemy arenas of the Pacific Northwest, after a long car ride with my dad. The pressure of all those opposing fans and colors drove us closer together, and though we both knew the circumstances were artificial, what it did for our relationship was not. UCLA fandom, for me, is the ground on which it is easiest for my dad and I to demonstrate that we love each other as deeply as we do.

The blue and gold runs deep is all I'm saying. And I know I'm not alone. The experience is common to the point of cliche.

After our 1995 championship run, I hacked out a UCLA basketball website (a blog in today's parlance, I'd say) from the HTML 2.0 wilds. For a couple of years, if you searched Yahoo for UCLA sports, my site would have been a top result. Running it taught me about the Internet and digital media, which you may have noticed, became my career.

A year or so after the site began, at 14, living in rural Washington State, I was invited to join a semi-secret group of UCLA boosters called the Dead Bruins Society. This is the only reference to it on the Internet, so I won't say much more. But I got to participate in conversations with people who were actually close to the basketball program.

While age and responsibility have perhaps quieted my fandom, even today, I dress my son in blue and gold onesies as often as my wife will let me, among other embarrassing consequences.

* * *

There is, I have to tell you, a statistically perfect bracket, according to the Predict-O-Tron methodology. The algorithm powering it would have correctly picked 85 percent of the games over the past four tournaments, including 94 percent of the matchups in 2012.

Now, Boice doesn't think that this bracket is truly the best possible one. "It's super overfit to the last four years of data," he said.  It's matching the features of the last four years worth of games too closely, when some of them will inevitably turn out to be noise, not predictive signal.

But still. This is the best the Predict-O-Tron can do.

I have seen this bracket, which the Huffington Post has hidden as an Easter egg in the app. It looks a lot like other brackets created by skilled people. Three double-digit seeds make it to the second round. One (Nebraska) even makes it past that and all the way to elite eight. In the final four, there are two No. 1 seeds, a No. 2 seed, and a No. 5 seed.

But there is one glaring difference in this bracket from almost all others that I've seen: It has UCLA's second-round opponent, VCU, winning the whole damn tournament! 

This would, obviously, be a disaster because UCLA would lose in the second round. And my Bruins are so hot right now.

My dad's gift as a fan is to know when a certain play was a turning point in the flow of the game or season. And after our horrific loss against Washington State in our last regular season game, he thinks something flipped in our squad. The evidence was clear in the Pac 12 tournament: Our last three games were our best three games of the entire season.

Can the gelling of a young team be adequately captured by statistics?

I hope not.

This knowledge of your team's inner capabilities, their potential against the realities of any game or play or season record, is what makes non-statistical fandom so fun. They can, on any day, live up to your dreams.

So, over the first round (and perhaps beyond), I'm going to chart my own personal clash with stats fandom: I'm going to root and write against the Predict-O-Tron's best-guess picks.

I want a journal of what happens when the bracket turns back into bodies. Come with me: Let's root against the abstractions and the data!