How Netflix Reverse Engineered Hollywood

Yellin had some of the misplaced Hollywood feel, too. Intelligent, quick, and energetic, he feels like a producer, which makes sense as he's been, by his own accounting, "on all sides of the movie industry." Physically, he bears a remarkable resemblance to the actor Michael Kelly, who plays Doug Stamper, chief of staff to Frank Underwood (Kevin Spacey) in Netflix's original series House of Cards

He seems like a guy who can make things work. 

As we sit down in a conference room, I pull out my computer and begin to show off the genre generator we built. I walk him through my spreadsheets and show him all the text analysis we've done. 

Though he seems impressed at our nerdiness, he patiently explains that we've merely skimmed one end-product of the entire Netflix data infrastructure. There is so much more data and a whole lot more intelligence baked into the system than we've captured. 

Here's how he told me all the pieces fit together. 

"My first goal was: tear apart content!" he said.


Todd Yellin at Netflix headquarters.

How do you systematically dismember thousands of movies using a bunch of different people who all need to have the same understanding of what a given microtag means? In 2006, Yellin holed up with a couple of engineers and spent months developing a document called "Netflix Quantum Theory," which Yellin now derides as "our pretentious name." The name refers to what Yellin used to call "quanta," the little "packets of energy" that compose each movie. He now prefers the term "microtag."

The Netflix Quantum Theory doc spelled out ways of tagging movie endings, the "social acceptability" of lead characters, and dozens of other facets of a movie. Many values are "scalar," that is to say, they go from 1 to 5. So, every movie gets a romance rating, not just the ones labeled "romantic" in the personalized genres. Every movie's ending is rated from happy to sad, passing through ambiguous. Every plot is tagged. Lead characters' jobs are tagged. Movie locations are tagged. Everything. Everyone. 

That's the data at the base of the pyramid. It is the basis for creating all the altgenres that I scraped. Netflix's engineers took the microtags and created a syntax for the genres, much of which we were able to reproduce in our generator. 

To me, that's the key step: It's where the human intelligence of the taggers gets combined with the machine intelligence of the algorithms. There's something in the Netflix personalized genres that I think we can tell is not fully human, but is revealing in a way that humans alone might not be. 

For example, the adjective "feel good" gets attached to movies that have a certain set of features, most importantly a happy ending. It's not a direct tag that people attach so much as a computed movie category based on an underlying set of tags. 

The only semi-similar project that I could think of is Pandora's once-lauded Music Genome Project, but what's amazing about Netflix is that its descriptions of movies are foregrounded. It's not just that Netflix can show you things you might like, but that it can tell you what kinds of things those are. It is, in its own weird way, a tool for introspection.

That distinguishes it from Netflix's old way of recommending movies to you, too. The company used to trumpet the fact that it could kind of predict how many stars you might give a movie. And so, the company encouraged its users to rate movie after movie, so that it could take those numeric values and develop a taste profile for you. 

They even offered a $1 million prize to the team that could design an algorithm that would improve the company's ability to predict how many stars users would give movies. It took years to improve the algorithm by a mere 10 percent.

The prize was awarded in 2009, but Netflix never actually incorporated the new models. That's in part because of the work required, but also because Netflix had decided to "go beyond the 5 stars," which is where the personalized genres come in.  

The human language of the genres helps people identify with the recommendations. "Predicting something is 3.2 stars is kind of fun if you have an engineering sensibility, but it would be more useful to talk about dysfunctional families and viral plagues. We wanted to put in more language," Yellin said. "We wanted to highlight our personalization because we pride ourselves on putting the right title in front of the right person at the right time."

And nothing highlights their personalization like throwing you a very, very specific altgenre. 

So why aren't they ultraspecific, which is to say, super long, like the gonzo genres that our play generator can create? 

Yellin said that the genres were limited by three main factors: 1) they only want to display 50 characters for various UI reasons, which eliminates most long genres; 2) there had to be a "critical mass" of content that fit the description of the genre, at least in Netflix's extended DVD catalog; and 3) they only wanted genres that made syntactic sense. 

We ignore all of these constraints and that's precisely why our generator is hilarious. In Netflix's real world, there are no genres that have more than five descriptors. Four descriptors are rare, but they do show up for users: Scary Cult Mad-Scientist Movies from the 1970s. Three descriptors are more common: Feel-good Foreign Comedies for Hopeless Romantics. Two are widely used: Steamy Mind Game Movies. And, of course, there are many ones: Quirky Movies.

Presented by

Join the Discussion

After you comment, click Post. If you’re not already logged in you will be asked to log in or register.

blog comments powered by Disqus