A look at new software that could transform journalism
In a few short years, we've learned to delegate all manner of tasks to computers. For music recommendations or driving directions or academic scouring, we readily turn to our clever machines. They do it better most of the time, and with much less effort.
Now computers have proven competence—no, fluency—in yet another aspect of human life: writing. Narrative Science, a Chicago-based startup, has developed an innovative platform that writes reported articles in eerily humanlike cadence. Their early work focused on niche markets, clients with repetitive storylines and loads of numeric data—sports stories, say, or financial reports. But the underlying logic that drives the process—scan a data set, detect significance, and tell a story based on facts—is powerful and vastly applicable. Wherever there is data, Narrative Science founders say, their software can generate a prose analysis that's robust, reliable, and readable.
For example: One high-profile client, Forbes magazine, uses the platform to create what Forbes writer Lewis Dvorkin calls "computer-generated company earnings previews." Each day, the platform sorts through recent stock data to profile a notably performing company. Another client is The Big Ten Network, which uses Narrative Science to create automatic sports recaps based on box scores and player data. Though these pieces lack the verve of, say, Chuck Klosterman's sportswriting, the highly customizable platform does adopt a sports fan's idiomatic shorthand: "Cincinnati was hot from long range," one Narrative Science recap runs, "hitting 9-of-23 threes for a 39 percent night from beyond the arc." Similarly, the iPhone app Gamechanger, which coaches and parents use to score Little League games, has a "recap" service enabled by Narrative Science. Mark the final out and, kapow, you've got a print-ready article about the game. In theory, you could even receive recaps with a personal touch, nine innings retooled around the feats and foibles of your little tyke.
I traveled to Chicago to meet the Narrative Science founders and learn more about their work. They claim their technology will reshape our relationship to data, media, and the way we consume information—and, after several hours of interviews, I believe them. The concern in some quarters is that Narrative Science, with its ability to generate reams of cheap, instantaneous content—is going to make human writers obsolete. The truth, however, is more complicated.
RISE OF THE META-JOURNALIST
MORE ON BOOKS
Every startup has its rosy vision for the world. Mark Zuckerberg wants to make people more connected. Sergey Brin wants to make great content more findable. Kris Hammond, Narrative Science co-founder and CTO, wants to make things easier to read.
"Data is tremendously valuable," he told me. "It's unbelievably valuable. But it's not valuable as a spreadsheet of numbers. It's valuable based on the insights that you can glean from it." We're swimming in numeric data, he insists, almost drowning in it—which strikes him as odd because most people don't actually like numbers very much. Spreadsheets confound us because human beings think in stories. So, in Hammond's view, wherever there are numbers, we should have stories instead--and that's where Narrative Science comes in. "In the long run," he said, "our technology ends up being the mediator between data and the human experience."
When I ask him what this means for human writers, he points out that his work has long been a collaboration between computer scientists and journalists. In his ongoing work at Northwestern's Intelligent Information Laboratory, which he co-chairs with Narrative Science Chief Scientific Advisor Larry Birnbaum, he routinely partners with students and faculty at the University's Medill School of Journalism to create from "cross-functional teams" of writers and coders. (This itself is a pioneering move, as journalists and computer scientists tend not to cross paths in scholarship or public life.) In fact, it was this dynamic interplay that lead to Stats Monkey, a baseball recap platform that became the prototype for today's authoring platform.
Birnbaum and Hammond, both Yale-educated professors of computer science, have academic backgrounds in linguistic systems—and their serious interest in the science of story arc is apparent at Narrative Science. Here, because they each contribute such valuable work, writers and coders inhabit the same hierarchical plane. Programmers are crucial because they maintain and improve the robust authoring platform that is the company's foundation. This foundation is enormously powerful. "We've created a horizontal platform that's vertically agnostic, industry agnostic," CEO Stuart Frankel told me. "We can write just about any kind of content, using any kind of data." But each client not only has different rules—house style, publication tone, specialized vocabulary. They also tell different kinds of stories. That's why Narrative Science needs journalists.
When Narrative Science inks a deal with a new client, their writers begin work customizing the existing platform within a configuration layer. House style—how to format names and dates, when to italicize, and so on—is the easy part. What takes more time is establishing the facts and inferences that will conceivably be drawn from client data, as well as a "constellation" of possible story angles through which the data might be presented. In the case of baseball, this means "all the scenarios that might be derived from the raw data of a box score": the slugfest, the shutout, the pitcher's duel, the back-and-forth, postponed by rain, on and on.
In this way, Narrative Science writers don't think about specific stories as much as they outline a web of story possibilities. "They know how to configure our technology to allow them to become what are essentially meta-journalists," Frankel told me, "people who can write millions of stories opposed to a single story at a time." As the technology progresses, we may see more and more writers working on this macro level.
WE RECOIL FROM THE DULL
But using Narrative Science to write baseball games is a little like hammering a nail with an atom bomb. The platform's inference engine, Hammond says, is supported by "hardcore data analytics"—it can handle vast, truly complex information, data sets that would boggle any human mind. In this regard, the platform may one day serve as a kind of all-star assistant for human journalists.
Imagine, for instance, the prospect of deducing how Twitter users feel about the Republican presidential candidates on a particular day. A human journalist simply couldn't do it—trying to monitor any significant sample size would be impossible. Twitter moves so fast, and at such a high volume, that it eludes us. The problem with social media," Hammond writes on his blog, "is that there's so damned much of it."
But Narrative Science is beta-testing an initiative that can monitor all of Twitter for trends in content, using the Republican contenders (for now) as their frame. "Newt Gingrich has been consistently popular on Twitter, as he has been the top riser on the site for the last four days," the platform reported in February. "While the overall tone of the Gingrich tweets is positive, public opinion regarding the candidate and character issues is trending negatively. In particular, @MommaVickers says, 'Someone needs to put The Blood Arm's 'Suspicious Character' to a photo montage of Newt Gingrich. #pimp.'" Sure, it's a little dry. But this kind of holistic perspective, in the future, will be useful for human writers (not to mention advertisers) as we try to wrangle social media sandstorms into something we can hold.
Now consider how valuable this kind of data-combing could be for investigative journalists. In his novel The Pale King, the late David Foster Wallace argued that the era of secrecy is over: The post-Watergate government hides secrets in plain sight, obscuring them in an unchartable morass of freely available information. The result? We lose interest, civilians and journalists and activists alike.
"One of the great and terrible PR discoveries in modern democracy," the book's narrator tells us, "is that if sensitive issues of governance can be made sufficiently dull and arcane, there will be no need for officials to hide or dissemble, because no one not directly involved will pay enough attention to cause trouble. No one will pay attention because no one will be interested...we recoil from the dull."
Look no farther than Wikileaks' imposing data dumps to see this illustrated. The true significance of these revelations cannot be measured because no one has the time or resources to study them completely. In theory, Narrative Science could change that, working like a team of cheap interns to scour the dross, find the gems, and deliver insight. With bales and bales of mind-numbing government and corporate documents to sort through, Narrative Science could eventually help writers find the needle in the haystack.
It's worth mentioning that most journalists won't be able to afford Narrative Science services on their own, but outlets like The Atlantic could make the service available to their writers.
AN AUDIENCE OF ONE
Both Hammond and Frankel insisted that, while Narrative Science will certainly replace some types of human-generated writing, the stories they're most excited about are the ones journalists rarely cover. Because of readership expectations, no journalist would write a story with relevance to only one person, or a few—sports writers, for instance, don't write about Little League games in the first place. That's why the company's putting special effort into what they call "audience of one" applications—narratives that bring professional-caliber prose insight where right now we only have confusing data.
Hammond asks me to imagine a world in which medical test results provide not swaths of obscure-looking numbers, but physician-quality written notes on how you're faring (and what you can do to improve). Where your energy bill monitors usage trends and suggests ways to save power and money. Instead of simply tallying wrong answers, your kid's standardized test results make highly specific study suggestions—in language that would do an English teacher proud. Log in to check your portfolio, you'll get an expert analysis on how your stocks are doing, with suggestions on what to trade our buy. "Any place where there are numbers," Hammond said, "and people have a hard time ingesting those numbers, is a place for us."
Hammond hopes that Narrative Science will pave the way for the small-scale or individual stories that journalists overlook. People analyze trends in large metro areas like Chicago he says, because that's what makes sense. But the authoring platform could "take a combination of IRS data, American Community Survey data, census data, department of labor data, and turn that into a story for every single metro area in the country." In his view, there's no reason every tiny town couldn't have a comprehensive yearly story of who it is, where it's been, and where it's going. It's just one example of how future stories are going to get much more personalized.
This kind of micro-personalization, though, has some potentially disturbing applications. As Slate's Evgeny Morozov notes in a recent article, "automated journalism" could result in news stories appearing differently to different readers. Someone who spends time at The New York Review of Books or The Economist, Morozov argues, might be served a more challenging, sophisticated take than a chronic TMZ commenter—even if they're reading the "same" piece. This could exacerbate the confirmation bias—readers searching for stories that confirm their beliefs—that Internet browsing already makes possible.
To be clear, Narrative Science is not specifically working on news stories that appear differently to different readers. But the Internet is already moving rapidly towards a "customized" experience, so advertisers and content providers will inevitably find their platform's personalization capacity appealing. The company already helps web marketing companies understand the data they mine from us—Frankel was a senior executive at ad-serving company Doubleclick before Google snapped it up—and they (or future competitors) may find that individualized web content is too lucrative to pass up. Morozov's right to be concerned about a future in which "objective" reporting recedes as browsing history and online purchases affect the way we read about current events.
CHARTING THE SEAS OF BIG DATA
Though computer authoring will almost certainly reshape our relationship to content, Narrative Science will also have huge impact in the growing arena of corporate data collection and management. "We're looking at a situation where every company worth its salt right now is metering and monitoring their business processes, and amassing huge databases of information," Hammond told me. Cost, production, sales, and earnings figures are scrupulously measured over in ever-broadening array of categories, with increasingly sharp detail. Frankel says the standard business mindset is to collect "as much data as possible" in order to become more competitive and profitable.
But here's the strange thing about our current moment: Though companies invest heavily in data collection, they can only work their findings in very limited ways. Since there's a deluge of information, much of it radically new, many data collectors simply throw up their hands. "It's painful to see how much data has gone fallow," Hammond says. Gleaning understanding from these lodes of information—what Hammond calls "big data"—is a primary focus for Narrative Science.
Frankel told me about one client, an unnamed fast food company, that created an expensive business analytics framework to monitor and aggregate point-of-sale data, in real time, at every franchise location. They made this information available to franchise owners, who were so overwhelmed that 90% of them never used the system at all. The company hired Narrative Science to create a reporting layer on top of all the data—and when the project rolls out, each franchisee will receive a weekly prose assessment sent directly to their inbox. "It's a story now," Frankel said, "about activity in their store. How they're doing in absolute terms, how they stack up against other locations. Most importantly, they'll learn about what products are performing well, or not as well, and what they might do [to improve]."
The capacity to draw automated insight from high volumes of data will potentially cause a sea change in the way businesses monitor and evaluate commerce. I suspect that these advances will prove especially powerful in the realm of Internet data—as Atlantic editor Alexis Madrigal recently reported, most websites keep close tabs on viewers, spawning a robust industry of web data collectors. As personal data becomes more immediately useful and understandable, it also becomes more desirable—impelling collectors to mine more information, more aggressively. When it comes to the Internet, we may see privacy standards radically redefinied.
As Narrative Science continues refining and improving their authoring platform, two future grails stand out.
First, Hammond would like to be able to train the platform to look for conclusions that haven't yet occurred to human clients. Though the platform can search for spikes and correlations—trends that might surprise clients in a "Freakonomics" sense—it can only report on story possibilities that human programmers have trained it to "see." In looking at geological data, say, the software could potentially catch a surprising link between hydrofracking and increased earthquakes—but wouldn't, unless humans had asked for an assessment of this possibility.
That's why Hammond yearns to improve the software until it can look for insights that haven't yet occurred to its creators—the Rumsfeldian "unknown unknowns" that continually elude us. "We can't do it now," he told me, "but the entire notion behind the platform is to get there." "As the system becomes smarter and smarter," Frankel predicted, "it will be able to draw on data that it analyzes to make its own conclusions." Eventually, he said, the platform will be able to "draw some conclusions without even first knowing what the subject matter is."
Second, they hope to move beyond numbers. Though humans delve in stories and narratives, computers are simply much more adept with numbers. The data sets Narrative Science works best with are primarily numeric, what's known as "structured data." Frankel told me that the platform already works with some "unstructured data"—it can understand the driving "sentiment" within a Tweet or blog comment, for instance. But further developments in computer understanding of human language could blow the current technology open. When Narrative Science can scan written documents with the same comprehension it brings to number sets, its viability increases dramatically.
Taken together, these two advancements—the ability to drawn unique conclusions and the ability to work with difficult unstructured data—would render an astoundingly human authoring platform. One that could read, say, all the artificial intelligence scholarship ever written in an afternoon, and then use it to make original claims.
FATE OF THE HUMAN WRITER
As a journalist and fiction writer, it of course struck me to think about the relevance of all of this to what I do. I arrived at the Chicago office prepared to have my own biases confirmed—that the human mind is a sacred mystery, that our relationship to words is unique and profound, that no automaton could ever replicate the writerly experience. But speaking with Hammond, I realized how much of the writing process—what I tend to think of as unpredictable, even baffling—can be quantified and modeled. When I write a short story, I'm doing exactly what the authoring platform does—using a wealth of data (my life experiences) to make inferences about the world, providing those inferences with an angle (or theme), the creating a suitable structure (based on possible outcomes I've internalized from reading and observing and taking creative writing classes). It's possible to give a machine a literary cadence, too: choose strong verbs, specific nouns, stay away from adverbs, and so on. I'm sure some expert grammarian could map out all the many different ways to make a sentence pleasing (certainly, the classical orators did, with their chiasmus and epanalepsis, anaphora and antistrophe).
Hammond tells me it's theoretically possible for the platform to author short stories, even a statistically "perfect" piece that uses all our critical knowledge about language and literary narrative. Such attempts have been made before—Russian musicians once wrote the "best" and "worst" songs ever, based on survey data. But I suspect that a computer's understanding of art will never quite match our own, no matter how specific our guidelines become. Malcolm Gladwell writes about this effect in Blink, noting how, for reasons that sometimes confound us, supposedly market-perfect media creations routinely tank.
Besides, the best journalism is always about people in the end—remarkable individuals and their ideas and ideals, our ongoing, ever-changing human experience. In this, Frankel agrees.
"If a story can be written by a machine from data, it's going to be. It's really just a matter of time at this point," he said. "But there are so many stories to be told that are not data-driven. That's what journalists should focus on, right?"
And we will, we'll have to, because even our simplest moments are awash in data that machines will never quantify—the way it feels to take a breath, a step, the way the sun cuts through the trees. How, then, could any machine begin to understand the ways we love and hunger and hurt? The net contributions of science and art, history and philosophy, can't parse the full complexity of a human instant, let alone a life. For as long as this is true, we'll still have a role in writing.