Searching for Shakespeare's Fingerprint

A new study claims to be able to graph authors' linguistic styles, sparking lively debate

What's the difference between Shakespeare and Dan Brown? According to a study in the New Journal of Physics, part of the difference lies in the number and frequency of the words they use. By analyzing the works of Herman Melville, Thomas Hardy, and D. H. Lawrence, researchers argued, according to the BBC, that "counting the number of unique words as a particular author's works get longer and longer" can be used to construct a "unique word" curve which functions as a "linguistic fingerprint." Really? While some folks call it a breakthrough, others are skeptical. Even if this is true, they ask, what's the purpose of a complicated equation to confirm what humans already knew? Here's the debate:

  • 'Bold Idea,' Difficult to Prove  Chris of The Lousy Linguist latches onto the researchers' idea, growing out of the "fingerprint" notion, that "every person has their own unique meta book" of vocabulary and preferences from which everything he or she writes is drawn. "Unfortunately," says Chris, "it is ... almost entirely untestable. Keep in mind that this research had zero psycholinguistic component. They were just counting words on pages. I'd caution against drawing any conclusion about the human language system based solely on this work."
  • Bravo--The Computer Can Count  The Guardian's Nicholas Lezard questions the value of teaching a computer to do what "any reasonably well read person should be able to do"--namely, "tell whether a text is by Hardy, Melville, or Lawrence almost at a glance even if they haven't read it before." He points to the "inordinate frequency" of the word "fuck" in D. H. Lawrence's writings. In fact, he declares creating an "algorithim which helps us determine who wrote what" to be of a similar "category of futility as those scientific studies that claim to have determined the formula for female beauty or what makes a really good sandwich." Nor is Lezard impressed by the "meta book" idea, which he quotes to illustrate his point:
"This meta book is an imaginary infinite book which gives a representation of the word frequency characteristics of everything that a certain author could ever think of writing."

When I first read that I thought it was rather nicely Borgesian. But then, when I thought about it a bit more, I realised the idea was meaningless garbage. And anyone who can write a phrase like "word frequency characteristics" without being in some way ashamed of himself is really better not writing anything at all, or having anything to do with other people's writing. I suggest they read more good books.

  • Don't Be So Quick to Dismiss  James Dacey at Physics World thinks the ire of "literary purists" may be misplaced. This isn't, he says, just " another example of uncouth physicists trying to impose rigid mathematical frameworks onto works of unquantifiable beauty." Instead, the study actually confirms the uniqueness of each author's art: "For 75 years, language analysts have assumed that all literature, regardless of author, follows the same statistical pattern when viewed as a whole." Now, these scientists are saying each author has his or her own pattern.
  • Agreed--Particularly If You're an Arrogant Guardian Writer  English professor Alan Jacobs would like to challenge the aforementioned Lezard to test his assertion that "any reasonably well read person" can distinguish between authors "almost at a glance." Forced to identify Charles Dickens among samples of Anthony Trollope, Jacobs thinks Lezard "might discover that styles are more elusive than he thinks. In this context I am reminded of wine tasting: experts pronounce with great confidence on the traits of various wines, but their judgments are highly inconsistent and their palates easily fooled."
