The Ghost's Vocabulary

How the computer listens for Shakespeare's "voiceprint"

This pattern-finding mathematics is widely used in physics and engineering, in deciphering television and radar signals, for example. Given a long list of words—not simply the "blue" and "Green" of the example, but dozens more—the computer can quickly tell how Shakespeare typically balanced those words. "You might have a pattern," Valenza says, "with a lot of 'love,' very little 'hate,' and a good deal of 'woe.'" A different writer might use the same words, and even use them at the same rates as Shakespeare, but the configurations might be different. The result is that a given list of words produces a kind of voiceprint for each author.

Valenza and Elliott examined "common but not too common" words that Shakespeare used. To examine rare words, Valenza had reasoned, would be like trying to identify a voice from a whisper, and to examine common words would be to let the writer shout into your ear. The final, fifty-two-word list—with such miscellaneous entries as "about," "death," "desire," "secret," and "set"—was assembled by trial and error. It consisted of words with two key properties. In various works of Shakespeare's those words are used in patterns that yield the same voiceprint each time. And when other writers are tested, the same words yield voiceprints that are different from Shakespeare's.

The machinery in place, Valenza and Elliott began by testing Shakespeare's poetry against that of thirty other writers. Exciting results came quickly: The disputed "Shall I Die?" poem seemed not to be Shakespeare's after all. Three of the leading claimants to Shakespeare's work—Francis Bacon, Christopher Marlowe, and Sir Edward Dyer—were decisively ruled out. To Elliott's good-humored consternation, the test dealt just as harshly with the claims put forward on behalf of the Earl of Oxford. Worse was to follow. For even as this first round of tests ruled out the best-known Shakespeare candidates, it left a few surprising contenders. One possibility for the "real" Shakespeare: Queen Elizabeth I. "That did it for our chance of appearing in Science," Elliott laments, "But it vastly increased our chance of getting into the National Enquirer." (To his dismay, Elliott did find himself in Science, not as the co-author of a weighty research paper but as the subject of a skeptical news brief with the headline "Did Queen Write Shakespeare's Sonnets?")

Valenza and Elliott have since conducted more-extensive tests that have ruled out Queen Elizabeth. But the mishap highlights a risk that is shared by all the number-crunching methods. "If the glass slipper doesn't fit, it's pretty good evidence that you're not Cinderella," Elliott points out. "But if it does fit, that doesn't prove that you are."

The risk of being fooled is least for someone who combines a deep knowledge of literature with some statistical insight. Donald Foster, a professor of English at Vassar College, fits that bill. Foster's scholarship is highly regarded. Soon after "Shall I Die?" was presented to the world, for example, he wrote a long debunking essay that persuaded many readers that the poem was not Shakespeare's. In a more recent essay he consigned whole libraries of research to the scrap heap. Hundreds or thousands of articles have been written to explain the epigraph to Shake-speare's Sonnets, which begins, "To the onlie begetter of these insuing sonnets, Master W.H." Who was W.H.? Foster's solution to the mystery, which won him the Modern Language Association's Parker Prize, is that W.H. was...a typo. The publisher, who wrote the epigraph as a bit of flowery praise to honor Shakespeare, had intended to print "W.SH."

Those essays had nothing to do with statistics, but Foster has done some statistical sleuthing of his own, and he is well aware of the hazards. One scholar compared Shakespeare's plays with someone else's poems, for example, and concluded that Shakespeare used the present tense more than other writers do. Another compared Shakespeare with later writers and concluded that he used many four-letter words, whereas other writers used shorter words—forgetting that archaic words like "thou" and "hath" drive Shakespeare's average up. "There are strong and compelling reasons for avoiding this kind of research," Foster says, "because it's so difficult to anticipate all the pitfalls." But Foster himself has often given way to temptation. Like many Shakespeareans, he steers clear of the "authorship question," but he has looked into a pertinent mystery.

Shakespeare acted in his plays. But with two exceptions, we don't know what roles he took. Foster believes he has found a statistical way to gather that long-vanished knowledge. "It occurred to me," he says, "that Shakespeare may have been influenced in his writing by the parts he had memorized for performances and was reciting on a more or less daily basis." Last year Foster figured out a way to test that hunch. "The results," he says, "have been absolutely stunning."

"We started by using a concordance to type in all the words that Shakespeare used ten times or fewer," Foster says. These aren't exotic words, necessarily, just ones that don't crop up often in Shakespeare. Scholars have known for some time that these "rare" words tend to be clustered chronologically. Foster found that if two plays shared a considerable number of rare words, in the later play those words were scattered randomly among all the characters. In the earlier play, the shared words were not scattered. "In one role," Foster says, "there would be two to six times the expected number of rare words." There stood Shakespeare: the words that Shakespeare the writer had at the tip of his pen were the ones he had been reciting as Shakespeare the actor.

If Foster is right, Shakespeare played Theseus in A Midsummer Nights Dream and "Chorus" in Henry V and Romeo and Juliet. In play after play the first character to come on stage and speak is the one that Foster's test identifies as Shakespeare: John Gower in Pericles, Bedford in Henry VI, Part I, Suffolk in Henry VI, Part II, and Warwick in Henry VI, Part III. And Foster's test picks out as Shakespeare's the two roles that we have seventeenth-century evidence he played: the ghost in Hamlet and Adam in As You Like It.

The theory can be tested in other ways. It never assigns to Shakespeare a role we know another actor took. The roles it does label as Shakespeare's all seem plausible—male characters rather than women or children. The test never runs in the wrong direction, with the unusual words scattered randomly in an early play and clustered in one role in a later play. On those occasions when Foster's test indicates that Shakespeare played TWO roles in a given play—Gaunt and a gardener in Richard II, for example—the characters are never onstage together. Foster's theory passes another test. When Foster looks at the rare words that Hamlet shares with Macbeth, written a few years later, those words point to the ghost in Hamlet as Shakespeare's role. And if Foster looks at rare words that Hamlet shares with a different play also written a few years later—King Lear, for example—those shared words also pick out the ghost as Shakespeare's role.

Additional evidence has been uncovered. After Hamlet, the ghost's vocabulary exerted a strong influence on Shakespeare's writing and then tapered off. But Shakespeare's plays went in and out of production. When Hamlet was revived several years after its first staging, and Shakespeare was again playing the ghost, he began again to recycle the ghost's vocabulary.

It is a strange image, a computer fingering a ghost. But it is a sign of things to come. Eventually the prejudice against computers in literary studies will give way. "The walls are sure to crumble," Ward Elliott says, "just as they did in baseball and popular music....Some high-tech Jackie Robinson will score a lot of runs, and thereafter all the teams in the league will pursue the newest touch as ardently and piously as they now shrink from it."

Presented by

Join the Discussion

After you comment, click Post. If you’re not already logged in you will be asked to log in or register with Disqus.

Please note that The Atlantic's account system is separate from our commenting system. To log in or register with The Atlantic, use the Sign In button at the top of every page.

blog comments powered by Disqus


Cryotherapy's Dubious Appeal

James Hamblin tries a questionable medical treatment.


Confessions of Moms Around the World

In Europe, mothers get maternity leave, discounted daycare, and flexible working hours.


How Do Trees Know When It's Spring?

The science behind beautiful seasonal blooming

More in Entertainment

More back issues, Sept 1995 to present.

Just In