Five years ago, Microsoft began scanning the collection of one of the world’s largest libraries: The British Library. Home to more than 14 million books, it’s rivaled only by the Library of Congress in terms of size.
On Friday, we saw some of the first fruits of that digitization. The British Library released more than a million images from its books to the public domain, publishing them to Flickr Commons for anyone to use or adapt. The images come from 46,000 books from the 17th, 18th, and 19th centuries, by authors both revered (Dickens!) and forgotten.
The library’s release comes at the end of a year full of similar, and similarly massive, donations. In August, the J. Paul Getty Trust gave almost 5,000 high-quality images of art—including works by Van Gogh, Rembrandt, and Durer—to the public domain. In the spring, too, the national museum of the Netherlands released over 125,000 images of its works for free use online.
The library is trying out something of a new tack with this release, though. While it knows the title, author, and publishing year of its books, it doesn’t know the content of the images—what they actually depict. So early next year, it says it will roll out a crowdsourcing website and ask the public for its help in identifying the content of the images.
“Our intention is to use this data to train automated classifiers that will run against the whole of the content,” Ben O’Steen, a librarian, wrote on the Library”s Digital Scholarship blog. “The data from this will be as openly licensed as is sensible (given the nature of crowdsourcing) and the code, as always, will be under an open licence.”
In this, the library follows the New York Public Library (NYPL), which has explored crowdsourcing extensively over the past few years. This year, for example, the NYPL asked Internet users to help map and identify buildings in some of the 19th century insurance maps of New York City in its collection; it has classified its menu collection the same way in the past.
It also follows the Library of Congress, which asked Flickr users to tag its public domain images five years ago.
But crowdsourcing in the humanities has a much longer history than the Internet. Robert Brinkley, an American historian, discusses it in his 1920s manifesto-of-sorts, the unfortunately named but exhilarating “New Tools for Men of Letters.” And crowdsourcing gave rise to a landmark of British scholarship of a stature equal, perhaps, to the British Library itself: the Oxford English Dictionary.