How? The story begins in 2008. That year, Google Books digitized a number of magazines, including Ebony, Popular Mechanics and New York. Google also digitized the oldest and longest-running journal of matters baseball-related: Baseball Digest, published since 1942 in Evanston, Illinois. A huge number of issues, July 1945 to 2008, had gone online. And the magazines were full of images of the players.
A small group of Wikipedians, dedicated to improving the project's baseball articles, discovered the trove. Their editing, plus the huge, new body of baseball knowledge, soon dramatically improved the encyclopedia. After the digitization, Nagaraj found articles on four decades of All-Stars between 1944 and 1984 grew by about 5,200 words per article.
But his research was able to go further. Because of a small clause in copyright law, all the issues of Baseball Digest from before 1964 had fallen in the public domain -- meaning, that though all of the Baseball Digest articles from 1944 to 1984 were online in full on the Baseball Digest site, Wikipedia editors could only use the images from the earlier years. So Nagaraj created, from his set of All-Stars, two historical sets: a "control" group of players who first played in a game between 1964 and 1984 (and thus likely have Baseball Digest material that remains privately-owned), and a "treatment" group of All-Stars who first played in the big game between 1944 and 1964.
By comparing the two groups, Nagaraj could see the direct effects of copyright on the articles in terms of length, number of images, and traffic. That first metric -- length -- proved resilient to the copyright divide. Words are easy to rescue from private-ownership, and the Wikipedia authors simply rewrote the information still owned by the Digest. Every article, post-digitization, became on average much longer.
But Nagaraj found was that the availability of public domain material dramatically improved the article's images. Before the digitization, players from between '44 and '64 had an average of .183 pictures on their articles. The '64 to '84 group had about .158 pictures. But after digitization, those numbers dramatically changed: there were 1.15 pictures on each of the older group's articles -- but only .667 in the new group. More recent players, covered by privately-owned parts of Baseball Digest, had half as many images on their pages as did old-timers.
And the effects of this -- of just having an image on the page -- cascaded to other metrics. "Out-of-copyright" players' pages saw a significant boost in traffic. Articles from the pre-'64 that were already in the top 10 percent saw their hits increase more than 70 percent. Articles from that group in the least-popular ten percent saw traffic to their articles increase by 25 percent. Those pages were more frequently edited across the board, too. And this makes sense: Google rewards updated content, and it rewards images. The out-of-copyright players provided more of both.