Erin Battat, a historian and author of Ain’t Got No Home: America’s Great Migrations and the Making of an Interracial Left, joined the research, to verify the patterns in Ancestry’s data. She noticed, for example, that Alabama saw an influx of people from South Carolina in the early 19th century. What happened was that intensive cotton cultivation had depleted the soil in South Carolina and Georgia. And in 1814, the Treaty of Fort Jackson compelled the Creek Indians to cede land in Alabama. This set off an episode of “Alabama Fever,” where South Carolinians traveled through Georgia to settle in a new state with land open for cultivation. “That’s the kind of puzzle I was solving as a historian,” says Battat.
In the case of Geni.com’s data, the company allowed scientists from the New York Genome Center, Columbia, MIT, and Harvard to scrape crowdsourced public records that ultimately contained 43 million people, largely in North America and western Europe. It included the single largest known family tree with 13 million people. (And yes, that family tree included Kevin Bacon.)
The researchers were largely geneticists and computational biologists, but they also recognized the potential value of the data for historians and social scientists. So in an analysis published to the preprint server bioRxiv that is not yet peer-reviewed, they looked at several different variables, such as: the distance men traveled before marrying (on average, longer than women) and the genetic relatedness of couples (decreases markedly after 1850). They also noticed that even when couples started marrying people further away from their birth locations, they didn’t stop marrying their relatives right away. The decline in marrying relatives, the team hypothesizes, might have more to do with changing cultural taboos than the ability to move further via swifter transportation. (Yaniv Erlich, who led this work, declined an interview about his paper because the preprint is still under review at a journal.)
These observations are interesting, but do they reveal anything new? Jan Van Bavel at the University of Leuven writes in an email that they largely confirm earlier research in demography, which is the use of statistics to study the structure of historical populations. “But I think that is a good thing.” he writes, “First, these databases need to be validated, i.e. see if they can replicate well-known facts. If that is the case, that is reassuring to go on and use these data to answer new questions.”
One of the drawbacks of these user-generated genealogies is that they are neither a complete nor random sample of the population. It underrepresents people who don’t have descendants or don’t have descendants with an interest in genealogy contributing to these sites. “Modern demographers really want to know about the whole population,” says Philip Cohen, a demographer at the University of Maryland. “We would be very reluctant to generalize to the whole social order.” What it might be most useful for are specific subpopulations, say in a specific region, where the records are quite complete.