Last week there was some debate across the blogosphere about race and IQ (again), much of it springing from the controversy of Jason Richwine's dissertation, "IQ and Immigration Policy." You can read my thoughts here, here, and here. One helpful critique made of these posts by Razib Khan held that they could use more science. Razib added some of that in his own post which was rooted in this paper:"Characterizing the Admixed African Ancestry of African Americans."
I read the paper, understood most of it, but was basically lost trying to understand the graphs. (It's true that my math and science foundation is fairly weak.) So I read it again. Still not quite getting it, I reached out to one of the authors -- geneticist Neil Risch, who directs the Institute for Human Genetics at University of California San Francisco. Professor Risch agreed to chat with me via e-mail. He also sent me these two papers ("The Importance of Race and Ethnic Background in Biomedical Research" and "Assessing Genetic Contributions to Phenotypic Differences Among 'Racial' and 'Ethnic' Groups"), which I found enlightening and would urge everyone to read.
I want to thank Professor Risch for his time. Our conversation is below.
Thanks for agreeing to talk with me, Professor Risch. I've been involved in a good number of conversation around race lately -- specifically regarding race and IQ. I was referred to a paper you wrote with some co-authors on the African ancestry of African Americans. My science background is not particularly strong, and I'd like to bring more science (and less humanities) to my readers on this topic.
Let's start with the dumb and simple questions first. In your paper "Characterizing the admixed African ancestry of African Americans," we see a chart (Figure 1) depicting a "Principal components analysis of Africans, U.S. Caucasians and African Americans." For the less mathematically literate among us, can you explain what we're seeing?
In reference to Figure 1, there are primarily 2 different types of analysis we used. Both are based on genetic information. As you have probably read, everyone has 23 pairs of chromosomes, 22 of which are autosomes and one pair is the sex chromosomes (X and Y). Most of that sequence (about 3 billion nucleotides) is identical among individuals, but there are millions of locations where people can differ. In our analysis, we focused on about 450,000 of them. Modern technology now allows us to get a pretty good look at DNA sequence variation among individuals.
The two analyses we used were admixture analysis and principal components analysis (PCA). The first figure shows results of both. What we attempted to do in the bar chart (the admixture analysis) was to estimate for each African American in our study, what proportion of their genome derived from different population groups. In this case, we were focusing on potentially different African subgroups, but did not focus on possible European subgroups. In theory, we possibly could also have done the latter, but because the proportion of ancestry in this sample that was from Europe was not large, that would be more challenging. Our ability to do this analysis depends on how much genetic differentiation there has been between the possible ancestral groups included in the analysis (e.g. the Mandenka, Yoruba, San, Mbuti and Biaka, as well as Europeans). I should be clear that these represent current day populations, and are only possible surrogates for the actual ancestral groups.
That differentiation is depicted in the rest of that figure (the dots with population labels) - which was the result of the PCA. In PCA, you define a few variables that explain the most variation in the data (in this case it is the genetic information). It is a data reduction procedure - in other words, to reduce the information in 450,000 genetic markers to just a few variables. Here we present the first two such reduced variables that explain most of the variation. As you can see, the Europeans and Africans are very well separated on the X axis while the African subgroups are well separated on the Y axis. This separation is what allows us to estimate the proportion of ancestry in the African Americans from each of these groups. You will notice that the African Americans (purple triangles) fall on a line between the Yoruba/Mandenka and the Europeans. This indicates that they have mixed ancestry that is both African and European. The broad spread of the purple triangles along the X axis indicates that individual African Americans in this study vary quite a bit in terms of how much of their ancestry is African and how much is European. You can also see that the majority of the African ancestry is Central-West African because of the approach on the X axis towards the Yoruba/Mandenka. If there were a substantial amount of ancestry from the other groups, you would see the line going more horizontally and less to the upper right.
The actual proportions of ancestry are then given in the bar chart. Here again, you can see the varying amount of European ancestry. It is also confirmed in the bar chart that most of the African ancestry is Central-West African.