Last week there was some debate across the blogosphere about race and IQ (again), much of it springing from the controversy of Jason Richwine's dissertation, "IQ and Immigration Policy." You can read my thoughts here, here, and here. One helpful critique made of these posts by Razib Khan held that they could use more science. Razib added some of that in his own post which was rooted in this paper:"Characterizing the Admixed African Ancestry of African Americans."
I read the paper, understood most of it, but was basically lost trying to understand the graphs. (It's true that my math and science foundation is fairly weak.) So I read it again. Still not quite getting it, I reached out to one of the authors -- geneticist Neil Risch, who directs the Institute for Human Genetics at University of California San Francisco. Professor Risch agreed to chat with me via e-mail. He also sent me these two papers ("The Importance of Race and Ethnic Background in Biomedical Research" and "Assessing Genetic Contributions to Phenotypic Differences Among 'Racial' and 'Ethnic' Groups"), which I found enlightening and would urge everyone to read.
I want to thank Professor Risch for his time. Our conversation is below.
Thanks for agreeing to talk with me, Professor Risch. I've been involved in a good number of conversation around race lately -- specifically regarding race and IQ. I was referred to a paper you wrote with some co-authors on the African ancestry of African Americans. My science background is not particularly strong, and I'd like to bring more science (and less humanities) to my readers on this topic.
Let's start with the dumb and simple questions first. In your paper "Characterizing the admixed African ancestry of African Americans," we see a chart (Figure 1) depicting a "Principal components analysis of Africans, U.S. Caucasians and African Americans." For the less mathematically literate among us, can you explain what we're seeing?
In reference to Figure 1, there are primarily 2 different types of analysis we used. Both are based on genetic information. As you have probably read, everyone has 23 pairs of chromosomes, 22 of which are autosomes and one pair is the sex chromosomes (X and Y). Most of that sequence (about 3 billion nucleotides) is identical among individuals, but there are millions of locations where people can differ. In our analysis, we focused on about 450,000 of them. Modern technology now allows us to get a pretty good look at DNA sequence variation among individuals.
The two analyses we used were admixture analysis and principal components analysis (PCA). The first figure shows results of both. What we attempted to do in the bar chart (the admixture analysis) was to estimate for each African American in our study, what proportion of their genome derived from different population groups. In this case, we were focusing on potentially different African subgroups, but did not focus on possible European subgroups. In theory, we possibly could also have done the latter, but because the proportion of ancestry in this sample that was from Europe was not large, that would be more challenging. Our ability to do this analysis depends on how much genetic differentiation there has been between the possible ancestral groups included in the analysis (e.g. the Mandenka, Yoruba, San, Mbuti and Biaka, as well as Europeans). I should be clear that these represent current day populations, and are only possible surrogates for the actual ancestral groups.
That differentiation is depicted in the rest of that figure (the dots with population labels) - which was the result of the PCA. In PCA, you define a few variables that explain the most variation in the data (in this case it is the genetic information). It is a data reduction procedure - in other words, to reduce the information in 450,000 genetic markers to just a few variables. Here we present the first two such reduced variables that explain most of the variation. As you can see, the Europeans and Africans are very well separated on the X axis while the African subgroups are well separated on the Y axis. This separation is what allows us to estimate the proportion of ancestry in the African Americans from each of these groups. You will notice that the African Americans (purple triangles) fall on a line between the Yoruba/Mandenka and the Europeans. This indicates that they have mixed ancestry that is both African and European. The broad spread of the purple triangles along the X axis indicates that individual African Americans in this study vary quite a bit in terms of how much of their ancestry is African and how much is European. You can also see that the majority of the African ancestry is Central-West African because of the approach on the X axis towards the Yoruba/Mandenka. If there were a substantial amount of ancestry from the other groups, you would see the line going more horizontally and less to the upper right.
The actual proportions of ancestry are then given in the bar chart. Here again, you can see the varying amount of European ancestry. It is also confirmed in the bar chart that most of the African ancestry is Central-West African.
You can also see in the bar chart that compared to the European ancestry component, there is much less variation in the different African subgroup components of ancestry -- in other words, there aren't some individuals who have much more Yoruban ancestry and others that have much more Bantu ancestry. This is why we concluded that it is likely that mating patterns in African Americans probably did not strongly reflect actual origins in Africa. And that is also the reason we concluded that because most African Americans appear to have admixed African ancestry, looking only at a single genetic location (e.g. the Y chromosome or mtDNA, as often done by ancestry companies) gives only a narrow picture of the entire ancestry.
Subsequent figures in the paper pretty much reinforce these conclusions.
OK, so that helps. A lot. Here is another question. I want to know what someone with your background thinks about the notion of "race." As a writer, I approach this through the lens of history. I imagine, because of that, I might be missing some things. I want to know, as a geneticist, whether you think of African Americans as a "race?"
I believe it is inaccurate to refer to African Americans as a race or racial group (much as it is similarly inappropriate to refer to Latinos that way) -- unless you move away from the more classical definitions of race. We try to use the term race/ethnicity. There has been a lot of debate about whether genetic variation in the human population is continuous or discrete. From my view, it is both. This is what makes it challenging to create categories.
One question pops out at me. You indicate some suspicion to referring to African-Americans as a "race" but (in some of your research) you support using "race" in terms of collecting med data and disease studies. Is this a case of a definition -- though it may be imperfect, clunky and at times even misleading -- still telling us something? From what I gathered from those articles "race" can be a proxy not just for genetic stuff, but for social phenomenon too (such as access to health care.) Am I seeing that right? Is it correct to say, for instance, "Yes, race is a social construct, but this does not make it meaningless." It still useful to look at "race," for instance, when studying sickle-cell. Perhaps some day, when we have more refined technique, it won't be.