Race, Intelligence, and Genetics For Curious Dummies

A Q&A with a geneticist

Vincent Kessler/Reuters
Last week there was some debate across the blogosphere about race and IQ (again), much of it springing from the controversy of Jason Richwine's dissertation, "IQ and Immigration Policy." You can read my thoughts here, here, and here. One helpful critique made of these posts by Razib Khan held that they could use more science. Razib added some of that in his own post which was rooted in this paper:"Characterizing the Admixed African Ancestry of African Americans."
I read the paper, understood most of it, but was basically lost trying to understand the graphs. (It's true that my math and science foundation is fairly weak.) So I read it again. Still not quite getting it, I reached out to one of the authors -- geneticist Neil Risch, who directs the Institute for Human Genetics at University of California San Francisco. Professor Risch agreed to chat with me via e-mail. He also sent me these two papers ("The Importance of Race and Ethnic Background in Biomedical Research" and "Assessing Genetic Contributions to Phenotypic Differences Among 'Racial' and 'Ethnic' Groups"), which I found enlightening and would urge everyone to read.
I want to thank Professor Risch for his time. Our conversation is below.

Thanks for agreeing to talk with me, Professor Risch. I've been involved in a good number of conversation around race lately -- specifically regarding race and IQ. I was referred to a paper you wrote with some co-authors on the African ancestry of African Americans. My science background is not particularly strong, and I'd like to bring more science (and less humanities) to my readers on this topic.
Let's start with the dumb and simple questions first. In your paper "Characterizing the admixed African ancestry of African Americans," we see a chart (Figure 1) depicting a "Principal components analysis of Africans, U.S. Caucasians and African Americans." For the less mathematically literate among us, can you explain what we're seeing?
In reference to Figure 1, there are primarily 2 different types of analysis we used.  Both are based on genetic information.  As you have probably read, everyone has 23 pairs of chromosomes, 22 of which are autosomes and one pair is the sex chromosomes (X and Y).  Most of that sequence (about 3 billion nucleotides) is identical among individuals, but there are millions of locations where people can differ.  In our analysis, we focused on about 450,000 of them.  Modern technology now allows us to get a pretty good look at DNA sequence variation among individuals.
The two analyses we used were admixture analysis  and principal components analysis (PCA).  The first figure shows results of both.  What we attempted to do in the bar chart (the admixture analysis) was to estimate for each African American in our study, what proportion of their genome derived from different population groups.  In this case, we were focusing on potentially different African subgroups, but did not focus on possible European subgroups.  In theory, we possibly could also have done the latter, but because the proportion of ancestry in this sample that was from Europe was not large, that would be more challenging.  Our ability to do this analysis depends on how much genetic differentiation there has been between the possible ancestral groups included in the analysis (e.g. the Mandenka, Yoruba, San, Mbuti and Biaka, as well as Europeans).  I should be clear that these represent current day populations, and are only possible surrogates for the actual ancestral groups.
That differentiation is depicted in the rest of that figure (the dots with population labels) - which was the result of the PCA.  In PCA, you define a few variables that explain the most variation in the data (in this case it is the genetic information).  It is a data reduction procedure - in other words, to reduce the information in 450,000 genetic markers to just a few variables.  Here we present the first two such reduced variables that explain most of the variation.  As you can see, the Europeans and Africans are very well separated on the X axis while the African subgroups are well separated on the Y axis.  This separation is what allows us to estimate the proportion of ancestry in the African Americans from each of these groups.  You will notice that the African Americans (purple triangles) fall on a line between the Yoruba/Mandenka and the Europeans.  This indicates that they have mixed ancestry that is both African and European.  The broad spread of the purple triangles along the X axis indicates that individual African Americans in this study vary quite a bit in terms of how much of their ancestry is African and how much is European. You can also see that the majority of the African ancestry is Central-West African because of the approach on the X axis towards the Yoruba/Mandenka.  If there were a substantial amount of ancestry from the other groups, you would see the line going more horizontally and less to the upper right.   
The actual proportions of ancestry are then given in the bar chart.  Here again, you can see the varying amount of European ancestry.  It is also confirmed in the bar chart that most of the African ancestry is Central-West African.
You can also see in the bar chart that compared to the European ancestry component, there is much less variation in the different African subgroup components of ancestry -- in other words, there aren't some individuals who have much more Yoruban ancestry and others that have much more Bantu ancestry.  This is why we concluded that it is likely that mating patterns in African Americans probably did not strongly reflect actual origins in Africa.  And that is also the reason we concluded that because most African Americans appear to have admixed African ancestry, looking only at a single genetic location (e.g. the Y chromosome or mtDNA, as often done by ancestry companies) gives only a  narrow picture of the entire ancestry.
Subsequent figures in the paper pretty much reinforce these conclusions.
OK, so that helps. A lot. Here is another question. I want to know what someone with your background thinks about the notion of "race." As a writer, I approach this through the lens of history. I imagine, because of that, I might be missing some things. I want to know, as a geneticist, whether you think of African Americans as a "race?"
I believe it is inaccurate to refer to African Americans as a race or racial group (much as it is similarly inappropriate to refer to Latinos that way) -- unless you move away from the more classical definitions of race. We try to use the term race/ethnicity. There has been a lot of debate about whether genetic variation in the human population is continuous or discrete.  From my view, it is both. This is what makes it challenging to create categories.
One question pops out at me. You indicate some suspicion to referring to African-Americans as a "race" but (in some of your research) you support using "race" in terms of collecting med data and disease studies. Is this a case of a definition -- though it may be imperfect, clunky and at times even misleading -- still telling us something? From what I gathered from those articles "race" can be a proxy not just for genetic stuff, but for social phenomenon too (such as access to health care.) Am I seeing that right? Is it correct to say, for instance, "Yes, race is a social construct, but this does not make it meaningless." It still useful to look at "race," for instance, when studying sickle-cell. Perhaps some day, when we have more refined technique, it won't be. 
Definitions can indeed be "clunky." I would use the phrase race/ethnicity rather than just race because in common parlance it is a better description.   I tend to think that race has been used more in terms of continental origins (Africa, East Asia, Europe, Americas).  On that basis, one would not characterize African Americans as a racial group, but rather as an ethnic group.  We sort of implied this in the Genome Biology paper.  The reason is that African Americans typically have European as well as African ancestry (and possibly other ancestries as well) and are also culturally distinct from Africans.  Sort of similar to Latinos - who from a genetic ancestry standpoint can be nearly anything.  Hence our use of race/ethnicity.
Just to opine a bit, I think part of the problem is the notion of a causal relationship -- i.e. "dark-skin" or "blackness" causing sickle-cell -- as opposed to a more geographic definition that might encompass people regardless of skin color.
Yes, exactly. Groups living in isolation from each other for long periods of time have acquired many genetic differences. The large majority of those are due to "genetic drift" -- i.e. random fluctuations in gene frequencies. That also includes many genetic variants that code for traits and diseases.  But then there are some genetic variants that differ in frequency due to differential selection pressure in different environments. The best examples are for genes that confer resistance to malaria. One of those causes sickle cell disease in those who carry two mutations; those who carry one copy have sickle cell trait, which is generally benign but confers greater resistance to severe malaria infection. Mutations for sickle cell disease are found at pretty high frequency in some African populations, but also found in parts of the middle east and India. Beta thalassemia is another disease where carriers are offered greater protection from malaria. This disease is more common around the Mediterranean (e.g. Greeks).  
Then there is G6PD deficiency.  Mutations for that are found at increased frequency in parts of Africa, but also in the Middle East. The mutations underlying these disorders generally differ geographically, which is another indication that while the mutations are different ancestrally, they achieved high frequency in different populations for similar reasons (i.e. resistance to malaria). Another more recent example is a gene called ApoL1. There are a couple of genetic variants found in West Africans (and African Americans); when carrying two of these, there is an increased risk for kidney disease if hypertensive.  It was shown that these variants likely provide some immunity from African Sleeping Sickness (tsetse fly disease) which may have led to them becoming more common where the disease is prevalent. 
Various populations have an increased frequency of genetic diseases, which are often unique.  Probably a lot or most of it is just chance, but perhaps not all of it. Proving historical selective advantages can be pretty challenging. So, as I mentioned above, groups living in isolation developed their own genetic (and cultural) profiles. Generally, there is no cause and effect between the traits that differentiate groups. East Asians have dark hair and eat with chopsticks.  But there is no causal relationship. You can use a whole variety of different traits to place individuals into the same categories, but those traits may have nothing to do with each other etiologically.
I often hear people say that Africa has the highest genetic diversity in the world. What does that practically mean?
If you sequence the genome of an African individual (pretty much from anywhere except North Africa), you will generally find more locations in their DNA that are variable than for any non-African individual. Why is this the case? Population geneticists believe that the world outside of Africa was initially populated by humans who migrated out of Africa. The presumption is that if the number of such individuals migrating was small, then some of the genetic variation was lost in the process. As I described before, genetic drift (fluctuation in allele frequencies) can happen when a population is small. The random fluctuation means that some alleles increase in frequency and others decrease. The ones that decrease may be lost altogether. You tend to find that the amount of genetic variation decreases along the migration routes out of Africa (more or less by distance from Africa, but of course population bottlenecks can also happen anywhere along the way).
What is the impact of this?  As I mentioned before (and above), random fluctuations in allele frequencies can mean that rare alleles that create risk for a disease may increase in frequency, by chance.  So some diseases may become more common.  But the flip side is that some diseases may also become less common.
One last question. Your paper on assessing genetic contributions to phenotype, seemed skeptical that we would ever tease out a group-wide genetic component when looking at things like cognitive skills or personality disposition. Am I reading that right? Are "intelligence" and "disposition" just too complicated?
Joanna Mountain and I tried to explain this in our Nature Genetics paper on group differences.  It is very challenging to assign causes to group differences. As far as genetics goes, if you have identified a particular gene which clearly influences a trait, and the frequency of that gene differs between populations, that would be pretty good evidence. But traits like "intelligence" or other behaviors (at least in the normal range), to the extent they are genetic, are "polygenic." That means no single genes have large effects -- there are many genes involved, each with a very small effect. Such gene effects are difficult if not impossible to find. The problem in assessing group differences is the confounding between genetic and social/cultural factors. If you had individuals who are genetically one thing but socially another, you might be able to tease it apart, but that is generally not the case. 
In our paper, we tried to show that a trait can appear to have high "genetic heritability" in any particular population, but the explanation for a group difference for that trait could be either entirely genetic or entirely environmental or some combination in between.
So, in my view, at this point, any comment about the etiology of group differences, for "intelligence" or anything else, in the absence of specific identified genes (or environmental factors, for that matter), is speculation.