IN the April, 1948, issue of a publication called The Scientific Monthly an article appeared under the title "The Measurement of Mental Systems (Can Intelligence Be Measured?)." The authors, W. Allison Davis and Robert J. Havighurst, were well-known liberal educators; Davis had written a book about the plight of young Negroes in the South, called Children of Bondage, and was one of the first black professors ever to get a job at a white university; Havighurst had recently worked as a consultant to the president of Harvard, James Bryant Conant. The authors believed that intelligence tests were a fraud -- a way of wrapping the fortunate children of the middle and upper-middle classes in a mantle of scientifically demonstrated superiority. The article considerably troubled Henry Chauncey, the head of the fledgling Educational Testing Service.
Even in 1948 the debate over IQ tests, which were not yet fifty years old, had an eternal quality. The leading intelligence-testing researchers tended to be true believers who thought they had found a way to measure the one essential human ability -- what the British psychometrician Charles Spearman, in a famous 1904 article, called "the general factor," or g. Most psychometricians considered intelligence to be a substantially inherited, biologically grounded trait.
The overall results of intelligence tests have always produced a kind of photograph of the existing class structure, in which the better-off economic and ethnic groups are found to be more intelligent and the worse-off are found to be less so. In his book analyzing the results of the intelligence tests that the Army had given recruits during the First World War, for example, Carl Brigham, an early psychometrician and the father of ETS's leading test, the Scholastic Aptitude Test, reported that the highest-scoring identifiable group was Princeton students -- this at a time when, by today's standards, Princeton was a den of carousing rich boys.
One of the main findings of American intelligence testing in Brigham's day was that immigrants were less intelligent than native-born Americans, with the most recent crop of immigrants being the least intelligent ever. The highest-scoring group today is Jews, but in 1923, when most American Jews were recent immigrants, Brigham reported that "our figures ... would rather tend to disprove the popular belief that the Jew is highly intelligent... " Aggregate IQ-test results have always been held up as proof of the innate mental superiority of some ethnic groups over others; among the g men, eugenicist policies like stringent immigration restrictions and measures to discourage reproduction among people with low IQ scores have long found advocates. A taste of this view, in an especially ill-tempered form, can be had from a letter that Ben Wood, another founding father of testing and by then a crotchety old man, wrote in 1972 to someone at the Educational Testing Service:
It may be said in all soberness that professional "reliefers" and most other indigents who produce children thereby commit crimes against humanity which are fully as serious as many acts now considered felonies. They have no moral right to produce such children, and therefore should have no legal right to immunity from punishment that constructively "fits the crime," such as some form of painless sterilization of both guilty parents, which would be permanent....
By the 1920s, not long after the introduction of IQ tests, a sweeping liberal critique of them had appeared. Its three main tenets were: first, that the tests were measuring cultural conditioning rather than a biological trait; second, that there was no such thing as g -- a single human ability more important than all others -- but instead a group of human abilities; and third, that IQ tests could be misused to classify millions of young people as mentally inferior and so to deny them opportunity. To liberals, IQ tests mostly measured the taker's education and language fluency, which explained why poor people and immigrants tended to score so low.
Not all critics of the idea that intelligence is inherited and immutable were psychometrically illiterate save-the-world types. As Stephen Jay Gould pointed out in The Mismeasure of Man, several of the fathers of intelligence testing, including Lewis Terman, H. H. Goddard, Carl Brigham, and even Charles Spearman, backed away from their g enthusiasm over time. Brigham, especially, became a vituperative critic of g. From the standpoint of ETS, perhaps the most important of the g critics was Louis Leon Thurstone, of the University of Chicago. A former assistant to Thomas Edison and a founder of the Psychometric Society and the journal Psychometrika, Thurstone was so austerely dedicated to his work that he had a blackboard mounted in his house so that he could conduct seminars. Thurstone believed that human intelligence consisted of several distinct factors rather than any single one, and he helped to develop a technique called factor analysis that would enable this truth to be expressed statistically.
But at the time of the founding of ETS, in 1948, the IQ debate was dormant. IQ testing had been a major public issue in the early 1920s and would later become one again, but for some time there had been no great controversy about it. Congress passed laws severely restricting immigration in 1921 and 1924, with the result that people stopped worrying that the country was being flooded by the mentally inferior. During the Depression perhaps a third of all Americans were poor -- it was difficult to chalk up to feeblemindedness the difficulties of so many people. And Adolf Hitler's embrace of eugenics put the movement in a bad light.
Because IQ was not an issue in the late 1940s, a curious situation obtained: the father of the SAT, Brigham, was on record as believing that there was no such thing as general intelligence, but the main promoters of the wide use of the SAT regarded it rather casually as an intelligence test. In 1937 Henry Chauncey, then an assistant dean at Harvard engaged in persuading the Ivy League schools to adopt the SAT as a scholarship test, made the following notes:
1) Much less in the way of factual or formal learning is now felt to be necessary for the successful carrying on of college work than was expected a generation ago.
2) Possibly more intellectual powers -- ability to handle problems in a chosen field of study -- are required today than formerly.
3) Intelligence tests are an aid to admissions practically unknown a generation ago.
As Chauncey remembers it now, during the 1940s Conant, who had long since approved the SAT for use in Harvard's national scholarship program, used to say he suspected that the SAT was actually an achievement test. Chauncey would argue back that it measured a combination of innate and learned qualities. It was clear to Chauncey that the more Conant thought that the SAT measured native intelligence, the happier he would be with it.
UNCOMFORTABLY SIMILAR TESTS
B UT now here was the article in The Scientific Monthly, attacking intelligence tests for measuring only "a very narrow range of mental activities" and for being "a strong cultural handicap for pupils of the lower socioeconomic groups." Without mentioning the SAT specifically, the article treated academic-aptitude and IQ tests as closely related -- which, in fact, they were and still are. IQ tests have always heavily stressed reading comprehension and vocabulary items like analogies and antonyms, and so does the verbal section of the SAT. Back in the early days Carl Brigham published a scale for converting intelligence-test scores to SAT scores. Paul Diederich, a contemporary of Henry Chauncey's who was for decades a researcher at ETS, expresses a view common in the field: "IQ tests are reading comprehension and vocabulary doctored up to look like reasoning. To change the SAT to an IQ you'd simply divide the score by an age measure. Basically they're the same thing."
The similarity between IQ and academic-aptitude tests can be explained in part through the testing concept called validation. Even if a test produces a reliable score -- that is, one that changes little between administrations -- it isn't much good unless it can be shown to predict something. Validation is a comparison of the test score with some other outcome. IQ testers have always thought that IQ scores are a good predictor of overall individual success and achievement, but historically it has been difficult to validate IQ convincingly against much besides performance in school. (To make a very long story short, IQ scores tend to be somewhat predictive of job performance but much more predictive of school performance.)
By definition, of course, an academic-aptitude test like the SAT is validated against grades in school. To the extent that IQ testers know that their work will be validated against school performance, they construct their tests to predict school performance as accurately as possible. The SAT is constructed with prediction of school performance as the sole goal. All tests validated by school grades will necessarily be quite similar.