The Bias Question

In a surprising challenge to the SAT's reputation as an unbiased measure of student learning, one researcher has argued that blacks do better than matched-ability whites on the harder questions of the SAT—something he believes their scores should reflect

Freedle's attack on the SAT came at a time when ETS and the College Board thought the scholarly debate over the fairness of the test had been settled. The gap between blacks' and whites' performance on the SAT was clear. The blame, they thought, should be placed not on the test but on differences in family income and culture—and on K-12 school policies infected by lingering racism. Many middle-class black and Hispanic families were new to affluence and higher education, and on average, some researchers argued, were not quite as middle-class as their white neighbors. That meant that their children were still at a disadvantage in an academic setting, and only more time and better schooling would close the gap. Furthermore, high schools appeared to be putting minority students in less challenging classes out of a misplaced concern, fed by old stereotypes, that the students would not be able to handle the demands of honors and Advanced Placement or International Baccalaureate courses.

"It is not bias in the tests," Wayne Camara, the vice-president of research and development for the College Board, said in an interview with me this spring. "It is the differences in the opportunities the students have to get a quality education, the kinds of support they have in school and in the community and in the home." The College Board's president, Gaston Caperton, agrees. "If this is a bad test, I wouldn't have taken this job. Wayne wouldn't have come to work here. No college president would have used these tests, which they have for years and years, if it were a bad test. None of us would be part of it."

Freedle more or less accepted the view that the SAT was a useful measuring tool, but he believed the test had a flaw that could be corrected. He first became interested in the SAT for the same reason many people do. He saw himself as one of those whose lives were changed by higher education. His father was a tool-and-die maker. His mother was a waitress. He grew up in the Chicago area and majored in psychology and biology at Roosevelt University while working in the mailrooms of Marshall Field's and the Universal Atlas Cement Company. At Columbia, where he earned his doctorate in experimental psychology in 1964, he supported himself with typing jobs and work in the architecture-department office. He was the first person in his family to go to college, much less graduate school.

At Columbia he became interested in how the structure of language in passages heard or read influenced thought and perception. He worked briefly at a Washington research firm after getting his doctorate. Then, in 1967, the eminent cognitive psychologist John B. Carroll lured him to ETS. Freedle enjoyed years of pure research on short-term memory, but when grants for such work became hard to get, he was happy to try his hand at more-practical projects.

He began to analyze questions on the Test of English as a Foreign Language, an exam for students from abroad who want to qualify for places at American universities. He found that by analyzing various linguistic aspects of the questions—word order, word placement, interrogative style—he could predict which ones test takers in Seoul or Shanghai or Sarajevo would find easy and which would make them chew their pencils and look at the clock.

Freedle identified seven factors that seemed to affect the difficulty of a test question. One of the most common was simple word repetition. If an answer had a word in it that also appeared in the question, more test takers chose it; thus a question was easier if it contained a word that recurred in the correct answer, and harder if it contained a word that recurred in an incorrect answer. In reading-comprehension questions another factor was where in a reading passage the key part appeared. If it was at the beginning or the end of the passage, the question was easier than if it was in the middle.

Then he decided he would apply his analysis to ETS's biggest test. He realized that if he looked at how different ethnic groups reacted to the seven factors, he might help to improve the SAT and impress his supervisors, who indicated that rooting out bias in the test was a priority.

Freedle was excited by a new technique called differential item functioning, or DIF (pronounced diff). To see which questions on the verbal section of the SAT produced different results by race, test takers were first divided into groups by score: those who had scored 200, the lowest possible; those who had scored 210; those who had scored 220; and on up to 800, the top score. Each of those scoring groups were examined to see how people of different ethnicities had done on each item of the test.

Using DIF, Freedle began to notice some intriguing results, which turned out to have more to do with word choice than with the seven difficulty factors. At each level of ability, but particularly in the lower-scoring groups, white students on average did better than blacks on the easier items, whereas blacks on average did better than whites on the harder ones. (Whites, though, did better overall.)

That was not a result that Freedle's supervisors expected. In 1987 he handed in a draft of a report on the subject that he had done with Irene Kostin, an ETS colleague. The research-division chiefs asked for a revision. He handed in a second draft. They were still not satisfied. Some of the questions were legitimate, Freedle thought. His conclusions contradicted other research. The chiefs wanted him to look at the data from other angles. But each re-examination confirmed the initial results. By the time he was ordered to do an eleventh revision, Freedle had begun to wonder if ETS, in its scholarly way, was trying to discourage him from pursuing his rogue conclusion.

