The Bias Question

In a surprising challenge to the SAT's reputation as an unbiased measure of student learning, one researcher has argued that blacks do better than matched-ability whites on the harder questions of the SAT—something he believes their scores should reflect

Although Freedle's analysis concentrated on the verbal section of the SAT, he argued that the difficulty-bias effect extended to the math section, and perhaps even to essay questions such as those found on AP exams. He cited research showing minority students doing better than non-Hispanic whites on harder math items, which he attributed to the fact that those items used more textbooklike language and "more abstract concepts learned strictly in the classroom." Minority students scored worse on the easier math items, just as they did on easier verbal items, because commoner words were used in those questions. Freedle said that an examination of essay test results showed a somewhat similar effect, whereby minority students scored better on harder topics than they did on easier ones.

He believed that a supplemental SAT score was the best solution. He emphasized that he was not out to destroy the most important product of ETS and the College Board. He was not proposing a completely new SAT. He liked the adjustments the College Board was making to the test, including the thirty-minute essay section that would be added in 2005. He wanted his R-SAT scores to be sent to colleges as a bonus, to help them identify students, mostly lower-income students of all races, whose SAT scores suffered because of the distance between the language of their families and neighborhoods and that of middle-class America.

He did not try to estimate how many students would benefit from this additional information, but he thought it would be enough to make the supplement worthwhile. Freedle found one African-American student (Freedle's data gave no names) whose verbal R-SAT score was 600 although his or her original verbal SAT score was only 290. "This student's gain score is 310 points—an astonishingly large reassessment of his/her scholastic skills," Freedle wrote. Potentially thousands of students would score 100 to 200 points higher on the R-SAT than on the SAT; that higher score could mean the difference between getting into a selective college and not. Such increases in their scores might also make them eligible for thousands of dollars in scholarships.

Freedle confessed that he did not have enough data to analyze the current form of the SAT, which differs in some ways from the SAT he had analyzed while still working at ETS. But he saw no reason why an R-SAT score wouldn't still benefit some minority students. Nor did he think that the SAT changes to take effect in 2005, including the removal of the analogies, would be enough to eliminate the need for an R-SAT, since the gap between easy words and hard words—or the gap between what people learn at home and what they learn at school—affected other parts of the test. He hoped that ETS would give his analysis the follow-up it deserved: a rigorous testing of its validity and predictive value. "The expense is truly minimal, the moral obligation maximal," he concluded.

When Drew Gitomer read the Review article, he felt not only that Freedle's conclusion was wrong but that his analysis was nonsense. It was based on snippets of old data and seemed to put great weight on correct answers that could be explained as random guesses. He asked some of his staff members to look at it, and then thought about what he should do. He helped the College Board to post a quick Web-site response, which criticized Freedle's paper as "flawed" and "misleading," and organized his staff to produce something longer and more complete, but still he worried about Freedle's use of SAT data, which he thought might be College Board property.

Gitomer, like Freedle, was a cognitive psychologist committed to the values of science. He had arrived at ETS in 1985 with a keen interest in finding better ways of training people for complex tasks and in creating alternatives to fill-in-the-box assessment tools like the SAT. Five years later, not yet thirty-five, he won the company's annual Scientist Award. But in 1999 Gitomer agreed to be the top administrator of the research division. So when Freedle's article appeared, the job of damage control fell to him. The ETS executive had promised himself that he would not get angry on the phone. He would not discuss the many large holes he saw in the article. He just wanted to be able to answer the questions he anticipated from the College Board: Had Freedle used some of its data that he was not authorized to have? Were more such articles on the way?

The pleasantries did not take long. Gitomer got to the point. "I have read your paper in the Harvard Educational Review," he said. "Where did you get the data?"

"That's the wrong question," Freedle replied. He wanted Gitomer to see the benefits of taking his work seriously. "You should view this, and the College Board should view this, as a positive development, and I really mean it. I have solved a significant problem for you."

Gitomer could see that he was not going to get an answer, but he tried again. "Well, let me ask you, then," he said. "How else did you get the data?"

"Again, that is the wrong question," Freedle said. "I think you must use this as a positive thing." He was not going to talk about people who might have shared information with him, and Gitomer was not interested in pushing him further. He asked if Freedle had any more articles in the works. Freedle said no. Relieved, Gitomer said good-bye and hung up.

Gitomer wrote a note to himself on the top of the first page of Freedle's article: "Call Atkinson." This was a reference to Richard C. Atkinson, the president of the University of California system. Atkinson had become the fulcrum of the SAT debate. Like Freedle and Gitomer, he was a cognitive psychologist who understood how the SAT was constructed. Parts of the test he did not like. It had too many psychometric tricks (in his view, the analogies were the worst) that forced college applicants to take expensive SAT-prep courses in order to decipher what Atkinson thought were irrelevancies—for example, is "entomology to insects" more like "agriculture to cows" or "pedagogy to education"? So, wielding his power, the U.C. president had persuaded the College Board to junk the analogies, add more-advanced math, and create the writing section for the 2005 version. He did this as a way of making the test fairer for all students. And Gitomer knew that he was likely to be intrigued by any suggestion that certain questions put minority students at a disadvantage.

Freedle realized the same thing, and decided to call Atkinson himself. His call was returned by Patrick S. Hayashi, the associate president of the U.C. system. After a friendly chat Hayashi suggested that Freedle talk with Saul Geiser, the director of research in Atkinson's office. Geiser told Freedle that U.C. planned to do just the kind of serious analysis of the latest data that Freedle had hoped ETS and the College Board would do.

As it happened, Gitomer did not call Atkinson, but a few days later he spoke to Mark Wilson, a Berkeley psychometrician who had been asked by U.C. to analyze Freedle's report. Gitomer told Wilson that he welcomed the U.C. study, because Freedle had raised serious issues—"no matter how scientifically bankrupt I believe they are." Wayne Camara, of the College Board, said that he, too, was happy to cooperate. However, the College Board's Web-site response to Freedle's article seemed to imply that any study of Freedle's results would be largely a waste of time.

Let us look briefly at the data for the so-called SAT-R Section that Freedle recommends. On the difficult items that are included in the SAT-R, African-American candidates receive an average score of 22 percent out of a perfect score of 100 percent. Since there are five answer options for each question, 22 percent is only slightly above what would be expected from random guessing, namely 20 percent. White candidates do somewhat better, achieving an average score of 31 percent. The results indicate that this test is too hard for either group and would be a frustrating experience for most students. There are simply too many questions that are geared to those with a much higher level of knowledge and skill than is required of college freshmen. Extending Freedle's argument, we could substantially reduce all group differences if the test were made significantly more difficult so that all examinees would have to guess the answers to nearly all of the questions. We could then predict that each subgroup would have to have an average of 20 percent of their answers correct, based on chance.

Freedle's response to this, in a draft memo that he never sent to the College Board, was "Shame on you." He said he had done a statistical analysis of the five choices in the questions studied and found that the students' picks did not seem to be random at all; taking his study further would not be adding error to error. Some independent experts dismissed the Freedle piece. "I was unimpressed," says Robert Linn, a University of Colorado professor and a former president of the American Educational Research Association. "I don't think there is much there." But some found it valuable. Robert Calfee, a retired Stanford cognitive psychologist who now serves as the dean of the education school at U.C. Riverside, said he found Freedle's article "very convincing and to me very understandable." His analysis of the effect of different language cultures on test results, Calfee said, seemed to mirror other research on the powerful differences between formal language learned at school and informal (often called natural) language learned at home. Michael T. Brown, a professor of education at U.C. Santa Barbara, called the article "a competently performed work, thought-provoking, and sensitive with respect to the issues of equity."

So while the University of California pursues its study, Freedle hopes that he is finally getting someplace after having his most provocative work stuffed into ETS file drawers. Nearly all those involved—ETS and College Board officials, University of California researchers, high school guidance counselors and admissions officers from those schools that would be affected by a change in the SAT—are, like Freedle, practical people with a seemingly distant but still compelling goal. They want to remove barriers that limit young people's choices in life. All of them, Freedle included, acknowledge that many other things, more difficult than devising a scoring supplement to a multiple-choice test, will have to be done to make that happen.

Jump to comments
Presented by
Get Today's Top Stories in Your Inbox (preview)

Why Do People Love Times Square?

A filmmaker asks New Yorkers and tourists about the allure of Broadway's iconic plaza


Join the Discussion

After you comment, click Post. If you’re not already logged in you will be asked to log in or register. blog comments powered by Disqus

Video

Why Do People Love Times Square?

A filmmaker asks New Yorkers and tourists about the allure of Broadway's iconic plaza

Video

A Time-Lapse of Alaska's Northern Lights

The beauty of aurora borealis, as seen from America's last frontier

Video

What Do You Wish You Learned in College?

Ivy League academics reveal their undergrad regrets

Video

Famous Movies, Reimagined

From Apocalypse Now to The Lord of the Rings, this clever video puts a new spin on Hollywood's greatest hits.

Video

What Is a City?

Cities are like nothing else on Earth.

Writers

Up
Down

More in National

More back issues, Sept 1995 to present.

Just In