Students were better than trained adult observers at evaluating teachers. This wasn’t because they were smarter but because they had months to form an opinion, as opposed to 30 minutes. And there were dozens of them, as opposed to a single principal. Even if one kid had a grudge against a teacher or just blew off the survey, his response alone couldn’t sway the average.
“There are some students, knuckleheads who will just mess the survey up and not take it seriously,” Ferguson says, “but they are very rare.” Students who don’t read the questions might give the same response to every item. But when Ferguson recently examined 199,000 surveys, he found that less than one-half of 1 percent of students did so in the first 10 questions. Kids, he believes, find the questions interesting, so they tend to pay attention. And the “right” answer is not always apparent, so even kids who want to skew the results would not necessarily know how to do it.
Even young children can evaluate their teachers with relative accuracy, to Kane’s surprise. In fact, the only thing that the researchers found to better predict a teacher’s test-score gains was … past test-score gains. But in addition to being loathed by teachers, those data are fickle. A teacher could be ranked as highly effective one year according to students’ test gains and as ineffective the next, partly because of changes in class makeup that have little to do with her own performance—say, getting assigned the school’s two biggest hooligans or meanest mean girls.
Student surveys, on the other hand, are far less volatile. Kids’ answers for a given teacher remained similar, Ferguson found, from class to class and from fall to spring. And more important, the questions led to revelations that test scores did not: Above and beyond academic skills, what was it really like to spend a year in this classroom? Did you work harder in this classroom than you did anywhere else? The answers to these questions matter to a student for years to come, long after she forgets the quadratic equation.
The survey did not ask Do you like your teacher? Is your teacher nice? This wasn’t a popularity contest. The survey mostly asked questions about what students saw, day in and day out.
Of the 36 items included in the Gates Foundation study, the five that most correlated with student learning were very straightforward:
1. Students in this class treat the teacher with respect.
2. My classmates behave the way my teacher wants them to.
3. Our class stays busy and doesn’t waste time.
4. In this class, we learn a lot almost every day.
5. In this class, we learn to correct our mistakes.
When Ferguson and Kane shared these five statements at conferences, teachers were surprised. They had typically thought it most important to care about kids, but what mattered more, according to the study, was whether teachers had control over the classroom and made it a challenging place to be. As most of us remember from our own school days, those two conditions did not always coexist: some teachers had high levels of control, but low levels of rigor.
After the initial Gates findings came out, in 2010, Ferguson’s survey gained statistical credibility. By then, the day-to-day work had been taken over by Cambridge Education, a for-profit consulting firm that helped school districts administer and analyze the survey. (Ferguson continues to receive a percentage of the profits from survey work.)
Suddenly, dozens of school districts wanted to try out the survey, either through Cambridge or on their own—partly because of federal incentives to evaluate teachers more rigorously, using multiple metrics. This past school year, Memphis became the first school system in the country to tie survey results to teachers’ annual reviews; surveys counted for 5 percent of a teacher’s evaluation. And that proportion may go up in the future. (Another 35 percent of the evaluation was tied to how much students’ test scores rose or fell, and 40 percent to classroom observations.) At the end of the year, some Memphis teachers were dismissed for low evaluation scores—but less than 2 percent of the faculty.
The New Teacher Project, a national nonprofit based in Brooklyn that recruits and trains new teachers, last school year used student surveys to evaluate 460 of its 1,006 teachers. “The advent of student feedback in teacher evaluations is among the most significant developments for education reform in the last decade,” says Timothy Daly, the organization’s president and a former teacher.
In Pittsburgh, all students took the survey last school year. The teachers union objects to any attempt to use the results in performance reviews, but education officials may do so anyway in the not-too-distant future. In Georgia, principals will consider student survey responses when they evaluate teachers this school year. In Chicago, starting in the fall of 2013, student survey results will count for 10 percent of a teacher’s evaluation.
No one knows whether the survey data will become less reliable as the stakes rise. (Memphis schools are currently studying their surveys to check for such distortions, with results expected later this year.) Kane thinks surveys should count for 20 to 30 percent of a teacher’s evaluations—enough for teachers and principals to take them seriously, but not enough to motivate teachers to pander to students or to cheat by, say, pressuring students to answer in a certain way.
Ferguson, for his part, is torn. He is wary of forcing anything on teachers—but he laments how rarely schools that try the surveys use the results in a systematic way to help teachers improve. On average over the past decade, only a third of teachers even clicked on the link sent to their e-mail inboxes to see the results. Presumably, more would click if the results affected their pay. For now, Ferguson urges schools to conduct the survey multiple times before making it count toward performance reviews.
As it happens, both Kane and Ferguson, like most university professors, are evaluated partly on student surveys. Their students’ opinions factor into salary discussions and promotion reviews, and those opinions are available to anyone enrolled in the schools where they teach. “I think most of my colleagues take it seriously—because the institution does,” Ferguson says. “Your desire not to be embarrassed definitely makes you pay attention.”
Still, Ferguson dreads reading those course evaluations. The scrutiny makes him uncomfortable, he admits, even though it can be helpful. Last year, one student suggested that he use a PowerPoint presentation so that he didn’t waste time writing material on the board. He took the advice, and it worked well. Some opinions, he flat-out ignores. “They say you didn’t talk about something,” he says, “and you know you talked about it 10 times.”
In fact, the best evidence for—and against—student surveys comes from their long history in universities. Decades of research indicate that the surveys are only as valuable as the questions they include, the care with which they are administered—and the professors’ reactions to them. Some studies have shown that students do indeed learn more in classes whose instructors get higher ratings; others have shown that professors inflate grades to get good reviews. So far, grades don’t seem to significantly influence responses to Ferguson’s survey: students who receive A’s rate teachers only about 10 percent higher than D students do, on average.
The most refreshing aspect of Ferguson’s survey might be that the results don’t change dramatically depending on students’ race or income. That is not the case with test data: nationwide, scores reliably rise (to varying degrees) depending on how white and affluent a school is. With surveys, the only effect of income may be the opposite one: Some evidence shows that kids with the most-educated parents give slightly lower scores to their teachers than their classmates do. Students’ expectations seemingly rise along with their family income (a phenomenon also seen in patient surveys in the health-care field). But overall, even in very diverse classes, kids tend to agree about what they see happening day after day.