A school superintendent allowing his staff to doctor students' answers on a set of high-stakes standardized exams has something in common with a corporate CEO holding a bundle of stock options who practices "earnings management" via bogus asset sales. Each is responding to an intense incentive system by faking success rather than producing it.
One could formulate this as a general principle: any incentive to create a result also creates an incentive to simulate the same result. The corollary is obvious: the greater the incentive, the greater the temptation. Or, as W. C. Fields put it in You Can't Cheat an Honest Man, "If a thing is worth winning, it's worth cheating for." Borrowing Fields's real name, I propose to call this generalization Dukenfield's Law of Incentive Management. Designers of control systems ignore Dukenfield's Law at their peril, and ours.
A second corollary follows directly from the first: holding the level of audit effort constant and other things equal, the reliability of a measure will decline as the importance attached to it grows. To put the same thing another way: to maintain a given level of reliability, the resources invested in verifying any performance measure need to rise roughly in proportion to the stakes involved
Yet audit and other counter-simulation systems are typically treated as afterthoughts in the design of incentive management systems. The school accountability movement is a good example here. There are many ways of cheating on standardized tests other than doctoring the answer keys or even using questions from the test in class exercises. Simulation strategies come in a wide range of subtleties, and no doubt all of them are being used.
Unless we're literally training children to answer examinations, all school tests are merely proxies for things we really care about. It isn't hard to find ways of producing proxy results instead of real ones, for example by drilling students in four-term verbal analogies [Apple is to pear is catfish is to: 1) cat 2) salmon 3) fish 4) seafood 5) none of the above.] The ability to solve such puzzles quickly (and not too quirkily) isn't a bad proxy measure for a certain kind of reasoning and interpretive skill, but it's hardly valuable enough to rate the hour a week it took out of my 11th-grade English class. The goal back then was to fool the SAT test to get students into good colleges, rather than fooling the state to get raises for teachers, but the strategy was the same.
Test results at the level of the school can also be influenced by managing the population tested; if all the worst students transfer to other schools, the average score will surely go up. For better or worse, expulsion has been made difficult, but there are other ways - incentive-based ways, many of them - to induce the weaker players to leave the game.
At the other extreme of subtlety, dropping art and music (or even reducing hours spent on science and history) and substituting more hours of reading can be thought of either as cheating (by degrading a set of valued characteristics that the tests don't happen to measure) or as simply responding as intended to an incentive system designed to produce literacy at virtually all costs. In such cases, discussion of the simulation risks and counter-simulation strategies will require a discussion of just what it is that the incentive system is trying to produce, and therefore what it is that the tests are intended to measure.
Testing, and counter-simulation measures to back up testing, are, of course, overhead costs of education, as opposed to direct instructional costs. Many of those most enthusiastic about testing as a management tool in public education also advocate reducing spending on overhead items, and in particular central administration, to concentrate resources on direct instruction. Those two positions may not be in contradiction—there are, obviously, other categories of overhead—but they certainly are in tension.
My point is not that high-stakes standardized testing for educational management is good or bad, but rather that and discussion of that issue without a parallel discussion of simulation strategies and counter-simulation strategies is hopelessly inadequate. At minimum, each proposed measure has to pass the benefit-cost test of being worth more than the resources required to create and maintain a monitoring system good enough to keep result-simulation down to an acceptable level.
Criminalizing any form of cheating can increase the investigative resources available to detect it as well as the possible losses to cheaters if detected. But note criminalization is a two-edged sword: once someone has started to cheat, the penalty for getting caught is as much a part of the incentive for concealment as the reward of successful imposture.
As a result, the rule of proportionality between monitoring resources and the benefits of cheating applies in the criminal context as well as the civil one. Applications of this principle are sometimes easier to spot at a distance than they are up close.
For example, the absurdity of assigning Mexican policemen paid 2000 pesos a month to catch criminals moving billions of dollars' a year worth of drugs is obvious to the typical U.S. Congressman. But that same Congressman sees no problem in assigning an SEC lawyer, FBI agent, or Assistant United States Attorney paid $70,000 a year to catch corporate malefactors who can walk away with $100 million from a few year's successful practice of creative bookkeeping. The sources of weakness aren't identical - to my knowledge, there's no evidence of significant corruption in Federal law enforcement - but the principle that a monitoring system has to fight in its weight class remains valid.
People and organizations respond to incentives: imperfectly, it's true, but still they respond. That makes incentive management central, rather than peripheral, to all policy and management problems. But since all incentive systems generate results-simulation, and more powerful incentive systems generate results-simulation more powerfully, counter-simulation strategy should be central, rather than peripheral, to incentive-system design. In general, we should expect the costs of monitoring to rise along with the stakes created by the incentive system.
It turns out that the maxim "If you can't measure it, you can't manage it" expresses only half the truth. To manage one must be able, not only to measure, but to measure in the face of active impression management among those measured. Any proposal for an incentive system without explicit consideration of the simulation problem and how to deal with it should be presumed non-serious. If it's true that management without accountablity is just cheerleading, it's also true that creating big incentives without preventing results-simulation is just asking to be cheated.
"Trust your fellow man," says the old adage, "but always cut the cards."