The more important historically of the two trials that showed significant benefits is known as the HIP trial, after the Health Insurance Plan of Greater New York, the health-maintenance organization from which the researchers gathered participants. A classic of medical research, the HIP trial enrolled 60,696 patients from 1963 to 1966 and followed them for as long as eighteen years. In 1988 two of its principal investigators, Sam Shapiro and Philip Strax, won the Charles F. Kettering Prize for outstanding cancer research, both because the trial was one of the earliest large-scale attempts to test a preventive health measure and because it showed that periodic mammography reduced breast-cancer mortality by 30 percent over a ten-year period.
Notwithstanding the accolades awarded to this pioneering effort, it suffered from methodological weaknesses. In clinical trials, as I have said, the subjects should be randomly distributed between the test group (in this case women who received annual mammograms and clinical breast examinations) and the control group (women who received ordinary medical care from their own doctors). Experimenters always fear that they may introduce bias by inadvertently including or excluding the wrong people. The HIP trial demonstrates why they worry.
In this experiment the researchers assigned women to the test and control groups alternately in order of enlisting, a process intended to produce groups of equal size and composition. After assigning a woman to the test group the researchers made two simple determinations: Was she pregnant? Had she had breast cancer?If the answer to either of these was yes, Shapiro and Strax made the woman ineligible for the test group. They did not, of course, want to x-ray pregnant women. Nor did they think it a good idea to include women with breast cancer, because administering mammograms to women who had already been treated for the disease would not be a proper test of mammography for the purposes of the study. The test group of 30,131 women thus wound up with 434 fewer subjects than the control group of 30,565 women.
Shapiro and Strax gave the women in the test group annual mammograms and breast examinations for as long as five years. They then counted up the number in each group who died of breast cancer within ten years and compared the two figures. Because 147 died in the test group and 193 died in the control group, Shapiro and Strax concluded that mammography was beneficial. But was it?The researchers themselves attributed the "higher proportion"of the benefit, at least in the first five years, to clinical breast examination—that is, palpation. In order to be certain that the remaining benefit was due to mammography, one would have to confirm that excluding from the test group the 434 women who either were pregnant or had had breast cancer did not skew the results.
The ages of the women in the HIP study ranged from forty to sixty-four. Because few women in this age range are pregnant, most of the subjects excluded from the test group were surely dropped for having had breast cancer. In order to guarantee that the test and control groups were identical, the investigators would also have had to identify and exclude all the women in the control group who had breast cancer when the experiment began, by asking the same two questions they had asked women assigned to the test group. But they did not do this—they simply filed the names of the women in the control group, without performing any initial evaluation. (This was not unusual for control groups of the time.) As the trial progressed, the team leaders later wrote, women in the control group who "were identified through other sources as having had breast cancer diagnosed before their entry dates . . . were dropped from the investigation."In other words, the researchers tracked them by looking for their names in medical records, insurance claims, and death records; if they found evidence that breast cancer had been diagnosed in a member of the control group prior to 1963, when the study began, the scientists retroactively dropped her from the study. This meant that the researchers sometimes had to ascertain the time of diagnosis by finding very old records in scattered hospitals or asking family members about events many years in the past.
For my own research I have attempted to document medical histories, and can report that it is no easy task. Records disappear; memories are faulty; people move. More than half the subjects in the HIP study left that health-maintenance organization within fifteen years. (In one trial I myself inadvertently transmitted incorrect data to its statistical administrators: I reported that a deceased patient was alive, because a busy doctor's office taking part in the trial was unaware of the death. When this omission was discovered, I asked the National Cancer Institute to examine the body of research data Ihad supervised, and I was cleared of any wrongdoing.) In my experience people typically underestimate the time that has elapsed since a trip to the hospital. Informed of the real date, they say, "That long ago? I can't believe that much time has gone by!" In those instances when Shapiro and Strax relied on the memories of patients and relatives, they almost certainly retained in the control group some cancer cases that had actually been diagnosed before the trial began—inadvertently stacking the deck in favor of mammography.
A few such slips would be enough to throw off the entire experiment. Indeed, if it turned out that Shapiro and Strax had ascribed a mistakenly late date of diagnosis to as few as twenty-five women in the control group, the failure to exclude them, too, would have changed the study's conclusions. Correcting for it would cause the benefit of mammography to lose statistical significance—the touchstone of medical research. Twenty-five errors is in this case not a big number; it is equivalent to about six percent of the 434 patients excluded from the test group. To be sure, the trial may have been error-free; although they had not initially excluded from the control group women who had had breast cancer, Shapiro and Strax wrote that "confidence is warranted" that by the trial's end they had identified prior breast-cancer cases equally well in both groups. Still, the vulnerability of the conclusions to such a small error is troubling.
No such methodological worries seem to afflict the second of the trials to show a statistically significant advantage from mammography: the Kopparberg study, named after the county in southern Sweden where it took place. Beginning in 1977 the Kopparberg trial offered mammography to a test group of 38,562 women and ordinary medical care to a control group of 18,478 women. At the same time, the researchers performed a second big trial in another Swedish county. The test group in Kopparberg experienced a statistically significant reduction in breast-cancer mortality of 40 percent; the mortality reduction in the test group in the other county was statistically insignificant.
Strangely, though, as the researchers acknowledged, the mortality from all causes in the two test groups was "identical" to that in the control groups. In 1988 the late Petr Skrabanek, of Trinity College, Dublin, pointed out in the British Medical Journal, "Not a single life was 'saved' in a trial that included over 130,000 women" in both counties. The women who underwent mammography may have died less frequently of cancer, but the gain was offset by deaths from other causes, such as heart attack. Presumably the "extra" deaths reflect the workings of chance. But it is awkward to argue that a decrease in cancer deaths in the treatment groups is meaningful while claiming that an equally great increase in deaths from other causes is a fluke.
One of the most recent clinical trials discussed in the Cancer article took place in Canada, where a team of physicians gave annual mammograms to 44,925 women and ordinary medical care to 44,910 women. The researchers enrolled women in the trial from 1980 to 1985 and followed their progress for a minimum of seven years. The subjects were divided by age into two subgroups: those who were forty to forty-nine when they entered the trial, and those who were fifty to fifty-nine. In neither subgroup was there an overall difference in mortality from breast cancer between the treatment and control groups—mammography had no effect.
Mammography supporters immediately dismissed these negative results as obvious signs of faulty equipment, poor training, or flawed experimental technique. Typical was the reaction of Charles R. Smart, then the director of the Division of Cancer Prevention and Control at the National Cancer Institute. Without presenting any supporting evidence, Smart dismissed a preliminary report from the Canadian researchers by writing in the journal Cancer Prevention, in 1990, "The lack of a decrease in mortality [in older women] suggests problems in the quality of the mammography in this trial."
Others took the Canadian trial more seriously. In a series of steps that sowed confusion in many women, the National Cancer Institute and the American Cancer Society reviewed all the available evidence about mammography, especially for younger women. In October of 1993 the NCI reported that mammography provided no certain benefit to women under fifty but some benefit to their elders. The American Cancer Society continued to endorse routine mammography for all women over forty. The dispute sparked by the inability of the Canadian study to find any benefit from routine mammography became bitter and personal. Angry critiques poured into the journals, and the Canadian researchers defended themselves with equal vigor. Contradictory editorials abounded.
The attacks and counterattacks clearly demonstrated how hard it has been to prove unequivocally that mammography has a strong beneficial effect on women's lives. Trying to resolve the controversy, several research teams employed a technique called meta-analysis. Roughly speaking, meta-analysis involves adding together the data from many clinical trials to create a single pool of data big enough to eliminate much of the statistical uncertainty that plagues individual trials. It is accomplished by gathering all available studies and comparing them one at a time with the "null hypothesis"—in this case the hypothesis that mammography has no impact whatever on mortality from breast cancer. If the null hypothesis is true, the series of comparisons should randomly differ from zero; added together, the chance variations will cancel one another out. If the studies consistently find an impact, the comparisons will draw the total away from the null hypothesis and toward the actual effect. The great virtue of meta-analysis is that clear findings can emerge from a group of studies whose findings are scattered all over the map.
In January of last year a team led by Karla Kerlikowse, of the University of California at San Francisco, published in the Journal of the American Medical Association the results of a meta-analysis of the eight trials: mammography reduces the seven-to-nine-year mortality from breast cancer in women aged fifty to seventy-four by about 23 percent, but it has no impact on women in their forties. A second meta-analysis of mammography for women in their forties appeared three months later in Cancer (the summary referred to above; its authors included Charles R. Smart, now retired). Including somewhat more recent data from the same trials, with an average follow-up time of 10.4 years, the Cancer article concluded that mammography in fact lowered the rate of mortality from breast cancer in women aged forty to forty-nine by about 16 percent. Indeed, these researchers argued, the true benefit was likely to be greater than that. First, the technology of mammograms is constantly improving. Second, not all the women in the groups scheduled to receive mammograms actually showed up for their examinations. Finally, Smart and his associates presented an argument for eliminating from consideration the results of the Canadian trial, because it had what they regarded as worrisome problems. For instance, almost four times as many advanced cancers were diagnosed in the women who had mammograms as in the women in the control group—a disproportionately high number of very dangerous tumors, which in the critics' view makes the experiment unrepresentative. If data from the Canadian trial were discarded, the researchers calculated, mammography would lower the rate of mortality from breast cancer for women in their forties by 24 percent.
Do the meta-analyses settle the matter? Yes and no. Even if one accepts the highest values from these overviews for the risk reduction associated with mammography—23 percent for women over fifty, 24 percent for women in their forties—the figures do not mean what people think when they read headlines about them. The percentages refer to the relative risk reduction—a statistical measurement calculated by dividing the difference between the risks in the test and control groups by the risk in the test group. For example, if a clinical trial shows that a treatment cuts the risk of dying from a disease from 70 percent in the control group to 50 percent in the test group, the relative risk reduction is 70 minus 50, or 20, divided by 50, which works out to 40 percent. This percentage sounds large, and it is of great import to medical researchers. But it has little to do with the question of interest to individual women—the absolute difference in risk between those who are screened and those who are not. Using the example I just gave, that difference would be 20 percent—half the relative risk reduction. In other words, the figure from the meta-analysis is the answer to the question "Given that I have breast cancer, how much will I have cut my risk of dying if the tumor was detected mammographically?" It is not the answer to the question "If I am a typical woman, how much will I cut my risk of dying from breast cancer by having an annual mammogram?"
Unfortunately, in this case the absolute reduction is much smaller than the relative reduction. According to a rough calculation described by Russell Harris and Linda Leininger, of the University of North Carolina at Chapel Hill, in the Annals of Internal Medicine in April of last year, annual mammographic screening for 10,000 women aged fifty to seventy will extend the lives of only two to six of them each year. ("The many must be screened to benefit the few," Harris and Leininger remarked.) For younger women, they argued, the benefit is even more meager: annual screening of 10,000 women in their forties will extend the lives of just one or two a year. As Harris and Leininger observed, "the use of the term 'marginal' to describe this risk reduction seems justified."
Even this small benefit may be more apparent than real. Almost all breast-cancer experts agree that mammography, which can diagnose smaller tumors, picks up some slow-growing cases of cancer that might otherwise never be caught. If these tumors grow sufficiently slowly, they will rarely become dangerous in a patient's lifetime. Discovering them will thus manufacture an apparent excess of "cures." Because we will detect the same number of big, dangerous, fast-growing tumors, the "cures" of slow-growing cancers will appear statistically only a number of years after diagnosis. Even if mammography had no actual effect on mortality, it would still produce a small statistical increase in survival many years down the pike. In a clinical trial the test group, with its frequent examination by mammography, would have a greater number of less-dangerous cancers diagnosed within it than would the control group—a form of length bias that would lead predictably to the modest prophylactic effect observed in the meta-analyses.
Society should ensure that this effect is worth the cost of obtaining it, which includes both the direct cost of mammography itself and the indirect cost of biopsies, laboratory analyses, and the time women must take off from work for checkups. (The emotional costs of the huge number of false positives are substantial too, but cannot be reckoned by this kind of accounting.) Charles Wright and C. Barber Mueller calculated in their Lancet article what they called a "low" estimate for the cost of mammography: $1.2 million for each woman benefited. Two previous cost-benefit analyses, they noted, produced comparable figures.
Now contrast the cost of mammography with that of another widely used cancer-screening technique: the Pap smear for cervical cancer. Named after George Papanicolaou, the physician who developed it in the 1930s, the test is less expensive than mammography, simpler to perform, and far more reliable. Because the Pap smear can detect cervical cancer in its long latent stage, before the cancer invades surrounding tissue, the test is widely believed to reduce the mortality of invasive cervical cancer by 90 percent. According to a study published in 1990 in the Annals of Internal Medicine by David Eddy, of the Duke University Center for Health Policy Research and Education, screening 10,000 women with a Pap smear every three years from their twenties to their seventies would prevent about 200 of them from developing invasive cervical cancer; if each detected cervical cancer translated into an additional ten years of life, the cost to society would be approximately $150,000 per woman benefited. Eddy's calculation cannot be directly compared with that of Wright and Mueller, because it used a different methodology. Nonetheless, it is clear that Pap smears provide much more benefit than mammograms, at a small fraction of the cost.
Is mammography worth it? I would argue, with Wright and Mueller, that "the benefits of mass population screening, even in older women, are too small and the harm and cost generated too great to justify widespread implementation of screening mammography." In fact, the authors suggest that routine mammography should be recommended only for women at high risk of developing breast cancer, such as those whose mothers or sisters developed breast cancer early in life. There is little factual basis for this plausible-sounding suggestion, though.
A similar radical stance was adopted by Michael Baum, the research director of the British Institute of Cancer Research, who quit England's national breast-cancer-screening advisory board last September because nationwide mammography is "not worth doing." Having helped to set up the country's screening program, he was disturbed by claims about the effectiveness of mammography. The London Sunday Times quoted Baum as saying, "There is a political correctness about screening. I took pride in setting up the service, which is as efficient as it can be, but just because you are doing something efficiently, it doesn't mean it is worth doing."
Abandoning widespread mammography in the United States is probably not feasible. After years of effort invested in encouraging mammography, to reverse course would cause widespread confusion and anger. Alarmed by the contradictory recommendations of "experts," women would probably keep having mammograms, just to be safe—and in their shoes I might do the same. Moreover, many physicians believe that routine mammography encourages women to come in for regular checkups, and thus may play an important role in general preventive medicine.
On balance, then, I reluctantly support the status quo. When my patients come in for their mammograms, I do not try to dissuade them. But I tell them that the most optimistic interpretation of the available evidence suggests that routine mammography has only a marginal effect on a woman's chances of surviving breast cancer—and that it may have no effect at all.