To understand what the economists are arguing about, you need to understand their methods. When economists compare wages, they must begin by deciding which wages to compare. You wouldn’t prove much, one way or the other, by studying wages of Miami-area stockbrokers before and after the Mariel migration: The Marielitos did not compete with stockbrokers.
Well, who should the Marielitos be compared with?
You want to compare them to similarly situated workers. But that creates two statistical problems.
The first is: How do you define a similarly situated worker?
The second is: Remember, all this happened some time in the past. The only way we know what anybody was making in 1980 is by looking at Department of Labor samples, often collected for other reasons. Suppose we wanted to say something about Miami-area stockbrokers in 1980. The Department of Labor would have an average wage—but that average would be aggregated from a certain number of persons chosen to answer a government questionnaire. If we wanted to know about women stockbrokers, the number of answers would be smaller, and if we wanted to know about women stockbrokers in their 40s, it would be smaller again.
When Borjas did his work on the Mariel Cubans, he defined a “similarly situated worker” quite precisely. He counted only men. He counted only native-born workers. And he counted only workers who’d dropped out of high school. That meant he was looking at the wages of only about two dozen people. He tried to compensate by looking at that small control group over three different year periods … but still, a small control group times three remains a small control group.
So that’s a problem.
But now look at what Peri and Yasenov did to make their control groups bigger. They included women. They included other recent Hispanic immigrants. And instead of counting only high school “dropouts,” they included everyone in the Department of Labor samples who had not yet finished high school—including people still currently enrolled in school!
That generated a big sample all right, but a big, worthless sample.
Men and women have different labor-market experiences. Would you expect an influx of men without high-school diplomas to affect the wages of nannies?
Inserting other immigrants into the control group was also distorting, in work intended to discern the effects of immigration on wages. It might, conceivably, have led to comparing some people who are driving wages down to other people who are also driving wages down.
And as for treating people who have not yet completed high school as the equivalent of high-school dropouts—that’s the most intensely dubious comparison of all.
Data mining is indeed bad. But this kind of data dredging seems far, far worse. Yet data dredging on an industrial scale seems to be the only way to rescue the Card paper from the withering criticism Borjas has offered. That’s not very reassuring from an academic point of view. And if the most important immigration-doesn’t-hurt-the-unskilled research of the past quarter-century must be rejected as hopelessly contaminated by its own sampling errors, then what is left? It’s famously said that economic science represents the triumph of pure reason over common sense. But in this case, what has triumphed over common sense is not reason, but massaged and manipulated data.