Skip Navigation
Megan McArdle

Megan McArdle - Megan McArdle is a senior editor for The Atlantic who writes about business and economics. She has worked at three start-ups, a consulting firm, an investment bank, a disaster recovery firm at Ground Zero, and The Economist. More

Megan was born and raised on the Upper West Side of Manhattan, and yes, she does enjoy her lattes, as well as the occasional extra-dry skim-milk cappuccino. Her checkered work history includes three start-ups, four years as a technology project manager for a boutique consulting firm, a summer as an associate at an investment bank, and a year spent as sort of an executive copy girl for one of the disaster-recovery firms at Ground Zero … all before the age of 30.

While working at Ground Zero, Megan started Live From the WTC, a blog focused on economics, business, and cooking. She may or may not have been the first major economics blogger, depending on whether we are allowed to throw outlying variables such as Brad Delong out of the set. From there it was but a few steps down the slippery slope to freelance journalism. She has worked in various capacities for The Economist, where she wrote about economics and oversaw the founding of Free Exchange, the magazine's economics blog. She has also maintained her own blog, Asymmetrical Information, which moved to The Atlantic, along with its owner, in August 2007.

Megan holds a bachelor's degree in English literature from the University of Pennsylvania and an M.B.A. from the University of Chicago. After a lifetime as a New Yorker, she now resides in northwest Washington, D.C., where she is still trying to figure out what one does with an apartment larger than 400 square feet.

How to lie (to yourself) with statistics

By Megan McArdle
Feb 26 2008, 6:18 AM ET Comment

William Briggs has a nice piece on how easy it is to delude yourself into thinking you've found a connection between two factors:

To show you how easy it is to mislead yourself with stepwise procedures, I did the following simulation. I generated 100 observations for y’s and 50 x’s (each of 100 observations of course). All of the observations were just made up numbers, each giving no information about the other. There are no relationships between the x’s and the y2. The computer, then, should tell me that the best model is no model at all.

But here is what it found: the stepwise procedure gave me a best combination model with 7 out of the original 50 x’s. But only 4 of those x’s met the usually criterion for being kept in a model (explained below), so my final model is this one:

explan. p-value Pr(beta x| data)>0
x7 0.0053 0.991
x21 0.046 0.976
x27 0.00045 0.996
x43 0.0063 0.996

In classical statistics, an explanatory variable is kept in the model if it has a p-value< 0.05. In Bayesian statistics, an explanatory variable is kept in the model when the probability of that variable (well, of its coefficient being non-zero) is larger than, say, 0.90. Don't worry if you don't understand what any of that means---just know this: this model would pass any test, classical or modern, as being good. The model even had an adjusted R2 of 0.26, which is considered excellent in many fields (like marketing or sociology; R2 is a number between 0 and 1, higher numbers are better).

Nobody, or very very few, would notice that this model is completely made up. The reason is that, in real life, each of these x’s would have a name attached to it. If, for example, y was the amount spent on travel in a year, then some x’s might be x7=”married or not”, x21=”number of kids”, and so on. It is just too easy to concoct a reasonable story after the fact to say, “Of course, x7 should be in the model: after all, married people take vacations differently than do single people.” You might even then go on to publish a paper in the Journal of Hospitality Trends showing “statistically significant” relationships between being married and travel model spent.

And you would be believed.

I wouldn’t believe you, however, until you showed me how your model performed on a set of new data, say from next year’s travel figures. But this is so rarely done that I have yet to run across an example of it. When was the last time anybody read an article in a sociological, psychological, etc., journal in which truly independent data is used to show how a previously built model performed well or failed? If any of my readers have seen this, please drop me a note: you will have made the equivalent of a cryptozoological find.

Incidentally, generating these spurious models is effortless. I didn’t go through 100s of simulations to find one that looked especially misleading. I did just one simulation. Using this stepwise procedure practically guarantees that you will find a “statistically significant” yet spurious model.

This sort of thing is why we're barraged with studies showing that almost everything will kill you--no, wait! they'll make you live forever!



Presented by

More at The Atlantic

Using the Internet as Matchmaker: The Drawbacks to Online Dating Internet as Matchmaker: The Drawbacks to Online Dating
Whitney Houston Has Died Whitney Houston's Greatest Hits
Will the Grammys Remain as Bizarre as Always This Year? Our Predictions for 'Music's Biggest Night'
translating the Bible—Into an E-Book That Works on Any Phone Translating the Bible—Into an E-Book That Works on Any Phone
The Myth of Energy Independence: Why We Can't Drill Our Way to Oil Autonomy The Myth of Energy Independence

Join the Discussion

After you comment, click Post. If you’re not already logged in you will be asked to log in or register.
blog comments powered by Disqus
Special Report
Submit Your Photos of America at Work AP Submit Your Photos of America at Work
Send us your images of friends, family, and neighbors on the job. We'll publish the best. Read more ›
View All Correspondents

The Biggest Story in Photos

The Civil War, Part 3: The Stereographs

Feb 10, 2012

Subscribe Now

SAVE 59%! 10 issues JUST $2.45 PER COPY

Facebook

Newsletters

Sign up to receive our free newsletters

(sample)

(sample)

(sample)

(sample)

Megan McArdle
from the Magazine

Why Companies Fail

GM’s stock price has sunk by a third since its IPO. Why is corporate turnaround so difficult…

The Graduates

Busted banking careers, crashed consultants, and shrunken incomes: the author attends her 10-year…

Romney’s Business

The Republican contender touts his business experience—but does it really matter?