# 5 Statistics Problems That Will Change The Way You See The World

Even a rudimentary look at probability can give new insights about how to interpret data. Simple thought experiments an can give new insight into the different ways misunderstanding of statistics can distort the way we perceive the world.

We've selected five classic problems solved in unconventional ways that can help one get a new way to understand the way that data can be misleading and the story on the surface can take people in the wrong direction.

(1) THE MONTY HALL PROBLEM

Say you're on a game show where there are three doors. Behind two of the doors, there are goats. Behind one of the doors, there is a brand new car.

The host says that once you pick a door, he'll open one of the doors you didn't pick to reveal a goat. Then, you have the option of either staying with your door or switching to the last unopened door.

Do you switch or stay?

This is actually based on a real game show, and the result has been the source of controversy for years.

Essentially, when you first made the selection, you had a one in three chance of correctly selecting the door that had a car behind it. Switching raised that probability to two in three that you'll select a car.

Said another way: A player whose strategy is to always switch will only lose when the door they initially selected has a car behind it. A contestant who selects either of the two doors with a goat behind it and then switches will always get the car.

Here's a final way to look at it, provided the contestant selected Door #1

Door 1       Door 2       Door 3      Result if Stay #1     Result if Switch

Car            Goat          Goat         Car                         Goat

Goat           Car           Goat         Goat                       Car

Goat          Goat           Car           Goat                      Car

Source: The Straight Dope

You run an office that employs 23 people. What is the probability that two of your employees have the same birthday? For the purposes of the problem, ignore February 29.

Once the population of an office hits 366 people, it's a certainty that two people in your office have the same birthday, since there are only 365 possible days of birth.

Still, assuming that each birth date (except February 29) is equally likely, it turns out that once your office has 57 people in it there is a 99% chance that two of them share a birthday. When there is 23 people, that probability is 50%.

Here's why. Instead of calculating the probability that two people share a birthday, instead calculate the converse, probability that two people don't share a birthday. Since these are mutually exclusive scenarios, first probability plus the second probability has to equal 1.

Here's how we figure this out, then.

Select two people in the office. The probability of the second person not sharing a birthday with the first is 364/365. The probability of the third person not sharing a birthday with the first or second is 363/365. Going through the office and multiplying these together, we see this:

365/365 x 364/365 x 363/365 x 362/365 x ... x 343/365 = 0.4927.

So, the probability that nobody in an office of 23 people share a birthday is 0.4927, or 49.3%. That means that the probability that two people in the office share a birthday is 1 -- 0.4927 = 0.5073, or 50.7%.

Source: Better Explained

(3) GAMBLER'S RUIN

A gambler has a certain amount of money ("B") and is playing a game of chance with some win probability less than 1. Every time he wins, he raises his stake to a certain fraction, 1/N, of his bankroll, where N is a positive number. The gambler doesn't reduce his stake when he loses

Every time he wins, he'll raise his stake to \$B/N, or his bankroll divided by N. When B= \$1000 and N=4, for example, he'll gamble \$250 each time going forward. Should he win, he'll raise it again. Should he lose, he'll keep his stake at \$250.

If he keeps at it, what are his expected winnings?

When it comes down to it, if our gambler bets 1/N of his bankroll each time and then maintains the amount as he loses, the gambler is N losing bets in a row away from bankruptcy.

Assuming that the player keeps on playing and there is some chance that the player can lose -- we are gambling, after all -- then the player remains N losing bets away from a broken bank each time.

If our gambler sounds like something of an idiot, know that this is actually a rather common betting strategy. Casinos also endorse it by ensuring that players are stocked with mostly high denominational chips as they go on winning streaks in order to encourage higher bets.

Even more, consider the ante in a game of poker, which is a similar system designed to accelerate a winner.

Source: University of California San Diego

(4) ABRAHAM WALD'S MEMO

Wikipedia

Abraham is tasked with reviewing damaged planes coming back from sorties over Germany in the Second World War. He has to review the damage of the planes to see which areas must be protected even more.

Abraham finds that the fuselage and fuel system of returned planes are much more likely to be damaged by bullets or flak than the engines. What should he recommend to his superiors?

ANSWER: PROTECT THE PARTS THAT DON'T HAVE DAMAGE

Abraham Wald, a member of the Statistical Research Group at the time, saw this problem and made an unconventional suggestion that saved countless lives.

Don't arm the places that sustained the most damage on planes that came back. By virtue of the fact that these planes came back, these parts of the planes can sustain damage.

If an essential part of the plane comes back consistently undamaged, like the engines in the previous example, that's probably because all the planes with shot-up engines don't make it back.

Wald's memos on this situation -- in addition to being a remarkable historical statistical document -- shed additional light of the statistics developed during the Second World War that would go on to found the field of Operations Research.

Source: Marc Mangel, Fransisco Samaniego

AP

A kidney study is looking at how well two different drug treatments (A and B) work on small and large kidney stones. Here is the success rate that was found:

• Small Stones, Treatment A: 93%, 81 out of 87 trials successful
• Small Stones, Treatment B: 87%, 234 out of 270 trials successful
• Large Stones, Treatment A: 73%, 192 out of 263 trials successful
• Large Stones, Treatment B: 69%, 55 out of 80 trials successful.

Which is the better treatment, A or B?

Even though Treatment A had higher success rates in both small and large stones, when the whole trial is viewed as a sample space Treatment B is actually more successful:

• Small Stones, Treatment A: 93%, 81 out of 87 trials successful
• Small Stones, Treatment B: 87%, 234 out of 270 trials successful
• Large Stones, Treatment A: 73%, 192 out of 263 trials successful
• Large Stones, Treatment B: 69%, 55 out of 80 trials successful.
• All stones, Treatment A: 78%, 273 of 350 trials successful
• All stones, Treatment B: 83%, 289 of 350 trials successful.

This is an excellent example of Simpson's Paradox, where correlation in separate groups doesn't necessarily translate to the whole sample set.

In short, just because there correlation in smaller groups hides the real story taking place in the largest of groups.

Source: Stephen Julious