Ominous thunder, bouts of mad cackling, and pristine white lab coats? Looks like someone has been running an experiment around here. Once the deed is done, though, and all the protesting villagers go home, we still need to analyze the results.
Consider this experiment: we took 10 cats, snuck up behind them, and scared them to see how high they would jump. Don't worry; even if we scare the life out of them, they still have 8 lives left. We used two different ways of scaring them. One was the classic "BOO," while the other was to break out into yodeling.
We want to know if there is a difference between these two methods, or if they are equally effective. In statistical terms, is there a statistical difference between the average height of the jumps of the two groups?
The key word is "difference," as in subtraction. We can make a null hypothesis that there is no difference between the groups. (Avg. 1 – Avg. 2) = 0. Another way to put that is that the two sample means are equal: Avg. 1 = Avg. 2.
Calling this a null hypothesis makes us think about creating a sampling distribution. Then we could compare our result to the distribution to see how typical it is. Now wouldn't that be nice?
The problem, though, is that we don't have a pre-defined idea, like coin flipping, that we can use to simulate the results under the null hypothesis. Are we doomed to never knowing the best way to spook a cat? No way. We'll use the data we already have to create a sampling distribution instead.
Boo (inches) | Yodel (inches) |
7 | 4 |
8 | 2 |
11 | 6 |
9 | 7 |
14 | 11 |
The average height jumped by the "Boo" group is 9.8 inches, while the cats that were yodeled at only jumped 6 inches on average. The difference is 3.8 inches. That seems like a lot, especially for cats, but we have to remember that these results come from sampling two larger populations. One is the "cats scared by saying Boo" population, and the other is the "cats scared by yodeling" population.
Okay, that was a weird sentence, even for us. The point is, if we took multiple samples from each population, we'd get slightly different measurements each time. What if there is no real difference between the populations, though? Our observed difference might not be unusual in that case.
But what about this: what if there is only one population, "cats that are scared"? We wouldn't have two groups of cats with 5 individuals each, but one big group of 10 cats. That's a lot less scary than a group of 10 big cats.
If we're only sampling from one population, this is identical to saying that there's no difference between the two groups. Under this assumption, we could have found our 10 cats in either group at random. We'll use this to find a sampling distribution.
The name of the game is resampling. We'll shuffle our 10 data points around, putting them into the two groups, Boo and Yodeling, at random. Then we'll calculate the difference between the averages, just like we did with our actual results. Then we'll reshuffle—that is to say, resample—again and again, until we have a distribution of differences to compare to.
Sample | Group |
4 | Boo |
9 | Boo |
11 | Boo |
6 | Boo |
8 | Boo |
14 | Yodeling |
7 | Yodeling |
7 | Yodeling |
2 | Yodeling |
11 | Yodeling |
Here's an example of what the data looks like after resampling once. The two groups are the same sizes as before, 5 and 5. The averages are now 7.6 and 8.2 inches, for a difference of -0.6. The average of the yodeling group is a little larger this time. One down, only 199 more to go.
Here's our sampling distribution. Our observed difference, of 3.8 inches, is towards the edge of the graph. Only 9 of the resampled groups had a result as large or larger. That's a P-value of 0.045, which falls below our 0.05 significance level. We reject the null hypothesis of no difference between the groups.
Why do cats jump higher when someone shouts "Boo" behind them than when they yodel? We don't know; the results of our experiment don't tell us that. We only know that they do jump higher, based on our results. Maybe they're just afraid of ghosts, but not of the Swiss?