Evaluating An Estimate - At A Glance

Welcome to Election Day here in Shmoop Nation. We're not voting on something so cliché as the President or members of Congress. Nope, we're voting for dogcatcher. That's a very important position here at Shmoop. #totallynotsarcasm

Our two choices are for Marty and Shmarty. We thought nobody cared about the race and that everyone would just pick one of them at random. Once the votes were counted, though, Marty walked away with a full 60% of the vote. Were we completely off-base on how much people cared about who was dogcatcher?

We can use a sampling distribution to investigate. We won't hold a couple hundred elections and see how the results vary. People would start giving write-in votes for Twilight Sparkle by the second revote anyway.

Instead, we'll use simulation. It works like this. If people voted at random, there would be a 50% chance for each person to vote for Marty. We can simulate that 50% probability by flipping a coin and saying each head is a vote for Marty. We'll flip 30 coins, one for each person who voted (we're a small nation, okay). We'll record the results of each flip, then find out the proportion of heads we get.

That will be a single simulation of our election. We want a distribution of results, though, so we'll flip 30 coins again and again and again, until we've done it 200 times. That will give us a distribution of 200 election results that we can then compare to our single, real-world result.

Our coin-flipping hand would get tired pretty quickly here, so we're going to use a computer to run the simulation for us.


Each dot on the graph represents a single set of 30 "coin flips." Sometimes Marty got exactly half of the votes. Sometimes he got more, sometimes less. The values close to 0.5 are more likely than those farther away, though. Where does his actual result, of 0.6, fall on here?

Out of the 200 trials, 32 of them estimated a proportion of the vote of 0.6 or larger. That's 16% of the trials. We can think of this as the probability of getting a result like ours, under the assumptions of the sampling distribution. We call this probability the P-value. It's a probability value—clever, right?

Our P-value is kind of small, but not super tiny or anything like that. The 60% vote seems to be consistent with the assumption of the sampling distribution (that people voted for dogcatcher at random).

We just said that the vote is "consistent" with the true proportion being 0.5, but we didn't say that people actually did vote at random. We won't ever say it, either, because we just don't know. A true voting proportion of 0.55 is also consistent with our results, just like it being exactly 0.6 is as well. Instead, we "failed to reject" the assumption of random voting.

What would make us reject 0.5 as inconsistent with the actual vote? A generous contribution from Marty's campaign might convince some people, but not us. We would need the results of the election to be really unlikely for us to be convinced; we would need a really small P-value.

How small? A common cutoff is when 5% or fewer of the trials are as extreme, or more so, than our sample data. That 5% is the significance level for the analysis. It's like the Gatekeeper to the P-value's Keymaster. It's sometimes called the alpha level, because we use the Greek letter α to represent it.

For our sampling distribution, our result becomes really unlikely when the sampled proportion (which was 0.6 in our case) is further than ~0.18 from 0.5. That ±0.18 around the sampling distribution's mean, which holds about 95% of the distribution, is known as the margin of error. This is commonly called the confidence interval as well.

Marty getting 60% of the vote does not fall outside the margin of error. However, if 70% of the people had voted for him, then we would have been able to reject the assumption of random voting. If he had gotten 100% of the vote, we would sail past non-random voting into the realm of shenanigans.

If 70% of the nation had voted for Marty, then we could "reject the null hypothesis." The null hypothesis is the assumption we made to create our sampling distribution. We reject the null hypothesis when our P-value is smaller than our significance level.

Rejecting the null hypothesis is nice, but we can't turn around and accept 70% as the true parameter either. Remember, there are a lot of values consistent with that result too. Values like "littering is bad" and "always eat your vegetables."

We could, in fact, create a sampling distribution for the assumption that 70% of Shmoop Nation would want to vote for Marty. It's hard to find a coin that lands on heads 70% of the time, but a computer can do it easily.

That sampling distribution will have its own range of plausible values (i.e., its own margin of error). The margin of error will actually be ±0.18 as well. What a coinkydink.

And this is why we can't accept our estimate as being the truth, even when we reject the null hypothesis. There will always be some margin of error surrounding our estimates. It's like a cloud of uncertainty that follows around everywhere we go. At least it doesn't smell.