High School: Statistics and Probability

High School: Statistics and Probability

Interpreting Categorical and Quantitative Data HSS-ID.C.9

9. Distinguish between correlation and causation.

One of the most common mistakes made when analyzing statistics is confusing correlation and causation, or, more specifically, assuming that correlation implies causation. What happens when we assume something? We don't know, but it's rarely good.

Students should already know that correlation is a measure of the strength of the association of two variables. In particular, they should know that the linear correlation coefficient r can range from -1 to 1, with a value of -1 suggesting a strong negative correlation and a value of 1 suggesting a strong positive correlation. At the very least, they should be aware that this coefficient exists in the world.

As strong as that r guy might be (and his black belt speaks for itself), he's still a correlation coefficient. While r can prove that two variables are strongly correlated, it can't prove that one causes the other.

For instance, let's say we find that there's a strong positive linear correlation between the age of a tree and how many apples it produces. In fact, this correlation is so strong that r = 0.99. Does that mean the age of the tree causes more apples to grow?

Maybe, right? After all, it makes sense that the older a tree gets, the more fruit it will bear, and that explains why r has the strength of the Incredible Hulk. So we can say that a tree's age will cause more apples to grow, right?

Wrong.

While the two variables are strongly correlated, we cannot prove that one causes the other because there may be a zillion other factors we haven't considered. What about rainfall? Did the farmer use fertilizer? Did he prune the trees? What were the summer and winter temperatures? Any one of these factors may have influenced the number of apples.

Basically, students should have it drilled into their heads that it takes a lot more than a strong correlation to prove causation.

The key thing here is that correlation does not imply causation. We'll say it again, because it's that important: Correlation does not imply causation. It's crucial that students understand and remember that. In fact, they should have it tattooed on their foreheads so they never forget it. Correlation does not imply causation. Got it?

Drills

  1. Fill in the blank: Correlation does not _______ causation.

    Correct Answer:

    imply

    Answer Explanation:

    Were you awake when you read this standard? Because if you were, then there's no chance you'd miss this question. Yes, exactly. Correlation does not imply causation.


  2. What is the definition of correlation?

    Correct Answer:

    Measure of the strength of a linear relationship between two variables

    Answer Explanation:

    The correlation coefficient is the strength of the linear relationship between two variables. It never suggests causation, and since all the other answer choices have "cause" or "causation," that narrows it down pretty quick.


  3. Which of the following values for r suggests that one variable causes another?

    Correct Answer:

    None of the above

    Answer Explanation:

    It doesn't matter how strong or weak the correlation is, a value for r will never suggest causation of one variable on the other. When all else fails, just think of causation as impossible to prove or even suggest.


  4. Which of these values for r validates a strong positive correlation?

    Correct Answer:

    0.9

    Answer Explanation:

    For a correlation to be strong, the coefficient should be greater than 0.8 (or less than -0.8). So looks like (C) is our gym-going, six-pack-toting winner. Rock on little r, rock on. And note that it validates a strong positive correlation, not causation.


  5. Which of these values for r suggests a negative correlation?

    Correct Answer:

    -0.7

    Answer Explanation:

    A negative correlation is one that is (you guessed it) negative, or less than zero. It shouldn't be a surprise that our answer is (A). But again, notice that it suggests a negative correlation, but not causation.


  6. Which of these values for r suggests no linear correlation?

    Correct Answer:

    0

    Answer Explanation:

    No linear correlation at all? We made a bad assumption, and the goose-egg score for r proves it. No correlation, but not a word about causation.


  7. Which of these values for r suggests a negative causation?

    Correct Answer:

    None of the above

    Answer Explanation:

    What's the last word in the question, again? Oh, right, right. Causation. Remember, no r value can suggest causation in any way, no matter how positive, negative, strong, or weak. So we're going with (D). Final answer.


  8. What does an r value of -0.89 suggest about two variables?

    Correct Answer:

    As the independent variable increases, the dependent variable decreases

    Answer Explanation:

    Answers (A) and (B) can't be right, because of what your tattoo says: Correlation does not imply causation. However, correlation does mean that as the independent variable changes, the dependent variable changes in some predictable fashion. In this case, the value is negative, so (D) is the right answer.


  9. What does an r value of 0.05 suggest about two variables?

    Correct Answer:

    The variables are not correlated and x does not cause y

    Answer Explanation:

    This correlation coefficient is so weak it can barely hold its head up, therefore disproving the assumption that the variables are correlated. And, of course, we can not prove anything about causation from the correlation coefficient. But you already knew that.


  10. Why does correlation not imply causation?

    Correct Answer:

    Because we must take into account all possible variables when proving causation

    Answer Explanation:

    If you chose (A), you may want to reread this entire thing. We know that correlation does not imply causation, and the reason is because of (B). While correlation indicates some sort of connection, causation is a much more direct relationship. If we want to prove that two variables are causally related, we need to make sure that no other variable affects this relationship. The answer isn't (C) because it mixes up the two terms, and (D) can't be right because all the other answers conflict with each other.


Aligned Resources

More standards from High School: Statistics and Probability - Interpreting Categorical and Quantitative Data