Scatter Plots & Correlation at a Glance

Scatter plots are an awesome way to display two-variable data (that is, data with only two variables) and make predictions based on the data. These types of plots show individual data values, as opposed to histograms and box-and-whisker plots.

Here's a scatter plot of the amount of money Mateo earned each week working at his father's store:

mateo's money

The weeks are plotted on the x-axis, and the amount of money he earned for that week is plotted on the y-axis. In general, the independent variable (the variable that isn't influenced by anything) is on the x-axis, and the dependent variable (the one that is affected by the independent variable) is plotted on the y-axis.

Using this plot, we can see that in week 2 Mateo earned about $125, and in week 18 he earned about $165. More important is the trend of the data. For example, with this dataset, it is clear that Mateo is earning more each week. Maybe his father is giving him more hours per week or more responsibilities.

Correlation

With scatter plots we often talk about how the variables relate to each other. This is called correlation. There are three types of correlation: positive, negative, and none (no correlation).

Positive Correlation: as one variable increases so does the other. Height and shoe size are an example; as one's height increases so does the shoe size.
Negative Correlation: as one variable increases, the other decreases. Time spent studying and time spent on video games are negatively correlated; as your time studying increases, time spent on video games decreases.
No Correlation: there is no apparent relationship between the variables. Video game scores and shoe size appear to have no correlation; as one increases, the other one is not affected.

Mateo's scatter plot has a pretty strong positive correlation; as the weeks increase his paycheck does too.

Line of Best Fit

We use a "line of best fit" to make predictions based on past data. There are many complicated statistical formulas we could use to find this line, but for now, we will just estimate it. The line we draw through the points on the graph just needs to look like it fits the trend of the data. When drawing the line, you want to make sure that the line fits with most of the data. If there is a point that is much higher or lower (an outlier), it shouldn't be on the line.

mateo's money best fit line

Using this line, we can predict how much money Mateo will earn in his 20th week of work (assuming he continues this pattern).

mateo's money approximation

Based on this line, Mateo will earn approximately $157 in week 20.

Here's another type of graph involving a bell curve

Examples
Exercises

Example 1

This is a scatter plot showing the amount of sleep needed per day by age.
As you can see, as you grow older, you need less sleep (but still probably more than you’re currently getting...). What type of correlation is shown here?
This is a negative correlation. As we move along the x-axis toward the greater numbers, the points move down which means the y-values are decreasing, making this a negative correlation.
Estimate a line of best fit.

Show Next Step

Example 2

These two scatter plots show the average income for adults based on the number of years of education completed (2006 data). 16 years of education means graduating from college. 21 years means landing a Ph.D. What type of correlation does each graph represent?
Both graphs are positively correlated. As years of education increase, so does income.
Draw a line of best fit for each graph. Then, estimate and compare the earnings for each gender with 11 years of education completed.
Based on these plots it looks like a female who completes 11 years of school can expect to earn around $14,000/year while a male can expect to earn around $23,000/year.
These graphs show two important things. First, higher education does lead to a higher income in general. Second, there is a gender gap in income. While women have begun to close this discrepancy, there is more work to do.

Show Next Step

Example 3

What if females with 23 years of education have an average income of $80,000? How does including this point on the scatter plot change the trend of the line of best fit?

Show Next Step

Classify each pair of variables as positively, negatively, or not correlated.

The amount of time spent studying and your grade point average.

Show Answer Gimme a Hint

Hint

as time spent studying increase, GPA increases

Classify each pair of variables as positively, negatively, or not correlated.

Shoe size and the number of pairs of shoes one owns.

Show Answer Gimme a Hint

Hint

as shoe size increases, what happens to pairs of shoes?

Use this scatter plot to answer questions 3 – 5

Frankie and Lucy are planning on selling a new iPhone app. This is a scatter plot estimation of how many apps they can sell at different prices. A line of best fit is drawn in for you.

iPhone app

Exercise 3

What kind of correlation is shown in this graph (positive, negative, or no correlation)?

Show Answer Gimme a Hint

Hint

as price increases, quantity sold decreases

Exercise 4

If Frankie and Lucy price the application at $2.50, how many can they expect to sell, and how much money would they get from the sales?

Show Answer Gimme a Hint

Hint

multiply the quantity sold by the price

Exercise 5

If they price the application at $3.00, how many can they expect to sell, and how much money would they get from the sales?

Show Answer Gimme a Hint

Hint

multiply the quantity sold by the price

Scatter Plots & Correlation at a Glance

Correlation

Line of Best Fit

Example 1

Example 2

Example 3

Classify each pair of variables as positively, negatively, or not correlated.

Classify each pair of variables as positively, negatively, or not correlated.

Use this scatter plot to answer questions 3 – 5

Exercise 3

Exercise 4

Exercise 5

Tired of ads?