Scatter plots are an awesome way to display two-variable data (that is, data with only two variables) and make predictions based on the data. These types of plots show individual data values, as opposed to histograms and box-and-whisker plots.
Here's a scatter plot of the amount of money Mateo earned each week working at his father's store:
The weeks are plotted on the x-axis, and the amount of money he earned for that week is plotted on the y-axis. In general, the independent variable (the variable that isn't influenced by anything) is on the x-axis, and the dependent variable (the one that is affected by the independent variable) is plotted on the y-axis.
Using this plot, we can see that in week 2 Mateo earned about $125, and in week 18 he earned about $165. More important is the trend of the data. For example, with this dataset, it is clear that Mateo is earning more each week. Maybe his father is giving him more hours per week or more responsibilities.
Correlation
With scatter plots we often talk about how the variables relate to each other. This is called correlation. There are three types of correlation: positive, negative, and none (no correlation).
- Positive Correlation: as one variable increases so does the other. Height and shoe size are an example; as one's height increases so does the shoe size.
- Negative Correlation: as one variable increases, the other decreases. Time spent studying and time spent on video games are negatively correlated; as your time studying increases, time spent on video games decreases.
- No Correlation: there is no apparent relationship between the variables. Video game scores and shoe size appear to have no correlation; as one increases, the other one is not affected.
Mateo's scatter plot has a pretty strong positive correlation; as the weeks increase his paycheck does too.
Line of Best Fit
We use a "line of best fit" to make predictions based on past data. There are many complicated statistical formulas we could use to find this line, but for now, we will just estimate it. The line we draw through the points on the graph just needs to look like it fits the trend of the data. When drawing the line, you want to make sure that the line fits with most of the data. If there is a point that is much higher or lower (an outlier), it shouldn't be on the line.
Using this line, we can predict how much money Mateo will earn in his 20th week of work (assuming he continues this pattern).
Based on this line, Mateo will earn approximately $157 in week 20.
Here's another type of graph involving a bell curve
Example 1
This is a scatter plot showing the amount of sleep needed per day by age. |
Example 2
These two scatter plots show the average income for adults based on the number of years of education completed (2006 data). 16 years of education means graduating from college. 21 years means landing a Ph.D. What type of correlation does each graph represent? |
Example 3
What if females with 23 years of education have an average income of $80,000? How does including this point on the scatter plot change the trend of the line of best fit? |
Classify each pair of variables as positively, negatively, or not correlated.
The amount of time spent studying and your grade point average.
Classify each pair of variables as positively, negatively, or not correlated.
Shoe size and the number of pairs of shoes one owns.
Use this scatter plot to answer questions 3 – 5
Frankie and Lucy are planning on selling a new iPhone app. This is a scatter plot estimation of how many apps they can sell at different prices. A line of best fit is drawn in for you.
Exercise 3
What kind of correlation is shown in this graph (positive, negative, or no correlation)?
Exercise 4
If Frankie and Lucy price the application at $2.50, how many can they expect to sell, and how much money would they get from the sales?
Exercise 5
If they price the application at $3.00, how many can they expect to sell, and how much money would they get from the sales?