Analyzing Data

Analyzing Data

All right, we've got ourselves a pretty good-lookin' graph or some pretty amazing data. Now what?

Now it's time for analysis. Analyzing data is, like, the most exciting part of being a scientist. Okay, okay, maybe blowing stuff up is the most exciting part, but analyzing data is a close second. Figuring out what those data mean is the best way to answer our experimental question, because that answer isn't gonna be in the back of the book.

Show Us Your Stats

How do we go about analyzing data? One way is to look at their stats. Our experiment won't come with a baseball card listing its RBI's for the season on the back. So we're going to have to get out our calculators and do a little math. Don't cry, we'll hold hands the whole way.

Let's start by getting our numbers organized. It helps to put them in order from smallest to largest. Then we can bust out the calculator and figure out the mean, median, mode, and range.

Here's a recap on how to find each value:

  • Range = biggest number – smallest number
  • Mode: number that repeats the most
  • Median: number in the middle when the data are in numerical order
  • Mean = total of all values divided by the number of values

Outliers (Not Another S.E. Hinton Novel)

When we start analyzing our data, we might notice that we have a few data points that don't quite fit in with the rest of the crowd. We call these outliers. They're pretty easy to identify, because they're avoiding the rest of our data points like the plague.


See that red dot? That's Beyonce. Those green dots are the rest of us. (Source)

Where do these rebel data come from? Well, sometimes they're just errors that get made when we conduct our experiment. Maybe we read that beaker wrong, or forgot to calibrate our scale, or "Was that number we jotted down a six or an eight?" It happens to the best of us.

Outliers don't always come from error, though. Sometimes we just happen to measure something out of the ordinary. Like that one hair on our sister's chin that has somehow grown three inches longer than the rest of the hairs on her chin. Oops, we weren't supposed to tell anyone.

Major outliers can really skew our data analysis, so it's important to handle them appropriately. We know what you're thinking. "Can't we just get rid of them and pretend they never happened?" Getting rid of outliers is seriously frowned upon in the scientific world, so we can't slip them in the shredder. First of all, it's important to show other scientists all of the results we got (the good, the bad, and the ugly). Second of all, that outlier could actually be the discovery of something new that we didn't even know was related to our study. And we know if we discover something new, they name it after us, so there's that.

A general exception to the "Keep All the Outliers" rule is if we know for a fact the outlier was not a true measurement of whatever we were trying to measure. For example, suppose we're recording the weight of adult elephants, and we find that one of our data points is fifteen pounds. Maybe it was written down wrong, or the scale was calibrated incorrectly, but we don't remember seeing any adult elephants the size of house cats walking around, so there's clearly been an error here.

Not only is this data point not what we were trying to measure, it's going to radically skew our data when we go to analyze it. Of course, it's a good idea to mention in our report that we got this data point and give our reasons for discarding it so no one calls "cheater, cheater, elephant eater" on us. If you're not sure when your outliers should stay and when they should go, check out this summary of outlier etiquette.

What's Trending?

Another way to analyze our data is to scour that graph we just whipped up for trends. We're not talking about neon leggings and scrunchies, here. We're talking about what the graph is doing. Are the data points increasing or decreasing? Is there a relationship between the independent and dependent variables? Did changing the independent variable change the dependent variable? Was there an overall change over time?

Looking at these data point trends will show us if there's a positive correlation, negative correlation, or no correlation at all. Heard the word "correlation," but don't know what it means? No problem, here's a dollop of Shmoop to get you up to speed.

Want another example? Well, we've got that too.


(Source)

Let's look at Graph 1, the one with the green chicken pox. This graph's data points are distributed randomly, so we would say there is no correlation between the variables here.

Graph 2 is smiling at us, how sweet. This is a nonlinear relationship.

Graph 3 shows us data points that are all fairly close together near the best-fit line, which is pointed upward. This means we have a strong positive correlation, or an increase in one variable increases the value in the other variable.

Graph 4 is doing the same thing, but is pointed downward. This means we have a strong negative correlation, or an increase in one variable decreases the value of the other variable.

When we're looking at data, it's important to remember that correlation doesn't necessarily mean causation. All we mean is that, just because there's a correlation (or relationship) between two variables, doesn't necessarily mean one caused the other.

Let's take Sporty McShmoopball for example. He's got a lucky jersey that he wears every time his team plays. He thinks his jersey is lucky because there is a positive correlation between the times he's worn it and the times his team has won. If it has something to do with lack of laundering or a player's sweat, we don't want to know.

However, does wearing his lucky jersey actually cause Sporty's team to win? Are the players physically unable to catch the ball or score a goal without him wearing it? The answer is no. While it may seem that the jersey is responsible for that championship trophy, the truth is that Sporty's team would still be hoisting it above their heads even if his jersey was mid-spin cycle. So maybe he'll think about giving it a rinse every now and again.

Okay, back to our data points. We talked about identifying trends, but what if our graph has the data chicken pox? What if those points don't really show an identifiable trend? It's possible that the data weren't collected properly or that there weren't enough trials to show a trend. Either way, we may need to do a little sleuthing to find out what makes our graph tick.

Other People's Graphs

It's also important to keep in mind that we won't always be looking at a graph that we whipped together ourselves. A big part of science is checking out the work of other scientists, which means we might be looking at a graph of data we've never seen before. Don't panic, just start with the basics. First we introduce ourselves to the graph, exchange Instagram handles, and check the axes to see what the graph is measuring and what units it's using. Then start looking for patterns, trends, and correlations, just like we would with our own graph.

Once we've channeled our inner graph whisperer and figured out what that graph is trying to tell us, we can use it as evidence. That's right, we can point to that graph and say, "That is how I know this." Or you know, brag about it on Facebook. Analyzing data can also introduce more questions for us to study or let us know that our hypothesis might need a little tweaking. Data are always trying to tell us something—we just have to listen.

Common Mistakes

Some common mistakes scientists make when analyzing graphs is assuming correlation means causation. Just because murder rates increase as ice cream sales go up does not mean ice cream causes more murders. To actually show causation, we need a well-designed experiment. Otherwise, we're just speculating.

Brain Snack

Did you know pirates are the cause of global warming? Arrrgg, look at these here data to decide fer yerself if it be a true causation or some blimey corrrrrelation.