What's the best way to collect the data for a sample? A butterfly net probably isn't the way to go, unless we're actually sampling butterflies. Putting up a flyer asking the data to come to us won't work. Well, if we need volunteers for a medical study, then it might work. Really, there are a lot of ways to collect data, and which one we use will depend on what we're collecting and why.
Survey Says
We've already talked about sample surveys a bit. That's when we collect a sample from a population for the purpose of estimating a parameter. Are your classmates planning a revolt against school? Run a poll and find out. It might be best if you stay home that day.
Have we mentioned how important random sampling is for sample surveys? It's not (just) that we're getting senile in our old age. Random sampling is so important that we'll mention it twice, and look old doing it. If a sample survey isn't a random sample, then we can't use it to estimate a population parameter. It's especially sad because that's their only job.
To The Lab
Sometimes, describing a population isn't enough. We want to test some crazy idea we have, like "Can pigs fly after taking an 8-week online course?" That's when we run an experiment.
To start, we randomly put individuals into two groups. Many experiments use a control group, that receives no treatment, and an experimental group that does. One set of pigs will take the online pilot's course, while the other pigs will take a course on something else. Maybe we'll teach those pigs some African History. Then we'll record the results of the two groups and compare them.
Experiments are great because they let us test cause-and-effect relationships. Does taking an online class on flying turn pigs into better pilots? It would be hard to figure this out without an experiment. It will probably be hard to figure out with an experiment, too. We're going to break a lot of planes before we're done.
Something that can't be done in our pig experiment, but that shows up in a lot of experiments, is double-blinding. When running the pig experiment, we the researchers knew which pigs were in the control group and which were in the experimental group. The pigs knew which group they were in as well. At least, we think they knew.
An example of a double-blind experiment is a drug trial where neither the doctors nor the patients know if they are taking the real drug or a placebo. A doctor might act differently with someone in the control group than someone in the experimental group. Or people might stop taking the drug if they know it's a placebo. In the land of the double blind, the doctor's assistants are king: that's because they're the ones that keep track of which individuals are in each group. Double-blinding helps reduce unconscious bias.
Go Outside and Collect Data
There are times when we can't collect a random sample. On a Saturday morning, for instance. We'd rather just sleep in. Other times we even have actual reasons for why we can't. If we're studying the health effects of smoking, we can't ethically assign some people to a smoking group. That might have flown in the 1930s, but not today.
In those cases, we can conduct an observational study instead. We'll take people who already belong to the group we're interested in, and we'll compare them to a control group. While the control group can be randomly selected, the group we're interested in can't be. Not to sound like a broken record, but no random sampling: no dice.
This means our sample probably won't be a representative sample. That's why we avoid observational studies like the plague, unless we have no choice. Then we'll only do it if we drink plenty of OJ and wash our hands obsessively, to avoid catching the observational study. Or the plague. One of those.