Going through our archives (but not too far back), we came across this great post from November 22, 2013, “What’s The First Thing You Tell Students About Statistics?” that we wanted to share in case you missed it the first time around.
I’m looking forward to teaching my first masters level course in a lo-o-ng time next week. Since this may be the first course students take in their masters program, the question I’m faced with is,
“What would you tell someone at the very beginning of learning about statistics?”
I’m starting with this:
Bias = bad
Bias is to statisticians as sin is to preachers. We’re against it.
Bias is SYSTEMATIC error. While it is generally impossible to avoid error, in an unbiased study, error will be random.
Random = good
If error is random, we would be equally likely to err in one direction as the other, and so, on the average, would get the correct result. For example, if I was evaluating fighters to decide if they really did have brain damage as a result of being hit in the head too many times, in some borderline cases I might incorrectly decide the fighter was fine when, in fact, there was some minimal brain damage. In other cases, I might decide the person had damage, when he or she was just somewhat on the low side of the bell curve in terms of functioning brain cells. On the average, though, those errors should balance out and I should get the correct conclusion.
Random assignment is good because it means that people are equally likely to be assigned to one group versus another, so it is likely to control for confounding variables. What are confounding variables? Those are factors that may have complex relationships that distort the relationships found between your predictors/ risk factors and outcome variables. For example, people residing in nursing homes (my predictor) may be more likely to die (my outcome) but that might be because they are older or in poorer health (confounding variables).
Random selection is good because it means that everyone in the population has an equal chance to be selected, which means that, if you have a large enough sample, your sample is likely to be representative.
What’s a sample? What’s a population? What’s representative?
Well, we’ll get into that shortly.
But, speaking of random, I thought the most important thing to begin with was not how to find a mean or standard deviation but that bias is bad, because if you have bias, you are worse off after you found the mean than before you knew how to compute it. Before you didn’t have any information, you didn’t know the mean and you knew you didn’t know it.
With bias, you still don’t know the mean, but you think you do. You’ve actually gone backwards.
Think about it.