It’s the night before the election and statisticians are safe in their beds as visions of the Central Limit Theorem dance in their heads.
Eight years ago, when President Obama was running against John McCain, I was discussing this in a statistics course that I was teaching and told the students that I was sure Obama would win. When they asked why I seemed so certain when many of the polls seemed to show Obama ahead by a relatively small percentage I explained it was the lesson we had just completed.
The central limit theorem states that the sampling distribution of the mean of any independent, random variable will be approximately normal.
This theorem assumes three things:
- Your sample of means is relatively large
- The samples are independent (that is, you didn’t just ask the same people 4 times, or ask them and then ask their best friends)
- The samples are taken at random. So, asking only Republicans or only people who attend your church is not allowed. Every person in the population should have an equal chance of being selected.
What does this have to do with elections?
Well, first of all, we DO have a bunch of samples asking people how they plan to vote. These samples are called polls.
Let’s say you take 100 polls and 97 of them say that Hillary Clinton is going to win by from 1 to 3 points and three of them say Trump is going to win by 1-2 points. Based on that you could conclude that the election is really close and either of them is equally likely to win.
I would conclude somewhat differently that the popular vote is close and Hillary Clinton is very likely to win by from 1 to 3 percentage points – because the population mean is the center of the distribution of the sample means.
Is there an exception to this theorem?
Well, look above. Have any of the assumptions been violated? In particular, are certain people less likely to be left out of your poll? A random sample assumes that, for example, when you call someone’s house and they don’t answer, they are equally likely to be a Republican or a Democrat. If all of the people you are able to reach are Democrats, that is going to bias your sample and your results are going to be wrong.
I have seen limited evidence that any bias exists and what data has been reported seems to indicate the Latino vote was underestimated, which probably won’t break for Trump.
Another assumption of every statistical technique and hence usually unstated is that your data are reliable, in this case that people didn’t lie to you because they were too embarrassed to tell you that they were voting for Trump.
I think if you are going to pose an argument like that you need some data to back it up and I have not seen any.
So, it’s 1 a.m. and I have to turn in but there you have it – the Central Limit Theorem says Hillary for President.