Posted this as a comment at Just One Minute, and then decided to put it here as well. “Kimji” is a commenter on JOM.
Kimji dropped this one on the other thread:
If there are so many of them, they aren’t outliers. The high ones aren’t outliers any more than the low ones are.
Which is a demonstration that that stats class went right over its head.
Let’s throw in a little statistical factchecking. Note, by the way, that the numbers here aren’t intended to be taken as exact values for any particular poll; there’s just there as definite examples in order to lay out the statistical principles involved.
1) No, if the high ones are outliers then the low ones probably are. However, the notion of “outlier” really doesn’t mean much in this situation, because
2) the samples are radically different. Because of that, the idea that some of the polls are outliers — samples of a random variable that happen to come out far from the mean — is flawed.
3) So is the “Real Clear Politics” average. Averaging means only works if the samples are sufficiently similar that the variances are similar and the means are within a “reasonable” interval. Right now we have a situation where some of the polls being averaged are so far apart that the mean of one is seven or eight or nine standard deviations away from the mean of another.
An easy way to look at it is this: the “margin of error” number is really a statement that the real value you’d get from a real election has a 19 in 20 chance of being within the margin of the poll’s reported value. When, as RCP is doing, you average polls as far apart as these are, you are basically saying that the vote will be 55/41±3 Obama/McCain AND 49/47±3 Obama/McCain AT THE SAME TIME.
Which is obvious nonsense.
4) That doesn’t mean we can’t learn from the polls, it just means that we have to understand what they’re telling us.
What they’re telling us is that there are a whole mess of radically different assumptions about the makeup of the people who actually get to the polls. Reasonably enough, the higher the proportion of Democrats to Republicans, the higher the poll shows for Obama. Lower, the better the poll shows for McCain. Here’s how to read that: IF the actual turnout is, say 40 percent D, 28 percent R THEN the expected vote will come out, say, 51/44.
So don’t average the polls; read them as different statements entirely, based on different assumptions.
5) Now, let’s bring in what Jay Cost and DJ Drummond have been saying: it would be extremely unusual for party identification to change that much. Certainly not impossible, but very very unusual.
6) Factor in the ACORN stuff: we know they said they’d registered 1.4 million new voters. It’s been reported that they really registered between a third and a tenth as many in reality. Say it’s a third, about 500,000. Of those, how many are fraudulent? We don’t know, but every reported example has been pretty high.
Both of these would tend to make the electorate model — that is, the relative proportion of D to R to I — weighted incorrectly to D.
The upshot of it all is: ignore the RCP average: it’s meaningless, literally based on a self-contradictory assumption.
Read the different polls as: IF the electorate is really balanced as this poll says, THEN the results are likely to be so-and-so.
And if you care about the outcome of the election, do everything you can to make sure that people who think like you are voting. Legally.
{ 2 } Comments
Wow. That’s a truly arcane
diorama there, Chaco.
Do you think the ‘Bradley Effect’ has any relevance in this election?
You know, the closeted bigots who cannot shake racism, but don’t want to appear politically incorrect, even to a pollster they will never see again?
I could point you to some good stats texts, Leo.
As far as “Bradley effect”, given your way of describing it, yeah, I do think you and your co-religionists could well be convincing people they don’t want to admit who they might vote for to a stranger who knows their phone number.
Post a Comment