> Caught just a piece of a news story on the radio a couple of days ago, > where they were talking about the difficulties pollsters face trying to > get meaningful projections in this close US presidential race.
There was even one case where two organizations (I think ABC and Washington Post) used the same data and got different conclusions.
> One pollster was saying that he knew of cases where two polls differed > by 15 points in the same state, and he seemed to be implying that was > connected to the closeness of the race. > > This started me wondering what sort of model could explain that. > > We're trying to estimate p = the fraction of voters who will vote for > Gore. We do this by measuring p-hat, a fraction of people polled who say > they'll vote for Gore. Mostly I guess p-hat is assumed to be normal with > mean p and variance depending on sample size. > > Obviously the sample has to be truly random and unbiased in some sense > for this to work, and presumably the polling companies have techniques > they use to try to eliminate bias. This guy seemed to be implying that > the nature of the race could either be introducing biases, or increasing > the variance of p-hat in some other way. > > What could be going on here?
Several things are going on here.
1. Less than half those contacted choose to answer the question. Thus, there is a real question as to whether those who choose to answer and those who don't are the same underlying population. If they aren't, that would be an explanation for at least some election day surprises.
2. In addition, there are polling things. I am under the impression that the real statistical things are done pretty much by the book. Stratified sampling, calculating sample size, etc. However, there is still room for art, that is, individual interpretation.
a) How are the questions phrased? "Who do you prefer?" should get a lot less "undecided" replies than "who are you going to vote for?" One of the experts in the piece I heard said that the pollsters change the phrasing to get changes in the polls. (After all, they get paid for producing the numbers. If the numbers aren't changing, why pay for a new poll?)
b) The "qualification questions" (asked to determine if the respondent is a "likely voter") leave the determination of "likely voters" up to the interpretation of the pollster. That is why ABC and Washington Post got different numbers from the same data. For example, if you ask two questions, are you planning to vote this time, and did you vote last time, then you have a choice of declaring a respondent "likely" for planning this time, did vote last time, either, or both. That's four different ways to choose a classification scheme. Of course, there's a lot of overlap, but that introduces a source of variation in results. And it's a significant one (quoted without attribution).
So, the bottom line is that the difficulties are with the non-statistical parts of the process.