OK, thanks David. There's a lot more teaching going on here than learning, but I'm trying to soak up what I can.
I'm hearing that my residual deviance to residual degrees of freedom indicates what Rich described as "the "lack of fit" that you have is an overly strict standard for what you are fitting", or that as you first tried to tell me, my data may not be Poisson-conformant, even if they are count data. Fair enough.
So, I stepped away from the Poisson glm. I briefly tried the quasipoisson, but as the AIC is not defined for that family, I can't lean on the backward stepwise process I was hoping to depend on. A good thing you might say? (I'm still curious when it would be acceptable to Hiawatha to use the step routines that are available?). Anyhow, I tried the gaussian, with a log transform of the DV, and with a sqrt transform of the DV.
Not that I trust it any more, but to my eye the log transform yielded a visually more normal distribution, of my 31 measurements.
Can I compare the AIC of two models with different families and transforms of the DV as guidance to the better transform? With the log transform I achieve a final AIC of 61, whereas with the poisson glm it was 179. That would be a vote for the log(DV) gaussian glm.
... Null deviance: 16.1573 on 30 degrees of freedom Residual deviance: 8.3488 on 25 degrees of freedom AIC: 61.306
Comparing the values for the other tests I'd run (see original post!), for this new model I get
Correlation of predictions to observations cor(mining$InvertRich,exp(predict.glm(mlmir)) = 0.678 (cf 0.773)
Goodness of fit 1-pchisq(mlmir$deviance,mlmir$df.residual) = 0.999, (cf 5e-05)
Likelihood logLik(mlmir) = -23.6 (cf -82.5)
Likelihood ratio pchisq(-2*log(exp(logLik(NullModel)/logLik(mlmir)),(NullModel $df.residual-mlmir$df.residual)) =0.998 (cf 1.0)
Wald Test waltest(mlmir) Pr(>F) = 0.004 (cf 4.6e-6)
So again a mixed bag of results. There's much less deviance relative to remaining degrees of freedom, does that suggest no further problem with overdispersion? The correlation of predictions to observations is pretty good but not as good as it was before, the goodness of fit test is at the other end of the scale now (ie the good end), the likelihood is better, the likelihood ratio is marginally better but still appalling, and the Wald test shows it's an OK model but not as outstanding as previously.
I truly appreciate the time that you all have spent in discussing the issues with me. I didn't expect my hand held through the whole process, and hopefully I'll retain some of the learnings that have been handed to me during this development. Thanks for your patience and generosity.
On Jan 18, 1:40 am, David Duffy <dav...@orpheus.qimr.edu.au> wrote: > Erogo <eogood...@gmail.com> wrote: > > Granted, every model is wrong. No data drawn from the real world is > > truly Poisson > > You might recall your G.O.F. P-value of 1e-5. Dividing the residual > deviance by residual d.f.s gives you the "scale factor" or dispersion, > which can used in fiting a quasi-Poisson model qv. Alternaively it is > possible that including interaction terms might soak some of this up. > A GLMM with one random effect per count is another way to deal with this: > look at the Poisson example in Breslow and Clayton 1993, and read some ofhttp://glmm.wikidot.com/ > > In my experience, inappropriate Poisson regression can lead to ridiculous > estimates, but that has been with bigger datasets. > > Cheers, David Duffy.