Maybe you can comment on the validity of my approach.
I've got a multiyear timeseries data set, which I have reduced to daily resolution by averaging. Data are environmental / meteorological / oceanographic. I'm hoping to isolate the effects of various meteorological drivers on oceanographic response(s).
Understanding that consecutive samples are not independent, I was stumped for some time, avoiding standard linear regression. Is this alternative approach better? Is it better enough to be acceptable?
I fitted an annual signal to each variable, using GAM from the mgcv package in R. Where the gam fitted a smooth unimodal signal that explained a decent amount of the total variation, I took forward the residuals of that model to use as the "detrended" variable. Both for the drivers, and the response(s).
I then use standard multiple linear regressions, with detrended drivers (gam residuals) to predict detrended response (gam residuals).
I achieved R squared values for these detrended linear models of roughly 0.5 for half a dozen models (predicting oceanographic properties closest to the sea surface), down to 0.13 (for predictions at greater depth in the sea). I can increase the R^2 from 0.13 to 0.2 if I allow lags in the time series.
If I then make predictions of the seasonal fluctuation using my gam, and add these to predictions of the non-seasonal, detrended (residual) values using the linear models, I get correlations of 0.95 to the measured data. (Note I did not separate training and test data, and n= 2191).