Drexel dragonThe Math ForumDonate to the Math Forum



Search All of the Math Forum:

Views expressed in these public forums are not endorsed by Drexel University or The Math Forum.


Math Forum » Discussions » sci.math.* » sci.stat.math.independent

Topic: How to fit this generalised linear model?
Replies: 2   Last Post: Oct 23, 2002 10:03 AM

Advanced Search

Back to Topic List Back to Topic List Jump to Tree View Jump to Tree View   Messages: [ Previous | Next ]
Gaj Vidmar

Posts: 21
Registered: 12/13/04
How to fit this generalised linear model?
Posted: Oct 15, 2002 3:08 PM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply

First of all, please excuse me if this post reappears in a couple
of hours - I've sent it via the news server I use at work, which
sometimes posts messages with huge delay, or doesn't post them
at all, and by waiting till tomorow to see what happens I would miss
a whole working day of US-time - and since I would like to
get help as soon as possible, I'm posting it via another server.
The rush is also the excuse for posting to two sci.stat newsgroups.
----------
I would greatly appreciate help on the following matter -
preferably from someone familiar with the topic of relative survival,
but the issue (and my ignorance) is rather general so that anyone
dealing with GLM might be able to help.

We are trying to familiarise ourselves with the Hakulinen approach
to modelling relative survival and first we would like to replicate
the results of Prof. Hakulinen (who held a course on the topic at
our institute last year) and associates. Unfortunately, all their
publications and analyses relate to code in GLIM, which we have no
access to; but we have SPSS (soon we should upgrade to ver. 11 which,
I believe, handles such models), Stata (a colleague, actually a
colleaguess :) is mastering it) and of course R (which I am
starting to struggle with), so we would like to be able to implement
the regression approach to relative survival using one of these
packages. (preferably Stata :)

To use the model, the patients are first grouped into strata, one
stratum for each combination of predictors of interest (say:
age group[4 values], gender[2], period of diagnosis[2] and follow-up
year[1..5]). A life table is then estimated for each stratum (k),
whereby the number of deaths (d_ki) in a given follow-up interval (i)
among the l'_ki patients at risk is counted and assumed to be
binomially distributed; p_ki is defined as the interval-specific
observed survival rate while p*_ki denotes the expected survival rate
(obtained from mortality tables for general population). The additive
hazards model, also known as the relatve survival model, can be
expressed as

ln ( -ln ( p_ki/p*_ki)) = x_vector beta_vector

The manual says: "This implies a generalised linear model, the outcome
being l'_ki - d_ki (the number of patients surviving the interval),
the error structure binomial with denominator l'_ki and the link
function complementary log-log combined with a division by p*_ki."

Now, for the dataset of interest (the one on melanoma refered to in
http://www.cancerregistry.fi/surv2/manual.pdf) we have obtained p_ki
and p*_ki for each stratum and we would like to fit the model,
hoping to get results identical to, e.g., Table 11.2 in the
abovementioned manual). The colleague figured that in Stata the right
procedure should be cloglog, while in R I imagine it would be glm with
family=binomial; but exactly how to arrange the data??? And exactly
what syntax to use??? - It would be sooo nice if someone told me this!

We have constructed the dataset with 80 cases (1 per stratum:
4x2x2x5=80). I understand things to the point that if we use the ratio
p_ki/p*_ki for each stratum as the dependent variable we can not feed
such grouped data into the program (Stata or R, whichever), because if
nothing else the GLM routine should somehow be told what l'_ki
corresponds to each case. Reading help on Stata's cloglog function I saw
the weighting option, which includes weighf to weight by frequency, so
I imagine we might construct two cases for each stratum, one with
y=1 and weight=l'_ki (p_ki/p*_ki) and one with y=0 and
weight=l'_ki (1-(p_ki/p*_ki)), and then use cloglog with dependent
variable y, independents group, gender, period of diagnosis and
follow-up year, and weightf by weight option. Is this correct???
Or should we now produce l'_ki (p_ki/p*_ki) cases with y=1 and
l'_ki (1-(p_ki/p*_ki)) cases with y=0, all covariates being equal,
from each stratum??? And then run which command???

Thanking deeply for any help,

Assist. Gaj Vidmar, Biostatistician
University of Ljubljana, Faculty of Medicine
Institute of Biomedical Informatics
http://www.mf.uni-lj.si/ibmi-english/index2.html







Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© Drexel University 1994-2014. All Rights Reserved.
The Math Forum is a research and educational enterprise of the Drexel University School of Education.