Hello. I am working on my MS thesis, and a committee member and I are disagreeing about how I have selected the best model. Based on my readings and the LatentGold LCA manual, I think this is reasonable. I would appreciate anyone agreeing or disagreeing with this method and letting me know why (time is of the essence for me to complete my thesis). Thanks! Danielle
Quick overview of project: I sought to determine if the symptoms of individuals experiencing a major depressive episode (MDE), with a diagnosis of recurrent unipolar or bipolar I disorder, would cluster into meaningful and clinically interpretable subgroups of depression. Outpatients (N = 293; Unipolar n = 195, Bipolar n = 98) who met DSM-IV diagnostic criteria for a MDE were assessed at intake to treatment with
the 25-item Hamilton Rating Scale for Depression 25-item (HRS-D-25). I
applied latent class analysis (LCA) to the 25 HRS-D items in the unipolar depressed sample and then the unipolar sample combined with the bipolar I depressed sample.
Data analysis plan: I entered the HRS-D-25 items as coded, ordinal variables (rather than dichotomized absent/present). I applied LCA to a
195row x 25-column data matrix and 293row x 25-column data matrix with Latent Gold 4.0. For each model, gender was entered as an active covariate.
I first fit a one-class solution followed by a two-class solution, and so on, until I reached a seven-class solution, or the "best" solution. I stopped the analyses at seven classes because I felt that more than seven classes would provide little clinical or practical significance. I defined the best solution by the following criteria: (1) the solution fit the data significantly better than the previous solution and (2) the estimated parsimony of the model was associated with a significant p-value.
First, I estimated the parsimony of each possible model solution, using
the bootstrap p-value test statistic. Since I have a small sample size
and the chi-squared estimation may be problematic, I used the bootstrap
p-value. Then, I sought to determine if each level of a restricted model (a solution with a greater number of classes) was a significant improvement of the less restricted model (a solution with a smaller number of classes). I thought that if the bootstrap -2LL Diff test statistic has a significant p-value, then the less restricted model is more parsimonious than the more restricted model. For instance, if the
bootstrap -2LL Diff statistic is not significant, then a 3-class solution may be accepted as a good fit and as a more parsimonious model
than a 2-class solution. Conversely, if the bootstrap -2LL Diff p-value is significant, then a 2-class solution should be accepted as the more parsimonious model. Next, I evaluated how well the selected model permitted predictions of class membership based on the observed indicator variables with entropy R-squared.
Results: Based on the Bootstrap -2LL Diff statistic, a six-class solution was the most parsimonious solution and was able to adequately predict class membership (entropy R2 = 0.85). The Bootstrap -2LL Diff statistic between a 6- and 7-class solution was significant. This
suggests that the less restricted 6-class model was preferrable to the more restricted 8-class model. In the combined unipolar-bipolar sample, a 5-class solution was the most parsimonious solution and was able to adequately predict class membership based on the HRS-D-25 items