Search All of the Math Forum:
Views expressed in these public forums are not endorsed by
Drexel University or The Math Forum.



Re: help with NaN removal from dataset class using group mean value
Posted:
Feb 2, 2012 6:19 PM


Long Live Peter !!
This is exactly I was looking for... THANKS A TON !!! MATLAB Center is just awesome....
Keep Rocking !! Shatrughan
Peter Perkins <Peter.Remove.Perkins.This@mathworks.com> wrote in message <jgf3kl$mgq$1@newscl01ah.mathworks.com>... > On 2/2/2012 3:13 PM, Shatrughan wrote: > > Hello, > > I have question for removing NaN with nanmean of the "group" value. I am > > working with dataset class (since it is alrge amount of data). > > Shatrughan, this turns out to very easy, though perhaps not obvious. > Start with these data: > > >> d > d = > grp var > 'A' 5 > 'A' 5 > 'A' 5 > 'A' 5 > 'A' NaN > 'A' 5 > 'A' NaN > 'A' 5 > 'A' 5 > 'A' 5 > 'B' 6 > 'B' 6 > 'B' NaN > 'B' 6 > 'B' 6 > 'C' 7 > 'C' 7 > 'C' NaN > 'C' 7 > > Note that here I've set things up so that the "grp" variable in d is a > cell array of strings, not a column of chars. If you have a column of > chars, then just do this: > > d.grp = cellstr(d.grp); > > Converting to a nominal or ordinal variable would also work. > > Now call the dataset's grpstats method to compute the means of the > variable "var" within each group, automatically leaving out the NaNs: > > >> gmeans = grpstats(d,'grp') > gmeans = > grp GroupCount mean_var > A 'A' 10 5 > B 'B' 5 6 > C 'C' 4 7 > > "mean_var" is kind of a funny name, but "var" was what you called that > variable, and it was autogenerated as "the mean of the variable 'var'". > > Now all you have to do is find the NaNs in "var", and replace them with > the group means: > > >> nans = isnan(d.var); > >> grps = d.grp(nans) > grps = > 'A' > 'A' > 'B' > 'C' > >> d.var(nans) = gmeans.mean_var(grps) > d = > grp var > 'A' 5 > 'A' 5 > 'A' 5 > 'A' 5 > 'A' 5 > 'A' 5 > 'A' 5 > 'A' 5 > 'A' 5 > 'A' 5 > 'B' 6 > 'B' 6 > 'B' 6 > 'B' 6 > 'B' 6 > 'C' 7 > 'C' 7 > 'C' 7 > 'C' 7 > > That last step _looks_ like it couldn't possibly work. But with a > dataset array it does, and here's why: > > grps is a subset of d.var, and so it is a cell array of strings. gmeans > is a dataset array with observation names 'A', 'B', and 'C'  you can > see those at the left margin of the display. And so > gmeans.mean_var(grps) indexes into the "mean_var" variable by matching > up what's in grps with the observation names that "mean_var" inherits > from the dataset array it lives in. > > Hope this helps.



