Search All of the Math Forum:
Views expressed in these public forums are not endorsed by
NCTM or The Math Forum.


Math Forum
»
Discussions
»
Software
»
comp.softsys.matlab
Notice: We are no longer accepting new posts, but the forums will continue to be readable.
Topic:
help with NaN removal from dataset class using group mean value
Replies:
3
Last Post:
Jul 15, 2013 8:22 AM




Re: help with NaN removal from dataset class using group mean value
Posted:
Feb 2, 2012 5:44 PM


On 2/2/2012 3:13 PM, Shatrughan wrote: > Hello, > I have question for removing NaN with nanmean of the "group" value. I am > working with dataset class (since it is alrge amount of data).
Shatrughan, this turns out to very easy, though perhaps not obvious. Start with these data:
>> d d = grp var 'A' 5 'A' 5 'A' 5 'A' 5 'A' NaN 'A' 5 'A' NaN 'A' 5 'A' 5 'A' 5 'B' 6 'B' 6 'B' NaN 'B' 6 'B' 6 'C' 7 'C' 7 'C' NaN 'C' 7
Note that here I've set things up so that the "grp" variable in d is a cell array of strings, not a column of chars. If you have a column of chars, then just do this:
d.grp = cellstr(d.grp);
Converting to a nominal or ordinal variable would also work.
Now call the dataset's grpstats method to compute the means of the variable "var" within each group, automatically leaving out the NaNs:
>> gmeans = grpstats(d,'grp') gmeans = grp GroupCount mean_var A 'A' 10 5 B 'B' 5 6 C 'C' 4 7
"mean_var" is kind of a funny name, but "var" was what you called that variable, and it was autogenerated as "the mean of the variable 'var'".
Now all you have to do is find the NaNs in "var", and replace them with the group means:
>> nans = isnan(d.var); >> grps = d.grp(nans) grps = 'A' 'A' 'B' 'C' >> d.var(nans) = gmeans.mean_var(grps) d = grp var 'A' 5 'A' 5 'A' 5 'A' 5 'A' 5 'A' 5 'A' 5 'A' 5 'A' 5 'A' 5 'B' 6 'B' 6 'B' 6 'B' 6 'B' 6 'C' 7 'C' 7 'C' 7 'C' 7
That last step _looks_ like it couldn't possibly work. But with a dataset array it does, and here's why:
grps is a subset of d.var, and so it is a cell array of strings. gmeans is a dataset array with observation names 'A', 'B', and 'C'  you can see those at the left margin of the display. And so gmeans.mean_var(grps) indexes into the "mean_var" variable by matching up what's in grps with the observation names that "mean_var" inherits from the dataset array it lives in.
Hope this helps.



