Drexel dragonThe Math ForumDonate to the Math Forum



Search All of the Math Forum:

Views expressed in these public forums are not endorsed by Drexel University or The Math Forum.


Math Forum » Discussions » Software » comp.soft-sys.matlab

Topic: help with NaN removal from dataset class using group mean value
Replies: 3   Last Post: Jul 15, 2013 8:22 AM

Advanced Search

Back to Topic List Back to Topic List Jump to Tree View Jump to Tree View   Messages: [ Previous | Next ]
Peter Perkins

Posts: 110
Registered: 8/12/11
Re: help with NaN removal from dataset class using group mean value
Posted: Feb 2, 2012 5:44 PM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply

On 2/2/2012 3:13 PM, Shatrughan wrote:
> Hello,
> I have question for removing NaN with nanmean of the "group" value. I am
> working with dataset class (since it is alrge amount of data).


Shatrughan, this turns out to very easy, though perhaps not obvious.
Start with these data:

>> d
d =
grp var
'A' 5
'A' 5
'A' 5
'A' 5
'A' NaN
'A' 5
'A' NaN
'A' 5
'A' 5
'A' 5
'B' 6
'B' 6
'B' NaN
'B' 6
'B' 6
'C' 7
'C' 7
'C' NaN
'C' 7

Note that here I've set things up so that the "grp" variable in d is a
cell array of strings, not a column of chars. If you have a column of
chars, then just do this:

d.grp = cellstr(d.grp);

Converting to a nominal or ordinal variable would also work.

Now call the dataset's grpstats method to compute the means of the
variable "var" within each group, automatically leaving out the NaNs:

>> gmeans = grpstats(d,'grp')
gmeans =
grp GroupCount mean_var
A 'A' 10 5
B 'B' 5 6
C 'C' 4 7

"mean_var" is kind of a funny name, but "var" was what you called that
variable, and it was autogenerated as "the mean of the variable 'var'".

Now all you have to do is find the NaNs in "var", and replace them with
the group means:

>> nans = isnan(d.var);
>> grps = d.grp(nans)

grps =
'A'
'A'
'B'
'C'
>> d.var(nans) = gmeans.mean_var(grps)
d =
grp var
'A' 5
'A' 5
'A' 5
'A' 5
'A' 5
'A' 5
'A' 5
'A' 5
'A' 5
'A' 5
'B' 6
'B' 6
'B' 6
'B' 6
'B' 6
'C' 7
'C' 7
'C' 7
'C' 7

That last step _looks_ like it couldn't possibly work. But with a
dataset array it does, and here's why:

grps is a subset of d.var, and so it is a cell array of strings. gmeans
is a dataset array with observation names 'A', 'B', and 'C' -- you can
see those at the left margin of the display. And so
gmeans.mean_var(grps) indexes into the "mean_var" variable by matching
up what's in grps with the observation names that "mean_var" inherits
from the dataset array it lives in.

Hope this helps.



Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© Drexel University 1994-2014. All Rights Reserved.
The Math Forum is a research and educational enterprise of the Drexel University School of Education.