Drexel dragonThe Math ForumDonate to the Math Forum



Search All of the Math Forum:

Views expressed in these public forums are not endorsed by Drexel University or The Math Forum.


Math Forum » Discussions » Software » comp.soft-sys.matlab

Topic: help with NaN removal from dataset class using group mean value
Replies: 3   Last Post: Jul 15, 2013 8:22 AM

Advanced Search

Back to Topic List Back to Topic List Jump to Tree View Jump to Tree View   Messages: [ Previous | Next ]
Shatrughan

Posts: 18
Registered: 3/17/11
Re: help with NaN removal from dataset class using group mean value
Posted: Feb 2, 2012 6:19 PM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply

Long Live Peter !!

This is exactly I was looking for... THANKS A TON !!!
MATLAB Center is just awesome....

Keep Rocking !!
Shatrughan


Peter Perkins <Peter.Remove.Perkins.This@mathworks.com> wrote in message <jgf3kl$mgq$1@newscl01ah.mathworks.com>...
> On 2/2/2012 3:13 PM, Shatrughan wrote:
> > Hello,
> > I have question for removing NaN with nanmean of the "group" value. I am
> > working with dataset class (since it is alrge amount of data).

>
> Shatrughan, this turns out to very easy, though perhaps not obvious.
> Start with these data:
>

> >> d
> d =
> grp var
> 'A' 5
> 'A' 5
> 'A' 5
> 'A' 5
> 'A' NaN
> 'A' 5
> 'A' NaN
> 'A' 5
> 'A' 5
> 'A' 5
> 'B' 6
> 'B' 6
> 'B' NaN
> 'B' 6
> 'B' 6
> 'C' 7
> 'C' 7
> 'C' NaN
> 'C' 7
>
> Note that here I've set things up so that the "grp" variable in d is a
> cell array of strings, not a column of chars. If you have a column of
> chars, then just do this:
>
> d.grp = cellstr(d.grp);
>
> Converting to a nominal or ordinal variable would also work.
>
> Now call the dataset's grpstats method to compute the means of the
> variable "var" within each group, automatically leaving out the NaNs:
>

> >> gmeans = grpstats(d,'grp')
> gmeans =
> grp GroupCount mean_var
> A 'A' 10 5
> B 'B' 5 6
> C 'C' 4 7
>
> "mean_var" is kind of a funny name, but "var" was what you called that
> variable, and it was autogenerated as "the mean of the variable 'var'".
>
> Now all you have to do is find the NaNs in "var", and replace them with
> the group means:
>

> >> nans = isnan(d.var);
> >> grps = d.grp(nans)

> grps =
> 'A'
> 'A'
> 'B'
> 'C'

> >> d.var(nans) = gmeans.mean_var(grps)
> d =
> grp var
> 'A' 5
> 'A' 5
> 'A' 5
> 'A' 5
> 'A' 5
> 'A' 5
> 'A' 5
> 'A' 5
> 'A' 5
> 'A' 5
> 'B' 6
> 'B' 6
> 'B' 6
> 'B' 6
> 'B' 6
> 'C' 7
> 'C' 7
> 'C' 7
> 'C' 7
>
> That last step _looks_ like it couldn't possibly work. But with a
> dataset array it does, and here's why:
>
> grps is a subset of d.var, and so it is a cell array of strings. gmeans
> is a dataset array with observation names 'A', 'B', and 'C' -- you can
> see those at the left margin of the display. And so
> gmeans.mean_var(grps) indexes into the "mean_var" variable by matching
> up what's in grps with the observation names that "mean_var" inherits
> from the dataset array it lives in.
>
> Hope this helps.




Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© Drexel University 1994-2014. All Rights Reserved.
The Math Forum is a research and educational enterprise of the Drexel University School of Education.