Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: help with NaN removal from dataset class using group mean value
Replies: 3   Last Post: Jul 15, 2013 8:22 AM

 Messages: [ Previous | Next ]
 Shatrughan Posts: 18 Registered: 3/17/11
Re: help with NaN removal from dataset class using group mean value
Posted: Feb 2, 2012 6:19 PM

Long Live Peter !!

This is exactly I was looking for... THANKS A TON !!!
MATLAB Center is just awesome....

Keep Rocking !!
Shatrughan

Peter Perkins <Peter.Remove.Perkins.This@mathworks.com> wrote in message <jgf3kl\$mgq\$1@newscl01ah.mathworks.com>...
> On 2/2/2012 3:13 PM, Shatrughan wrote:
> > Hello,
> > I have question for removing NaN with nanmean of the "group" value. I am
> > working with dataset class (since it is alrge amount of data).

>
> Shatrughan, this turns out to very easy, though perhaps not obvious.
>

> >> d
> d =
> grp var
> 'A' 5
> 'A' 5
> 'A' 5
> 'A' 5
> 'A' NaN
> 'A' 5
> 'A' NaN
> 'A' 5
> 'A' 5
> 'A' 5
> 'B' 6
> 'B' 6
> 'B' NaN
> 'B' 6
> 'B' 6
> 'C' 7
> 'C' 7
> 'C' NaN
> 'C' 7
>
> Note that here I've set things up so that the "grp" variable in d is a
> cell array of strings, not a column of chars. If you have a column of
> chars, then just do this:
>
> d.grp = cellstr(d.grp);
>
> Converting to a nominal or ordinal variable would also work.
>
> Now call the dataset's grpstats method to compute the means of the
> variable "var" within each group, automatically leaving out the NaNs:
>

> >> gmeans = grpstats(d,'grp')
> gmeans =
> grp GroupCount mean_var
> A 'A' 10 5
> B 'B' 5 6
> C 'C' 4 7
>
> "mean_var" is kind of a funny name, but "var" was what you called that
> variable, and it was autogenerated as "the mean of the variable 'var'".
>
> Now all you have to do is find the NaNs in "var", and replace them with
> the group means:
>

> >> nans = isnan(d.var);
> >> grps = d.grp(nans)

> grps =
> 'A'
> 'A'
> 'B'
> 'C'

> >> d.var(nans) = gmeans.mean_var(grps)
> d =
> grp var
> 'A' 5
> 'A' 5
> 'A' 5
> 'A' 5
> 'A' 5
> 'A' 5
> 'A' 5
> 'A' 5
> 'A' 5
> 'A' 5
> 'B' 6
> 'B' 6
> 'B' 6
> 'B' 6
> 'B' 6
> 'C' 7
> 'C' 7
> 'C' 7
> 'C' 7
>
> That last step _looks_ like it couldn't possibly work. But with a
> dataset array it does, and here's why:
>
> grps is a subset of d.var, and so it is a cell array of strings. gmeans
> is a dataset array with observation names 'A', 'B', and 'C' -- you can
> see those at the left margin of the display. And so
> gmeans.mean_var(grps) indexes into the "mean_var" variable by matching
> up what's in grps with the observation names that "mean_var" inherits
> from the dataset array it lives in.
>
> Hope this helps.

Date Subject Author
2/2/12 Shatrughan
2/2/12 Peter Perkins
2/2/12 Shatrughan
7/15/13 Pedram Davoudi