Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: how to find unique rows and calculate the average.
Replies: 9   Last Post: Apr 12, 2013 6:31 PM

 Messages: [ Previous | Next ]
 dpb Posts: 9,850 Registered: 6/7/07
Re: how to find unique rows and calculate the average.
Posted: Apr 9, 2013 10:54 PM

On 4/9/2013 4:32 PM, Alan Hitch wrote:
> I have a matrix (2563x19) with text and numeric characters. The first
>
> e.g. [Site Marshtype Date Species Throw Season Year UniqueID Total Pass1
> Pass2 Pass3 Pass4 Pass5 Pass6 Pass7 Pass8 Pass9 Pass10
> 3BAB Brackish 27-Oct-04 PAPU 2 Fall 2004 3BAB38287PAPU 35 25 9 0 1 0 0 0
> 0 0 0
> 3BAB Brackish 27-Oct-04 POLA 2 Fall 2004 3BAB38287POLA 79 46 19 9 5 0 0
> 0 0 0 0
> 3BAB Brackish 27-Oct-04 GAAF 2 Fall 2004 3BAB38287GAAF 64 45 13 6 0 0 0
> 0 0 0 0
> 3BAB Brackish 27-Oct-04 CYVA 2 Fall 2004 3BAB38287CYVA 11 2 4 2 3 0 0 0
> 0 0 0
> 3BAB Brackish 27-Oct-04 LUPA 2 Fall 2004 3BAB38287LUPA 20 11 2 4 2 1 0 0
> 0 0 0]
>
> I would like to get average values (Total) of particular rows that have
> identical column values (column of interest is UniqueID) and create a
> new matrix with the calculated mean and associated UniqueID. For each
> UniqueID there are between 2 and 3 values.
>
> Like this:
>
> [3BAB38287PAPU 234
> 3BAB38287POLA 456
> 3BAB38287GAAF 35]
>
> Could someone help me with this? I have tried a lot of things and
> nothing seems to be working.

Well, not seeing what you tried can't try to fix it, but one
straightforward approach...

>> type 'alan.dat'

3BAB Brackish 27-Oct-04 PAPU 2 Fall 2004 3BAB38287PAPU 35 25 9 0 1 0 0
0 0 0 0
3BAB Brackish 27-Oct-04 POLA 2 Fall 2004 3BAB38287POLA 79 46 19 9 5 0 0
0 0 0 0
3BAB Brackish 27-Oct-04 GAAF 2 Fall 2004 3BAB38287GAAF 64 45 13 6 0 0 0
0 0 0 0
3BAB Brackish 27-Oct-04 CYVA 2 Fall 2004 3BAB38287CYVA 11 2 4 2 3 0 0
0 0 0 0
3BAB Brackish 27-Oct-04 LUPA 2 Fall 2004 3BAB38287LUPA 20 11 2 4 2 1 0
0 0 0 0

>> fid=fopen('alan.dat','rt');
>> C=textscan(fid,[repmat('%*s',1,7) '%s'

repmat('%d',1,11)],'collectoutput',1);
>> fid=fclose(fid);
>> u=unique(C{1});
>> for i=1:length(u),m=mean(C{2}(strcmp(C{1},u(i)),1));disp([u(i) m]),end

'3BAB38287CYVA' [11]
'3BAB38287GAAF' [64]
'3BAB38287LUPA' [20]
'3BAB38287PAPU' [35]
'3BAB38287POLA' [79]
>>

Note I didn't worry about the header and for simplicity of demo just
threw away all the data columns up to the uniqeID one. Then used
'collectoutput' to put rest in a single cell.

Then we found the unique IDs that are in the file (of which there is
only one of each in the sample and ran a loop over it to find the mean
by using a logical vector returned by strcmp as the row index into the
data. I chose the first column arbitrarily.

Salt to suit for your case, of course...

--

Date Subject Author
4/9/13 Guest
4/9/13 dpb
4/10/13 Alan Hitch
4/10/13 dpb
4/10/13 dpb
4/11/13 Alan Hitch
4/11/13 dpb
4/12/13 dpb
4/10/13 Peter Perkins
4/11/13 Bruno Luong