Drexel dragonThe Math ForumDonate to the Math Forum



Search All of the Math Forum:

Views expressed in these public forums are not endorsed by Drexel University or The Math Forum.


Math Forum » Discussions » Software » comp.soft-sys.matlab

Topic: Downsampling very large text file
Replies: 7   Last Post: Apr 18, 2013 9:02 AM

Advanced Search

Back to Topic List Back to Topic List Jump to Tree View Jump to Tree View   Messages: [ Previous | Next ]
Derek Goring

Posts: 3,919
Registered: 12/7/04
Re: Downsampling very large text file
Posted: Apr 10, 2013 4:26 PM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply

On Thursday, April 11, 2013 8:13:32 AM UTC+12, dpb wrote:
> On 4/10/2013 1:13 PM, bram wrote:
>

> > Hi all,
>
> >
>
> > For an experiment I measured vast amounts of data because of high sample
>
> > rate. Ive taken out important parts, now I need to downsample the data
>
> > to get a general trend.
>
> >
>
> > The data is in a textfile, it had 8 columns and so many rows the text
>
> > files are around 50GB. columns are seperated by tabs. first 84 rows are
>
> > sensor properties.
>
> >
>
> > Im trying to read 200 000 rows and average them into a single value and
>
> > write them to a new array, then take the next ones....and so on.
>
> >
>
> > Can you guys help me along? Ive used quite some matlab but never
>
> > analysed such data files.
>
> >
>
> > Ive made a start with textscan, seems to be working nicely, but im
>
> > having trouble making proper for-loops in combination with the ~foef test.
>
>
>
> Should be piece 'o cake...here's a sample w/ a very short file but the
>
> ideas, the same...
>
>
>
> NtoAvg = 4; % how many records for each average
>
> m=zeros(SomeLargeNumber,YourNoColumns);
>
> fmt=repmat('%f',1,nColumnsinYourFile);
>
> i=0;
>
> fid=fopen('yourfile.dat','rt');
>
> while ~feof(fid)
>
> C=textscan(fid,fmt,NtoAvg,'collectoutput',1,'delimiter','\t');
>
> i=i+1;
>
> m(i,:)=mean(C{:})
>
> end
>
> fid=fclose(fid);
>
>
>

> >>
>
>
>
> Note that the above works for the last set being less than your NtoAVg;
>
> will just average over a smaller number. I'm presuming this won't matter.
>
>
>
> Obviously you'll will need the logic to check that if need more room
>
> grow the array, etc., but the ideas are as simple as the above.
>
>
>
> NB that feof() isn't [yet] T after the last record is read if it turns
>
> out that the number of records in the file were to be an exact multiple
>
> of NtoAvg because feof() isn't called until the next read. In that (I
>
> think unlikely) case you'll get a record of NaN because the cell array
>
> will be empty. You can use a test on isempty() to stop that from
>
> happening but it probably would be noticeable in runtime I didn't
>
> include it above. If wanted, it would look sotoo...
>
>
>
> ...
>
> C=textscan(fid,'%f %f',NtoAvg,'collectoutput',1);
>
> if ~isempty(C{:})
>
> i=i+1;
>
> m(i,:) = mean(C{:});
>
> end
>
> ...
>
>
>
> --


I'd just add to dpb's excellent suggestions that you could determine the number of lines in your file first by running this little perl routine:
while (<>) {};
print $.,"\n";

Put these two lines in a file called, say, countlines.pl then:
nlines=perl('countlines.pl','yourfile.dat');
With this info, you can use a for loop instead of a while loop.




Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© Drexel University 1994-2014. All Rights Reserved.
The Math Forum is a research and educational enterprise of the Drexel University School of Education.