Drexel dragonThe Math ForumDonate to the Math Forum



Search All of the Math Forum:

Views expressed in these public forums are not endorsed by Drexel University or The Math Forum.


Math Forum » Discussions » Software » comp.soft-sys.matlab

Topic: Downsampling very large text file
Replies: 7   Last Post: Apr 18, 2013 9:02 AM

Advanced Search

Back to Topic List Back to Topic List Jump to Tree View Jump to Tree View   Messages: [ Previous | Next ]
dpb

Posts: 8,231
Registered: 6/7/07
Re: Downsampling very large text file
Posted: Apr 10, 2013 4:42 PM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply

On 4/10/2013 3:26 PM, TideMan wrote:
...

> I'd just add to dpb's excellent suggestions that you could determine
> the number of lines in your file first by running this little perl
> routine:
> while (<>) {};
> print $.,"\n";
>
> Put these two lines in a file called, say, countlines.pl then:
> nlines=perl('countlines.pl','yourfile.dat');
> With this info, you can use a for loop instead of a while loop.


Yeah, running over floor(nlines/NtoAvg) will also eliminate the last
batch being smaller than the rest...

I was intending to add but forgot to OP that in the future if make files
of such size write them as unformatted stream records instead of
formatted if at all possible. Both writing and post-processing will go
_much_ noticeably faster (albeit w/ the reqm't of needing a viewer to
observe the data just at the command line, but realistically what can
you do visually w/ more than the first few records, anyway???).

I would, in fact, suggest that in all likelihood what the routine above
does is write the output to file in unformatted form and possibly also
while scanning thru the file for the averaging rewrite the raw data as
unformatted as well so if decide later to use different decimation
ratios can do so much quicker.

The only difference is instead of textscan and records, will be using
fread()/fwrite() and counting bytes/words instead. A secondary benefit
is that his 50 G file will probably be reduced by at least an order of
magnitude.

The only real gotcha' in this is that will need the header to be
encapsulated so that can retrieve it but that's simple enough just have
to remember to do it.

--







Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© Drexel University 1994-2014. All Rights Reserved.
The Math Forum is a research and educational enterprise of the Drexel University School of Education.