Drexel dragonThe Math ForumDonate to the Math Forum



Search All of the Math Forum:

Views expressed in these public forums are not endorsed by Drexel University or The Math Forum.


Math Forum » Discussions » Software » comp.soft-sys.matlab

Topic: Downsampling very large text file
Replies: 7   Last Post: Apr 18, 2013 9:02 AM

Advanced Search

Back to Topic List Back to Topic List Jump to Tree View Jump to Tree View   Messages: [ Previous | Next ]
bram

Posts: 4
Registered: 4/10/13
Re: Downsampling very large text file
Posted: Apr 15, 2013 3:35 AM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply

dpb <none@non.net> wrote in message <kk4ir3$1fc$1@speranza.aioe.org>...
> On 4/10/2013 3:26 PM, TideMan wrote:
> ...
>

> > I'd just add to dpb's excellent suggestions that you could determine
> > the number of lines in your file first by running this little perl
> > routine:
> > while (<>) {};
> > print $.,"\n";
> >
> > Put these two lines in a file called, say, countlines.pl then:
> > nlines=perl('countlines.pl','yourfile.dat');
> > With this info, you can use a for loop instead of a while loop.

>
> Yeah, running over floor(nlines/NtoAvg) will also eliminate the last
> batch being smaller than the rest...
>
> I was intending to add but forgot to OP that in the future if make files
> of such size write them as unformatted stream records instead of
> formatted if at all possible. Both writing and post-processing will go
> _much_ noticeably faster (albeit w/ the reqm't of needing a viewer to
> observe the data just at the command line, but realistically what can
> you do visually w/ more than the first few records, anyway???).
>
> I would, in fact, suggest that in all likelihood what the routine above
> does is write the output to file in unformatted form and possibly also
> while scanning thru the file for the averaging rewrite the raw data as
> unformatted as well so if decide later to use different decimation
> ratios can do so much quicker.
>
> The only difference is instead of textscan and records, will be using
> fread()/fwrite() and counting bytes/words instead. A secondary benefit
> is that his 50 G file will probably be reduced by at least an order of
> magnitude.
>
> The only real gotcha' in this is that will need the header to be
> encapsulated so that can retrieve it but that's simple enough just have
> to remember to do it.
>
> --
>
>


Thanks alot for the quick replies guys!

Ik will put to the scripts to the test this week. I will update on their functionality and performance ;)

Cheers bram



Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© Drexel University 1994-2014. All Rights Reserved.
The Math Forum is a research and educational enterprise of the Drexel University School of Education.