The Math Forum

Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Math Forum » Discussions » Software » comp.soft-sys.matlab

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: Downsampling very large text file
Replies: 7   Last Post: Apr 18, 2013 9:02 AM

Advanced Search

Back to Topic List Back to Topic List Jump to Tree View Jump to Tree View   Messages: [ Previous | Next ]

Posts: 4
Registered: 4/10/13
Re: Downsampling very large text file
Posted: Apr 15, 2013 3:35 AM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply

dpb <> wrote in message <kk4ir3$1fc$>...
> On 4/10/2013 3:26 PM, TideMan wrote:
> ...

> > I'd just add to dpb's excellent suggestions that you could determine
> > the number of lines in your file first by running this little perl
> > routine:
> > while (<>) {};
> > print $.,"\n";
> >
> > Put these two lines in a file called, say, then:
> > nlines=perl('','yourfile.dat');
> > With this info, you can use a for loop instead of a while loop.

> Yeah, running over floor(nlines/NtoAvg) will also eliminate the last
> batch being smaller than the rest...
> I was intending to add but forgot to OP that in the future if make files
> of such size write them as unformatted stream records instead of
> formatted if at all possible. Both writing and post-processing will go
> _much_ noticeably faster (albeit w/ the reqm't of needing a viewer to
> observe the data just at the command line, but realistically what can
> you do visually w/ more than the first few records, anyway???).
> I would, in fact, suggest that in all likelihood what the routine above
> does is write the output to file in unformatted form and possibly also
> while scanning thru the file for the averaging rewrite the raw data as
> unformatted as well so if decide later to use different decimation
> ratios can do so much quicker.
> The only difference is instead of textscan and records, will be using
> fread()/fwrite() and counting bytes/words instead. A secondary benefit
> is that his 50 G file will probably be reduced by at least an order of
> magnitude.
> The only real gotcha' in this is that will need the header to be
> encapsulated so that can retrieve it but that's simple enough just have
> to remember to do it.
> --

Thanks alot for the quick replies guys!

Ik will put to the scripts to the test this week. I will update on their functionality and performance ;)

Cheers bram

Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© The Math Forum at NCTM 1994-2018. All Rights Reserved.