The Math Forum

Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Math Forum » Discussions » Inactive » comp.soft-sys.math.mathematica

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: Obtaining Random LIne from A file
Replies: 9   Last Post: Feb 21, 2013 5:46 AM

Advanced Search

Back to Topic List Back to Topic List Jump to Tree View Jump to Tree View   Messages: [ Previous | Next ]
David Bailey

Posts: 714
Registered: 11/7/08
Re: Obtaining Random LIne from A file
Posted: Feb 19, 2013 6:52 PM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply

On 19/02/2013 06:09, Ramiro wrote:
> Thank you so much for the reply. My files are 50MB each, I don't think ReadList would work for my purposes, it would be too slow. I am actually doing an MCMC simulation, doing (hopefully if I have time) millions of iterations and in each one I need to read a random line from one of many files, thus requiring this reading to happen as quickly as possible. Any suggestions? Each line is pretty much the same length.
> Thanks,
> Ramiro

OK - let's establish two points:

1) Are the records in the files of a fixed length?

2) When you say you want an 'arbitrary line' I am assuming that you
calculate a number N, and when want the N'th line of the file. If you
really don't care which line you choose, use Ramiro's method (above).

If your files are not guaranteed to have equal length records, there is
obviously a problem, as I explained before, because you have to read all
N-1 lines to establish which is the N'th. One option therefore, might be
to pre-process your files to make fixed length records by padding with

Once you have fixed record length files, you can open them with
BinaryFormat->True and use SetStreamPosition to set the stream to the
position in bytes where your record starts, and read the relevant number
of bytes. Unless you are using extended characters, you could convert
these to characters with FromCharacterCode.

This should be VERY fast, because the cost of each access is not
proportional to the size of the file (once all the files have been

If the records are variable length but contain some identification such
as a line number, another option would be to pull out a line as Ramiro
suggested, but then use a binary chop procedure to zero in on the line
of interest.

Hint: You may want to look at the processed file with a hex editor, to
make sure the record length is as you expect - remember Windows uses 2
characters per end of line!

David Bailey

Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© The Math Forum at NCTM 1994-2018. All Rights Reserved.