The Math Forum

Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Math Forum » Discussions » Inactive » comp.soft-sys.math.mathematica

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: Obtaining Random LIne from A file
Replies: 9   Last Post: Feb 21, 2013 5:46 AM

Advanced Search

Back to Topic List Back to Topic List Jump to Tree View Jump to Tree View   Messages: [ Previous | Next ]
Albert Retey

Posts: 688
Registered: 7/15/08
Re: Obtaining Random LIne from A file
Posted: Feb 19, 2013 6:52 PM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply


> Thank you so much for the reply. My files are 50MB each, I don't
> think ReadList would work for my purposes, it would be too slow. I
> am actually doing an MCMC simulation, doing (hopefully if I have
> time) millions of iterations and in each one I need to read a random
> line from one of many files, thus requiring this reading to happen as
> quickly as possible. Any suggestions? Each line is pretty much the
> same length.

For that specific use case I see two possible ways to proceed:

1) pick a random position in the file, search line break before that
position and linebreak after that position, read that line. The
drawbacks of that approach is that the probability to pick a line is
proportional to its length and that you have to search line start and
end for each line you read. If all lines have the same length there is
no problem, if they are different it depends on whether your MC
simulation will suffer from a non-uniform distribution or not...

2) build an index of line-starts. Instead of reading the full content
you could just scan through the file once, searching the positions where
new lines start and build a list of file position with length equal to
the number of lines. Then you'd have to choose one of these position
(e.g. with RandomChoice), seek that position and read one line. I guess
that this is probably the best alternative. To speed up the building of
the index you might want to read the file in chunks of several lines
instead of line by line. You could have a look at e.g. this
mathematica.stackexchange answer
for an example on how to do that.

Whichever way you choose, you will need the low level functions for file
access: search for OpenRead, StreamPosition, SetStreamPosition, Read,
ReadList and Close in the documentation for details about these...



Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© The Math Forum at NCTM 1994-2018. All Rights Reserved.