Date: Feb 19, 2013 6:52 PM
Author: Albert Retey
Subject: Re: Obtaining Random LIne from A file
> Thank you so much for the reply. My files are 50MB each, I don't
> think ReadList would work for my purposes, it would be too slow. I
> am actually doing an MCMC simulation, doing (hopefully if I have
> time) millions of iterations and in each one I need to read a random
> line from one of many files, thus requiring this reading to happen as
> quickly as possible. Any suggestions? Each line is pretty much the
> same length.
For that specific use case I see two possible ways to proceed:
1) pick a random position in the file, search line break before that
position and linebreak after that position, read that line. The
drawbacks of that approach is that the probability to pick a line is
proportional to its length and that you have to search line start and
end for each line you read. If all lines have the same length there is
no problem, if they are different it depends on whether your MC
simulation will suffer from a non-uniform distribution or not...
2) build an index of line-starts. Instead of reading the full content
you could just scan through the file once, searching the positions where
new lines start and build a list of file position with length equal to
the number of lines. Then you'd have to choose one of these position
(e.g. with RandomChoice), seek that position and read one line. I guess
that this is probably the best alternative. To speed up the building of
the index you might want to read the file in chunks of several lines
instead of line by line. You could have a look at e.g. this
for an example on how to do that.
Whichever way you choose, you will need the low level functions for file
access: search for OpenRead, StreamPosition, SetStreamPosition, Read,
ReadList and Close in the documentation for details about these...