Drexel dragonThe Math ForumDonate to the Math Forum



Search All of the Math Forum:

Views expressed in these public forums are not endorsed by Drexel University or The Math Forum.


Math Forum » Discussions » Software » comp.soft-sys.math.mathematica

Topic: Obtaining Random LIne from A file
Replies: 9   Last Post: Feb 21, 2013 5:46 AM

Advanced Search

Back to Topic List Back to Topic List Jump to Tree View Jump to Tree View   Messages: [ Previous | Next ]
Albert Retey

Posts: 688
Registered: 7/15/08
Re: Obtaining Random LIne from A file
Posted: Feb 19, 2013 6:52 PM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply

Hi,

> Thank you so much for the reply. My files are 50MB each, I don't
> think ReadList would work for my purposes, it would be too slow. I
> am actually doing an MCMC simulation, doing (hopefully if I have
> time) millions of iterations and in each one I need to read a random
> line from one of many files, thus requiring this reading to happen as
> quickly as possible. Any suggestions? Each line is pretty much the
> same length.


For that specific use case I see two possible ways to proceed:

1) pick a random position in the file, search line break before that
position and linebreak after that position, read that line. The
drawbacks of that approach is that the probability to pick a line is
proportional to its length and that you have to search line start and
end for each line you read. If all lines have the same length there is
no problem, if they are different it depends on whether your MC
simulation will suffer from a non-uniform distribution or not...

2) build an index of line-starts. Instead of reading the full content
you could just scan through the file once, searching the positions where
new lines start and build a list of file position with length equal to
the number of lines. Then you'd have to choose one of these position
(e.g. with RandomChoice), seek that position and read one line. I guess
that this is probably the best alternative. To speed up the building of
the index you might want to read the file in chunks of several lines
instead of line by line. You could have a look at e.g. this
mathematica.stackexchange answer
http://mathematica.stackexchange.com/a/15216/169
for an example on how to do that.

Whichever way you choose, you will need the low level functions for file
access: search for OpenRead, StreamPosition, SetStreamPosition, Read,
ReadList and Close in the documentation for details about these...

hth,

albert







Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© Drexel University 1994-2014. All Rights Reserved.
The Math Forum is a research and educational enterprise of the Drexel University School of Education.