Drexel dragonThe Math ForumDonate to the Math Forum



Search All of the Math Forum:

Views expressed in these public forums are not endorsed by Drexel University or The Math Forum.


Math Forum » Discussions » Software » comp.soft-sys.math.mathematica

Topic: Obtaining Random LIne from A file
Replies: 9   Last Post: Feb 21, 2013 5:46 AM

Advanced Search

Back to Topic List Back to Topic List Jump to Tree View Jump to Tree View   Messages: [ Previous | Next ]
Kevin J. McCann

Posts: 146
Registered: 12/7/04
Re: Obtaining Random LIne from A file
Posted: Feb 19, 2013 6:51 PM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply

If you plan to do this millions of times, then your only hope is to load
the file(s) into memory, e.g. with ReadList. If you do a disk access for
each line, you will be waiting for quite a while. Memory is cheap.

Kevin

On 2/19/2013 1:09 AM, Ramiro wrote:
> Thank you so much for the reply. My files are 50MB each, I don't think ReadList would work for my purposes, it would be too slow. I am actually doing an MCMC simulation, doing (hopefully if I have time) millions of iterations and in each one I need to read a random line from one of many files, thus requiring this reading to happen as quickly as possible. Any suggestions? Each line is pretty much the same length.
>
> Thanks,
> Ramiro
>
> On Sunday, February 17, 2013 4:08:27 AM UTC-5, David Bailey wrote:

>> On 16/02/2013 06:07, Ramiro Barrantes wrote:
>>

>>> Hello,
>>
>>>
>>
>>> I would like to get a random line from a file, I know this can be done
>>
>>> with Mathematica but I am playing with using sed to see if it goes
>>
>>> faster, say I want to get line 1000
>>
>>>
>>
>>> In mathematica it would be:
>>
>>>
>>
>>> <<"! sed -n p1000 filename.txt"
>>
>>>
>>
>>> However, I am trying to put the filename as a variable, say
>>
>>>
>>
>>> filename="hugefile.txt"
>>
>>>
>>
>>> cmd="! sed -n p1000 "<>filename
>>
>>> <<cmd
>>
>>>
>>
>>> does not work.
>>
>>>
>>
>>> How can I do this?
>>
>>>
>>
>>> Lastly, I am getting a randomline using mathematica doing:
>>
>>>
>>
>>> getRandomLine[file_, n_] :=
>>
>>> Block[{i = RandomInteger[{1, n}], str = OpenRead[file], res},
>>
>>> Skip[str, "String", i];
>>
>>> res = Read[str, Expression];
>>
>>> Close[str];
>>
>>> res[[2]]
>>
>>> ]
>>
>>>
>>
>>> However, it is very slow so I was going to try with sed.Any suggestions?
>>
>>>
>>
>>> Thanks in advance,
>>
>>> Ramiro
>>
>>>
>>
>>>
>>
>> I would stick with Mathematica to do this job! How big is the file
>>
>> (number of lines and number of bytes)? If it will fit inside Mathematica
>>
>> comfortable, I'd see how it works to read it all in as a list of strings
>>
>> and pick the one you want:
>>
>>
>>
>> xx=ReadList["C:\\some file",String];//Timing
>>
>>
>>
>> Then you have an array of strings, and you can select what you want
>>
>> directly.
>>
>>
>>
>> Remember, the basic problem with reading at an arbitrary position in a
>>
>> text file, is that if the line lengths are not the same, any algorithm
>>
>> has to read every line before the one you want! If you create this file,
>>
>> you should consider packing the lines to make them all the same length -
>>
>> then you could access what you want very efficiently (but with a little
>>
>> more coding!)
>>
>>
>>
>> David Bailey
>>
>> http://www.dbaileyconsultancy.co.uk

>
>





Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© Drexel University 1994-2014. All Rights Reserved.
The Math Forum is a research and educational enterprise of the Drexel University School of Education.