Date: Feb 19, 2013 6:51 PM
Author: Kevin J. McCann
Subject: Re: Obtaining Random LIne from A file

If you plan to do this millions of times, then your only hope is to load 
the file(s) into memory, e.g. with ReadList. If you do a disk access for
each line, you will be waiting for quite a while. Memory is cheap.

Kevin

On 2/19/2013 1:09 AM, Ramiro wrote:
> Thank you so much for the reply. My files are 50MB each, I don't think ReadList would work for my purposes, it would be too slow. I am actually doing an MCMC simulation, doing (hopefully if I have time) millions of iterations and in each one I need to read a random line from one of many files, thus requiring this reading to happen as quickly as possible. Any suggestions? Each line is pretty much the same length.
>
> Thanks,
> Ramiro
>
> On Sunday, February 17, 2013 4:08:27 AM UTC-5, David Bailey wrote:

>> On 16/02/2013 06:07, Ramiro Barrantes wrote:
>>

>>> Hello,
>>
>>>
>>
>>> I would like to get a random line from a file, I know this can be done
>>
>>> with Mathematica but I am playing with using sed to see if it goes
>>
>>> faster, say I want to get line 1000
>>
>>>
>>
>>> In mathematica it would be:
>>
>>>
>>
>>> <<"! sed -n p1000 filename.txt"
>>
>>>
>>
>>> However, I am trying to put the filename as a variable, say
>>
>>>
>>
>>> filename="hugefile.txt"
>>
>>>
>>
>>> cmd="! sed -n p1000 "<>filename
>>
>>> <<cmd
>>
>>>
>>
>>> does not work.
>>
>>>
>>
>>> How can I do this?
>>
>>>
>>
>>> Lastly, I am getting a randomline using mathematica doing:
>>
>>>
>>
>>> getRandomLine[file_, n_] :=
>>
>>> Block[{i = RandomInteger[{1, n}], str = OpenRead[file], res},
>>
>>> Skip[str, "String", i];
>>
>>> res = Read[str, Expression];
>>
>>> Close[str];
>>
>>> res[[2]]
>>
>>> ]
>>
>>>
>>
>>> However, it is very slow so I was going to try with sed.Any suggestions?
>>
>>>
>>
>>> Thanks in advance,
>>
>>> Ramiro
>>
>>>
>>
>>>
>>
>> I would stick with Mathematica to do this job! How big is the file
>>
>> (number of lines and number of bytes)? If it will fit inside Mathematica
>>
>> comfortable, I'd see how it works to read it all in as a list of strings
>>
>> and pick the one you want:
>>
>>
>>
>> xx=ReadList["C:\\some file",String];//Timing
>>
>>
>>
>> Then you have an array of strings, and you can select what you want
>>
>> directly.
>>
>>
>>
>> Remember, the basic problem with reading at an arbitrary position in a
>>
>> text file, is that if the line lengths are not the same, any algorithm
>>
>> has to read every line before the one you want! If you create this file,
>>
>> you should consider packing the lines to make them all the same length -
>>
>> then you could access what you want very efficiently (but with a little
>>
>> more coding!)
>>
>>
>>
>> David Bailey
>>
>> http://www.dbaileyconsultancy.co.uk

>
>