Date: Feb 18, 2013 5:59 AM
Author: Albert Retey
Subject: Re: Obtaining Random LIne from A file
Am 17.02.2013 10:08, schrieb David Bailey:

> On 16/02/2013 06:07, Ramiro Barrantes wrote:

>> Hello,

>>

>> I would like to get a random line from a file, I know this can be done

>> with Mathematica but I am playing with using sed to see if it goes

>> faster, say I want to get line 1000

>>

>> In mathematica it would be:

>>

>> <<"! sed -n p1000 filename.txt"

>>

>> However, I am trying to put the filename as a variable, say

>>

>> filename="hugefile.txt"

>>

>> cmd="! sed -n p1000 "<>filename

>> <<cmd

>>

>> does not work.

>>

>> How can I do this?

>>

>> Lastly, I am getting a randomline using mathematica doing:

>>

>> getRandomLine[file_, n_] :=

>> Block[{i = RandomInteger[{1, n}], str = OpenRead[file], res},

>> Skip[str, "String", i];

>> res = Read[str, Expression];

>> Close[str];

>> res[[2]]

>> ]

>>

>> However, it is very slow so I was going to try with sed.Any suggestions?

>>

>> Thanks in advance,

>> Ramiro

>>

>>

> I would stick with Mathematica to do this job! How big is the file

> (number of lines and number of bytes)? If it will fit inside Mathematica

> comfortable, I'd see how it works to read it all in as a list of strings

> and pick the one you want:

>

> xx=ReadList["C:\\some file",String];//Timing

>

> Then you have an array of strings, and you can select what you want

> directly.

>

> Remember, the basic problem with reading at an arbitrary position in a

> text file, is that if the line lengths are not the same, any algorithm

> has to read every line before the one you want!

if he just wants to get an arbitrary line that's not true: just choosing

a position in the file at random and searching e.g. the previous and

next linebreak would also result in picking a random line. Of course the

probability of choosing longer lines would be larger than that for

shorter lines, but it isn't clear from the question whether that would

be a problem for what the OP tries to do...

> If you create this file,

> you should consider packing the lines to make them all the same length -

> then you could access what you want very efficiently (but with a little

> more coding!)

... and slightly (?) higher memory requirements...

hth,

albert