|
|
Re: Obtaining Random LIne from A file
Posted:
Feb 18, 2013 5:59 AM
|
|
Am 17.02.2013 10:08, schrieb David Bailey: > On 16/02/2013 06:07, Ramiro Barrantes wrote: >> Hello, >> >> I would like to get a random line from a file, I know this can be done >> with Mathematica but I am playing with using sed to see if it goes >> faster, say I want to get line 1000 >> >> In mathematica it would be: >> >> <<"! sed -n p1000 filename.txt" >> >> However, I am trying to put the filename as a variable, say >> >> filename="hugefile.txt" >> >> cmd="! sed -n p1000 "<>filename >> <<cmd >> >> does not work. >> >> How can I do this? >> >> Lastly, I am getting a randomline using mathematica doing: >> >> getRandomLine[file_, n_] := >> Block[{i = RandomInteger[{1, n}], str = OpenRead[file], res}, >> Skip[str, "String", i]; >> res = Read[str, Expression]; >> Close[str]; >> res[[2]] >> ] >> >> However, it is very slow so I was going to try with sed.Any suggestions? >> >> Thanks in advance, >> Ramiro >> >> > I would stick with Mathematica to do this job! How big is the file > (number of lines and number of bytes)? If it will fit inside Mathematica > comfortable, I'd see how it works to read it all in as a list of strings > and pick the one you want: > > xx=ReadList["C:\\some file",String];//Timing > > Then you have an array of strings, and you can select what you want > directly. > > Remember, the basic problem with reading at an arbitrary position in a > text file, is that if the line lengths are not the same, any algorithm > has to read every line before the one you want!
if he just wants to get an arbitrary line that's not true: just choosing a position in the file at random and searching e.g. the previous and next linebreak would also result in picking a random line. Of course the probability of choosing longer lines would be larger than that for shorter lines, but it isn't clear from the question whether that would be a problem for what the OP tries to do...
> If you create this file, > you should consider packing the lines to make them all the same length - > then you could access what you want very efficiently (but with a little > more coding!)
... and slightly (?) higher memory requirements...
hth,
albert
|
|