|
|
Re: Obtaining Random LIne from A file
Posted:
Feb 19, 2013 6:51 PM
|
|
If you plan to do this millions of times, then your only hope is to load the file(s) into memory, e.g. with ReadList. If you do a disk access for each line, you will be waiting for quite a while. Memory is cheap.
Kevin
On 2/19/2013 1:09 AM, Ramiro wrote: > Thank you so much for the reply. My files are 50MB each, I don't think ReadList would work for my purposes, it would be too slow. I am actually doing an MCMC simulation, doing (hopefully if I have time) millions of iterations and in each one I need to read a random line from one of many files, thus requiring this reading to happen as quickly as possible. Any suggestions? Each line is pretty much the same length. > > Thanks, > Ramiro > > On Sunday, February 17, 2013 4:08:27 AM UTC-5, David Bailey wrote: >> On 16/02/2013 06:07, Ramiro Barrantes wrote: >> >>> Hello, >> >>> >> >>> I would like to get a random line from a file, I know this can be done >> >>> with Mathematica but I am playing with using sed to see if it goes >> >>> faster, say I want to get line 1000 >> >>> >> >>> In mathematica it would be: >> >>> >> >>> <<"! sed -n p1000 filename.txt" >> >>> >> >>> However, I am trying to put the filename as a variable, say >> >>> >> >>> filename="hugefile.txt" >> >>> >> >>> cmd="! sed -n p1000 "<>filename >> >>> <<cmd >> >>> >> >>> does not work. >> >>> >> >>> How can I do this? >> >>> >> >>> Lastly, I am getting a randomline using mathematica doing: >> >>> >> >>> getRandomLine[file_, n_] := >> >>> Block[{i = RandomInteger[{1, n}], str = OpenRead[file], res}, >> >>> Skip[str, "String", i]; >> >>> res = Read[str, Expression]; >> >>> Close[str]; >> >>> res[[2]] >> >>> ] >> >>> >> >>> However, it is very slow so I was going to try with sed.Any suggestions? >> >>> >> >>> Thanks in advance, >> >>> Ramiro >> >>> >> >>> >> >> I would stick with Mathematica to do this job! How big is the file >> >> (number of lines and number of bytes)? If it will fit inside Mathematica >> >> comfortable, I'd see how it works to read it all in as a list of strings >> >> and pick the one you want: >> >> >> >> xx=ReadList["C:\\some file",String];//Timing >> >> >> >> Then you have an array of strings, and you can select what you want >> >> directly. >> >> >> >> Remember, the basic problem with reading at an arbitrary position in a >> >> text file, is that if the line lengths are not the same, any algorithm >> >> has to read every line before the one you want! If you create this file, >> >> you should consider packing the lines to make them all the same length - >> >> then you could access what you want very efficiently (but with a little >> >> more coding!) >> >> >> >> David Bailey >> >> http://www.dbaileyconsultancy.co.uk > >
|
|