Drexel dragonThe Math ForumDonate to the Math Forum



Search All of the Math Forum:

Views expressed in these public forums are not endorsed by Drexel University or The Math Forum.


Math Forum » Discussions » Software » comp.soft-sys.matlab

Topic: Issues with parsing huge file
Replies: 0  

Advanced Search

Back to Topic List Back to Topic List  
Patrick Diviacco

Posts: 23
Registered: 9/15/09
Issues with parsing huge file
Posted: Mar 28, 2011 12:12 PM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply

I'm parsing a document and writing to disk pairs such as these ones:
0 vs 1, true
0 vs 2, false
0 vs 3, true
1 vs 2, true
1 vs 3, false
..
and so on.

Successively i'm balancing the trues and falses rows for each instance, by removing random lines (lines with true value if they exceed, and viceversa) and I end up with a file such as this one:
0 vs 1 true
0 vs 2 false
1 vs 2 true
1 vs 3 true
1 vs 4 false
1 vs 5 false

The falses are usually much much more than trues, so in the previous example, I could keep only 1 false for isntance 0, and only 2 falses for instance 1.

I'm doing this process in 2 steps, before parsing and then balancing.


Now, my issue is that the unbalanced file is too big: more than 1GB, and most of its rows are going to be removed by the balancing step.

My question is: can I balance the rows while parsing ?

My guess is no, because I don't know which items are arriving and I can't delete any row until when all rows for a specific instance have been discovered.

I hope it is clear.
thanks




Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© Drexel University 1994-2014. All Rights Reserved.
The Math Forum is a research and educational enterprise of the Drexel University School of Education.