The Math Forum

Search All of the Math Forum:

Views expressed in these public forums are not endorsed by NCTM or The Math Forum.

Math Forum » Discussions » Software » comp.soft-sys.matlab

Notice: We are no longer accepting new posts, but the forums will continue to be readable.

Topic: Pull out specific numbers from unstructured text file
Replies: 13   Last Post: Aug 8, 2014 6:51 AM

Advanced Search

Back to Topic List Back to Topic List Jump to Tree View Jump to Tree View   Messages: [ Previous | Next ]

Posts: 9,850
Registered: 6/7/07
Re: Pull out specific numbers from unstructured text file
Posted: Feb 9, 2013 4:50 PM
  Click to see the message monospaced in plain text Plain Text   Click to reply to this topic Reply

On 2/9/2013 1:12 PM, Stan wrote:
> ^^^^^Okay I think I don't understand Lines 5,7,8 in your shortcut code:
> Line 1: > fid=fopen(....,'rt');
> Line 2: > l=' ';
> Line 3: > while 1
> Line 4: > l=fgetl(fid);
> Line 5: > if strfind(l,'Nmoves')>0,break,end
> Line 6: > end
> Line 7: > Nmoves=sscanf(l,'Nmoves=%d');
> Line 8: > Nrequired=fscanf(fid,'Nrequired=%d');
> Line 9: > fid=fclose(fid);
> My explanation is:
> while 1
> .
> .
> .
> end
> This is for lines 4-6 and this reads the file. If fgetl encounters the
> end-of-file indicator, it returns -1. So, as long as it returns 1 (i.e.
> anywhere before the end of the file), this statement is saying the while
> loop should perform the actions inside the if statement.

Not quite--the '1' in the WHILE construct is a constant and never
changes--only finding the string 'Nmoves=' somewhere in the file will
break the loop.

The condition in the WHILE would have to be something on the variable l
after returned by fgetl() if it were to have any effect. I chose to not
do that 'cuz I presumed you'd only use this on an appropriate file and
it would take reading the first line outside the loop or to otherwise
initialize the loop at the beginning. An alternate that would be a
little cleaner in case the string weren't to be in the file would be to
use while ~feof(fid) which would at least die gracefully on the EOF

> My explanation for line 5:
> If 'Nmoves' is found in the string l (where l is the contents of the
> file that have been read up to that point) then stop reading at that line.

Essentially--it breaks the loop having found the desired string and
therefore the first line to parse (on the assumption the string pattern
only exists for the line desired or at least it is the first
occurrence). At that point 'l' holds the content of the line read--the
strfind() simply scans the content for a match and returns.

> My explanations for lines 7 and 8:
> 7: Scan l for 'Nmoves=%d'.
> 8. Scan fid for 'Nrequired=%d'.

Well, depends on what you mean by "scan" -- they both do input
conversion matching the formatting string according to the rules
therefore. The rule for a literal string is to match that string in the
input and essentially ignore those matching characters. %d is to
convert a field as decimal number. sscanf() works from a string
variable ('l' in this case which we filled w/ the desired line from the
file previously so now we're getting the desired value to a variable)
while fscanf takes input from the file which has been connected via
fopen() and associated w/ a valid file handle (fid is just a convenient
variable name for that).

> Questions:
> In line 8, why did you change from l to fid?

Because we need to scan another line and it's done w/ one source code
line directly from the file via fscanf() whereas we had used fgetl() to
suck up a record in its entirety before while search for the target
first line. By your file, the next line was the location for the next
value wanted so didn't need any more searching to find another randomly
place record--it was given to be the next.

> What is the connection between line 5 and lines 7,8?
> How does it know, after line 5 (i.e. after reaching the end of the line
> containing Nmoves), that it needs to search for the next two lines?

You described the file format and said the next line after the one
containing "Nmoves" was the next desired field to be parsed.

You still don't seem to grasp that the fgetl() reads a record including
the \n (newline) and returned that in the character variable 'l' and the
first sscanf() is parsing that string--nothing else has happened in the
file at that point (after the sscanf() that is). _THEN_, we went back
to the file and got as much of the next record as required to get the
next variable by the use of fscanf().

fscanf(), however, unlike fgetl() does _NOT_ automagically read the
entire record _UNLESS_ and _IFF_ the format string provided tells it to
do that. Your initial description didn't say anything about reading
anything except these two values so I did just that--read records until
found the first one desired, then read just what was needed to get the
variable value requested from the following record. Period. End of
story. That's why later when you came back and said "Oh, that's not the
end of what's needed" I said what I gave you was a shortcut specifically
for the first problem outlined.

Now, the problem is that to read the rest of the desired records you've
got to either write specific formatting strings to handle them (a pita
since they're not symmetric in much of any useful way) to continue on w/
fscanf() (and including the fact that the file position marker is in the
middle of the Nrequired record as above).

So, as noted in my previous response, given you want to do the other
stuff I'd suggest it's simpler to revert to fgetl/sscanf pairs.

Again, take the sample code and your example file and just type the
while loop in at the command line and look at what the contents of 'l'
are and then what happens if you follow the fscanf() call w/ a fgetl()
to understand the difference...

Also read

doc fscanf
doc fgetl

and friends carefully...


Point your RSS reader here for a feed of the latest messages in this topic.

[Privacy Policy] [Terms of Use]

© The Math Forum at NCTM 1994-2018. All Rights Reserved.