Date: Jan 3, 2013 10:08 AM
Author: dpb
Subject: Re: textscan: handling spaces in fixed-width data

On 1/3/2013 7:23 AM, David wrote:
> Hello everyone, please consider the following simple example, with
> fixed-width-data (here: 1-digit per data):
>
> myString = '12 45 78';
> a = textscan(myString,repmat('%1d',1,9));
> b = textscan(myString,repmat('%1d',1,9),'TreatAsEmpty',' ');
>
> What I want to have is something like
> {1,2,<empty>,4,5,<empty>,7,8,<empty>}, i.e. spaces should be treated as
> empty/undefined value.
>
> But, what I get for a AND for b (they are identical!) is:
> {1,2,4,5,7,8,<empty>,<empty>,<empty>}
>
> Maybe I did not understand the "TreatAsEmpty"-option very well?! Do you
> have any advice how to use textscan for this?
>
> I have to read huge amounts of data, so the rowwise evaluation as
> characters/strings in combination with str2double seems not to be a good
> solution for me.


You're at the conundrum in scanning fixed-width text input in Matlab--it
just simply doesn't know how (im[ns]ho it's just broke)

About all you can do is the workaround you describe above or read as
character and do a mass substitution of the blank w/ another character
that won't be silently eaten no matter what...

Sotoo

str=strrep(myString,' ','b');
b = textscan('%1d','TreatAsEmpty','b');

Trouble is that even here the return value will be '0', not [] so if
zero is a possible value as well you're still stuck because textscan()
'emptyvalue' argument is ignored for non-delimited files as well.

textscan() is useful but it still has warts...of course, w/ fixed-width
fields, there's no parsing in Matlab that doesn't. :(

In reality, probably your best bet is to create the file as delimited as
painful as that sounds.

If you're comfortable w/ Fortran, in the past I've written mex-files
that use Fortran FORMAT statements to deal w/ the problem is another
workaround.

I don't know what TMW won't take this on as a challenge despite it being
an issue since day 1...other things are more glamorous it seems.

--