Date: Mar 28, 2013 11:09 AM
Author: Steven Lord
Subject: Re: The function textscan

"dpb" <> wrote in message news:kit8ad$9f$
> On 3/26/2013 4:56 PM, Steven_Lord wrote:
>> "dpb" <> wrote in message
>> news:kisv1h$4t2$
>> *snip*

>>> textread(), while deprecated by TMW and not as fully flexible as
>>> textscan() does have some benefits that aren't available otherwise the
>>> primary of which is that it returns "ordinary" double arrays instead
>>> of cell arrays only a la textscan(). That can often be a nice advantage.

>> I agree that it can be an advantage. It can also be a complication.
>> Without careful analysis of the TEXTREAD call, it can be difficult to
>> determine the types that the output arguments can have. It may even be
>> impossible if one or more of the inputs is user-specified at runtime.

> I suppose altho I must say I've never run into that limitation -- if it is
> necessary that the inputs be variable then that ought to be planned for in
> the app.

For experienced programmers and developers, yes they _ought to have_ a plan
for their application that handles this scenario.

Not all MATLAB programmers are experienced, though. They may not have
planned their application.

> In most instances, however (and I'd argue that it's the overwhelming
> majority of cases) the purpose is to read a fixed file format. That often
> is simply the case where an array is all one needs/wants and a cell is
> unneeded/unnecessary overhead.

The overhead will be small for that case, though, and for any sufficiently
large file ("sufficiently large" is smaller for regular hard drives than for
SSDs, but it's not all that large in either case I suspect) is likely
swamped by disk access overhead.

nonCellData = cellData{1};

> ...snip for brevity continuation of above thought...

>> The syntax for TEXTREAD also has another drawback: we can never add any
>> more output arguments with specific meanings. The syntax with (for
>> example) five data outputs and one "flag" output would look the same as
>> the syntax with six data outputs.
>> On the other hand, TEXTSCAN returns one data output argument and
>> whatever additional output arguments we choose. It currently has a
>> second output argument named position that lets you read part of a
>> string or file and continue reading from where you ended up later.
>> TEXTSCAN also returns a cell array as its first output. Always. That is
>> its documented behavior, and so if it doesn't then you're using a
>> version of TEXTSCAN that's shadowing the version shipped with MATLAB or
>> you've hit a bug. We know cell arrays have a steeper learning curve than
>> regular double arrays or even plain char arrays ('strings') as indicated
>> by plenty of questions in CSSM about cell array indexing [usually
>> dealing with the difference between C{1} and C(1).] But reading in files
>> written in arbitrary formats can be a challenge in general, and so we're
>> starting from a slightly higher point on the learning curve than if we
>> had, say, the PLUS function return cell arrays.
>> The output type consistency argument and the output extensibility
>> flexibility are IMO good reasons to prefer TEXTSCAN to TEXTREAD in many
>> situations.

> The key point in all of the above is "many" situations. I don't and have
> never argued there isn't a place for textscan(); what I still argue for is
> that where there isn't any advantage in the cell because it is simply a
> set of numeric data then there should be a way to bypass the cell directly
> and that way shouldn't be deprecated or no longer fully supported.

So if there were a flag available for TEXTSCAN that allowed reading ONE
array from the file into a plain numeric array instead of a cell array, that
would satisfy this use case? [We wouldn't be able to exactly reproduce the
calling sequence for TEXTREAD, as we've already locked down the second
output argument to be that location argument, but that's about as close as
we can get without a lot more fiddling.]

> If it is, in TMW's opinion, desirable to have a different syntax to
> incorporate that then do so...

>>> I keep harping on the desirability of continuing to fully support
>>> textread(). Whether the message is being received by TMW is hard to
>>> tell--there is essentially no feedback other than in whatever it is
>>> they choose to release on the next release, unfortunately.

>> Look at the Release Notes for MATLAB:
>> Release R2012b, Language and Programming section, "Preservation of
>> string functions for backwards compatibility" item.
>> We do listen. We occasionally even change our minds based on the
>> feedback we receive. :)

> Yeah, but we don't get that feedback typically until whatever is in the
> release is released--rarely does one hear back when one tosses a
> suggestion over the wall in whether it is a enhancement request or
> compatibility or whatever.

It can be difficult sometimes to determine whether a request is a bug
report, an enhancement request, the result of a misunderstanding (which can
turn into an enhancement request for the documentation), etc. without
investigation. And sometimes the same underlying scenario can be phrased in
different ways to put it in different buckets:

Underlying scenario: User wants to add 0.1 to itself three times and receive
Bug report: When I compute 0.1 + 0.1 + 0.1 it is not equal to 0.3.
Enhancement request: Provide a floating-point numeric data type in MATLAB in
which 0.1 and 0.3 are stored exactly, so 0.1+0.1+0.1 equals 0.3 in that data

[The IEEE 754-2008 spec defines types decimal32, decimal64, and decimal128
according to the IEEE 754 Wikipedia page.]

Now from experience we know that the bug report above is NOT a bug. But
other reports may not be quite so clear.

> It is good that at least sometimes what goes in does have at least some
> effect, certainly.

Steve Lord
To contact Technical Support use the Contact Us link on