Date: Jun 3, 2013 11:14 PM
Author: Daniel Lichtblau
Subject: Re: Interval vs default, was Re: Mathematica numerics and... Re: Applying

I'm clipping quite a lot as this is already too long. Also some points
were clarified, in both directions, off-line and I won't go into all
of them. I'm not trying to respond to every point raised, just to a few that
I think are compelling. And, alas, this is probably way too technical
anyway.

> There is a wikipedia entry on the topic that (at least as of this time
> and date) is pretty good.


The Wikipedia article in question is this.

http://en.wikipedia.org/wiki/Significance_arithmetic

In brief, I don't much like it. Little is said about how significance
arithmetic really works (using first order approximations from
Taylor series). And it seems to mix "significant figures arithmetic"
with significance arithmetic. That is, to say the least, confusing.


> The important quality of interval arithmetic that is not preserved is
> that interval arithmetic (done correctly) preserves validity. That is,
> the correct answer is guaranteed to be enclosed in the result.


This is true but I think less important than one might think at first
glance. It is certainly necessary for some purposes. e.g. mathematical
proofs, to have error fully enclosed. It is less important for, say,
reliable computation, provided one has determined that results are not
corrupted, say by precision becoming too low for the first order model
to hold up.

I will give an example that more or less stradles these two worlds.
See:

http://ccsenet.org/journal/index.php/jmr/article/view/22480/14613

A counterexample is indicated to a conjecture from 2003. The code used
to verify the counterexample invokes NSolve at 500 digits. If I recall
correctly the exact computation would take longer than my patience allows,
and finding an easier example that would be faster was likewise beyond
my patience.

I am confident that the counterexample holds up just fine. This
despite the fact that one needs approximate arithmetic to see it through, and
despite the fact that significance arithmetic is used to find and count real-
valued solutions. Even the referee, who was in other respects quite skritchy,
did not argue this point. (Had that happened, I would simply have
convinced Mathematica to validate that or a different counterexample, and
probably unsettled the referee even more in so doing.)


> It seems to me that the default arithmetic is fairly complicated. Each
> extended float number F has some other number E associated with it
> (guess: it is a machine float representing something like log of
> relative error), and calculations require mucking about with F and E
> both, e.g. some calculations involving condition numbers relating to how
> fast a function changes as the argument changes.


Yes, that all comes from the chain rule and first derivatives.


> I don't understand the derivative argument -- intervals can be computed
> essentially by composition of interval functions. This may be overly
> pessimistic compared to actually finding extrema, but it is, so far
> as I know, fairly easy.


Maybe I am missing something basic, but I don't think it is actually
so easy. How do you determine the spread of an interval when evaluating some
transcendental function? With trigs, on the real line, one can take
advantage of periodicity, monotonicity between peaks and valleys, etc. But for
arbitrary functions i would think one must work to either locate, or get outer
bounds on, extremal values when evaluation on an interval. This does not strike
me as either easy or computationally fast.


> > A drawback, relative to interval arithmetic, is that as a first-order
> > approximation, in terms of error propagation, significance arithmetic
> > breaks down at low precision where higher-order terms can cause the
> > estimates to be off. I will add that, at that point, intervals are going
> > to give results that are also not terribly useful.

>
> An advocate of interval arithmetic (and at least for the moment I will
> take this position :) ) would say that interval arithmetic guarantees
> a valid answer. That is, the answer is inside the interval. Sometimes
> the interval may be very wide, even infinitely wide. But it contains
> the answer. Not so for significance arithmetic. While it may be
> appropriately modest for the documentation for N[] to say that
> N[expr,n] >>>attempts<<< to give a result ... This is not really
> what interval arithmetic does. It guarantees...


All fair points, though I do say a bit about reliability above. I will
add to that here with a short description of the underlying mechanics. We
have a bignum float and also a machine float to record and track
precision of the bignum. While this is an implementation aspect, all the same it
often makes sense to view a number as x + dx, where the 'dx' part indicates
the error range.

Error is tracked using derivatives to approximate. For example, Plis[]
can be fully bounded in this first order model since

(x+dx) + (y+dy) = (x+y) + (dx+dy)

Well and good. What about Times?

(x+dx) * (y+dy) = (x*y) + (x*dy + y*dx) + (dx*dy)

The error terms tracked are the middle two-- the dx*dy is in effect
dropped.

So how bad is this error dropping? Consider how we actually represent
our error. We use a machine double, and that itself has a certain
granularity. A bit of thought, which I will not attempt to write out
here, will show the following: for multiplication, the second order
error that is discarded will be less than the granularity of error
that can be recorded, provided the precision remains higher than that
can be kept in a machine double. For all systems on which Mathematica
runs, this amounts to around 16 digits.

Another way to view it is this. Were we to add one ULP to our
precision loss (we do not do this, ity's just hypothetical), we would fully
bound error from Times provided precision is above the granularity implied
by this ULP. (Paradoxically this means that using e.g. quad precision for
error would require greater precision for certainty of enclosing this
error. But on the flip side precision degradation would be far slower.

Similar happens for e.g. inverting (as in 1/(x+dx) and taking
of integer powers. While these are by no means the only operations one
does, they are the most common in many algorithms at least of the sort
I run across.

There are a few more things I might be able to say but I think this
one response has already taken me a bit afield. Maybe another post.


Daniel Lichtblau
Wolfram Research