
Re: numerical precision of sqrt(a^2+b^2) with 16 bit floating point numbers
Posted:
Feb 22, 2014 6:16 PM


On Friday, February 21, 2014 11:58:33 AM UTC, WizardOfOz wrote:
> From my vague recollection of many years ago looking at these thing, I > think I recall that you get worse results when the ratio of a to b is > larger. The closer they are to being equal, the better the result. > > That's because when the ratio is very large/small the number of bits > required for the sum will increase, and you will lose digits from the > smaller of a or b. > > Also numbers that do not require as many binary digits to represent will be > better, as squaring doubles the number of binary digits required, and so > you are more likely to lose digits if a or b is not able to be represented > in a small number of binary digits. > > So I think the worst case might be with (say) b being very much smaller > than a and neither representable exactly by a floating point number and > have '1' digits in the last binary digit in the floating point > representation. > > Also, it would depend on the implementation of the sqrt function .. do we > assume here that it gives as good as result (as close as an approximation) > as is possible approximations for floating point numbers? > > That's just heuristic though, not in any way a thorough analysis :D
Totally wrong. Overflow or underflow are the problem. If there is no overflow or underflow, then you have rounding errors calculating a^2, rounding errors calculating b^2, rounding errors calculating the sum, and rounding errors calculating the square root. If one of a and b is a lot smaller than the other, then calculating the square of it will create a much smaller rounding error, so the error is on average a little bit less.
Consider a = 1, b = eps. sqrt (1 + eps^2) is about 1 + eps^2 / 2. If eps is small then adding eps^2 to 1 will be rounded to 1, so the calculated result is 1 with an error of eps^2 / 2. The smaller eps is, the smaller the error.

