In article <firstname.lastname@example.org>, email@example.com says... > [...] > On second thought, it would probably be slower on an Altivec (or other > modern pipelined) machine. Here's why: > > A 6-degree polynomial using the standard Horner's rule takes exactly 5 > fused multiply-adds to run. On a G5, that would take 5 x 8 cyles = 40 > cyles, since each fma depends on the last.
Yes. Horner's method is, somewhat contrary to popular opinion, not the best way of evaluating polynomials (on a computer). Horner's method creates an expression tree that is deep, rather than wide, so you fall victim to instruction latencies in evaluating the expression.
On modern architectures you want a wide expression tree so that subtrees can be evaluated in parallel (without stalling due to latency dependencies).
A method that improves on Horner's method in this aspect is Estrin's method. It rewrites the polynomial as a (mostly balanced) binary tree using x^2 and A*x+B terms (where A and B are different subexpressions).
Christer Ericson Sony Computer Entertainment, Santa Monica