> For degree 6, you can evaluate a real polynomial in x using 4 multiplications and > 7 additions with the code > > z = (x + a0)*x + a1 > w = (x + a2)*z + a3 > the value of the polynomial = ((w + z + a4)*w + a5) * a6 > > You need to solve a cubic equation to work out the constants a0-a6 given the initial > coefficients; however there will always be a real solution. > > You mentioned fused multiply-adds. This algorithm requires > 4 multiplies/fused multiply-adds and just 4 additions. I don't know if you can > get the number of additions down even further.
On second thought, it would probably be slower on an Altivec (or other modern pipelined) machine. Here's why:
A 6-degree polynomial using the standard Horner's rule takes exactly 5 fused multiply-adds to run. On a G5, that would take 5 x 8 cyles = 40 cyles, since each fma depends on the last.
A 6-degree polynomial using the Knuth method above uses 4 fused multiply-adds and 4 additions. Unfortunately each addition also takes 8 cycles, so it comes out to around 8 x 8 cyles = 64 cycles, minus one or two since the initial x + a0 doesn't depend on x + a2 and can be pipelined. Unless some of the coefficients are zero or the same, or I can do more of the operations in parallel... I have to a bit more analysis on that.