Fast polynomial evaluation
I guess everybody knows what a polynomial is.
Polynomials are really cool and used all over the place in programming. Evaluating a polynomial is really important, so how should we do it?
First of all lets see how the polynomial function looks like:
\(a_n x^n + a_{n-1} x^{n-1} + \dots + a_2 x^2 + a_1 x + a_0\)
We can see that a polynomial is basically defined by it’s coefficients \(a_i\). So we can actually store all we need for the evaluation by storing the coefficients into an array.
We can use a shorter notation \(\sum\limits_{i=1}^n a_i x^i\) that actually implies the first algorithm.
We implement this in Javascript
1 2 3 4 5 6 7 8 9 10 11 |
|
and then in Python
1 2 3 4 5 6 7 |
|
and of course C
1 2 3 4 5 6 7 8 9 10 11 |
|
Ok, now we can do better than this. We just use a little math trickery and split the polynomial into linear parts (monomial basis):
\(p(x) = a_0 + x(a_1+x(a_2+ \dots +x(a_{n-1} + a_n x)))\)
This is called the Horner method and it’s been around ages (even before Horner of course :) )
Now lets see the implementation:
Javascript
1 2 3 4 5 6 7 8 9 |
|
and then in Python
1 2 3 4 5 |
|
and of course C
1 2 3 4 5 6 7 8 9 |
|
As we can see, we basically avoid a multiplication by \(x\) each iteration.
Now that’s all fine and dandy but will that actually matter on todays computers?
Let’s do some profiling.
We note the G = \(\frac{t_{horner}}{t_{simple}}\) .
Python | Javascript | C | |
---|---|---|---|
G | 0.71 | 0.375 | 1.11 |
I did my tests on my home CPU: i7-2600 at 3.4GHZ (they could vary on your computers so feel free to run them and tell me your results)
We can see that there are some gains in speed when using the Horner method in Python and in Javascript.
We also see that the Horner method is actually worst in C . Why is that? We should be having better times, because we avoid doing a multiplication ! :)
We can also check the assembly code (I took the debug code so we can actually understand what is happening, the debug times are still better for the simple evaluation):
Simple method
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
Horner
1 2 3 4 5 6 7 |
|
The problem is that the Horner scheme is really bad for the cpu pipeline, because one iteration needs to wait for the other one to finish. Horner is starving the poor pipeline.
In higher level languages (that don’t get converted to machine code) you don’t really feel this, because everything is slower and you are waiting for the language interpreter to do it’s job (which is not starving anytime soon :) ).
If we want to improve the c polynomial evaluation we could actually use a Divede et impera method like Estrin’s scheme.
In practice, if we know the polynomial we sometimes can rewrite it using Estrin’s scheme, without coding the whole algorithm, as it has overhead that is insignificant only when the polynomial is huge.
if you tldr and just want to see the test code click here