Skip to main content

Appendix A Can we trust the accuracy of functions evaluations on computers?

Let us see first an example of how functions are evaluated by computers. In concrete, we will give a look to the sine function.

How functions are evaluated on a computer. As mentioned in Section 2.1, functions can be evaluated via Taylor series. The code below implements the Taylor series for the sine function with 6 non-zero terms (namely up to the term of order 13) and compares some of its values with the corresponding values of the math.sin function.

It turns out that usually no single algorithm is good enough to evaluate numerically a function. For instance, the code below shows that evaluating the sine function for large angles close to \(2k\pi\) for some integer \(k\) increases the error by an order of \(k\cdot\varepsilon_m\) (namely \(k\) significant digits are lost).

The following concerete example shows why the evaluation of trigonometric functions at points near to large angles almost equivalent to zeros of the function is problematic. Consider the angle \(x = 8248.251512\) and let us reduce it to an angle in the \([-\pi/2,\pi/2]\) interval:
\begin{equation*} x = x^* + 1313\times2\pi -\frac{\pi}{2} = x^* + 5251\times\frac{\pi}{2} \end{equation*}
for
\begin{equation*} x^*=-2.14758367...\times10^{-12}. \end{equation*}
Let us see what happens in a decimal floating point systems with 10-digits mantissa:

Therefore, in D10,
\begin{equation*} x^*=8248.251512-8248.251513 = -1\times10^{-6}. \end{equation*}
The value of \(x^*\) in D10 is wrong by 6 orders of magnitude, and so will be the value of the sine of that angle!!

A real-world software implementation of the sine function. The C code below is the implementation of the sine function used in the GNU implementation of C language on Linux since October 2011:
  1. When \(|x|<2^{-26}\text{,}\) either an underflow error is reported (when the angle is so small that cannot be represented in double precision) or the value of \(\sin x\) is approximated by its Taylor polynomial of order 1, namely the approximation \(\sin x\sim x\) is used (lines 212-216).
  2. When \(2^{-26}\leq|x|< 0.855469\text{,}\) \(\sin x\) is approximated by its Taylor polynomial of order 5, evaluated in line 140 via the Horner's method (lines 217-222).
  3. When \(0.855469\leq|x|<2.426265\text{,}\) \(\sin x\) is evaluated as \(\cos(\pi/2-x)\text{,}\) which is by its Taylor polynomial of order 6, evaluated in line 113 via the Horner's method (lines 224-230).
  4. When \(2.426265\leq|x|< 105414350\text{,}\) first the angle is brought back, by periodicity, to an angle in the interval \([-\pi/2,\pi/2]\) and then is approximated by the Taylor polynomial of order 5 of either \(\sin x\) or \(\cos x\text{,}\) depending on the initial angle (lines 232-237).
  5. When \(105414350\leq|x|< 2^{1024}\text{,}\) the same procedure in the point above is followed but a different subroutine is used for reducing the angle by periodicity (lines 239-244).
  6. When \(|x|>2^{1024}\text{,}\) an overflow error is reported.

How can we verify the accuracy of a function implementation? A way to make sure a function is well implemented is comparing its values with those ontained by any package able to use a higher accuracy. Nowadays there are several of these packages, for instance the GNU MPFR C library or Python's mpmath library. Both libraries are able to perform efficient calculations in arbitrarily large accuracy (provided you have enough RAM for it!).

The code below compares the value of NumPy's sine function (in double precision, namely with no more than 17 decimal significant digits) against the values of the mpmath's sine function with 20 decimal significant digits.

Why not using hardware implementations of functions? Hardware implementations are faster than their software counterpart but have a big drawback: errors cannot be fixed by users as it happens for open-source software packages (among which Python, GCC and MPFR). The unreliability of some hardware implementations is discussed in legth in the following pages:

For instance, the picture below shows, in red, some range where the value of cosine evaluated by a Intel Code i7-2600 CPU differs from the actual cosine value (evaluated by MPFR) by more than 1ulp (unit in the last place), namely with a relative error larger than \(\varepsilon_m\text{.}\)

A more quantitative example: an evaluation with a x87 CPU,like the one above, of \(sin(9223372035086174241)\) gives a value of \(-0.011874025925697012908\text{,}\) while the actual value is \(-4.2053336735954077951\cdot10^{-10}\text{.}\) Notice that Intel declares in its manuals that the precision of its hardware implementation of sine has an accuracy of 1ulp. The following text, explaining the reason for such colossal errors, is taken from the first link above:

Catastrophic cancellation, enshrined in silicon. The first step in calculating trigonometric functions like sin() is range reduction. The input number, which Intel says can be up into the quintillion range, needs to be reduced down to between \(\pm\pi/2\text{.}\)

Range reduction in general is a tricky problem, but range reduction around \(\pi\) is actually very easy. You just have to subtract the input number from a sufficiently precise approximation of pi. The worst-case for range reduction near \(\pi\) will be the value that is closest to pi. For double-precision this will be a 53-bit approximation to \(\pi\) that is off by, at most, half a unit in the last place (0.5 ULP). If you want to have 53 bits of accuracy in the result then you need to subtract from an approximation to \(\pi\) that is accurate to at least 106 ll still be left with enough.

But you actually need more than that. The x87 FPU has 80-bit registers which have a 64-bit mantissa. If you are doing long-double math and you want your range reduction of numbers near \(\pi\) to be accurate then you need to subtract from at least a 128-bit approximation to pi.

This is still easy. If you know your input number is near \(\pi\) then just extract the mantissa and do high-precision integer math to subtract from a hard-coded 128-bit approximation to pi. Round carefully and then convert back to floating-point. In case you want to give this a try I converted \(\pi\) to 192-bits of hexadecimal:

C90FDAA2 2168C234 C4C6628B 80DC1CD1 29024E08 8A67CC74

But Intel does not use a 128-bit approximation to \(\pi\text{.}\) Their approximation has just 66 bits. Here it is, taken from their manuals:

C90FDAA2 2168C234 C

Therein lies the problem. When doing range reduction from double-precision (53-bit mantissa) \(\pi\) the results will have about 13 bits of precision (66 minus 53), for an error of up to 2^40 ULPs (53 minus 13). The measured error is lower at approximately 164 billion ULPs (~37 bits), but that is still large. When doing range reduction from the 80-bit long-double format there will be almost complete cancellation and the worst-case error is 1.376 quintillion ULPs.

Oops.