Skip to main content

Section 2.2 Case study 2: evaluating the derivative of a function

All results below will make use of TheoremĀ 2.1.1.

Subsection 2.2.1 First algorithm for \(f'(x)\text{:}\) Forward differencing

Cut the r.h.s. of the formula above after the first two terms. We get that
\begin{equation*} f(x_0+h) = f(x_0) + f'(x_0)h + \frac{1}{2}f''(s)h^2 \end{equation*}
and so that
\begin{equation*} f'(x_0) = \frac{f(x_0+h)-f(x_0)}{h} - \frac{1}{2}f''(s)h. \end{equation*}
Truncation error. This formula tells us two important things:

  1. We can approximate the "calculus" limit \(f'(x_0)\) with the much simpler "algebraic" rate of change
    \begin{equation*} \frac{f(x_0+h)-f(x_0)}{h}\text{.} \end{equation*}
    This approximation for the derivative is called Forward differencing.
  2. By doing this, we make an error that, if \(f''\) is bounded close to \(x_0\text{,}\) goes to zero at least as fast as the first power of \(h=x-x_0\text{.}\)

Point (1) is the (quite simple, in this case) algorithm: in order to find the derivative, we evaluate the Forward differencing ratio.

Point (2) is the truncation error: the error due to our approximation is equal to a constant times \(h\text{.}\) The smaller \(h\text{,}\) the better the approximation.

Let us test this formula with Python! The following is a very instructive numerical experiment. We consider the function \(f(x)=\sin x\) and the point \(x_0=1\) and we evaluate numerically the derivative for different values of \(h\) with the algorithm devised above.

What do we expect to see? The error of our numerical evaluation should decrease at least with the first power of \(h\text{.}\) Does this happen? Well, almost...

Hard to say anything from the picture above since almost all values of \(h\) are concentrated in very little space. Better using the loglog rather than the plot instruction...

What is going on?? We forgot to keep into consideration the round-off error! The round-off is absolutely negligible with respect to truncation when \(h\) is not "too small", in this case when \(h>10^{-8}\) (this number depends on \(f\) and \(x_0\)).

When \(h\) gets smaller, though, the round-off becomes the main source of error!

Let us be more quantitative. In the best case scenario, both \(f(x_0+h)\) and \(f(x_0)\) have an error of 1 on their last significant digit and so their difference has an error double of that (we must always assume the worst case scenario!).

Since the values of the sine function close to \(\pi/2\) are about 1, the error \(\varepsilon\) is about \(2^{-53}\simeq10^{-16}\).

Hence the error in evaluating \(f'(x_0)\) with the current algorithm is

\begin{equation*} \left|\frac{f(x_0+h)-f(x_0)\pm2\varepsilon}{h}-f'(x_0)\right| = \left|\frac{f''(s)}{2}h\pm\frac{2\varepsilon}{h}\right| \leq \end{equation*}

\begin{equation*} \leq\left|\frac{f''(s)}{2}\right|\cdot|h|+\frac{2\varepsilon}{|h|}. \end{equation*}

In our case,

\begin{equation*} |f''(s)/2|\simeq\sin(1)/2\simeq0.84/2\simeq1/2. \end{equation*}

The minimum of the function
\begin{equation*} \frac{h}{2}+\frac{2\varepsilon}{h}\,. \end{equation*}
is achieved at the value of \(h\) such that
\begin{equation*} \frac{1}{2}-\frac{2\varepsilon}{h^2}=0\,, \end{equation*}
namely
\begin{equation*} h=2\sqrt{\varepsilon}\simeq2\times10^{-8}. \end{equation*}

What do we learn from here? The example above shows us clearly one important fact: although it is counter-intuitive, because of the round-off, decreasing \(h\) beyond some critical value does increase the error rather than diminishing it!!

In the concrete case discussed above, the smallest error we can achieve is somewhere between \(10^{-8}\) and \(10^{-8}\text{,}\) obtained at \(h\simeq10^{-8}\text{.}\)

Subsection 2.2.2 Second algorithm for \(f'(x)\text{:}\) Centered differencing

How can we improve our precision in the evaluation of \(f'(x)\text{?}\) Since decreasing arbitrarily the value of \(h\) does not work, we must either switch to a higher-precision floating system or use a better algorithm.

While it is possible to use quadruple precision or even arbitrary precision, these systems can be quite demainding from the hardware point of view and most of the times the approach is finding more efficient algorithms.

A simple improvement of the algorithm above comes from the following observation:
\begin{equation*} f(x_0+h) = f(x_0) + f'(x_0)h + \frac{1}{2}f''(x_0)h^2 + \frac{1}{6}f'''(s)h^3, \end{equation*}
\begin{equation*} f(x_0-h) = f(x_0) - f'(x_0)h + \frac{1}{2}f''(x_0)h^2 - \frac{1}{6}f'''(t)h^3, \end{equation*}
so that
\begin{equation*} f(x_0+h)-f(x_0-h) = 2f'(x_0)h + C h^3, \end{equation*}
where \(C=(f'''(s)-f'''(t))/6\) is some constant.

Hence
\begin{equation*} f'(x_0) = \frac{f(x_0+h)-f(x_0-h)}{2h} + O(h^2), \end{equation*}
namely also the ratio
\begin{equation*} \frac{f(x_0+h)-f(x_0-h)}{2h} \end{equation*}
can be used to approximate \(f'(x_0)\text{.}\) This approximation for the derivative is called Centered differencing.

In this case, the truncation error goes to zero as \(h^2\text{.}\) For instance this means that, if we decrease \(h\) by a factor 10, the truncation error decreases by a factor 100! This suggests that the truncation error can be made small without having to take "too small" values of \(h\text{.}\) Let us give it a try:

It works!\(\phantom{a}\)At \(h\simeq10^{-5}\) we reached an error of just about \(10^{-11}\), three orders of magnitude lower than with the previous algorithm. At lower values of \(h\text{,}\) though, the error goes up again. The reason is similar to the case of the previous algorithm: the total error of our algorithm is about \(\frac{2\varepsilon}{|h|}+Dh^2\) whose minimum value is reached at \(h=\sqrt[3]{\displaystyle\frac{\varepsilon}{D}}\text{.}\)

In our concrete case \(D\simeq \sin'''(1)/3\simeq1/6\simeq0.2\text{,}\) so
\begin{equation*} h_{best}\simeq\sqrt[3]{\frac{10^{-16}}{0.2}}\simeq 8\times 10^{-6}\simeq 10^{-5}. \end{equation*}

Subsection 2.2.3 NumPy's own function for numerical differentiation

Central differencing is the method used by NumPy to evaluate numerical derivatives. The name of this NumPy's function is gradient, because it is able to evaluate partial derivatives of a function with respect to each variables. For a function of a single variable, of course, the gradient is simply its derivative.