Section 2.2 Case study 1: evaluating the derivative of a function
Recall first of all the following fundamental "Calculus 1" result:Theorem 2.2.1.
Let \(f(x)\) be a smooth function, namely \(f(x)\) has derivatives \(f^{(n)}(x)\) of every order and each function \(f^{(n)}(x)\) is continuous in \(x\text{.}\) Then, given any point \(x_0\text{,}\)
\begin{equation*}
f(x_0+h) = f(x_0) + f'(x_0)h +
\frac{f''(x_0)}{2!}h^2+\dots+\frac{f^{(n)}(x_0)}{n!}h^n+\frac{f^{(n+1)}(s)}{(n+1)!}h^{n+1}
\end{equation*}
for every \(x\) close enough to \(x_0\) and some \(s\) between \(x_0\) and \(x\text{.}\)Subsection 2.2.1 First algorithm for \(f'(x)\text{:}\) Forward differencing
Cut the r.h.s. of the formula above after the first two terms. We get that
\begin{equation*}
f(x_0+h) = f(x_0) + f'(x_0)h + \frac{1}{2}f''(s)h^2
\end{equation*}
and so that
\begin{equation*}
f'(x_0) = \frac{f(x_0+h)-f(x_0)}{h} - \frac{1}{2}f''(s)h.
\end{equation*}
Truncation error. This formula tells us two important things:
- We can approximate the "calculus" limit \(f'(x_0)\) with the much simpler "algebraic" rate of change\begin{equation*} \frac{f(x_0+h)-f(x_0)}{h}\text{.} \end{equation*}This approximation for the derivative is called Forward differencing.
- By doing this, we make an error that, if \(f''\) is bounded close to \(x_0\text{,}\) goes to zero at least as fast as the first power of \(h=x-x_0\text{.}\)
\begin{equation*}
\left|\frac{f(x_0+h)-f(x_0)\pm2\varepsilon}{h}-f'(x_0)\right|
=
\left|\frac{f''(s)}{2}h\pm\frac{2\varepsilon}{h}\right|
\leq
\end{equation*}
\begin{equation*}
\leq\left|\frac{f''(s)}{2}\right|\cdot|h|+\frac{2\varepsilon}{|h|}.
\end{equation*}
In our case,
\begin{equation*}
|f''(s)/2|\simeq\sin(1)/2\simeq0.84/2\simeq1/2.
\end{equation*}
The minimum of the function
\begin{equation*}
\frac{h}{2}+\frac{2\varepsilon}{h}\,.
\end{equation*}
is achieved at the value of \(h\) such that
\begin{equation*}
\frac{1}{2}-\frac{2\varepsilon}{h^2}=0\,,
\end{equation*}
namely
\begin{equation*}
h=2\sqrt{\varepsilon}\simeq2\times10^{-8}.
\end{equation*}
What do we learn from here? The example above shows us clearly one important fact: although it is counter-intuitive, because of the round-off, decreasing \(h\) beyond some critical value does increase the error rather than diminishing it!! In the concrete case discussed above, the smallest error we can achieve is somewhere between \(10^{-8}\) and \(10^{-8}\text{,}\) obtained at \(h\simeq10^{-8}\text{.}\)Subsection 2.2.2 Second algorithm for \(f'(x)\text{:}\) Centered differencing
How can we improve our precision in the evaluation of \(f'(x)\text{?}\) Since decreasing arbitrarily the value of \(h\) does not work, we must either switch to a higher-precision floating system or use a better algorithm. While it is possible to use quadruple precision or even arbitrary precision, these systems can be quite demainding from the hardware point of view and most of the times the approach is finding more efficient algorithms. A simple improvement of the algorithm above comes from the following observation:
\begin{equation*}
f(x_0+h) = f(x_0) + f'(x_0)h + \frac{1}{2}f''(x_0)h^2 + \frac{1}{6}f'''(s)h^3,
\end{equation*}
\begin{equation*}
f(x_0-h) = f(x_0) - f'(x_0)h + \frac{1}{2}f''(x_0)h^2 - \frac{1}{6}f'''(t)h^3,
\end{equation*}
so that
\begin{equation*}
f(x_0+h)-f(x_0-h) = 2f'(x_0)h + C h^3,
\end{equation*}
where \(C=(f'''(s)-f'''(t))/6\) is some constant. Hence
\begin{equation*}
f'(x_0) = \frac{f(x_0+h)-f(x_0-h)}{2h} + O(h^2),
\end{equation*}
namely also the ratio
\begin{equation*}
\frac{f(x_0+h)-f(x_0-h)}{2h}
\end{equation*}
can be used to approximate \(f'(x_0)\text{.}\) This approximation for the derivative is called Centered differencing. In this case, the truncation error goes to zero as \(h^2\text{.}\) For instance this means that, if we decrease \(h\) by a factor 10, the truncation error decreases by a factor 100! This suggests that the truncation error can be made small without having to take "too small" values of \(h\text{.}\) Let us give it a try:
It works!\(\phantom{a}\)At \(h\simeq10^{-5}\) we reached an error of just about \(10^{-11}\), three orders of magnitude lower than with the previous algorithm. At lower values of \(h\text{,}\) though, the error goes up again. The reason is similar to the case of the previous algorithm: the total error of our algorithm is about \(\frac{2\varepsilon}{|h|}+Dh^2\) whose minimum value is reached at \(h=\sqrt[3]{\displaystyle\frac{\varepsilon}{D}}\text{.}\) In our concrete case \(D\simeq \sin'''(1)/3\simeq1/6\simeq0.2\text{,}\) so
\begin{equation*}
h_{best}\simeq\sqrt[3]{\frac{10^{-16}}{0.2}}\simeq 8\times 10^{-6}\simeq 10^{-5}.
\end{equation*}
Subsection 2.2.3 MATLAB's own function for numerical differentiation
gradient
, because it is able to evaluate partial derivatives of a function with respect to all variables. For a function of a single variable, of course, the gradient is simply its derivative.