You are on page 1of 98

Lecture Notes - MECN3032/CHMT3008

September 18, 2017

Contents
1 Numerical Methods Outline (MECN3032 and CHMT3008) 4
1.1 Course Structure and Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Course Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Course Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Hardware Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Machine Arithmetic, Errors and Norms 6


2.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.1 Round-off Error and IEEE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.2 Error Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.3 Stability and Conditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.1 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.2 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3 Systems of Linear Equations 17


3.1 Matrix Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2 Uniqueness of Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1
3.2.1 Linear Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3 Methods of Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3.1 Direct Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3.2 Gaussian Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3.4 LU Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3.5 Cholesky’s Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.3.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3.7 Indirect Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3.8 Jacobi’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3.9 Gauss-Seidel Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3.10 Convergence Criteria for Jacobi and Gauss-Seidel Methods . . . . . . . . . . . 36
3.3.11 Relaxation Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.3.12 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4 Numerical Solutions to Nonlinear Equations 39


4.1 Nonlinear equations in one unknown: f (x) = 0 . . . . . . . . . . . . . . . . . . . . . . 40
4.1.1 Interval Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.1.2 Bisection Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.1.3 False position method or Regula Falsi . . . . . . . . . . . . . . . . . . . . . . . 43
4.1.4 Fixed Point Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.1.5 Newton’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2 Newton’s Method for Systems of Nonlinear Equations . . . . . . . . . . . . . . . . . 47

5 Numerical Differentiation 49
5.1 Finite Difference Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.1.1 Approximations to f 0 (x) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.1.2 Approximations to f 00 (x) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.1.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.2 Richardson’s Extrapolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.2.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.2.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

6 Numerical Integration 54
6.1 Quadrature Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
6.2 Newton-Cotes Quadrature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
6.2.1 Trapezoidal Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.2.2 Simpson’s Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
6.3 Romberg Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.3.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.3.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

7 Data Fitting and Interpolation 61


7.1 Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
7.1.1 Weierstrauss Approximation Theorem . . . . . . . . . . . . . . . . . . . . . . . 61
7.1.2 Linear Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
7.1.3 Quadratic Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
7.1.4 Lagrange Interpolating Polynomials . . . . . . . . . . . . . . . . . . . . . . . . 64

2
7.1.5 Newton’s Divided Differences . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
7.1.6 Errors of Newton’s interpolating polynomials . . . . . . . . . . . . . . . . . . 68
7.1.7 Cubic Splines Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
7.1.8 Runge’s Phenomenon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
7.2 Least Squares Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
7.2.1 Linear Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
7.2.2 Polynomial Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
7.2.3 Least Squares Exponential Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
7.2.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

8 Ordinary Differentiable Equations (ODEs) 83


8.1 Initial Value Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
8.1.1 Stability of ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
8.1.2 Euler’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
8.1.3 Modified Euler’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
8.1.4 Runge-Kutta Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
8.2 Systems of First Order ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
8.2.1 R-K Method for Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
8.3 Converting an nth Order ODE to a System of First Order ODEs . . . . . . . . . . . . 96
8.3.1 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
8.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

3
1 Numerical Methods Outline (MECN3032 and CHMT3008)
1.1 Course Structure and Details
• Office: UG 3 - Maths Science Building (MSB)
• Consultation: Tuesdays - 12:30 - 14:00
• Lecture Venues:
– Fridays: WSS5
– Tuesdays: Unsupervised lab + consultation time

1.2 Course Assessment


• “Numerical Methods” is combined with a statistics component for MECN3032 and not
CHMT3008
– MECN3032 must get a sub-minimum of 40% for each component and 50% overall to
pass
– CHMT3008 must get 50% overall to pass
• There will be two tests and no assignment
• There will be a lab most weeks. These labs may/may not count for extra marks.
• The programming language used for the course will be Matlab/Octave

1.3 Course Topics


We will be covering the following topics throughout the course: * Errors * Norms * Systems of
Linear Equations * Nonlinear Equations * Numerical Differentiation * Numerical Integration *
Data Fitting and Interpolation * Ordinary Differentiable Equations (ODEs)

1.4 Hardware Requirements


The course will be very computational in nature, however, you do not need your own personal
machine. The PC pools for MIA and Chemical engineering have Matlab/Octave installed already.
You should have already used these in your second year computing courses. The labs will be
running the IDEs for Matlab/Octave while I will be using Jupyter for easier presentation and ex-
planation in lectures. You will at some point need to become familar with Jupyter as the tests will
be conducted in the Maths Science Labs (MSL) utilising this platform for autograding purposes.
If you do have your own machine and would prefer to work from that you are more than wel-
come. Since all the notes and code will be presented through Jupyter please follow the following
steps:

• Install Anaconda from here: https://repo.continuum.io/archive/Anaconda3-4.


2.0-Windows-x86_64.exe
– Make sure when installing Anaconda to set the installation to PATH when prompted
(it will be deselected by default)
• Next, depending on whether you own a copy of Matlab or not you can either install a Matlab
kernel into Jupyter or an Octave kernel.
– Here is the Octave kernel: https://github.com/Calysto/octave_kernel

4
– Here is the Matlab kernel: https://github.com/Calysto/matlab_kernel
• Following the instructions on the respective github pages to install. My recommendation
would be to use the Octave kernel for simplicity.
• To launch a Jupyter notebook, open the command promt (cmd) and type jupyter notebook.
This should launch the browser and jupyter. If you see any proxy issues while on campus,
then you will need to set the proxy to exclude the localhost.

If you are not running Windows but rather Linux please speak to me in person about how to
setup your system.

5
2 Machine Arithmetic, Errors and Norms
2.1 Preliminaries
2.1.1 Round-off Error and IEEE
From school we know that the real number line is continuous. Unfortunately, the notion of in-
finitely many numbers is impossible to store on a computer, thus computers can only perform
finite digit arithmetic potentially leading to round-off errors. Computers make use of two formats
for numbers. Fixed-point numbers for integers and floating point numbers for the reals. These are
described in the table below:

Size Description
Bit 0 or 1
Byte 8 bits
Word Reals 4 bytes - single precision 8 bytes - double precision
Word Integers 1,2, 4 or 8 byte signed (can hold both positive and
negative integers)
1, 2, 4, or 8 byte unsigned (can hold only 0 and
positive integers )

Generally speaking, Matlab/Octave will use double precision real numbers. Exceptions may
occur on large problems where memory has become a concern. Thus, consider double precision
as the standard and focus for the course. Double precision makes use of 8 bytes (i.e. 64 bits). For
the most part, this provides sufficient accuracy for computations.
The format for a floating point number is:

x = ±z × 2p ,

here, z is called the mantissa and p the exponent. To ensure a unique representation, we nor-
malise 1 < z ≤ 2. To be able to represent a floating point number, we have to limit the number
of digits within the mantissa and exponent respectively. For double precision this is 53 digits for
the mantissa and 11 bits for the exponent. This allows numbers ranging from just over 2−1022 to
almost 21024 (2.33 × 10−308 to 1.8 × 10308 in decimal).
Should an exponent value exceed this range then we are unable to represent this number and
we have experienced an overflow error. You may often see this represented in Matlab/Octave
with the special representation ±Inf. Alternatively, if an exponent is too small to be represented
then we experience underflow. Underflow can generally be considered the better of the two as
depending on the problem, a number so close to zero can always be approximated by zero.
Given that we can at most represent 264 numbers in double precision, any other number must
be approximated by one the achievable representable numbers. We can illustrate this with an
example. Consider the real numbers ranging between 1 + 2−53 ≤ x < 1 + (2−52 + 2−53 ). Here, the
number may be rounded to x∗ = 1 + 2−52 since this is exactly representable in double precision.
However, since this is no longer the true value we have introduced an error, albeit small. The
absolute error of this is:

Absolute Error = |x∗ − x|, (2.1)


∗ −53
|x − x| ≤ 2 .

6
A more accurate representation of the error would be utilising relative error. This can be com-
puted using:

|x∗ − x|
Relative Error = . (2.2)
|x|
To summarise:

• Numbers represented as floating points are not evenly spaced, however, fixed-
point numbers are.
• The advantage of floating point representation is that a large array of values can
be approximated by it.
• Operations with floating points may get; (i) the exact answer, (ii) a
rounded/approximated version of it, or (iii) a non-representable value, i.e. NaN
or InF.

Let us consider a Matlab/Octave example to illustrate:


We have Ax = b:    
2 1 1
A= , b=
1.99 1 −1
Dividing by 100, we get Cz = d:
   
0.02 0.01 0.01
C= , d=
0.0199 0.01 −0.01

Solving both of these should yield x = z. Lets see if it does?

In [6]: A = [2 1; 1.99 1];


b = [1; -1];
x = A\b

C = [0.02 0.01; 0.0199 0.01];


d = [0.01; -0.01];
z = C\d
format long
fprintf('Does x = z?\n')
ans = num2str(logical(~any(x==z == 0)));
fprintf('Answer = %s \n', ans)

x =

1.0e+02 *

1.999999999999998
-3.989999999999997

z =

7
1.0e+02 *

1.999999999999998
-3.989999999999997

Does x = z?
Answer = 1

2.1.2 Error Propagation


Unfortunately, round-off errors can lead to another issue when considering the accuracy of so-
lutions. Since numerous iterations or multiple steps are often undertaken within a computation,
the final value may have accrued a number of compounded round-off errors. This commonly
happens when the number of digits available are limited. Consider the example below, we are
attempting to add from 0 in steps of 0.1 for 100 steps. We can see that the answer should be 10,
but does the code generate this?

In [7]: x = .1;
sum = 0;
for i = 1:100
sum = sum + x;
end
format long
sum

sum =

9.999999999999981

So we can now see that computational error builds up. We can consider the total error as the
following:

f ∗ (x∗ ) − f (x) = f ∗ (x∗ ) − f (x∗ ) + f (x∗ ) − f (x) ,


| {z } | {z }
computational error propagated data error

where x is the true value, f (x) the desired result, x∗ the approximate input and f ∗ the approx-
imate function computed.

2.1.3 Stability and Conditioning


A problem is said to be insensitive or well-conditioned, if a relative change in input causes a
similar relative change in the solution. A problem is said to be sensitive or ill-conditioned, if

8
a relative change in input causes a large change in the solution. Analogously, an algorithm is
considered stable if it always produces the solution to a neighbourhood problem and conversely
unstable. We measure the sensitivity of a problem by computing the condition number of the
problem, given by:

relative change in solution |(f (x̂) − f (x))/f (x)|


Cond = = ,
relative change in input data |(x̂ − x)/x|
where x̂ is a point near x. The problem is ill-conditioned or sensitive if its condition number is
much larger than 1.

Example Consider the propagated data error when a function f is evaluated for an approximate
input argument x∗ = x + h instead of the true value x. We know then:

Absolute Error Relative Error Condition Number


f 0 (x)
0
hf (x)/f (x) f 0 (x)

f (x + h) − f (x)
f (x + h) − f (x) ≈ hf 0 (x) ≈h Cond =

= x f (x)

f (x) f (x) h/x

The relative error in the function value can be much larger or smaller than that in the input.
This depends on the function in question and the value of the input. For example, f (x) = ex . Here
the absolute error is approximately hex , the relative error is approximately h and therefore has a
condition number of approximately |x|.

In [23]: fun = @(x) exp(x);


x = 0.1:0.1:5;
x1 = 0.1:0.1:10;
y = fun(x);
y1 = fun(x1);
h = 0.01;

figure
subplot(2, 1, 1);
plot(x, h*y, x, h*ones(1, length(x)), x, abs(x), 'linewidth', 1.5);
legend('Absolute Error', 'Relative Error', 'Cond','Location','northwest')
title('Standard Plot')

subplot(2, 1, 2);
semilogy(x, h*y, x, h*ones(1, length(x)), x, abs(x), 'linewidth', 1.5);
legend('Absolute Error', 'Relative Error', 'Cond','Location','northwest')
title('Log Plot')

figure
subplot(2, 1, 1);
plot(x1, h*y1, x1, h*ones(1, length(x1)), x1, abs(x1), 'linewidth', 1.5);
legend('Absolute Error', 'Relative Error', 'Cond','Location','northwest')
title('Standard Plot')

9
subplot(2, 1, 2);
semilogy(x1, h*y1, x1, h*ones(1, length(x1)), x1, abs(x1), 'linewidth', 1.5);
legend('Absolute Error', 'Relative Error', 'Cond','Location','northwest')
title('Log Plot')

10
Example Consider computing values of the consine function near π/2. So let x ≈ π/2 and let h
be some small perturbation to x. Then the error in computing cos(x + h) is:
Absolute error = cos(x + h) − cos(x) ≈ −h sin(x) ≈ −h and relative error ≈ h tan(x) ≈ ∞.
Therefore, small changes in x near π/2 can have massive relative changes in cos(x) regardless of
the method used for computing it! For example:

function o = absoluteError(f, x, xstar)


o = abs(f(x) - f(xstar));
end

function o = relativeError(f, x, xstar)


o = abs(f(x) - f(xstar))/abs(f(x));
end

In [24]: f = @(x) cos(x);


fx = @(x) x;
x = 1.57079;
xstar = 1.57078;
o1 = absoluteError(f, x, xstar);
o2 = relativeError(f, x, xstar); % relative change in ouput
o3 = relativeError(fx, x, xstar); % relative change in input
o4 = o2/o3; % ratio of output change to input change
fprintf('We can see that the relative change in the output is %2.3f. This is rough

We can see that the relative change in the output is 1.581. This is roughly a
1/4 million times large (~248275.7898) than the
relative change in the input, 0.00000637

In summary:

• The concept of stability of an algorithm is analogous to conditioning of a mathematical prob-


lem.
• Both deal with the sensitivity to perturbations.
• An algorithm is stable if the result it produces is relatively insensitive to perturbations re-
sulting from approximations made during computation.
• Accuracy refers to the actual closeness of a computed solution to the true solution of the
problem.
• Stability if an algorithm does not guarantee accuracy. Accuracy also depends on the condi-
tioning of the problem as well as the stability of the algorithm. Gaussian Elimination on a
stable system, or a stable method on an ill-conditioned system being examples.

11
2.1.4 Exercises
Complete the following exercises:

1. What are the absolute and relative errors in approximating π by the following quantities:
• 3
• 3.14
• 22/7
2. Consider the function f : R2 → R defined by f (x, y) = x − y. Measuring the size of the input
(x, y) by |x| + |y| and assuming that |x| + |y| ≈ 1 and x − y ≈ , show that cond(f ) ≈ 1/.
3. Suppose x and y are true (nonzero) values and x̃ and ỹ are approximations to them. That is:

x̃ = x(1 − r)
ỹ = y(1 − s)

• Show that the relative error in x̃ is |r| and the relative error in ỹ is |s|
• Show that we can bound the relative error in x̃ỹ as an approximation to xy by:

x̃ỹ − xy
xy ≤ |r| + |s| + |rs|.

4. If a is an approximate value for a quantity whose true value is t, and a has a relative error of
r. Prove from the definitions of these terms that:

a = t(1 + r)

5. Consider the problem of evaluating sin(x), in particular, the propagated data error, that is,
the error in the function value due to a perturbation h in the argument x.
• Estimate the absolute error in evaluating sin(x)
• Estimate the relative error in evaluating sin(x)
• Estimate the condition number for this problem
• For what values of the arguement x is this problem highly sensitive?

12
2.2 Norms
Norms are essential in numerical work since they enable us to have a measure of the size of a vec-
tor or matrix. A norm is a real valued function and is required to possess the following properties:

Property Description
kAk ≥ 0 for all A
kAk = 0 if and only if A is the zero matrix (vector)
kcAk = |c|kAk for all c ∈ R and all A
kA + Bk ≤ kAk + kBk for all A and B (called the triangle inequality)

In order to distinguish between different norms we use a subscript. The above properties,
however, hold for all norms.

2.2.1 Vectors
The most commonly used norms for a vector x̄ ∈ Rn are:
n
X
`1 : kx̄k1 = |xi |, (2.3)
i=1

the Euclidean norm (i.e. the least squares/minimum energy),


v
u n
uX
`2 : kx̄k2 = t x2i , (2.4)
i=1

and the ∞ norm,

`∞ : kx̄k∞ = max |xi | (2.5)


1≤i≤n

In [1]: %%python
import numpy as np
import pylab as pl

def l1(xs):
return np.array([np.sqrt((1 - np.sqrt(x ** 2.0)) ** 2.0) for x in xs])

def l2(xs):
return np.array([np.sqrt(1.0 - x ** 2.0) for x in xs])

xs = np.linspace(0, 1, 100)

# l1 norm
pl.plot(xs, l1(xs), "r-", label="$\ell_1$")
pl.plot(xs, -1.0 * l1(xs), "r-")
pl.plot(-1 * xs, l1(xs), "r-")
pl.plot(-1 * xs, -1.0 * l1(xs), "r-")

13
# l2 norm
pl.plot(xs, l2(xs), "b-", label="$\ell_2$")
pl.plot(xs, -1.0 * l2(xs), "b-")
pl.plot(-1 * xs, l2(xs), "b-")
pl.plot(-1 * xs, -1.0 * l2(xs), "b-")

# l_infty norm
pl.plot(np.linspace(-1, 1, 10), np.ones(10), "g-", label="$\ell_\infty$")
pl.plot(np.linspace(-1, 1, 10), -1*np.ones(10), "g-")
pl.plot(np.ones(10), np.linspace(-1, 1, 10), "g-")
pl.plot(-1*np.ones(10), np.linspace(-1, 1, 10), "g-")

# Internal axis
pl.plot([-1.25, 1.25], [0, 0], "k-")
pl.plot([0, 0], [-1.25, 1.25], "k-")

pl.xlabel("$x$")
pl.ylabel("$y$")
pl.legend()
pl.title("Unit Norms", fontweight = "bold")
pl.axis("equal")
pl.show()

14
Example if x = [−3 1 0 2]T then,

kx̄k1 = | − 3| + |1| + |0| + |2| = 6,


p p
kx̄k2 = (−3)2 + 12 + 02 + 22 = 14,
kx̄k∞ = max{| − 3|, |1|, |0|, |2|} = 3

2.2.2 Matrices
If A ∈ Rn×n the `1 and `∞ norms are:

n
X
kAk1 = max |aij |, (2.6)
1≤j≤n
i=1
Xn
kAk∞ = max |aij |, (2.7)
1≤i≤n
j=1
(2.8)

which are the maximum column and row sum respectively.

Example  
5 −2 2
A= 3 1 2
−2 −2 3
If we sum the absolute values in each column we get {10 5 7}, giving:

kAk1 = 10.

If we sum the absolute values in each row we get:


 
9
6 ,
7
 

thus kAk∞ = 9.

There is no simple formula or the `2 norm of a matrix. One method is:


q
kAk2 = max{eig(AT A)},

that is, the square root of largest eigenvalue in absolute of AT A.

15
Example Using A from the above example:

In [1]: A = [5 3 -2; -2 1 -2; 2 2 3]


AT = A'
e = eig(AT*A)
l2 = sqrt(max(e)) % Computing the l2 using the formula
l2f = norm(A) % Computing the l2 using the builtin function

A =

5 3 -2
-2 1 -2
2 2 3

AT =

5 -2 2
3 1 2
-2 -2 3

e =

3.7998
17.1864
43.0138

l2 =

6.5585

l2f =

6.5585

16
3 Systems of Linear Equations
3.1 Matrix Representation
A linear system is a set of linear equations. Systems of linear equations arise in a large number
of areas, both directly in the mathematical modelling of physical situations and indirectly in the
numerical solution of other mathematical problems. A system of algebraic equations has the form:

A11 x1 + A12 x2 + . . . + A1n xn = b1


A21 x1 + A22 x2 + . . . + A2n xn = b2
..
. (3.1)
Am1 x1 + Am2 x2 + . . . + Amn xn = bm ,
where the coefficients Aij and the constants bj are known, and xi represent the unknowns. In
matrix notation the equations are written as:
   
A11 A12 . . . A1n x1 b1

A21 A21 . . . A2n  x2   b2 
..  ..  =  ..  ,
   
.. .. ..
.
. . .  .   . 
Am1 Am2 . . . Amn xm bm
or simply,

Ax = b, (3.2)
where:
• m < n, we have an under-determined system of linear equations.
• m = n, we have a quadratic system of linear equations.
• m > n, we have an over-determined system of linear equations.
A set of equations with a zero right-hand side, i.e. Ax = 0, is called a homogeneous set of
equations.

3.2 Uniqueness of Solution


An n × n matrix A is said to be singular if it has any one following properties:
1. A−1 does not exist. There is no matrix M such that AM = M A = I. Where I is the identity
matrix.
2. The determinant is zero, i.e. det(A) = 0.
3. rank(A) < n, i.e. the rank of the matrix is less than the number of rows.
4. Az = 0 for any vector z 6= 0.
Should the above not be the case, then the matrix is said to be non-singular. To determine
whether a solution to Ax = b exists depends on A being singular or non-singular. Should A be
non-singular, then A−1 exists and thus Ax = b has a unique solution x = A−1 b independent of the
value of b. Conversely, if A is singular, then the number of solutions is dependent on the vector
b. Depending on b we may have; (i) no solution, or (ii) infinitely many solutions, i.e. if a singular
system has a solution, then that solution cannot be unique. To summarise:

17
Solution Matrix
One solution non-singular
No solution singular
Infinitely many solutions singular

In [1]: x = -5:1:5;
y1 = 2*x + 5*ones(1, length(x));
y2 = -2*x - 5*ones(1, length(x));
y3 = 2*x + 20*ones(1, length(x));
figure
subplot(1, 3, 1); plot(x, y1, x, y2); title('Unique Solution'); xlabel('x'); ylabel
subplot(1, 3, 2); plot(x, y1, x, y1); title('Infinitely Many Solutions'); xlabel('x
subplot(1, 3, 3); plot(x, y1, x, y3); title('No solution'); xlabel('x'); ylabel('y'
suptitle('2D Representation of Singular/Non-Singular Outcomes')

18
Example Consider the following:

2x1 + 3x2 = b1 ,
5x1 + 4x2 = b2.

We can write this as:


    
2 3 x1 b
= 1
5 4 x2 b2
Here the system is nonsingular regardless the value of b. If b = [8 13]T , then the unique solution
is x = [1 2]T .
Now consider:
    
2 3 x1 b
= 1
4 6 x2 b2
Here the system is singular regardless of the value of b. With b = [4 7]T , there is no solution
and with b = [4 8] we have:
 
γ
x=
(4 − 2γ)/3,
where γ is any real number.

3.2.1 Linear Systems


The modelling of of linear systems inevitably leads to equations of the form Ax = b, where b
is the input and x represents the response of the system. The coefficient matrix A represents the
characteristics of the system and is independent of the input. That is to say if the input changes,
the equations have to be solved with a different b but the same A. Thus, it would be desirable to
have an equation solving algorithm that can handle any number of constant vectors with minimal
computational effort.

19
3.3 Methods of Solution
There are two classes of methods for solving systems of equations: Direct and Indirect methods.
In direct methods, only one (the unique solution hopefully) is obtained after implementing the
steps of the algorithm. This is done by performing row operations.
Recap - Elementary row operations on systems of equations are:

Operation Description
Row Swap Interchanging two equations in a system gives a new
system which is equivalent to the old one. Denoted as
(Ri ) ↔ (Rj ).
Scalar We can multiple an equations with a non-zero scalar. This
multiplication gives a new system equivalent to the old one. Denoted as
(λRi ) → (Ri ).
Row operation We can replace an equation with the sum of two
equations. This is still equivalent to the old system.
Denoted as (Ri + λRj ) → (Ri ).

Indirect methods begin with an initial guess for the solution x, and then iteratively refine
the solution until a given convergence criterion is reached. Iterative methods are generally less
efficient than direct methods due to large number of iterations require. However, they have sig-
nificant advantages if the coefficient matrix is large and sparsely populated.

3.3.1 Direct Methods


We look at two direct methods in this course, namely; (i) Gaussian Elimination, and (ii) LU De-
composition. We can see an overview in the Table below:

Method Initial Form Final Form


Gaussian Elimination Ax = b Ux = c
LU Decomposition Ax = b LU x = b

In the Table above, U represents the upper triangular matrix, L the lower triangular matrix,
and I the identity matrix. Thus a 3 × 3 upper triangular matrix has the form:
 
U11 U12 U13
U =  0 U22 U23  ,
0 0 U33

while a 3 × 3 lower triangular matrix appears as,


 
L11 0 0
L = L21 L22 0  .
L31 L32 L33

20
Example Determine whether the following matrix is singular:
 
2.1 −0.6 1.1
A = 3.2 4.7 −0.8
3.1 −6.5 4.1
Solution:
     
4.7 −0.8 3.2 −0.8 3.2 4.7
|A| = 2.1 − (−0.6) + 1.1
−6.5 4.1 3.1 4.1 3.1 −6.5
Thus, since the determinant is zero, the matrix is singular.

3.3.2 Gaussian Elimination


One method of solving systems of linear equations is Gaussian Elimination, a special case of
which is the Gauss-Jordan method (reduces to Ix = c). You should already be familiar with this
from your mathematics courses.
The Gaussian Elimination algorithm is comprised of two steps:

• Forward Elimination: transforms the equations into upper triangular form.


• Back substitution: solves for the unknown solution vector.

Consider the system of equations Ax = b:


    
a11 a12 . . . a1n x1 b1
 a21 a22 . . . a2n  x2   b2 
= ,
    
 .. .. ..  .. ..
 . . .  .   . 
an1 an2 . . . ann xn bn
a system of n equations and n unknowns.
Forward Elimination
Step 1: Express the equation system in augmented form:
 
a11 a12 . . . a1n
b1
 a21 a22 . . . a2n b2 
[A|b] =  .
 
.. .. ..
 ..

. .
. 
an1 an2 . . . ann bn
Step 2: To eliminate the elements below a11 we apply the sequence of row operations:
ai1
Ri ← Ri − mi1 R1 , mi1 = , i = 2, 3, . . . , n.
a11
6 0. f a11 6= 0 then
Here a11 is called the pivot element and mi1 the multiplier. Note that a11 =
the new augmented matrix obtained is:
 
a11 a12 a13 . . . a1n b1
(1) (1) (1) (1) 
 0 a22 a23 . . . a2n b2 

 . . 
 . . 
 . . 
(1) (1) (1) (1)
0 an2 an3 . . . ann bn

21
The superscript (1) refers to coefficients which may have changed as a result of row operations
(1)
in the i-th step. Repeat the process to eliminate the elements below the diagonal element a22 .
(1)
ai2
Ri ← Ri − mi2 R2 , mi2 = (1)
, i = 3, 4, . . . , n
a22
(1)
The element a22 is now the pivot:
 
a11 a12 a13 . . . a1n
b1
 0 a(1) a(1) . . . a(1) (1)
b2 
 22 23 2n 
 (2) (2) (2) 
 0 0 a33 . . . a3n b3 
 .. ..
 
 . .


(2) (2) (2)

0 0 an3 . . . ann bn

The procedure is repeated until we have introduced zeros below the main diagonal in the first
n − 1 columns. We then have the desired upper triangular form:
 
a11 a12 a13 . . . a1n b1

 0 a(1) a(1) . . . a(1) b(1) 
 22 23 2n 2 
 (2) (2) (2) 
 0 0 a33 . . . a3n b3 
 .. ..
 
 . .


(n−1) (n−1)

0 0 0 . . . ann bn
Back Substitution
We may then use back substitution to obtain:

(n−1)
bn
xn = (n−1)
(3.3)
ann
 
n
1 b(i−1) −
X (i−1)
xi = (i−1) i aij xj  i = n − 1, . . . , 1 (3.4)
aii j=i+1

Example Consider the following:


 
1 1 1 4

 2 3 1 9 

1 −1 −1 −2

R2 ← R2 − 2R1 , R3 ← R3 − R1

 
1 1 1 4

 0 1 −1 1 

0 −2 −2 −6

22
R3 ← R3 + 2R2

 
1 1 1 4
 0 1 −1 1 

0 0 −4 −4
Writing the system in full,

x1 + x2 + x3 = 4
x2 − x3 = 1
−4x3 = −4

We can now solve directly for x3 , x2 and x1 ,

x3 = −4/(−4) = 1
x2 = 1 + x3 = 2
x1 = 4 − x2 − x3 = 1

See Burden and Faires for more.

Partial Pivoting The Gaussian Elimination method fails if the pivot aii is zero or small. Division
by zero or a small number increases the error in the computation and may lead to an unexpected
solution. This issue is address using partial pivoting.
To perform partial pivoting, we ensure that for each step the diagonal element aii has the
largest absolute value. That is, search the ith column for the element with the largest magnitude.
Once found, perform a row swap with the respective column so that the small or zero pivot has
been moved off the diagonal.
Procedure: 1. Find the entry in the current working column with largest absolute value. This
is the new pivot. 2. Perform a row interchange if required so that the new pivot is on the diagonal.
3. Perform the elimination procedure as per usual.

Example Consider:
 
0.0030 59.14 59.17
.
5.291 −6.130 46.78
Applying pivoting yields:
 
5.291 −6.130 46.78
,
0 59.14 58.91
from which we obtain:

23
x2 = 0.9961
46.78 + 6.130(0.9961) 52.89
x1 = = = 9.996
5.291 5.291
Although not exact, this solution is closer to the expected solution than if we had not applied
partial pivoting.
True solution below.

In [8]: % True solution


A = [0.003 59.14; 5.291 -6.13]
b = [59.17; 46.78]
ans = A\b;
fprintf('The true solution is: %2.2f\n', ans)

A =

0.0030 59.1400
5.2910 -6.1300

b =

59.1700
46.7800

The true solution is: 10.00


The true solution is: 1.00

See the pseudocode for Gaussian Elimination below. Note it is pseudocode and not actual
Matlab/Octave code, so please do not copy and past this and expect it to work. Use it to help
program your own function.

% Pseudocode for Gaussian Elimination


input a
input b
for k = 1:n-1 do
for i = k+1:n do
factor = a(i, k)/a(k, k)
for j = k+1:n do
a(i, j) = a(i, j) - factor*a(k, j)
end
b(i) = b(i) - factor*b(k)

24
end
end

x(n) = b(n)/a(n, n)
for i = n-1:-1:1 do
sum = b(i)
for j = i+1:n do
sum = sum - a(i, j)*x(j)
end
x(i) = sum/a(i, i)
end
return x

3.3.3 Exercises
1. Use Gaussian Elimination with and without partial pivoting to solve the following linear
system:

x1 − x2 + 3x3 = 2
3x1 − 3x2 + x3 = −1
x1 + x2 = 3

2. Given the linear system:

2x1 − 6αx2 = 3
3αx1 − x2 = − 32

• Find values of α for which the system has no solution.


• Find values of α for which the system has an infinite number of solutions.
• Assuming a unique solution exists for a given α, find the solution
3. Solve for the following equations:

2x + y = 3,
2x + 1.001y = 0

4. Change the second equation in (3) to 2x + 1.002y = 0 and solve the new system. What do
you observe between the two solutions? What does this imply about the coefficient matrix?
5. Determine whether the following matrix is singular:
 
2.1 −0.6 1.1
3.2 4.7 −0.8
3.1 −6.5 4.1

6. Do row swaps change the solution of a system of equations? If not, what do they do?

25
7. Do column swaps change the solution of a system of equations?
8. Compute the condition number of the matrix:
 
1 −1 −1
0 1 −2
0 0 1

9. True or False. If x is any n vector, then kxk1 ≥ kxk∞

3.3.4 LU Decomposition
A drawback of Gaussian Elimination is that the vector b must be known prior to the forward
elimination. The LU decomposition method only requires the coefficient matrix A and can be
performed independently of the vector b. Consider the n × n linear system:

Ax = b
The general principle is to factorise the matrix A into two triangular matrices:

A = LU, (3.5)
where L and U are strictly lower and upper triangular matrices. The system:

Ax = LUx = b,
can then be solved by letting
Ux = y,
so that
A = Ly = b.
First we solve the system:

Ly = b, (3.6)
by forward substitution for y, and then solve the system:

Ux = y, (3.7)
by backward substitution for x.
LU decomposition is not unique, i.e. there are numerous combinations where the product LU
yield A. Three commonly used decompositions are:

Decomposition Description
Doolittle’s decomposition Lii = 1, i = 1, 2, . . . , n, i.e. the diagonal elements of L are
ones.
Crout’s decomposition Uii = 1, i = 1, 2, . . . , n, i.e. the diagonal elements of U are
ones.
Cholesky’s decomposition L = U T , i.e A = LLT

In this course we will only consider Doolittle’s decomposition and Choleski’s decomposition.

26
Doolittle’s Decomposition Doolittle’s decomposition is similar to Gauss Elimination. The gen-
eral formula for Doolittle’s factorisation of the general system is:
    
a11 a12 · · · a1n 1 0 ··· 0 u11 u12 · · · u1n
 a21 a22 · · · a2n   l21 1 · · · 0   0 u22 · · · u2n 
A= .  =  ..
    
. .. .. . . ..   .. .. . . .. 
 . .   . . . .  . . . . 
an1 an2 · · · ann ln1 ln2 · · · 1 0 0 · · · unn
A nice observation of Doolittle’s decomposition is that:

• The matrix U is identical to the upper triangular matrix obtained from Gaussian Elimination.
• Also, the off-diagonal elements of L are the pivot equation multipliers used in the Gaussian
Elimination, i.e. Lij are the multipliers that eliminated Aij . Note this is true when no partial
pivoting is used.

Having computed U using GE, we can state that:


j−1
!
1 X
lij = aij − lik ukj , i = j + 1, . . . , n. (3.8)
ujj
k=1

Example Use Doolittle’s decomposition to solve the system:

2x1 − 3x2 + x3 = 7
x1 − x2 − 2x3 = −2
3x1 + x2 − x3 = 0
Solution:     
2 −3 1 1 0 0 2 −3 1
A = 1 −1 −2 = l21 1 0 0 u22 u23 
3 1 −1 l31 l32 1 0 0 u33
With Doolittle’s decomposition, the first row of U is always the same as A. Solving for the above
unknowns we get:    
1 0 0 2 −3 1
L =  21 1 0 , U = 0 21 −5 2

3
2 11 1 0 0 25
Now letting y = U x we have:
    
1 0 0 y1 7
Ly =  12 1 0 y2  = −2
3
2 11 1 y3 0
leading to:

y1 = 7
y2 = −2 − 7/2 = −11/2
 
3 11
y3 = 0 − (7) − 11 − = 50
2 2

27
and finally:   
2 −3 1 7
0 1 −5   11 
2 2 −2
0 0 25 50
yielding the required solution:

x3 = 2
11 5
x2 = 2(− + ) = −1
2 2
1
x1 = (7 − 2 + 3(−1)) = 1
2
***

In [48]: a = [2 -3 1;1 -1 -2; 3 1 -1]


fprintf('We can code up to check. Implementing the above,\n')
[l1 u1] = LUfactor(a)
fprintf('checking with the buildin function\n')
[l u p] = lu(a)
fprintf('Are these the same?.... Lets check\n')
l1*u1
inv(p)*l*u

a =

2 -3 1
1 -1 -2
3 1 -1

We can code up to check. Implementing the above,


l1 =

1.00000 0.00000 0.00000


0.50000 1.00000 0.00000
1.50000 11.00000 1.00000

u1 =

2.00000 -3.00000 1.00000


0.00000 0.50000 -2.50000
0.00000 0.00000 25.00000

checking with the buildin function


l =

1.00000 0.00000 0.00000


0.66667 1.00000 0.00000
0.33333 0.36364 1.00000

28
u =

3.00000 1.00000 -1.00000


0.00000 -3.66667 1.66667
0.00000 0.00000 -2.27273

p =

Permutation Matrix

0 0 1
1 0 0
0 1 0

Are these the same?... Lets check


ans =

2 -3 1
1 -1 -2
3 1 -1

ans =

2.00000 -3.00000 1.00000


1.00000 -1.00000 -2.00000
3.00000 1.00000 -1.00000

% pseudocode for LU decomposition


for k = 1:n-1 do
for i = k+1:n do
if a(i, k) ~= 0 do
l = a(i, k)/a(k, k)
a(i, k+1:n) = a(i, k+1:n) - l*a(k, k+1:n)
a(i, k) = l
end
end
end

3.3.5 Cholesky’s Decomposition


For symmetric, positive definite matrices, factorisation can be done by Cholesky’s method. Ma-
trix is positive definite if:

A = AT , and xT Ax > 0 ∀x 6= 0. (3.9)


Quick checks for positive definiteness:

29
• A positive definite matrix has real eigenvalues. Positive eigenvalues implies a global mini-
mum. Mixed eigenvalues imply a saddle point, i.e. no maxima or minima. Negative eigen-
values imply negative definiteness, thus achieves a maximum. (See diagrams below)
• A symmetric matrix A is positive definite if and only if each of its leading principal subma-
trices has a positive determinant.

In [14]: [x, y] = meshgrid(-2:0.2:2, -2:0.2:2);


z1 = x.^2 + y.^2;
z2 = -x.^2 - y.^2;
z3 = x.^2;
z4 = x.^2 - y.^2;
figure
surf(z1);title('Postive Definite');xlabel('x');ylabel('y')
figure
surf(z2);title('Negative Definite');xlabel('x');ylabel('y')
figure
surf(z3);title('Positive Semi-Definite');xlabel('x');ylabel('y')
figure
surf(z4);title('Saddle Point - Indefinite');xlabel('x');ylabel('y')

30
31
Example Consider the matrix:  
2 −1 0
A = −1 2 −1 .
0 −1 2

 |A1 | = 2 > 0.
The submatrix A1 = [2] and
2 −1
The submatrix A2 = and |A2 | = 3 > 0.
−1 2
The submatrix A3 = A and |A| = 4 > 0.
Therefore A is positive definite.

If A is symmetric, then U = LT and hence:

A = LU = LLT (3.10)
The benefit of performing Cholesky over regular LU decomposition if it is applicable is due
to it taking roughly half the number of operations required. This is primarily attributed to the
symmetry of the problem. We can summarise the general recurrence relations as follows:

32

l11 = a11
a1i
lij = , i = 1, 2, . . . , n
l11
i−1
X
2 1/2
lii = (aii − lik ) , i = 2, . . . , n
k=1
(aij − j−1
P
k=1 ljk lik )
lij = , j = 1, 2, . . . , i − 1, i ≥ 2
ljj

3.3.6 Exercises
1. Utilising both LU and Cholesky decomposition, factorise the following matrix:
 
4 2 14
 2 17 −5
14 −5 83

2. Solve the equations Ax = b using LU decomposition where:


   
8 −6 2 28
A = −4 11 −7 , b = −40
4 −7 6 33

3. Under what conditions can you use Cholesky decomposition?


4. True or False: Once the LU factorisation of a matrix A has been computed to solve a linear
system Ax = b, then subsequent linear systems with the same matrix but different rand hand
side vectors can be solved without refactoring the matrix?
5. Prove that the matrix:  
0 1
A= ,
1 0
has no LU factorisation, i.e. no L and U exist such that A = LU .
6. What is the LU factorisation of the following matrix:
 
1 a
.
c b

Also, under what conditions is this matrix singular?

3.3.7 Indirect Methods


For large linear systems, the full matrix factorization becomes impractical. Iterative methods can
often be used in such circumstances. These schemes are also called indirect because the solution
is obtained from successive approximations. Here we consider several of such schemes.
An iterative solution scheme for a systems of equations can always be written in the form:

33
x(i+1) = Bx(i) + c, i = 0, 1, 2, . . . (3.11)
where B is an iteration matrix, c is a constant vector and i is an iteration counter. We start with
an initial guess x(0) of the true solution x of the system Ax = b. Using the iterative scheme (3.11)
we generate a sequence of vectors x(1) , x(2) , x(3) , . . . each of which is a better approximation to
the true solution than the previous one. This is called iterative refinement.
The iterative refinement is stopped when two successive approximations are found to differ,
in some sense, by less than a given tolerance. We shall use the stopping criteria:

|xij − xji−1 |
max < , i > 0. (3.12)
1≤j≤n |xij |
Consider an n × n system of equations A x = b where A is non-singular and the diagonal
elements of A are non-zero. Define

• L to be strictly lower triangular part of A.


• U to be strictly upper triangular part of A.
• D to be diagonal part of A.

i.e.,
A = D + L + U,
where: L, D and U are defined by:
  
aij , i > j aij , i = j aij , i < j
Lij = Dij = Uij =
0, i ≤ j 0, i 6= j 0, i ≤ j

For example a 3 × 3 matrix can be represented as:


       
a11 a12 a13 0 0 0 a11 0 0 0 a12 a13
 a21 a22 a23  =  a21 0 0  +  0 a22 0  +  0 0 a23 
a31 a32 a33 a31 a32 0 0 0 a33 0 0 0

Hence substituting A = L + D + U in A x = b we get:

(L + D + U)x = b

We can then re-arrange the equation to get:

Dx = −(L + U)x + b. (3.13)


This is the basis for Jacobi’s method.

3.3.8 Jacobi’s Method


Consider a system of equations Ax = b where A is an n × n matrix. Solving the ith equation for
xi we get:

34
b1 − (a12 x2 + a13 x3 + . . . + a1n xn )
x1 =
a11
b2 − (a21 x1 + a23 x3 + . . . + a2n xn )
x2 = (3.14)
a22
..
.
bn − (an1 x1 + an2 x2 + . . . + ann−1 xn−1 )
xn =
ann
In matrix form this is:

x = D−1 [b − (L + U)x] (3.15)


We can write equation (3.15) in iterative form as:

x(i+1) = D−1 [b − (L + U)x(i) ] (3.16)


which is clearly the standard form (i.e. of the form of equation (3.11)) for iterative solution with
BJ = −D−1 (L + U) and c = D−1 b.

3.3.9 Gauss-Seidel Method


The Gauss-Seidel iteration uses the most recent estimates at each step in the hope of achieving
faster convergence:

(i) (i) (i)


(i+1) b1 − (a12 x2 + a13 x3 + . . . + a1n xn )
x1 =
a11
(i+1) (i) (i)
(i+1) b2 − (a21 x1 + a23 x3 + . . . + a2n xn )
x2 = (3.17)
a22
..
.
(i+1) (i+1) (i+1)
bn − (an1 x1 + an2 x2 + . . . + ann−1 xn−1 )
x(i+1)
n =
ann
or in discrete form:
 
(i+1) 1  X (i+1)
X (i)
xj = bj − ajk xk − ajk xk  (3.18)
ajj
k<j k>j

In matrix form:

x(i+1) = D−1 [b − Lx(i+1) − Ux(i) ], (3.19)


where the most recent estimates are used throughout. For this method the iteration matrix is:

BGS = −(D + L)−1 U and c = (D + L)−1 b.

35
3.3.10 Convergence Criteria for Jacobi and Gauss-Seidel Methods
Convergence of an iterative method means the successive approximations will tend to a particular
vector x as i → ∞.
For any real x(0) , the sequence {x(k) }∞
k=0 defined by (3.11) converges to the unique solution
x = Bx + c if and only if kBk < 1.
A sufficient condition for convergence of the Jacobi and the Gauss-Seidel methods is that the
coefficient matrix is diagonally dominant:
X
|aii | > |aij |, ∀ i.
j6=i

This means that systems will sometimes converge even if the coefficient matrix is not diagonally
dominant. Occasionally, it is possible to re–arrange a system of equations to give a diagonally
dominant coefficient matrix.

Example  
1 3 −5
A= 1 4 1 
4 −1 2
We have:

i = 1 : |1| > |3| + | − 5| = 8 (not true)


i = 2 : |4| > |1| + |1| = 2 (true)
i = 3 : |2| > |4| + | − 1| = 5 (not true)

Clearly inequalities are not satisfied for i = 1 and i = 3, so this matrix is not diagonally
dominant. If we re-arrange A by swaping Rows 1 and 3 to get:
 
4 −1 2
A0 =  1 4 1 
1 3 −5

then

i = 1 : |4| > | − 1| + |2| = 3 (true)


i = 2 : |4| > |1| + |1| = 2 (true)
i = 3 : |5| > |1| + |3| = 4 (true)

i.e A0 is diagonally dominant.


Note: > If both the Jacobi and the GS are convergent, the GS method converges twice as fast as
the Jacobi method.

36
3.3.11 Relaxation Method
This is a method used to achieve faster convergence, or in some cases to obtain convergence of
some systems that are not convergent by Gauss-Seidel. This method is a weighted average of x(i)
(i+1)
and xGS :
(i+1)
x(i+1) = (1 − ω)x(i) + ωxGS , 0 < ω < 2
In component form:
 
(i+1) (i) ω  X (i+1)
X (i)
xj = (1 − ω)xj + bj − ajk xk − ajk xk  , (3.20)
ajj
k<j k>j

where ω ∈ (0, 2) is some weight factor, called the relaxation coefficient. It can be shown that
the solution diverges for ω ∈
/ (0, 2). ω is chosen to accelerate convergence
• If ω = 1,⇒ Gauss–Seidel iteration.
• If 1 < ω < 2, ⇒ {Successive Over–relaxation} (SOR).
• If 0 < ω < 1, ⇒ {Successive under–relaxation}.
Equation (3.20) can be re-arranged as:
 
(i+1) (i+1) (i)
X X
ajj xj +ω ajk xk = ωbj + (1 − ω)ajj − ω ajk  xk , (3.21)
k<j k>j

which in matrix form is:


(D + ωL)x(i+1) = ωb + [(1 − ω)D − ωU]x(i) ,
or h i
x(i+1) = (D + ωL)−1 ωb + [(1 − ω)D − ωU]x(i)
Therefore the iteration matrix and the constant vector are:
Bω = (D + ωL)−1 [(1 − ω)D − ωU], c = (D + ωL)−1 ωb
To obtain an optimum value of ω it can be shown that, if λ is the largest eigenvalue in magnitude
of BJ = D−1 (L + U) then:
2
ωopt = √ .
1 + 1 − λ2
For large systems determining λ may be complicated, however techniques do exist for its estima-
tion.
With an optimal value of ω (usually ω > 1) the convergence rate of SOR can be an order of
magnitude higher than that of GS.
For the same example used for jacobi and Gauss-Seidel method (3.20) with ω = 1.25 is:

(i+1) 3(1.25) (i) 24(1.25)


(i)
x1 = (1 − 1.25)x1 − x2 +
4 4
(i+1) 3(1.25) (i+1) (i) 1.25 (i) 30(1.25)
x2 = − x1 + (1 − 1.25)x2 + x +
4 4 3 4
(i+1) 1.25 (i+1) (i) 24(1.25)
x3 = x + (1 − 1.25)x3 −
4 2 4
...
etc

37
If x(0) = (1, 1, 1)T , five iterations lead to:
   
x1 3.00037211
x2  =  4.0029250 
x3 −5.0057135

3.3.12 Exercises
1. Using Jacobi and GS methods perform 5 iterations on the system:

3x1 + 3x2 − 7x3 = 4


3x1 − x2 + x3 = 1
3x1 + 6x2 + 2x3 = 0

using the initial approximation [1 1 1]T .


• Are the results converging?
• Check to see if the matrix is diagonally dominant.
• If not diagonally dominant re–arrange it to make it diagonally dominant and repeat the
iterations. Are the results convergent this time?
2. Perform the first three Jacobi and GS iterations for the solution of the following system start-
ing from (0, 0, 0, 0, 0)
    
8 −2 1 0 0 x1 7.2

 −2 8 −2 1 0 
 x2  
  2.1 


 1 −2 8 −2 1 
 x3 =
  1.6 

 0 1 −2 8 −2  x4   2.1 
0 0 1 −2 8 x5 7.2

3. Applying a weighting factor of 1 when using SOR, means we are essentially implementing
what?
4. Write the pseudocode for both Jacobi and Gauss-Seidel methods.

38
4 Numerical Solutions to Nonlinear Equations
Non-linear equations occur in many world problems and are rarely solvable analytically.
It is of great importance to solve equations of the form

f (x) = 0,

in many applications in science and engineering. The values of x that make f (x) = 0 are called
the roots (or the zeros) of this equation.
This type of problem also includes determining the points of intersection of curves. If f (x)
and g(x) represent equations of two curves, the intersection points correspond to the roots of the
function F (x) = f (x) − g(x) = 0.
Examples of nonlinear equations:

• ax2 + bx + c = 0 (two roots).


• x3 + 2x2 − x − 2 = 0 (three roots)
• x sin x = 1 (infinitely many roots).
• x = e−x (one root)
• x = ex (No roots)

In [2]: x = 0:0.1:5;
f1 = @(x) x.^3+2*x.^2-x-2; f2 = @(x) x.*sin(x) - 1; f3 = @(x) x - exp(-x); f4 = @(x
y1 = f1(x);
y2 = f2(x);
y3 = f3(x);
y4 = f4(x);

figure
hold on
plot(x, y1);
plot(x, y2);
plot(x, y3);
plot(x, y4);xlabel('x');ylabel('y');
axis([0 2*pi -5.5 5.5]);
legend('x^3+2x^2-x-2','x sin x -1', 'x - exp(-x)', 'x - exp(x)');
title('Some Nonlinear Equations');
hold off

39
4.1 Nonlinear equations in one unknown: f (x) = 0
We shall examine two types of iterative methods for determining the roots of the equation f (x) =
0, namely:

• Bracketing methods, also known as interval methods.


• Fixed point methods

To obtain these intervals or initial approximations graphical methods are usually used.

4.1.1 Interval Methods


These methods require an initial interval which is guaranteed to contain a root. The width of this
interval (bracket) is reduced iteratively until it encloses the root to a desired accuracy.

4.1.2 Bisection Method


The bisection method is an incremental search method in which the interval is always divided in
half.
Intermediate value theorem:
If f (x) is real and continuous in an interval [a, b] and f (a)f (b) < 0, then there exists a point
c ∈ (a, b) such that f (c) = 0.

40
If we calculate the midpoint of [a, b] i.e.,

1
c = (a + b)
2
then:

• If f (a)f (c) < 0 then f (a) and f (c) have opposite signs and so the root must lie in the smaller
interval [a, c].
• If f (a)f (c) > 0 then f (a) and f (c) have the same signs and so f (b) and f (c) must have
opposite signs, so the root lies in [c, b].

Example Perform two iterations of the bisection method on the function f (x) = x2 − 1, using
[0, 3] as your initial interval.
Answer: The root lies at 1, but after two iterations, the interval will be [0.75, 1.5].

In [4]: x = -1:0.1:3;
f = @(x) x.^2 - 1;
y = f(x);

figure
hold on
grid on
plot(x, y, 'b');
plot(0.75, f(0.75), 'r*');
plot(1, f(1), 'k*');
plot(1.5, f(1.5), 'r*');
hold off

41
Stopping Critera:
We use a stopping criteria of
|bn − an | < 
We have

|b1 − a1 | = |b − a|
1
|b2 − a2 | = |b1 − a1 |
2
..
.
1
|bn − an | = |bn−1 − an−1 |
2
1
= |bn−2 − an−2 |
22
1
= n−1
|b1 − a1 |
2
We require that |bn − an | ≈  which implies

1 |b1 − a1 |
|b1 − a1 | ≈ , or 2n = 2
2n−1 
or
 
|b1 − a1 |
n = log 2 / log 2 (4.1)


42
Example Find the root of f (x) = sin(x) − 0.5 between 0 and 1. Iterate until the interval is of
length 213
Answer: the final interval is [0.5, 0.625]. f (0.5) = −0.0206

If the bisection algorithm is applied to a continuous function f on an interval [a, b],


where f (a)f (b) < 0, then, after n steps, an approximate root will have been computed
with error at most (b − a)/2n+1 .

• Bisection will always converge to a root if the function is continuous.


• Reliable but slow. The method does not exploit any knowledge about the function
in question.
• Convergence rate is linear. It gains the same amount of accuracy from iteration to
iteration.

4.1.3 False position method or Regula Falsi


The bisection method is attractive because of its simplicity and guaranteed convergence. Its dis-
advantage is that it is, in general, extremely slow.
Regula Falsi algorithm is a method of finding roots based on linear interpolation. Its conver-
gence is linear, but it is usually faster than bisection. On each iteration a line is drawn between
the endpoints (a, f (a)) and (b, f (b)) and the point where this line crosses the x−axis taken as the
point c.

In [6]: x = 0:0.1:3;
f = @(x) x.^2 - 1;
y1 = f(x);
y2 = 3 * x - 1;

figure
hold on
grid on
plot(x, y1, 'b');
plot(x, y2, 'g');
plot(0, f(0), 'r*');
plot(1/3, 0, 'k*');
plot(3, f(3), 'r*');
hold off

43
The equation of the line through (a, f (a)) and (b, f (b)) is
x−a
y = f (a) + (f (b) − f (a)).
b−a
We require the point c where y = 0, i.e.
c−a
f (c) = f (a) + (f (b) − f (a)) = 0,
b−a
from which we solve for c to get:

af (b) − bf (a)
c= (4.2)
f (b) − f (a)
The sign of f (c) determines which side of the interval does not contain the root, which side is
discarded to give a new, smaller interval containing the root. The procedure is continued until the
interval is sufficiently small.

Example Perform two iterations of the false position method on the function f (x) = x2 −1, using
[0, 3] as your initial interval. Compare your answers to those of the bisection method.
Answer: False position, in other words, performs a linear fit onto the function, and then di-
rectly solves that fit.
With Bisection we obtain the following,

44
a c b
0 1.5 1.5
0.75 0.75 1.5
0.75 1.125 1.125
0.9375 0.9375 1.125
0.9375 1.03125 1.03125
0.984375 0.984375 1.03125

Stopping criteria The false position method often approaches the root from one side only, so
we require a different stopping criteria from that of the bisection method. We usually choose:

|c − c∗ | < 

where c∗ is the value of c calculated from the previous step.

• Normally faster than Bisection Method. Can decrease the interval by more than
half at each iteration.
• Superlinear convergence rate. Linear convergence rate in the worst case.
• Usually approaches the root from one side.

Exercise Use the bisection method and the false position method to find the root of f (x) =
x2 − x − 2 that lies in the interval [1, 4].

4.1.4 Fixed Point Methods


For these methods we start with an initial approximation to the root and produce a sequence of
approximations, each closer to the root than its predecessor.

4.1.5 Newton’s Method


This is one of the most widely used of all root-finding formulae. It works by taking as the new
approximation the point of intersection of the tangent to the curve y = f (x) at xi with the x–axis.
Thus we seek to solve the equation f (x) = 0, where f is assumed to have a continuous derivative
f 0.

Newton developed this method for solving equations while wanting the find the root
to the equation x3 − 2x − 5 = 0. although he demonstrated the method only for poly-
nomials, it is clear he realised its broader applications.

In [9]: x = 1.5:0.1:4; xp = 2.89:0.01:4; y = 0:0.01:13.36;


f = @(x) x^3 - 2*x - 5; fv = @(x) x.^3 - 2.*x - 5; g = @(x) 3*x^2 - 2;
tp = @(f, g, x, x0) f(x0) + g(x0).*(x - x0);
x0 = 4;
nM = @(f, g, x) x - (f(x)/g(x));
figure

45
hold on
grid on
plot(x, fv(x)); % main function
plot(x, zeros(length(x), 1)); % x-axis
plot(xp, tp(f, g, xp, x0)); % first tangent
plot([2.89 2.89], [0 13.36]); % second guess
plot(x0, f(x0), 'ko'); % initial point
for i = 1:4
xn = nM(f, g, x0);
x0 = xn;
plot(x0, f(x0), 'r*');
end
hold off

Newton’s method can be derived in several ways; we choose to do it using Taylor series.
Let xi+1 = xi + h and obtain a Taylor’s expansion of f (xi+1 ) about xi ,

h2 00
f (xi+1 ) = f (xi ) + hf 0 (xi ) +
f (xi ) + · · · (4.3)
2
An approximation is obtained by truncating the Taylor series after two terms:

f (xi+1 ) ≈ f (xi ) + hf 0 (xi )

Thus this series has an error O(h2 ).

46
Ideally f (xi+1 ) = 0 so that solving for h gives
f (xi )
h=− , provided f 0 (xi ) 6= 0.
f 0 (xi )
Therefore

f (xi )
xi+1 = xi + h = xi − , i = 0, 1, 2, · · · (4.4)
f 0 (xi )
which is called Newton’s (or Newton-Raphson’s) iterative formula.

• Requires the derivative of the function.


• Has quadratic convergence rate. Linear in worst case.
• May not converge if too far from the root.
• Could get caught in basins of attraction with certain sinusoidals.

4.2 Newton’s Method for Systems of Nonlinear Equations


Newton’s method may also be used to find roots of a system of two or more non-linear equations.
Consider a system of two equations:

f (x, y) = 0, g(x, y) = 0, (4.5)


Using Taylor’s expansion of the two functions near (x, y) we have

∂f ∂f
f (x + h, y + k) = f (x, y) + h +k + terms in h2 , k 2 , hk (4.6)
∂x ∂y
∂g ∂g
g(x + h, y + k) = g(x, y) + h +k + terms in h2 , k 2 , hk (4.7)
∂x ∂y
and if we keep only the first order terms, we are looking for a couple (h, k) such that:

∂f ∂f
f (x + h, y + k) = 0 ≈ f (x, y) + h +k (4.8)
∂x ∂y
∂g ∂g
g(x + h, y + k) = 0 ≈ g(x, y) + h +k (4.9)
∂x ∂y
hence it is equivalent to the linear system:

" #
∂f ∂f   
∂x ∂y h f (x, y)
∂g ∂g =− (4.10)
∂x ∂y
k g(x, y)

The 2 × 2 matrix is called the Jacobian matrix (or Jacobian) and is sometimes denoted as:
" #
∂f ∂f
∂x ∂y
J(x, y) = ∂g ∂g
∂x ∂y

The couple (h, k) is thus    


ch −1 cf (x, y)
= −J (x, y)
k g(x, y)

47
The general Jacobian of a a (n × n) matrix for a system of n equations and n variables,
(x1 , x2 , . . . , xn ) is immediate:  ∂f 
∂f1 ∂f1
∂x2 , · · · ∂xn
1
∂x1
 ∂f2 ∂f2 ∂f2 
 ∂x1 ∂x2 · · · ∂x 
n 
J =
 .. . . ..
 . . .


∂fn ∂fn ∂fn
∂x1 ∂x2 · · · ∂x n

If we define xi+1 = xi + h and yi+1 = yi + k then the equation (4.10) suggests the iteration
formula:      
xi+1 xi −1 f (xi , yi )
= − J (xn , yn )
yi+1 yi g(xi , yi )
Starting with an initial guess (x0 , y0 ) and under certain conditions it’s possible to show that this
iteration process converges to a root of the system.

Exercise: Use Newton’s method to look for a root near x0 = −0.6, y0 = 0.6.

f (x, y) = x3 − 3xy 2 − 1
g(x, y) = 3x2 y − y 3

Exercises
1. Show that the equation x = cos x has a solution in the interval [0, π/2]. Use the bisection
method to reduce the interval containing the solution to a length of 0.2.
2. Use the bisection method to solve
e−x = ln x, a = 1, b=2
3. Apply (i) the bisection method (ii) False Position and (iii) Newton’s method to solve each of
the following equations to, at least, 6D.
(a) x2 = e−x (b) 2x = tan x, near x=1
4. Make one Newton iteration for each of the following systems:
(a) xy 2 = yex , x cos y − 1 = e−y , near (0, 1)
(b) f1 (x, y) = x2 − 2y 2 − 1, f2 (x, y) = x3 y 2 − 2, near (1.5, 1)
5. Briefly explain how bracketing algorithms work to find zeros of one dimensional functions
and describe two variations used in practice.
6. Is Newton’s Method guaranteed to find the zero of any continuous function that has a zero
and for any starting point?
7. Given an initial bracket of [0, 100], how many steps of Bisection Method are required to
reduce the bracket size below 10−15 ?
8. Explain the meaning of the phrase: A convergent numerical method is qualitatively just as good
as an analytical solution
9. Motivate the False-Position Method, why is it generally preferable to the Bisection Method?

48
5 Numerical Differentiation
In certain situations it is difficult to work with the actual derivative of a function. In some cases
a derivative may fail to exist at a point. Another situation is when dealing with a function rep-
resented only by data and no analytic expression. In such situations it is desirable to be able to
approximate the derivative from the available information. Presented below are methods used
approximate f 0 (x).
Numerical differentiation is not a particularly accurate process. It suffers from round-off errors
(due to machine precision) and errors through interpolation. Therefore, a derivative of a function
can never be computed with the same precision as the function itself.

5.1 Finite Difference Methods


The derivative of y = f (x) is:

dy f (x + h) − f (x)
= f 0 (x) = lim . (5.1)
dx h→0 h

5.1.1 Approximations to f 0 (x)


Given a smooth function f : R → R, we wish to approximate its first and second derivatives at a
point x. Consider the Taylor series expansions:

f 00 (x) 2 f 000 (x) 3


f (x + h) = f (x) + f 0 (x)h + h + h + ..., (5.2)
2 6
and

f 00 (x) 2 f 000 (x) 3


f (x − h) = f (x) − f 0 (x)h + h − h + .... (5.3)
2 6
Solving for f 0 (x) in Equation (5.2), we obtain the Forward Difference Formula:

f (x + h) − f (x) f 00 (x) f (x + h) − f (x)


f 0 (x) = − h + ... ≈ , (5.4)
h 2 h
which gives an approximation that is first-order accurate since the dominant term in the re-
mainder of the series is O(h).
Similarly, from Equation (5.3) we derive the Backward Difference Formula:

f (x) − f (x − h) f 00 (x) f (x) − f (x − h)


f 0 (x) = + h + ... ≈ , (5.5)
h 2 h
which is also O(h).
Now, subtracting Equation (5.3) from Equation (5.2) gives the Central Difference Formula:

f (x + h) − f (x − h) f 000 (x) 2 f (x + h) − f (x − h)
f 0 (x) = − h + ... ≈ , (5.6)
2h 6 2h
which is second order accurate, i.e. O(h2 ).

49
5.1.2 Approximations to f 00 (x)
Adding Equation (5.3) to Equation (5.2) gives the Central Difference Formula for the second
derivative:

f (x + h) − 2f (x) + f (x − h) f 4 (x) 2 f (x + h) − f (x) + f (x − h)


f 00 (x) = − h + ... ≈ , (5.7)
h2 12 h2
which is second order accurate (O(h2 )).
Of course we can keep using function values at further addition points, x ± 2h, x ± 3h, . . . etc.
This gives us similar difference formulas but at much higher accuracy, or for high-order deriva-
tives. The downside to these however, is that we require more function values. This may add
much higher computational cost depending on the situation.

Mathematica Demonstration Show Mathematica Demonstration

5.1.3 Example
Compute an approximation to f 0 (1) for f (x) = x2 cos(x) using the central difference formula and
h = 0.1, 0.05, 0.025, 0.0125.

In [1]: cfd = @(f, x, h) (f(x + h) - f(x - h))/(2*h)


x = 1;
h = [0.1 0.05 0.025 0.0125 0.00625];

for i = 1:length(h)
y = cfd(@(x) x^2*cos(x), x, h(i));
fprintf('The derivative at x = 1 with h = %1.5f is f^1(x) = %4.6f\n', h(i), y)
end
tans = 2*cos(1) -sin(1);
fprintf('----------------------------------------------------------------\n')
fprintf('The true solution at x = 1 is: f^1(x) = %4.6f\n', tans)
fprintf('----------------------------------------------------------------\n')

cfd =

function_handle with value:

@(f,x,h)(f(x+h)-f(x-h))/(2*h)

The derivative at x = 1 with h = 0.10000 is fˆ1(x) = 0.226736


The derivative at x = 1 with h = 0.05000 is fˆ1(x) = 0.236031
The derivative at x = 1 with h = 0.02500 is fˆ1(x) = 0.238358
The derivative at x = 1 with h = 0.01250 is fˆ1(x) = 0.238940
The derivative at x = 1 with h = 0.00625 is fˆ1(x) = 0.239085
----------------------------------------------------------------
The true solution at x = 1 is: fˆ1(x) = 0.239134

50
----------------------------------------------------------------

5.2 Richardson’s Extrapolation


In numerical differentiation and soon to be seen integration, we are computing approximate val-
ues according to some stepsize. Clearly we would have an ideal case where the stepsize ap-
proaches zero as seen in our demo. However, due to rounding error this is simply not possible.
Using nonzero stepsizes however, we may be able to estimate the what the value would be for a
stepsize approaching zero. If we compute some value F from some stepsizes hi and know some-
thing of its behaviour of F as h → 0, then it may be possible to extrapolate from the known values
an approximation of F at h = 0. This extrapolation will be of higher order accuracy than any of
the originally used values.
In summary:

Richardson extrapolation method is a procedure which combines several approximations


of a certain quantity to yield a more accurate approximation of that quantity.

Suppose we are computing some quantity F and assume that the result depends on some stepsize
h. Denoting the approximation by f (h), we have F = f (h) + E(h), where E(h) represents an
error. Richardson’s extrapolation can remove the error provided E(h) = chp , where c and p are
constants. We start by computing f (h) at some value of h, say h1 giving:

F = f (h1 ) + chp1 ,

and another value h = h2 :


F = f (h2 ) + chp2 .
Then solving the above equations for F we get:

(h1 /h2 )p f (h2 ) − f (h1 )


F = ,
(h1 /h2 )p − 1

which is the Richardson’s Extrapolation Formula. In this course we will only consider half-steps,
thus h2 = h1 /2. So if we use our difference formulae to compute our initial approximations
T01 , T02 , . . . , T0n (which we should try to use as higher an order as possible), then we end up with
the formula:

i 1 m i+1 i

Tm = 4 Tm−1 − Tm−1 , m, i = 1, 2, . . . , n. (5.8)
4m − 1

5.2.1 Example
Build a Richardson’s extrapolation table for f (x) = x2 cos(x) to evaluate f 0 (1) for h =
0.1, 0.05, 0.025, 0.0125.
Solution:
We have:

51
1 1
T11 = T02 + (T02 − T01 ) = (4T02 − T01 )
3 3
1 3 1
2 3
T1 = T0 + (T0 − T0 ) = (4T03 − T02 )
2
3 3
1 1
T13 = T04 + (T04 − T03 ) = (4T04 − T03 )
3 3
1 2 1
1 2
T2 = T1 + (T1 − T1 ) = (16T12 − T11 )
1
15 15
1 1
T22 = T13 + (T13 − T12 ) = (16T13 − T12 )
15 15
1 2 1
1 2
T3 = T2 + (T2 − T2 ) = (64T22 − T21 )
1
63 63
In Tabular form:

hi T0i T1i T2i T3i


0.1 0.226736
0.05 0.236031 0.239129
0.025 0.238358 0.239133 0.239134
0.0125 0.238938 0.239132 0.239132 0.239132

5.2.2 Exercises
• Use the centred difference formula to approximate the derivative of each of the following
functions at the specified location and for the specified size:
– y = tan x at x = 4, h = 0.1

– y = sin(0.5 x) at x = 1, h = 0.125

• A jet fighter’s position on an aircraft carrier’s runway was timed during landing: where x
is the distance from the end of the carrier, measured in metres and t is the time in seconds.
Estimate the velocity and acceleration for each time point and plot these values accordingly.

t 0 0.51 1.03 1.74 2.36 3.24 3.82


x 154 186 209 250 262 272 274

• Using Richardson’s extrapolation to estimate the first derivative of y = sin x at x = π/3 using
stepsizes h1 = π/3 and h2 = π/6. Employ central differences. Work out the true solution
analytically and compares it with your estimates.
• The following data was collected when a large oil tanker was loading. Calculate the flow
dV
rate Q = for each time point.
dt

52
t, min 0 15 30 45 60 90 120
V, 106 barrels 0.5 0.65 0.73 0.88 1.03 1.14 1.30

53
6 Numerical Integration
A common problem is to evaluate the definite integral:
Z b
I= f (x)dx. (6.1)
a
Here we wish to compute the area under a the curve f (x) over an interval [a, b] on the real
line. The numerical approximation of definite integrals is known as numerical quadrature. We
will consider the interval of integration to be finite and assume the integrand f is smooth and
continuous.
Since integration is an infinite summation we will need to approximate this infinite sum by a
finite sum. This finite sum involves sampling the integrand a some number of finite points within
the interval, this is known as the quadrature rule. Thus, our goal is to determine which sample
points to take and how to weight these in contribution to the quadrature formula. We can design
these to a desired accuracy at which we are satisfied with the computational cost required. Gener-
ally, this computational cost is measured through the number of integrand function requirements
undertaken.

6.1 Quadrature Rules


An n-point quadrature formula has the form:
Z b n
X
I= f (x)dx = wi f (xi ) + Rn . (6.2)
a i=1

The points xi are the values at which f is evaluated (called nodes), the multipliers wi (called
weights) and the remainder Rn . To approximate the value of the integral we compute:
n
X
I= wi f (xi ), (6.3)
i=1

giving the quadrature rule.


Methods of numerical integration are divided into two groups; (i) Newton-Cotes formulas
and (ii) Gaussian Quadrature. Newton-Cotes formulas deal with evenly spaced nodes. They are
generally used when f (x) can be computed cheaply. With Gaussian Quadrature nodes are chosen
to deliver the best possible accuracy. It requires less evaluations of the integrand and is often
used when f (x) is expensive to compute. It is also used when dealing with integrals containing
singularities or infinite limits. In this course we will only be working with Newton-Cotes.

6.2 Newton-Cotes Quadrature


If the nodes xi are equally spaced on the interval [a, b], then the resultant quadrature rule is known
as a Newton-Cotes Quadrature rule. A closed Newton-Cotes rule includes the endpoints a and
b, if not, the rule is closed.
Consider the definite integral:
Z b
I= f (x)dx. (6.4)
a

54
Dividing the interval of integration (a, b) into n equal intervals, each of length h = (b − a)/n,
then we obtain our nodes x0 , x1 , . . . , xn . We then approximate f (x) with an interpolant of degree
n which intersects all the nodes. Thus:
Z b Z b
I= f (x)dx ≈ Pn (x)dx. (6.5)
a a

6.2.1 Trapezoidal Rule


This is the first and simplest of Newton–Cotes closed integration formulae. It corresponds to the
case when the polynomial is of first degree. We partition the interval [a, b] of integration into n
subintervals of equal width, and with n + 1 points x0 , x1 , · · · , xn , where x0 = a and xn = b. Let

b−a
xi+1 − xi = h = , i = 0, 1, 2, · · · , n − 1.
n
On each subinterval [xi , xi+1 ], we approximate f (x) with a first degree polynomial,

fi+1 − fi
P1 (x) = fi + (x − xi )
xi+1 − xi
fi+1 − fi
= fi + (x − xi ).
h
Then we have:

Z xi+1 Z xi+1
f (x)dx ≈ P1 (x)dx
xi xi
xi+1
fi+1 − fi
Z
= fi + (x − xi )dx
xi h
fi+1 − fi h2
= hfi +
h 2
h
= (fi + fi+1 )
2
Geometrically, the trapezoidal rule is equivalent to approximating the area of the trapezoid un-
der the straight line connecting f (xi ) and f (xi+1 ). Summing over all subintervals and simplifying
gives:

h
I≈ [f0 + 2(f1 + f2 + · · · + fn−1 ) + fn ] , (6.6)
2
which is known as the Composite Trapezoidal rule. In practice we would always used com-
posite trapezoidal rule since it is simply trapezoidal rule applied in a piecewise fashion. The error
of the composite trapezoidal rule is the difference between the value of the integral and the com-
puted numerical result:
Z b
E= f (x)dx − I, (6.7)
a
So:

55
(b − a)h2 00
ET = − f (),  ∈ [a, b], (6.8)
12
where  is a point which exists between a and b. We can also see that the error is of order
O(h2 ). Therefore, if the integrand is concave then the error is negative and the trapezoidal rule
overestimates the true value. Should the integrand be concave then the error is positive and we
have underestimated the true value.

Example: Using the trapezoidal rule, evaluate:


Z 1
1 π
2
dx = ,
0 1+x 4
use n = 6, i.e. we need 7 nodes.
Solution:
Since n = 6 then h = (1 − 0)/6 = 1/6, therefore:
1
I≈ [f0 + 2(f1 + f2 + f3 + f4 + f5 ) + f6 ]
12
In [1]: trap = @(f, x, h) (h/2).*(f(x(1)) + sum(2.*f(x(2:end-1))) + f(x(end)));
fprintf('Computed inputs:\n')
x = linspace(0, 1, 7)
h = 1/6
f = @(x) (1+x.^2).^(-1);
ans = trap(f, x, h);
fprintf('The trapezoidal method yields: %1.6f\n', ans)
tans = pi/4;
fprintf('The true answer: %1.6f\n', tans)

figure()
hold on
grid on
plot(x, f(x), 'r*-');xlabel('x');ylabel('y');
x2 = linspace(0, 1, 100);
y2 = f(x2);
plot(x2, y2, 'b-');
legend('Trapezoidal Rule','Analytical Solution');
title('Trapezoidal Rule Vs Analytical Solution');
hold off

Computed inputs:

x =

0 0.1667 0.3333 0.5000 0.6667 0.8333 1.0000

56
h =

0.1667

The trapezoidal method yields: 0.784241


The true answer: 0.785398

6.2.2 Simpson’s Rule


The trapezoidal rule approximates the area under a curve by summing over the areas of trape-
zoids formed by connecting successive points by straight lines. A more accurate estimate of the
area can be achieved by using polynomials of higher degree to connect the points. Simpson’s rule
uses a second degree polynomial to connect adjacent points. Interpolating polynomials are con-
venient for this approximation. So the interval [a, b] is subdivided into an even number of equal
subintervals (n is even). Next we pass a parabolic interpolant through through three adjacent
nodes. Therefore our approximation is:

h
I=
[fi−1 + 4fi + fi+1 ] . (6.9)
3
Summing the definite integrals over each subinterval [xi−1 , xi+1 ] for i = 1, 3, 5, · · · , n − 1 pro-
vides the approximation:

Z b
h
f (x)dx ≈ [(f0 + 4f1 + f2 ) + (f2 + 4f3 + f4 ) + · · · + (fn−2 + 4fn−1 + fn )] (6.10)
a 3

57
By simplifying this sum we obtain the approximation scheme:

Z b
h
f (x)dx ≈ [f0 + 4f1 + 2f2 + 4f3 + · · · + 2fn−2 + 4fn−1 + fn ]
a 3
h
≈ [f0 + 4(f1 + f3 + · · · fn−1 ) + 2(f2 + f4 + · · · + fn−2 ) + fn ] (6.11)
3
This method of approximation is known as Composite Simpson’s 1/3 Rule. The error for
Simpson’s rule is:

(b − a)h4 4
ES = − f (),  ∈ [a, b], (6.12)
180
giving an error of order O(h4 ). Hence if the integrand is of degree n ≤ 3, then the error is
zero and we obtain the exact value. The same can be said for the trapezoidal rule the integrand is
linear.

6.3 Romberg Integration


This method of integration uses the trapezoidal to obtain the initial approximation to the integral
followed by Richardson’s approximation to obtain improvements.
We can show that for a trapezoidal approximation:
Z b
I= f (x)dx = T (h) + (h), (h) = a1 h2 + a2 h4 + a3 h6 + · · · = O(h2 ),
a

where,
h (b − a)
(f0 + 2f1 + 2f2 + · · · + 2fn−1 + fn ), h =
T (h) = .
2 n
Consider two trapezoidal approximations with spacing 2h and h and n is even.

I2 = T (2h) + a1 (2h)2 + a2 (2h)4 + a3 (2h)6 + · · · (6.13)


2 4 6
I1 = T (h) + a1 h + a2 h + a3 h + · · · (6.14)

If we subtract equation (6.13) from 4 times equation (6.14) we eliminate the leading error term
(i.e. of O(h2 )) and we get

1
I = (4T (h) − T (2h)) + 4a2 h4 + 20a3 h6 + · · ·
3
after dividing right through by 3. But:

1 h
(4T (h) − T (2h)) = [(2f0 + 4f1 + 4f2 + · · · 4fn−1 + 2fn ) − (f0 + 2f2 + 2f4 + · · · 2fn−2 + fn )
3 3
h
= (f0 + 4f1 + 2f2 + 4f3 + · · · 2fn−2 + 4fn−1 + fn )
3
= S(h),

which is the Simpson’s rule, S(h), for h, with an error O(h4 ).

58
If we repeat for h/2, assuming that n is a multiple of 4, we have:

Ih = S(h) + c1 h4 + c2 h6 + · · · (6.15)
 4  6
h h
Ih/2 = S(h/2) + c1 + c2 + ··· (6.16)
2 2

Multiply (6.16) by 16 and subtract (6.15) to get

16S(h/2) − S(h)
I= + d1 h6 + · · ·
15
which is now more accurate, with an error O(h6 ).
We now generalize the results for hk = (b − a)/2k , n = 2k . Hence the trapezoidal rule for 2k
subintervals (i.e. n is even) becomes
hk
T0,k = (f0 + 2f1 + 2f2 + · · · + 2f2k −1 + f2k )
2
I = T0,k + a1 h2k + a2 h4k + a3 h6k + · · ·
We define
1
T1,k = (4T0,k+1 − T0,k ), k = 0, 1, · · ·
3
which is the Simpson’s rule for hk and hence has an error O(h4k ), i.e.,

I = T1,k + c1 h4k + c2 h6k + · · ·

In general, we define
1
Tm,k = (4m Tm−1,k+1 − Tm−1,k ), k = 1, · · · , m = 1, 2, · · · (6.17)
4m
−1
We can represent the approximations in the triangular form:

hi T0i T1k T2k


h1 T01
h2 T02 T11
h3 T03 T12 T21
.. .. .. ..
. . . .
hm T0m T1m−1 ... ... 1
Tm

6.3.1 Example
Use Romberg integration to find the integral of f (x) = e−x for x ∈ [0, 1]. Take the initial sub–
interval as h = (1 − 0)/2 = 0.5. Use 6 decimal places

59
hk T0k T1k T2k
0.5 0.645235
0.25 0.635409 0.632134
0.125 0.632943 0.632121 0.632121

Hence T21 = 0.632121 with an error of O(h6 ).

6.3.2 Exercises
• Use (a) the trapezoidal rule (b) Simpson’s rule to estimate I for the following:
1
– (i) f (x) = , over the interval [0, 1] for n = 4
1 + x2
2
– (ii) f (x) = xe−x over the interval [0, 2] for n = 4 Compare your numerical results
with the analytical ones.
• Use Romberg’s method to approximate to integral
Z 1p
I= 1 − x2 dx
0

Use h1 = R0.2, h2 = 0.1 and h3 = 0.05.


π
• Estimate 0 f (x)dx as accurately as possible, where f (x) is defined by the data:

x values 0 π/4 π/2 3π/4 π


$f(x) $ 1.0000 0.3431 0.2500 0.3431 1.0000
r
L
• The period of a simple pendulum of length L is τ = 4 h(θ0 ), where g is the gravitational
g
acceleration, θ0 represents the angular amplitude and:
Z π/2

h(θ0 ) = p .
0 1 − sin (θ0 /2) sin2 θ
2

Compute h(15◦ ), h(30◦ ) and h(45◦ ).

60
7 Data Fitting and Interpolation
7.1 Interpolation
Typically, from experimental observations or statistical measurements we may have the value of
a function f at a set of points x0 , x1 , · · · , xn (x0 < x1 < · · · < xn ). However, we do not have an
analytic expression for f which would allow us to calculate the value of f at an arbitrary point.
You will frequently have occasion to estimate intermediate values between precise data points
when dealing with real world data sets. The most common method used for this purpose is poly-
nomial interpolation.
Polynomial functions which fit the known data are commonly used to allow us to approximate
these arbitrary points. If we use this function to approximate f for some point x0 < x < xn then
the process is called interpolation. If we use it to approximate f for x < x0 or x > xn then it is
called extrapolation.
Polynomials are used because:

• Computers can handle them easily. Which makes for fast and efficient programming.

• The integration and differentiation of polynomials is straightforward computationally.

• Polynomials are smooth functions - i.e. not only is a polynomial a continuous function, but
all the derivatives exist and are themselves continuous.

• Polynomials are uniformly approximate continuous functions. This means that, given any
function, which is continuous on some interval [a, b] and any positive number  (no matter
how small) we can find a polynomial P such that

|f (x) − P (x)| < , x ∈ [a, b]

This result is known as Weierstrass Approximation theorem.

For n + 1 data points, there is one and only one polynomial of order n that passes through
all the points. For example, there is only one straight line (that is, a first-order polynomial) that
connects two points. Similarly, only one parabola connects a set of three points. Polynomial
interpolation consists of determining the unique nth-order polynomial that fits n + 1 data points.
This polynomial then provides a formula to compute intermediate values.

7.1.1 Weierstrauss Approximation Theorem


One of the most useful and well-known classes of functions mapping the set of real numbers into
itself is the algebraic polynomials, the set of functions of the form,

Pn (x) = an xn + an−1
n−1 + ... + a1 x + a0 ,

where n is a nonnegative integer and a0 , ..., an are real constants. One reason for their impor-
tance is that they uniformly approximate continuous functions. By this we mean that given any
function, defined and continuous on a closed and bounded interval, there exists a polynomial that
is as “close” to the given function as desired. This result is expressed precisely in the Weierstrass
Approximation Theorem.

61
Definition 7.1 (Weierstrauss Approximation Theorem) Suppose that f is defined and continous on
[a, b]. For each,  > 0, there exists a polynomial P (x), with the property that,
|f (x) − P (x)| < , for all x in [a, b].
Note: Karl Weierstrass (1815-1897) is often referred to as the father of modern analysis be-
cause of his insistence on rigor in the demonstration of mathematical results. He was instru-
mental in developing tests for convergence of series, and determining ways to rigorously define
irrational numbers. He was the first to demonstrate that a function could be everywhere con-
tinuous but nowhere differentiable, a result that shocked some of his contemporaries.

7.1.2 Linear Interpolation


Given only two points (x0 , f (x0 )) and (x1 , f (x1 )) (y = f (x)) the obvious interpolating function is
the (unique) straight line that passes through them.
Let P1 (x) = a0 + a1 x = f (x). Since this polynomial has to pass through these two points, it is
required that:

a0 + a1 x0 = f (x0 ) (7.1)
a0 + a1 x1 = f (x1 ) (7.2)
By solving for a0 and a1 , it is easy to show that:
f (x0 )x1 − f (x1 )x0 f (x1 ) − f (x0 )
a0 = , a1 =
x1 − x0 x1 − x0
and hence:
f (x0 )x1 − f (x1 )x0 f (x1 ) − f (x0 )
P1 (x) = +x
x1 − x0 x1 − x0
which can be rearranged to yield:
f (x1 ) − f (x0 )
P1 (x) = f (x0 ) + (x − x0 )
x1 − x0
which is a linear interpolating formula.
Hence at x = x∗ the linear interpolate is :
f (x1 ) − f (x0 ) ∗
f (x∗ ) = f (x0 ) + (x − x0 ).
x1 − x0
Note that the quotient f (xx11)−f
−x0
(x0 )
is the slope of the line joining (x0 , f (x0 )) and (x1 , f (x1 )). It is
also a finite divided difference approximation to the first derivative.

62
Example Estimate ln(2) using linear interpolation given x0 = 1 and x1 = 6.
Solution:

ln 6 − ln 1
P(2) = ln 1 + (2 − 1) = 0.3583519
6−1
Calculator value ln 2 = 0.6931472.
In this case the error is large because for one the interval between the data points is large and
secondly we are linearly approximating a non-linear function.

7.1.3 Quadratic Interpolation


The error in the above example results because we approximated a curve with a straight line. We
can improve the estimate by introducing some curvature into the line connecting the data points.
Given three distinct points (xi , f (xi )), i = 0, 1, 2, a unique parabola (i.e., a second degree
polynomial) can be fitted through them:

P2 (x) = b0 + b1 x + b2 x2 , (7.3)
by finding suitable coefficients b0 , b1 and b2 . A particularly convenient form for representing
this polynomial is:

P2 (x) = a0 + a1 (x − x0 ) + a2 (x − x0 )(x − x1 ) (7.4)


Note: This polynomial is just equivalent to the general polynomial (7.3). This can be shown by
multiplying out the terms in (7.4)

P2 (x) = (a0 − a1 x0 + a2 x0 x1 ) + (a1 − a2 x0 − a2 x1 )x + a2 x2

and hence:
b0 = a0 − a1 x0 + a2 x0 x1 b1 = a1 − a2 x0 − a2 x1 b2 = a2
Thus equations (7.3) and (7.4) are equivalent formulations of the unique second degree polynomial
joining three points.
Determination of the coefficients a0 , a1 and a2 : The polynomial has to pass through the three
points. Substituting in x = x0 and x = x1 gives:

P2 (x0 ) = a0 = f (x0 ) (7.5)


f (x1 ) − f (x0 )
P2 (x1 ) = f (x0 ) + a1 (x1 − x0 ) = f (x1 ), ⇒ a1 = (7.6)
x1 − x0
Finally, substituting in x = x2 in (7.4) and making use of the evaluated values of a0 and a1 , we
can show, after some algebraic manipulations that:
f (x2 )−f (x1 ) f (x1 )−f (x0 )
x2 −x1 − x1 −x0
a2 =
x2 − x0
Note: that a1 still represents the slope of the line joining (x0 , f (x0 )) and (x1 , f (x1 )). The last term
a2 (x − x0 )(x − x1 ) introduces the second order curvature into the formula.

63
Example Fit a second degree polynomial that goes through the points x0 = 1, x1 = 4 and x2 = 6
for f (x) = ln x. Use this polynomial to approximate ln 2.
Solution:
Polynomial,

P2 (x) = 0 + 0.46209813(x − 1) − 0.051873116(x − 1)(x − 4)

Estimate for ln 2, put x = 2 in P2 (x)

P2 (2) = 0 + 0.46209813(2 − 1) − 0.051873116(2 − 1)(2 − 4) = 0.56584436

This is a more accurate result than obtained using linear interpolation. We now have a relative
error of  = 18.4%. Thus, the curvature introduced by the quadratic formula improves the inter-
polation compared with the result obtained using straight lines.

7.1.4 Lagrange Interpolating Polynomials


The general class of interpolating polynomials that require specification of certain points through
which they must pass is called Lagrange polynomials. Suppose we want to determine a first
degree polynomial that passes through two points (x0 , y0 ) and (x1 , y1 ). Let such a polynomial
have the form:

(x − x1 ) (x − x0 )
P (x) = y0 + y1
(xo − x1 ) (x1 − x0 )
= L0 (x)y0 + L1 (x)y1

It is easy to verify that P (x0 ) = y0 and P (x1 ) = y1 . Thus the polynomial agrees with the
functional values at the two stipulated points. We also note the following about the quotients
L0 (x) and L1 (x). When x = x0 , L0 (x0 ) = 1 and L1 (x0 ) = 0. When x = x1 , L0 (x1 ) = 0 and
L1 (x1 ) = 1. Thus we need to construct the quotients L0 (x) and L1 (x) to determine the polynomial.
In general, to construct a polynomial of degree at most n that passes through the n + 1 points
(x0 , f (x0 )), (x1 , f (x1 )), . . . , (xn , f (xn )), we need to construct for k = 0, 1, . . . , n, a quotient Ln,k (x)
with the property that Ln,k (xi ) = 0 when i 6= k and Ln,k (xk ) = 1. To satisfy Ln,k (xi ) = 0 for each
i 6= k requires that the numerator of Ln,k to contain the term:

(x − x0 )(x − x1 ) . . . (x − xk−1 )(x − xk+1 ) . . . (x − xn ).

To satisfy Ln,k (xk ) = 1, the denominator of Ln,k must equal the denominator of the above numer-
ator evaluated at x = xk . Thus:

(x − x0 ) . . . (x − xk−1 )(x − xk+1 ) . . . (x − xn )


Ln,k (x) =
(xk − x0 ) . . . (xk − xk−1 )(xk − xk+1 ) . . . (xk − xn )
n
Y (x − xi )
= .
(xk − xi )
i=0,i6=k

The lagrange interpolating polynomial is thus given by:

P (x) = Ln,0 (x)f (x0 ) + Ln,1 (x)f (x1 ) + . . . + Ln,n (x)f (xn ) (7.7)

64
If there is no confusion about the degree of the required polynomial we shall simply use Lk
instead of Ln,k .
Error in Lagrange polynomial:
The error in the approximation by the Lagrange interpolating polynomial can be estimated if
f (x) is known as:
n
f (n+1) (ξ(x)) Y
E(x) = (x − xi ), (7.8)
(n + 1)!
i=0

for some ξ(x) ∈ (a, b), a ≤ x0 ≤ x − 1 ≤ . . . ≤ xn ≤ bn , assuming f (n+1) (x) is continuous on


[a, b].

Example Use the following data to approximate f (1.5) using the Lagrange interpolating poly-
nomial for n = 1, 2, and 3.

xi values 1 1.3 1.6 1.9 2.2


f (xi ) 0.7651977 0.6200860 0.4554022 0.2818186 0.1103623

The interpolating polynomial show be,

P (x) = (((0.0018251x + 0.0552928)x − 0.343047)x + 0.0733913)x + 0.977735,

which gives,

P (1.5) = 0.508939.

7.1.5 Newton’s Divided Differences


We first introduce the notation for the divided differences:

• The zeroth divided difference of f w.r.t. xi is f [xi ] = f (xi ) = fi .

• The first divided difference of f w.r.t. xi and xi+1 is:

f [xi+1 ] − f [xi ] fi+1 − fi


f [xi , xi+1 ] = =
xi+1 − xi xi+1 − xi

• The Second divided difference of f w.r.t. xi , xi+1 and xi+2 is:

f [xi+1 , xi+2 ] − f [xi , xi+1 ]


f [xi , xi+1 , xi+2 ] =
xi+2 − xi

• The k th divided difference of f w.r.t. xi , xi+1 , · · · , xi+k is:

f [xi+1 , xi+2 , · · · , xi+k ] − f [xi , xi+2 , · · · , xi+k−1 ]


f [xi , xi+1 , · · · , xi+k ] =
xi+k − xi

65
We now fit an nth degree interpolating polynomial to the n + 1 data points (xi , f (xi )), i =
0, 1, · · · , n in the form:

Pn (x) = a0 + a1 (x − x0 ) + a2 (x − x0 )(x − x1 ) + · · · + an (x − x0 )(x − x1 ) · · · (x − xn−1 ).

Since the polynomial must pass through the points (xi , fi ) we have:

• x = x0 Pn (x0 ) = f0 = a0 = f [x0 ]
f [x1 ]−f [x0 ]
• x = x1 Pn (x1 ) = f1 = f [x0 ] + a1 (x1 − x0 ) = f [x1 ] ⇒ a1 = x1 −x0 = f [x0 , x1 ].

• x = x2

Pn (x2 ) = f2 = f [x2 ] = f [x0 ] + f [x0 , x1 ](x2 − x0 ) + a2 (x2 − x0 )(x2 − x1 ),

and therefore:
f [x2 ] − f [x0 ] − f [x0 , x1 ](x2 − x0 )
a2 =
(x2 − x0 )(x2 − x1 )
With some algebraic manipulation it can be shown that:

f [x1 , x2 ] − f [x0 , x1 ]
a2 = = f [x0 , x1 , x2 ]
x2 − x0

In general:
ak = f [x0 , x1 , · · · , xk ]
so that:

n
X
Pn (x) = f [x0 ] + f [x0 , · · · , xk ](x − x0 ) · · · (x − xk−1 )
k=1
Xn k−1
Y
= f [x0 ] + f [x0 , · · · , xk ] (x − xi ) (7.9)
k=1 i=0

called Newton’s divided difference interpolating polynomial. All divided differences are
calculated in a similar process and the results are usually tabulated in:
a divided difference table:

66
xi f [xi ] f [xi , xi+1 ] f [xi , xi+1 , xi+2 ] f [xi , xi+1 , xi+2 , xi+3 ] f [xi , xi+1 , xi+2 , xi+3 , xi+4 ]

x0 f [x0 ]
f [x0 , x1 ]
x1 f [x1 ] f [x0 , x1 , x2 ]
f [x1 , x2 ] f [x0 , x1 , x2 , x3 ]
x2 f [x2 ] f [x1 , x2 , x3 ] f [x0 , x1 , x2 , x3 , x4 ]
f [x2 , x3 ] f [x1 , x2 , x3 , x4 ]
x3 f [x3 ] f [x2 , x3 , x4 ]
f [x3 , x4 ]
x4 f [x4 ]

Exercise Use a third degree polynomial passing through the points (1, ln 1), (4, ln 4), (5, ln 5) and
(6, ln 6) to estimate ln 2. (Ans: P3 (2) = 0.62876869).

Example Find a polynomial satisfied by (−4, 1245), (−1, 33), (0, 5), (2, 9), (5, 1335).
Solution:

xi f (xi ) f [xi , xi+1 ] f [xi , xi+1 , xi+2 ] f [xi , xi+1 , xi+2 , xi+3 ] f [xi , xi+1 , xi+2 , xi+3 , xi+4 ]

−4 1245
−404
−1 33 94
−28 −14
0 5 10 3
2 13
2 9 88
442
5 1335

Hence,

P4 (x) = 1245 − 404(x + 4) + 94(x + 4)(x + 1) − 14(x + 4)(x + 1)(x) (7.10)


+3(x + 4)(x + 1)x(x − 2)
= 3x4 − 5x3 + 6x2 − 14x + 5.

Note: If an extra data point (x, f (x)) is added, we only need to add an additional term to the
Pn (x) already found.

67
In general if Pn (x) is the interpolating polynomial through the (n + 1) points (xi , fi ), i =
0, 1, · · · , n, then the Newton’s divided difference formula gives Pn+1 through these points plus
one more point (xn+1 , fn+1 ) as i.e.,
n
Y
Pn+1 (x) = Pn (x) + f [x0 , x1 , · · · , xn , xn+1 ] (x − xi ) (7.11)
i=0

Pn+1 (x) improves the interpolation by introducing additional curvature.

7.1.6 Errors of Newton’s interpolating polynomials


Let Pn+1 (x) be the (n + 1)th degree polynomial which fits y = f (x) at the n + 2 points,
(x0 , f (x0 )), (x1 , f (x1 )), · · · , (xn , f (xn ), (x, f (x)). The last point is a general point. Then:
n
Y
Pn+1 (x) = Pn (x) + f [x0 , x1 , · · · , xn , x] (x − xi )
i=0

since f (x) ≈ Pn+1 (x), we have


n
Y
n (x) = Pn+1 (x) − Pn (x) = f [x0 , x1 , · · · , xn , x] (x − xi )
i=0

Remarks: For n = 0,
f (x) − f (x0 )
f [x0 , x] = .
x − x0
We have:
f (x)−f (x0 )
• (Mean value theorem) f [x0 , x] = x−x0 = f 0 (ξ), ξ ∈ [x0 , x].

• (Definition of a derivative) lim x → x0 f [x0 , x] = f 0 (x0 ).

In general, it can be shown that

1 (n)
f [x0 , x1 , · · · , xn ] = f (ξ), ξ ∈ [x0 , xn ]
n!
and hence:
1
f [x0 , x1 , · · · , xn , x] = f (n+1) (ξ), ξ ∈ [x0 , x] (7.12)
(n + 1)!
The error is then:

n
Y
n (x) = f [x0 , x1 , · · · , xn , x] (x − xi )
i=0
n
1 (n+1)
Y
= f (ξ) (x − xi ), ξ ∈ [x0 , x] (7.13)
(n + 1)!
i=0

68
7.1.7 Cubic Splines Interpolation
The previous sections concerned the approximation of arbitrary functions on closed intervals by
the use of polynomials. However, the oscillatory nature of the high-degree polynomials, and the
property that a fluctuation over a small portion of the interval can induce large fluctuations over
the entire range, restricts their use.
The concept of the spline fit originated from the drafting technique of using a thin, flexible
strip to draw a smooth curve through a set of given points. The flexible spline was pinned or held
by weights so that the curve passed through all the data points. The spline passed smoothly from
one interval to the next because of the laws governing beam flexure.
The most widely used spline fitting is the cubic spline. In the cubic spline procedure, a cu-
bic polynomial is passed through each pair of points in such a manner that the first and second
derivatives are continuous throughout the table of points.
A cubic spline s with knots x0 < x1 < · · · < xn satisfies:

• s is a polynomial of degree ≤ 3 in each knot interval Ii = [xi−1 , xi ], i = 1, 2, · · · , n

For xi−1 < x < xi let s(x) = si (x)


The first condition is that the spline must pass through all the data points. So:

fi = ai + bi (xi − xi ) + ci (xi − xi )2 + di (xi − xi )3 , (7.14)


which simplifies to,

ai = fi . (7.15)
Therefore, the constant in each cubic must be equal to the value of the dependent variable at
the beginning of the interval. This result can incorporated into,

si (x) = fi + bi (x − xi ) + ci (x − xi )2 + di (x − xi )3 . (7.16)
Where the coefficients, bi , di are solved using the following,

fi+1 − fi hi
bi = − (2ci + ci+1 ), (7.17)
hi 3
ci+1 − ci
di = , (7.18)
3hi
where hi is simply,

hi = xi+1 − xi . (7.19)
The solution for ci is somewhat more complicated. This enforces us to make use of systems of
linear equations by solving the following tridiagonal system,
    
1 
 c 1 
 0
 h1 2(h1 + h2 )
 h2 

 c2 



 
 3(f [x3 , x2 ] − f [x2 , x1 ]) 

. . .. .. .. = ..
.
  
 . . .  .    .
  
 hn−2 2(hn−2 + hn−1 ) hn−1  

 cn−1 

 3(f [xn , xn−1 ] − f [xn−1 , xn−2 ])
 
1 cn 0
 

69
Example Consider the table below. Fit cubic splines to the data and utilize the results to estimate
the value at x = 5.

i xi fi
1 3 2.5
2 4.5 1
3 7 2.5
4 9 0.5

Solution:
The first step is to generate the set of simultaneous equations that will be utilized to determine
the c coefficients:     
1 0 0 0  c
 1  0
1.5 8 2.5 0 c2   3(0.6 + 1) 
 0 2.5 9 2 c3  = 3(−1 − 0.6) .
   
 
0 0 0 1 c4 0
 
    
1 0 0 0   c1 
 0
1.5 8 2.5 0 c2   4.8 
⇒  0 2.5 9 2 c3  = −4.8 .
  
 
0 0 0 1 c4 0
 

Therfore:  
0
 0.839543726 
⇒ c̄ = 
−0.766539924 .

0
Using our values for c we obtain the following for our d’s,

d1 = 0.186565272,
d2 = −0.214144487,
d3 = 0.127756654.

We can then compute the b’s using equation (1.4),

b1 = −1.419771863,
b2 = −0.160456274,
b3 = 0.022053232.

These results allow us to develop the cubic splines for each interval using Equation (7.16):

s1 (x) = 2.5 − 1.419771863(x − 3) + 0.186565272(x − 3)3 ,


s2 (x) = 1 − 0.160456274(x − 4.5) + 0.839543726(x − 4.5)2 − 0.214144487(x − 4.5)3 ,
s3 (x) = 2.5 + 0.022053232(x − 7) − 0.766539924(x − 7)2 + 0.127756654(x − 7)3 .

70
The three equations can then be employed to compute values within each interval. For exam-
ple, the value at x = 5, which falls within the second interval, is calculated as,

s2 (5) = 1 − 0.160456274(5 − 4.5) + 0.839543726(5 − 4.5)2 − 0.214144487(5 − 4.5)3


= 1.102889734.

7.1.8 Runge’s Phenomenon


A major problem with interpolation is Runge’s Phenomenon. Let us consider an example in
Mathematica:
ClearAll[data, x];
data = RandomReal[{-10, 10}, 20];

ListPlot[data]

Manipulate[
Show[
Plot[InterpolatingPolynomial[data[[1 ;; n]], x], {x, 1, n},
PlotRange -> All],
ListPlot[data, PlotStyle -> Directive[PointSize[Large], Red]],
PlotRange -> All
], {n, 2, Length[data], 1}]

pctrl[d_, param_, noeud_] :=


LinearSolve[
Module[{n = Length[d]},
Table[BSplineBasis[{3, noeud}, j - 1, param[[i]]], {i, n}, {j,
n}]], d]

tcentr[d_] :=
Module[{a},
a = Accumulate[
Table[Norm[d[[i + 1]] - d[[i]]]^(1/2), {i, Length[d] - 1}]];
N[Prepend[a/Last[a], 0]]]

noeudmoy[d_, param_] :=
Join[{0, 0, 0, 0},
Table[1/3*Sum[param[[i]], {i, j, j + 2}], {j, 2,
Length[param] - 3}], {1, 1, 1, 1}]

dpts = Table[{i, data[[i]]}, {i, Length[data]}];

Manipulate[Module[{pCt},
pCt = pctrl[dpts[[1 ;; n]], tcentr[dpts[[1 ;; n]]],
noeudmoy[dpts[[1 ;; n]], tcentr[dpts[[1 ;; n]]]]];
Show[
ParametricPlot[
BSplineFunction[pCt,
SplineKnots ->
noeudmoy[dpts[[1 ;; n]], tcentr[dpts[[1 ;; n]]]]][x], {x, 0,
1}, PlotRange -> All],
ListPlot[data, PlotStyle -> Directive[PointSize[Large], Red]],
PlotRange -> All
]], {n, 4, Length[data], 1}]

Thus we can see that high order polynomials lead to an exponential growth of the infinity norm
error. To overcome this we used the splines technique from above, however, another method
one could use is Chebyshev polynomials. Here points are distributed more densely towards the
bounds of the interval.

71
Exercises

• Given the data points:

x -1.2 0.3 1.1


y -5.76 -5.61 -3.69

determine y at x = 0 using (a) Lagrange’s method and (b) Newton’s Divided Differences.

• Given the data points:

x 0.4 0.5 0.7 0.8


y 1.086 1.139 1.307 1.435

Estimate f (0.6) from the data using: 1. a second degree Lagrange polynomial 2. a third degree
Lagrange polynomial

• Given f (−2) = 46, f (−1) = 4, f (1) = 4, f (3) = 156, f (4) = 484, use Newton Divided
Differences to estimate f (0).

7.2 Least Squares Fitting


When considering experimental data it is commonly associated with noise. This noise could be
resultant of measurement error or some other experimental inconsistency. In these instances, we
want to find a curve that fits the data points “on the average”. That is, we do not want to overfit the
data, thereby amplifying any of the noise. With this in mind, the curve should have the simplest
form (i.e. lowest order polynomial possible). Let:
f (x) = f (x, a1 , a2 , . . . , am ),
be the function that is to be fitted to the n data points (xi , yi ), i = 1, 2, . . . , n. Thus, we have a
function of x that contains the parameters aj , j = 1, 2, . . . , m, where m < n. The shape of f (x) is
known a priori, normally from the theory associated with the experiment in question. This means
we are looking to fit the best parameters. Thus curve fitting is a two step process; (i) selecting the
correct form of f (x) and (ii) computing the parameters that produce the best fit to the data.
The notion of best fit (at least for the purpose of this course) considers noise bound to the
y-coordinate. The most common of which is measured by the least squares fit, which minimises:
n
X
S(a1 , a2 , . . . , am ) = [yi − f (xi )]2 , (7.20)
i=1
with respect to each aj . The optimal values of the parameters are given by the solution of the
equations:

∂S
= 0, k = 1, 2, . . . , m. (7.21)
∂ak

72
We measure the residual as ri = yi − f (xi ) from Equation (7.20) which represent the discrep-
ancy between the data points and the fitting function at xi . The function S is the sum of the squares
of all residuals.
A Least squares problem is said to be linear if the fitting function is chosen as a linear combi-
nation of functions fj (x):

f (x) = a1 f1 (x) + a2 f2 (x) + . . . + am fm (x). (7.22)


Here an example could be where f1 (x) = 1, f2 (x) = x, f3 (x) = x2
etc. Often these polynomials
can be nonlinear and become increasingly difficult to solve. For the purpose of this course we will
only consider linear least squares.

7.2.1 Linear Least Squares


We fit the straight line y = a0 + a1 x through some given n points. The sum of the squares of the
deviations is
Xn n
X
S= [yi − f (xi )]2 = [yi − (a0 + a1 xi )]2
i=1 i=1
A necessary condition for S(a0 , a1 ) to be a minimum is that the first partial derivatives of S w.r.t.
a0 and a1 must be zero:

n
∂E X
= −2 [yi − a0 − a1 xi )] = 0 (7.23)
∂a0
i=1
n
∂E X
= −2 xi [yi − a0 − a1 xi )] = 0 (7.24)
∂a1
i=1

We can rewrite these sums as:

n
X n
X
a0 n + a1 xi = yi (7.25)
i=1 i=1
n
X n
X n
X
a0 xi + a1 x2i = x i yi (7.26)
i=1 i=1 i=1

These equations are called the normal equations. They can be solved simultaneously for a1 :
P P P
n i xi yi − i xi i yi
a1 = (7.27)
n i x2i − ( i xi )2
P P

This result can then be used in conjunction with the Equation (7.25) to solve for a0 :
n n
!
1 X X
a0 = yi − a 1 xi . (7.28)
n
i=1 i=1
So in matrix form:

 Pn     Pn 
Pnn i=1
Pn x
2
i a0
= Pni=1 y i
. (7.29)
i=1 xi i=1 xi a1 i=1 xi yi

73
Therefore:

   Pn −1  Pn 
a0 n i=1 x i i=1 y i
= Pn Pn 2
Pn . (7.30)
a1 i=1 xi i=1 xi i=1 xi yi

Example Consider the data:

xi 1 2 3 4 5 6 7
yi 0.5 2.5 2.0 4.0 3.5 6.0 5.5

To find the least squares line approximation of this data, extend the table and sum the columns,
as below:

xi yi x2i xi yi
1 0.5 1 0.5
2 2.5 4 5.0
3 2.0 9 6.0
4 4.0 16 16.0
5 3.5 25 16.5
6 6.0 36 36.0
P7 P5.5 P 49 P 37.5
= 28 = 24 = 140 = 119.5

7(119.5) − 28(24)
a1 = = 0.8393
7(140) − 282
and hence:
24 − 0.8393(28)
a0 = = 0.0714
7
The least squares linear fit is:
y = 0.0714 + 0.8393x
Or alternatively in matrix form we have:
   −1  
a0 7 28 24
=
a1 28 140 119.5

Solving gives the following

In [16]: A = [7 28; 28 140];


b = [24;119.5];
ans = A\b;
fprintf('The value for a_0 is: %.4f \n', ans(1))
fprintf('The value for a_1 is: %.4f \n', ans(2))
% Now lets plot and see our results

74
x = [1 2 3 4 5 6 7];
y = [0.5 2.5 2.0 4.0 3.5 6.0 5.5];
f = @(x) ans(1) + ans(2).*x;
xx = 0:0.1:7;
fx = f(xx);
figure
hold on
grid on
plot(x, y, 'r*')
plot(xx, fx)
title('Our approach using the above equations')
hold off
% Now let us see what the builtin function does
figure
hold on
p = polyfit(xx, fx, 1);
yy = polyval(p, xx);
fprintf('The builtin function value for a_0 is: %.4f \n', p(2));
fprintf('The builtin function value for a_1 is: %.4f \n', p(1));
plot(xx, yy)
grid on
plot(x, y, 'r*')
title('Matlab builtin functions in action')
hold off

The value for a_0 is: 0.0714


The value for a_1 is: 0.8393
The builtin function value for a_0 is: 0.0714
The builtin function value for a_1 is: 0.8393

7.2.2 Polynomial Least Squares


The least squares procedure above can be readily extended to fit the data to an mth degree poly-
nomial:

f (x) = Pm (x) = a0 + a1 x + · · · + am xm (7.31)


through some n data points (x1 , Pm (x1 )), (x2 , Pm (x2 )), . . . , (xm , Pm (xn )), where m ≤ n − 1.
Then, S takes the form:

75
76
n
X
S= [yi − f (xi )]2 (7.32)
i=1

which depends on the m + 1 parameters a0 , a1 , · · · , am . We then have m + 1 conditions:

∂E ∂E ∂E
= 0, = 0, · · · , =0
∂a0 ∂a1 ∂am
which gives a system of m + 1 normal equations:

n
X n
X n
X n
X
a0 n + a1 xi + a2 x2i + · · · + am xm
i = yi (7.33)
i=1 i=1 i=1 i=1
n
X Xn Xn Xn n
X
a0 xi + a1 x2i + a2 x3i + · · · + am xm+1
i = x i yi (7.34)
i=1 i=1 i=1 i=1 i=1
Xn Xn Xn Xn Xn
a0 x2i + a1 x3i + a2 x4i + · · · + am xm+2
i = x2i yi (7.35)
i=1 i=1 i=1 i=1 i=1
.. ..
. . (7.36)
n
X n
X n
X n
X n
X
a0 xm
i + a1 xm+1
i + a2 xm+2
i + · · · + am x2m
i = xm
i yi (7.37)
i=1 i=1 i=1 i=1 i=1

These are m + 1 equations and have m + 1 unknowns: a0 , a1 , · · · am .


So for a quadratic polynomial fit, m = 2,and the required polynomial is f (x) = a0 + a1 x + a2 x2
obtained from solving the normal equations:

n
X n
X n
X
a0 n + a1 xi + a2 x2i = yi (7.38)
i=1 i=1 i=1
n
X Xn Xn Xn
a0 xi + a1 x2i + a2 x3i = xi yi (7.39)
i=1 i=1 i=1 i=1
Xn Xn Xn Xn
a0 x2i + a1 x3i + a2 x4i = x2i yi (7.40)
i=1 i=1 i=1 i=1

for a0 , a1 , and a2 .
Note: This system is symmetric and can be solved using Gauss elimination.

Exercise Fit a second degree polynomial to the data

xi 0 1 2 3 4
yi 2.1 7.7 13.6 27.2 40.9

77
In [17]: x = [0 1 2 3 4 5];
y = [2.1 7.7 13.6 27.2 40.9 61.1];
n = length(x);
sumX = sum(x);
sumY = sum(y);
sumX2 = sum(x.^2);
sumX3 = sum(x.^3);
sumX4 = sum(x.^4);
A = [n sumX sumX2; sumX sumX2 sumX3; sumX2 sumX3 sumX4]
b = [sumY; sum(x.*y); sum((x.^2).*y)]
a = round((A\b), 4)'
p = round(fliplr(polyfit(x, y, 2)), 4)
fprintf('Does our approximation give the same as the builtin function? (True=1)/(False=0) Answer: %.0f\n', isequ
figure
hold on
xx = 0:0.1:6;
f = @(x) a(1) + a(2).*x + a(2).*x.^2;
fx = f(xx);
plot(x, y, 'r*');
plot(xx, fx);
grid on
hold off

A =

6 15 55
15 55 225
55 225 979

b =

1.0e+03 *

0.1526
0.5856
2.4888

a =

2.4786 2.3593 1.8607

p =

2.4786 2.3593 1.8607

Does our approximation give the same as the builtin function? (True=1)/(False=0) Answer: 1

Remark: As the degree m increases the coefficient matrix becomes extremely ill-conditioned.
It is therefore not recommended to fit least squares polynomials of degree greater than 4 to given
data points.
Also, it would be common practice to use built-in libraries to do these computations instead
of programming it yourself. In addition, any real world scenario would likely involve a massive
number of data points. Gradient descent techniques could also be applied. You may find these
withing machine learning courses etc.

78
7.2.3 Least Squares Exponential Fit
Frequently a theory may suggest a model other than a polynomial fit. A common functional form
for the model is the exponential function:

y = aebx . (7.41)
for some constants a and b. We have from Equation (7.32):
n
X
S= [yi − aebxi ]2 . (7.42)
i=1

When the derivatives of S with respect to a and b are set equal to zero the resulting equations
are:

n
∂E X
= −2 ebxi [yi − aebxi ] = 0 (7.43)
∂a
i=1
n
∂E X
= −2 axi ebxi [yi − aebxi ] = 0 (7.44)
∂b
i=1

These two equations in two unknowns are nonlinear and generally difficult to solve.
It is sometimes possible to “linearise” the normal equations through a change of variables. If
we take natural logarithm of our equation (7.41) we have:

ln(y) = ln(aebx ) = ln(a) + bx

We introduce the variable Y = ln(y), a0 = ln(a) and a1 = b. Then the linearized equation becomes:

79
Y (x) = a0 + a1 x, (7.45)
and the ordinary least squares analysis may then be applied to the problem. Once the coeffi-
cients a0 and a1 have been determined, the original coefficients can be computed as a = ea0 and
b = a1 .

Example Fit an exponential function to the following data

xi 1.00 1.25 1.50 1.75 2.00


yi 5.10 5.79 6.53 7.45 8.46

To fit an exponential least squares fit to this data, extend the table as:

xi yi Yi = ln yi x2i x i Yi
1.00 5.10 1.629 1.0000 1.629
1.25 5.79 1.756 1.5625 2.195
1.50 6.53 1.876 2.2500 2.814
1.75 7.45 2.008 3.0625 3.514
2.00
P 8.46
P 2.135
P 4.000
P 4.270
P
= 7.5 = 33.3 = 9.404 = 11.875 = 14.422

Using the normal equations for linear least squares give:

5(14.422) − 7.5(9.404)
a1 = b = = 0.5056
5(11.875) − (7.5)2

and hence:
9.404 − 0.5056(7.5)
a0 = ln a = = 1.122, a = e1.122
5
The exponential fit is:

Y = 1.122 + 0.5056x (7.46)


ln y = 1.122 + 0.5056x (7.47)
0.5056x
y = 3.071e (7.48)

In [18]: % Now lets check again with the builtin function


x = [1 1.25 1.5 1.75 2.0];
y = [5.1 5.79 6.53 7.45 8.46];
sumX = sum(x);
sumY = sum(y);
p = polyfit(x, log(y), 1);
p(2) = exp(p(2))
f = @(x) p(2)*exp(p(1).*x)

80
xx = 1:0.1:2.1;
fx = f(xx);
figure
hold on
grid on
plot(x, y, 'r*')
plot(xx, fx)
hold off

p =

0.5057 3.0725

f =

function_handle with value:

@(x)p(2)*exp(p(1).*x)

81
7.2.4 Exercises
• Find the least squares polynomials of degrees one, two and three for the data, computing
the error S in each case.

x 1.0 1.1 1.3 1.5 1.9 2.1


y 1.84 1.96 2.21 2.45 2.94 3.18

Ans:
y = 0.6209 + 1.2196x, y = 0.5966 + 1.2533x − 0.0109x2 ,
y = −0.01x3 + 0.0353x2 + 1.185x + 0.629

• An experiment is performed to define the relationship between applied stress and the time
to fracture for a stainless steel. Eight different values of stress are applied and the resulting
data is:

Applied stress, x, kg/mm2 5 10 15 20 25 30 35 40


Fracture time, t, h 40 30 25 40 18 20 22 15

Use a linear least squares fit to determine the fracture time for an applied stress of 33 kg/mm2 to
a stress. (Ans: t = 39.75 − 0.6x, t = 19.95 hours)

• Fit a least squares exponential model to:

x 0.05 0.4 0.8 1.2 1.6 2.0 2.4


y 550 750 1000 1400 2000 2700 3750

(Ans: y = 530.8078e0.8157x )

82
8 Ordinary Differentiable Equations (ODEs)
Ordinary differential equations govern a great number of many important physical processes and
phenomena. Not all differential equations can be solved using analytic techniques. Consequently,
numerical solutions have become an alternative method of solution, and these have become a very
large area of study.
Importantly, we note the following:

• By itself y 0 = f (x, y) does not determine a unique solution.


• This simply tells us the slope y 0 (x) of the solution function at each point, but not the actual
value y(x) at any point.
• There are an infinite family of functions satisfying an ODE.
• To single out a particular solution, a value y0 of the solution function must be specified at
some point x0 . These are called initial value problems.

8.1 Initial Value Problems


The general first order equation can be written as:

dy
= f (x, y), (8.1)
dx
with f (x, y) given. Together with this may be given an initial condition, say y(x0 ) = y0 , in
which case (8.1) and this condition form an initial value problem. Its general solution contains a
single arbitrary constant of integration which can be determined from the given initial condition.

8.1.1 Stability of ODEs


Should members of the solution family of an ODE move away from each other over time, then the
equation is said to be unstable. If the family members move closer to one another with time then
the equation is said to be stable. Finally, if the solution curves do not approach or diverge from
one another with time, then the equation is said to be neutrally stable. So small perturbations
to a solution of a stable equation will be damped out with time since the solution curves are
converging. Conversely, an unstable equation would see the perturbation grow with time as the
solution curves diverge.
To give physical meaning to the above, consider a 3D cone. If the cone is stood on its circular
base, then applying a perturbation to the cone will see it return to its original position standing up,
implying a stable position. If the cone was balanced on its tip, then a small perturbation would see
the cone fall, there the position is unstable. Finally, consider the cone resting on its side, applying
a perturbation will simply roll the cone to some new position and thus the position is neutrally
stable.

Unstable ODE An example of an unstable ODE is y 0 = y. Its family of solutions are given by
the curves y(t) = cet . From the exponential growth of the solutions we can see that the solution
curves move away from one another as time increases implying that the equations is unstable. We
can see this is the plot below.

In [19]: y = @(t, c) c.*exp(t);


t = 0:0.1:1;

83
figure
hold on
grid on
for c = 1:1:5
yt = y(t, c);
plot(t, yt);
xlabel('t');ylabel('y(t)')
end
title("Family of solution curves for ODE y^\prime = y")
hold off

Stable ODE Now consider the equation y 0 = −y. Here the family of solutions is given by y(t) =
ce−t . Since we have exponential decay of the solutions we can see that the equation is stable as
seen in Figure below.

In [20]: y = @(t, c) c.*exp(-t);


t = 0:0.1:1;

84
figure
hold on
grid on
for c = 1:1:5
yt = y(t, c);
plot(t, yt);
xlabel('t');ylabel('y(t)')
end
title("Family of solution curves for ODE y^\prime = -y")
hold off

Neutrally Stable ODE Finally, consider the ODE y 0 = a for a given constant a. Here the family
of solutions is given by y(t) = at + c, where c again is any real constant. Thus, in the example
plotted below where a = 21 the solutions are parallel straight lines which neither converge or
diverge. Therefore, the equation is neutrally stable.

In [21]: y = @(t, c) 0.5.*t + c;


t = 0:0.1:5;

85
figure
hold on
grid on
for c = 1:1:5
yt = y(t, c);
plot(t, yt);
xlabel('t');ylabel('y(t)')
end
title("Family of solution curves for ODE y^\prime = 1/2")
hold off

8.1.2 Euler’s Method


The simplest numerical technique for solving differential equations is Euler’s method. It involves
choosing a suitable step size h and an initial value y(x0 ) = y0 , which are then used to estimate
y(x1 ), y(x2 ), · · · by a sequence of values $y_i,; i=1,2,. . . . $ Here use the notation xi = x0 + ih.
A method of accomplishing this is suggested by the Taylor’s expansion
1 2 00 1
y(x + h) = y(x) + hy 0 (x) + h y (x) + h3 y 000 (x) + · · ·
2! 3!

86
or, in terms of the notation introduced above:
1 2 00 1
yi+1 = yi + hyi0 + h yi + h3 yi000 + · · · (8.2)
2! 3!
By the differential equation (8.1), we have:

yi0 = f (xi , yi )

which when substituted in (8.2) yields:


1 2 0 1
h f (xi , yi ) + h3 f 00 (xi , yi ) + · · ·
yi+1 = yi + hf (xi , yi ) + (8.3)
2! 3!
and so if we truncate the Taylor series (8.3) after the term in h, we have the approximate for-
mula:

yi+1 = yi + hf (xi , yi ) (8.4)


This is a difference formula which can be evaluated step by step. This is the formula for Euler’s
(or Euler–Cauchy) method. Thus given (x0 , y0 ) we can calculate (xi , yi ) for i = 1, 2, · · · , n. Since
the new value yi+1 can be calculated from known values of xi and yi , this method is said to be
explicit.

Error in Euler’s Method Each time we apply an equation such as (8.4) we introduce two types
of errors: * Local truncation error introduced by ignoring the terms in h2 , h3 , · · · in equation (8.2).
For Euler’s method, this error is

h2 00
E= y (ξ), ξ ∈ [xi , xi+1 ],
2! i
i.e. E = O(h2 ). Thus the local truncation error per step is O(h2 ). * A further error introduced in
yi+1 because yi is itself in error. The size of this error will depend on the function f (x, y) and the
step size h.
The above errors are introduced at each step of the calculation.

Example Apply the Euler’s method to solve the simple equation:

dy
= x + y, y(0) = 1
dx
(Exercise: Solve the equation analytically and show that the analytic solution is y = 2ex − x − 1.)
Solution:
Here f (xi , yi ) = xi + yi . With h = 0.1, and y0 = 1 we compute y1 as:

y1 = y0 + hf (x0 , y0 ) = 1 + 0.1(0 + 1) = 1.1

The numerical results of approximate solutions at subsequent points x1 = 0.2, . . . can be computed
in a similar way, rounded to 3 decimal, to obtain places.

87
x y y 0 = f (x, y) y0h
0 1.000 1.000 0.100
0.1 1.100 1.200 0.120
0.2 1.220 1.420 0.142
0.3 1.362 1.662 0.166
0.4 1.528 1.928 0.193

The analytical solution at x = 0.4 is 1.584. The numerical value is 1.528 and hence the error is
about 3.5%. The accuracy of the Euler’s method can be improved by using a smaller step size h.
Another alternative is to use a more accurate algorithm.

In [22]: y0 = 1;
x0 = 0;
h = 0.1;
f = @(x, y) x + y;
yi = @(y, h, f) y + h*f;
yy = zeros(1, 4);
for i = 1:4
ff = f(x0, y0);
yy(1, i) = yi(y0, h, ff);
y0 = yy(1, i);
x0 = x0 + h;
end
yy = [1, yy];
fprintf('So our computed values are: \n')
fprintf(' %.3f\n', yy)

xx = 0:0.1:0.4;
tf = @(x) 2.*exp(x) - x - 1;
ty = tf(xx);
figure
hold on
grid on
plot(xx, ty)
plot(xx, yy, 'r*')
plot(xx, yy)
title("Euler's Method Vs Analytical Solution");
xlabel('x')
ylabel('y')
legend('Analytical','Euler')

So our computed values are:


1.000
1.100
1.220
1.362
1.528

88
8.1.3 Modified Euler’s Method
A fundamental source of error in Euler’s method is that the derivative at the beginning of the
interval is assumed to apply across the entire subinterval.
There are two ways we can modify the Euler method to produce better results. One method is
due to Heun (Heun’s method) and is well documented in numerical text books. The other method
we consider here is called the improved polygon (or modified Euler) method.
The modified Euler technique uses Euler’s method to predict the value of y at the midpoint of
the interval [xi , xi+1 ]:

h
yi+ 1 = yi + f (xi , yi ) . (8.5)
2 2
Then this predicted value is used to estimate a slope at the midpoint:

0
yi+ 1 = f (xi+1/2 , yi+1/2 ), (8.6)
2

which is assumed to represent a valid approximation of the average slope for the entire subin-
terval. This slope is then used to extrapolate linearly from xi to xi+1 using Euler’s method to
obtain:

89
yi+1 = yi + f (xi+1/2 , yi+1/2 )h (8.7)
For the modified Euler method, the truncation error can be shown to be:

h3 000
E = − y (ξ), ξ ∈ [xi , xi+1 ] (8.8)
12 i

Example Solve
dy
= x + y, y(0) = 1, h = 0.1
dx
using the modified Euler’s method described above.
Solution:

xi yi yi+1/2 0
yi+1/2 0
yi+1/2 h
0 1.000 1.050 1.100 0.110
0.1 1.110 1.1705 1.3205 0.13205
0.2 1.24205 1.1705 1.3205 0.13205
0.3 1.39847 1.31415 1.56415 0.15641
0.4 1.58180 1.48339 1.83339 0.18334

The numerical solution is now 1.5818 which much more accurate that the result obtained using
Euler’s method. In this case the error is about 0.14%.

In [23]: y0 = 1;
x0 = 0;
h = 0.1;
f = @(x, y) x + y;
f2 = @(x, y, h) y + (x + y)*(h/2);
yi = @(y, h, f) y + h*f;
yy = zeros(1, 4);
for i = 1:4
fff = f2(x0, y0, h);
ff = f((x0+(h/2)), fff);
yy(1, i) = yi(y0, h, ff);
y0 = yy(1, i);
x0 = x0 + h;
end
yy = [1, yy];
fprintf('So our computed values are: \n')
fprintf(' %.3f\n', yy)

xx = 0:0.1:0.4;
tf = @(x) 2.*exp(x) - x - 1;
ty = tf(xx);
figure
hold on
grid on

90
plot(xx, ty)
plot(xx, yy, 'r*')
plot(xx, yy)
title("Modified Euler's Method Vs Analytical Solution");
xlabel('x')
ylabel('y')
legend('Analytical','Modified Euler')

So our computed values are:


1.000
1.110
1.242
1.398
1.582

8.1.4 Runge-Kutta Methods


Runge and Kutta were German mathematicians. They suggested a group of methods for numeri-
cal solutions of ODEs.

91
The general form of the Runge–Kutta method is:

yi+1 = yi + hφ(xi , yi ; h), (8.9)


where φ(xi , yi ; h) is called the increment function.
In Euler’s method, φ(xi , yi ; h) = f (xi , yi ) = yi0 , i.e we are using the slope at the point xi to
extrapolate yi and obtain yi+1 . In the modified Euler’s method:
0
φ(xi , yi ; h) = f (xi+ 1 , yi+ 1 ) = yi+ 1
2 2 2

The increment function can be written in a general form as:

φ = w1 k1 + w2 k2 + · · · + wn kn (8.10)
where the k’s are constants and the w’s are weights.

Second Order Runge-Kutta Method The second order R-K method has the form:

yi+1 = yi + (w1 k1 + w2 k2 ), (8.11)


where

k1 = hf (xi , yi ) (8.12)
h k1
k2 = hf (xi + , yi + ), (8.13)
2 2
and the weights w1 + w2 = 1. If w1 = 1, then w2 = 0 and we have Euler’s method. If w2 = 1,
then w1 = 0 we have the Euler’s improved polygon method:

yi+1 = yi + k2 (8.14)
h k1
= yi + hf (xi + , yi + ), (8.15)
2 2
If w1 = w2 = 21 , then we have:

1
yi+1 = yi + (k1 + k2 ), (8.16)
2
k1 = hf (xi , yi ) (8.17)
h k1
k2 = hf (xi + , yi + ), (8.18)
2 2
called Heun’s method.

92
Fourth Order Runge-Kutta Method The classical fourth order R–K methodhas the form:
1
yi+1 = yi + (k1 + 2k2 + 2k3 + k4 ), (8.19)
6
where

k1 = hf (xi , yi ) (8.20)
h k1
k2 = hf (xi + , yi + ) (8.21)
2 2
h k2
k3 = hf (xi + , yi + ) (8.22)
2 2
k4 = hf (xi + h, yi + k3 ), (8.23)

This is the most popular R–K method. It has a local truncation error O(h4 )

Example Solve the DE y 0 = x + y, y(0) = 1 using 4th order Runge–Kutta method. Compare your
results with those obtained from Euler’s method, modified Euler’s method and the actual value.
Determine y(0.1) and y(0.2) only.
The solution using Runge-Kutta is obtained as follows:
For y1 :

k1 = 0.1(0 + 1) = 0.1 (8.24)


0.1 0.1
k2 = 0.1((0 + ) + (1 + )) = 0.01 (8.25)
2 2
0.1 0.11
k3 = 0.1((0 + ) + (1 + )) = 0.1105 (8.26)
2 2
k4 = 0.1((0 + 0.1) + (1 + 0.1105)) = 0.1211 (8.27)

and therefore:
1
y1 = y0 + (0.1 + 2(0.01) + 2(0.1105) + 0.1211) = 1.1103
6
A similar computation yields

1
y(0.2) = y2 = 1.1103 + (0.1210 + 2(0.1321) + 2(0.1326) + 0.1443 = 1.2428
6
A table for all the approximate solutions using the required methods is:

x Euler Modified Euler 4th order RK Actual value


0.1 1.1000000 1.1100000 1.1103417 1.1103418
0.2 1.2300000 1.2420500 1.2428052 1.2428055

In [24]: y0 = 1;
x0 = 0;
h = 0.1;
k1 = @(x, y, h) h*(x + y);

93
k2 = @(x, y, h, k1) h*((x + (h/2)) + (y + (k1/2)));
k3 = @(x, y, h, k2) h*((x + (h/2)) + (y + (k2/2)));
k4 = @(x, y, h, k3) h*((x + h) + (y + k3));
yi = @(y, k1, k2, k3, k4) y + (1/6)*(k1 + 2*k2 + 2*k3 + k4);
yy = zeros(1, 4);
for i = 1:4
kw1 = k1(x0, y0, h);
kw2 = k2(x0, y0, h, kw1);
kw3 = k3(x0, y0, h, kw2);
kw4 = k4(x0, y0, h, kw3);
yy(1, i) = yi(y0, kw1, kw2, kw3, kw4);
y0 = yy(1, i);
x0 = x0 + h;
end
yy = [1, yy];
fprintf('So our computed values are: \n')
fprintf(' %.4f\n', yy)

xx = 0:0.1:0.4;
tf = @(x) 2.*exp(x) - x - 1;
ty = tf(xx);
figure
hold on
grid on
plot(xx, ty)
plot(xx, yy, 'r*')
plot(xx, yy)
title("Runge-Kutta 4 Method Vs Analytical Solution");
xlabel('x')
ylabel('y')
legend('Analytical','Runge-Kutta 4')

So our computed values are:


1.0000
1.1103
1.2428
1.3997
1.5836

94
8.2 Systems of First Order ODEs
A nth order system of first order initial value problems can be expressed in the form:

dy1
= f1 (x, y1 , y2 , · · · , yn ), y1 (x0 ) = α1 (8.28)
dx
dy2
= f2 (x, y1 , y2 , · · · , yn ), y2 (x0 ) = α2 (8.29)
dx
..
. (8.30)
dyn
= fn (x, y1 , y2 , · · · , yn ), yn (x0 ) = αn , (8.31)
dx
for x0 ≤ x ≤ xn .
The methods we have seen so far were for a single first order equation, in which we sought the
solution y(x). Methods to solve first order systems of IVP are simple generalization of methods
for a single equations, bearing in mind that now we seek n solutions y1 , y2 , . . . , yn each with an
intial condition $y_k(x_0); k=1,. . . ,n $ at the points $x_i,; i=1,2.. . . $.

8.2.1 R-K Method for Systems


Consider the system of two equations:

95
dy
= f (x, y, z), y(0) = y0 (8.32)
dx
dz
= g(x, y, z), z(0) = z0 . (8.33)
dx
Let y = y1 , z = y2 , f = f1 , and g = f2 . The fourth order R–K method would be applied as
follows. For each j = 1, 2 corresponding to solutions yj,i , compute

k1,j = hfj (xi , y1,i , y2,i ), j = 1, 2 (8.34)


h k1,1 k1,2
k2,j = hfj (xi + , y1,i + , y2,i + j = 1, 2 (8.35)
2 2 2
h k2,1 k2,2
k3,j = hfj (xi + , y1,i + , y2,i + ) (8.36)
2 2 2
k4,j = hfj (xi + h, y1,i + k3,1 , y2,i + k3,2 ), (8.37)
and:

1
yi+1 = y1,i+1 = y1,i + (k1,1 + 2k2,1 + 2k3,1 + k4,1 ) (8.38)
6
1
zi+1 = y2,i+1 = zi + (k1,2 + 2k2,2 + 2k3,2 + k4,2 ). (8.39)
6
Note that we must calculate k1,1 , k1,2 , k2,1 , k2,2 , k3,1 , k3,2 , k4,1 , k4,2 in that order.

8.3 Converting an nth Order ODE to a System of First Order ODEs


Consider the general second order initial value problem
y 00 + ay 0 + by = 0, y(0) = α1 , y 0 (0) = α2
If we let
z = y0, z 0 = y 00
then the original ODE can now be written as

y 0 = z, y(0) = α1 (8.40)
0
z = −az − by, z(0) = α2 (8.41)
Once transformed into a system of first order ODEs the methods for systems of equations
apply.

8.3.1 Exercise
Solve the second order differential equation:
y 00 + 3xy 0 + 2x2 y = 0, y(0) = 3, y 0 (0) = 1
(i) Second order R–K method (ii) 4th order R–K. Use h = 0.1. Do only two steps.\ Let z(x) = y 0 (x).
We have the system

96
8.4 Exercises
Use (i) Euler’s method (ii) modified Euler’s formula to solve the following IVP;

• y 0 = sin(x + y), y(0) = 0

• y 0 = yx2 − y, y(0) = 1
for h = 0.2 and h = 0.1.

• Determine y(0.4) for each of the above IVP.

• Use Richardson’s extrapolation to get improved approximations to the solutions at x = 0.4

• If f is a function of x only, show that the fourth-order Runge-Kutta formula, applied to


the differential equationR dy/dx = f (x) is equivalent to the use of Simpson’s rule (over one
x
interval) for evaluating 0 f (x)dx.

• Use fourth order Runge–Kutta method to solve the following IVPs:

– y 0 = 2xy, y(0) = 1
– y0 =1+ y2, y(0) = 0,

Use h = 0.2 and determine the solutions at x = 0.4.

• Solve the following systems of IVPs:

– y 0 = yz, z 0 = xz, y(0) = 1, z(0) = −1


– y0 = x − z2, z 0 = x + y, y(0) = 1 z(0) = 2,

using (i) Euler’s method (ii) Second order Runge-Kutta with h = 0.1. Compute y and z, at
x = 0.2. ***

97
References
[1] Richard L Burden and J Douglas Faires. Numerical analysis. 2001. Brooks/Cole, USA, 2001.

[2] Brian Hahn and Daniel T Valentine. Essential MATLAB for engineers and scientists. Academic
Press, 2016.

[3] Michael T.. Heath. Scientific computing: An introductory survey. McGraw-Hill, 1997.

[4] Hans Petter Langtangen and Hans Petter Langtangen. A primer on scientific programming with
Python, volume 2. Springer, 2009.

[5] Dianne P O’Leary. Scientific computing with case studies. SIAM, 2009.

98

You might also like