You are on page 1of 77

Handout 2 27/07/02 1

Lecture 1: Errors and Eciency


Why numerical methods?
The solutions to most mathematical problems in the natural world cannot be determined
analytically. If you are asked to compute the perimeter of a circle with radius r, you know
that it is given by
C = 2r . (1)
This is because the perimeter can be computed analytically, since we can use calculus to
integrate along the perimeter of a circle to yield
C =

ds =

2
0
rd = r|
2
0
= 2r . (2)
Now lets say you are asked to compute the perimeter of an ellipse with major and minor
axes of a and b, respectively. Using the same technique (albeit more involved), you obtain
C = 4

2
0

a
2
sin
2
+ b
2
cos
2

d =? . (3)
This integral is known as an elliptic integral of second type, and has no known closed-
form solution. The only way, therefore, to obtain the perimeter of an ellipse, is to obtain it
numerically. Since there is no closed-form solution, you can only say that the perimeter of
the ellipse is given by the function C(a, b), which you could call the ellipse function. If
your calculator had the ellipse function on it, it would compute it numerically, just as it
computes the sine and cosine functions for you. There is no closed form solution to the
sine function. Your calculator knows what the solution is because it can approximate the
sine function using the rst three terms of the Taylor series approximation
sin

3
3!
+

5
5!
. (4)
Certainly, problems in the natural world can be much larger and more complex than
the ellipse problem. If you are asked to calculate the drag on an automobile moving at
100km/hr, you could calculate it exactly but you would need roughly 20, 000 years to do so
on the fastest computers available today. Clearly you could not even conceive of solving this
problem without numerical methods.
Types of errors
Referring back to the ellipse problem, if you did decide to solve it numerically, evidently
there would be errors in the way you computed it. Since a computer can only approximate
the solution, the degree of accuracy is dependent on several factors.
Handout 2 27/07/02 2
Truncation error
Every problem can be solved in many dierent ways. The error associated with the numerical
method you choose to solve the problem is termed the truncation error. In the Taylor series
approximation of the sine function, the exact solution is given by
sin =

n=0

2n+1
(2n + 1)!
(1)
n
. (5)
Clearly, it is impossible to compute an innite number of terms, so in this case, the truncation
error is given by the value of the terms you choose not to include in the calculation. Using
only three terms, the trunction error appears in the approximation of the sine function as
sin =

3
3!
+

5
5!
+

n=3

2n+1
(2n + 1)!
(1)
n
. (6)
In this case,
Truncation error =

n=3

2n+1
(2n + 1)!
(1)
n
. (7)
As the number of terms increases, the magnitude of the truncation decreases. No matter
how many terms are computed, however, it will always be an approximation.
Error resulting from problem assumptions
More often than not you are forced to solve a particular problem with assumptions that
make the problem only an approximate solution. For example, consider the problem of
sound propagation in a concert hall. For practical purposes you use the wave equation to
determine the acoustical properties of the concert stadium and you assume that the speed
of sound is constant everywhere in the room. No matter how accurately you solve the
wave equation, you will be stuck with the error associated with this approximation. This is
independent of the numerical algorithm or the computer you use to solve the problem.
Error in original data
The original data you are given can also impose limits on the accuracy of the nal solution
to your problem. If you are asked to predict the weather, no matter how accurate your
algorithm or equations are, if you dont know the present conditions exactly, you wont be
able to have an exact solution for what will happen to the weather tomorrow.
Propagated error
Whether or not the original conditions are exact, any error that develops in the solution will
propagate through to other parts of the computation and manifest itself in the nal solution.
For example, suppose you have developed an algorithm that solves Fermats last theorem in
ve steps. If steps 2-4 are close to exact but their solution depends on the solution of step
1, but step 1 is approximate, then the error associated with step 1 will manifest itself in the
Handout 2 27/07/02 3
nal solution regardless of how accurate steps 2-4 are solved. This type of error shows up a
lot in time-dependent solutions that are impulsively started from rest. A great deal of error
results initially. But even if the solver becomes more accurate as time progresses, the error
associated with the initial conditions will propagate throughout the entire solution.
Human error and bugs
This form of error in a numerical method is what causes the last 1% of a particular numerical
project to take 99% of the time. Bugs arise in code development that are hidden to the
programmer, who may think the error is resulting from the numerical method itself, while
a simple foolish error may result while deriving a particular method. These types of errors
result even in the largest of projects that take millions of human- and computer-hours to
complete. For example, the American National Aeronautics and Space Administration lost
one of its Martian surveyors due to a misunderstanding of the units used to quantify the
thruster forces!
Absolute vs. Relative error
Error is meaningless unless it is compared to the actual quantity that is being computed.
This is true not only for numerical methods, but for all approximations in general. For
example, if you are asked to estimate what the annual operating budget of your company is,
but you can only estimate it to within R 100,000, then you are better o if your company is
a billion rand per-year corporation than if it is a million rand-per-year corporation because
you will only be 0.01% o! If you work for a million rand-per-year company then you will
be 10% o, and certainly your boss will not be very happy. In this case the absolute error is
absolute error = |true value approximate value| = R 100,000 , (8)
while the relative error is
relative error =
absolute error
|true value|
. (9)
The absolute error of R 100,000 is meaningless unless it is given as a ratio to the true value.
In numerical methods, the relative error is used to determine whether or not a particular
solution is solved to within some predetermined level of accuracy.
Signicant digits
When describing the accuracy of a numerical result, often it is useful to quantify it in terms of
the number of signicant digits in which it agrees to the true value. We usually write integers
and rational numbers as their irrational counterparts with the last digit repeating in order
to determine the number of signicant digits. In the following two examples, 1.0 = 0.9999.
The last digit is only signicant if it diers from the exact value by less than 5. If the last
digit is within 4 of the exact value, then it is signicant.
Handout 2 27/07/02 4
For example,
if true value = 0.99999999
and approximate value = 0.9998,
then signicant digits = 4
since 9 8 < 5 .
If the last digit diers by 5 or more, then it is not signicant. For example,
if true value = 0.99999999
and approximate value = 0.9994,
then signicant digits = 3
since 9 4 5 .
Computer round-o error
A computer can only store a discrete set of numbers on the real number line. These numbers
are referred to as oating-point numbers. Every oating point number is stored in a
particular base system. Most of us think in base 10, and calculators usually work in base 10,
but computers usually work in base 2 or base 16. Given its base, a oating-point number
has three parts, namely, the sign, the fraction, and the exponent. As examples of oating
point numbers in base 10, As examples of oating point numbers in base 2,
Decimal number FP notation Sign Fraction Exponent (base 10)
22.45 .2245 10
2
+ 2245 2
0.0227 .227 10
2
227 2
Decimal number Binary equiv. FP notation Sign Fraction Exponent (base 2)
16.25 10000.01 .1000001 2
5
+ 1000001 5
.078125 0.000101 .101 2
3
101 3
A computers precision is a function of how much space it has available to store the sign,
the fraction, and the exponent for each number. Lets say a computer that employs base-10
arithmetic can store the sign, two digits for the fraction, and the exponent must be one of
2, 1, 0, 1, 2. The number must then have the form
X = .d
1
d
2
10
p
p {2, 1, 0, 1, 2} d
1
, d
2
{0, . . . , 9} . (10)
With this number system we are limited to 99 X 10
3
and 10
3
X 99. Clearly,
numbers that do not fall in this range cannot exist on this computer. Those that are too
large in absolute value are termed as either postive overow or negative overow, while
those too small in absolute value are termed as either positive underow or negative
underow. Because of the limited number system, severe round-o error would occur if
Handout 2 27/07/02 5
one attempted to perform calculations using this number system. For example, the number
.02154 would be represented as .02.
The IEEE number system is the most common number system used on personal com-
puters, and obviously it has much more versatility than the X number system above. Single
precision IEEE oating point numbers use a total of 32 bits to store a number (including
the sign, fraction, and exponent), while double precision IEEE oating point numbers use
64 bits. The smallest double precision number is 2.225E 308, and the largest number is
8.988E307. Even so, because of the nite range of possible numbers, round-o error still
occurs, and sometimes can even eect the outcome of a numerical result signicantly.
How good is a numerical method?
Number of operations
If two numerical methods are used to achieve the same result, the numerical method that used
the least number of operations is more ecient. Likewise, the numerical method in which
the number of operations grows more slowly with the problem size is also more ecient.
Big O notation is used to quantify this behavior of numerical algorithms. For example,
consider an algorithm that computes the sum of N numbers, represented by
A
1
=
N

i=1
i . (11)
This requires a total of N 1 operations. If N doubles, then the method eectively takes
twice as many operations. In Big O notation, this is refered to as requiring O(N) operations.
Now consider another algorithm that computes the sum of all of the possible distinct products
of a list of non-repeating numbers. Since there are a total of
N
p
= N + (N 1) + (N 2) + ... + 2 + 1 (12)
possible products, then
N
p
=
N

i=1
i =
N(N + 1)
2
. (13)
There are then a total of N
p
multiplications required, and then the sum takes another N
p
1
operations. Therefore, this algorithm takes a total of
2N
p
1 = N(N + 1) 1 = N
2
+ N 1 (14)
operations to complete. If the total number of entries N doubles, then for large N the total
number of calculations eectively quadruples. As N becomes very large, then, the method
is an O(N
2
) method.
Speed
A given algorithm will require a given number of oating point operations. However, dif-
ferent people can implement the same algorithm in a dierent way on a computer. Likewise,
Handout 2 27/07/02 6
dierent computers can perform dierently with the same algorithm. Therefore, coupled
with the total number of operations, it is desirable to compare the total time a certain al-
gorithm takes to perform a certain number of operations. A good measure of an algorithm
when it is implemented on a particular computer is its performance in FLOPS, or oating
point operations per second. The FLOPS count of an algorithm is given by
FLOPS =
total number of operations
time to compute in seconds
. (15)
The FLOPS performance of an algorithm for large-scale high-performance numerical com-
putations is usually referred to in MegaFLOPS (10
6
FLOPS), GigaFLOPS (10
9
FLOPS), or
TeraFLOPS (10
12
FLOPS).
Handout 3 30/07/02 1
Lecture 2: Solution of nonlinear equations I
The model nonlinear equation
In this lecture we will be using numerical methods to determine the roots of the nonlinear
equation f(x) = 0. As the model equation, we will compute the roots of
f(x) = x
2
+ x 6 , (1)
whose shape is shown in Figure 1 and has roots at x = 3 and x = +2. We will use the
Figure 1: Graph of x
2
+ x 6.
fact that the roots of f(x) are known exactly and compare them to the approximate values
computed with each numerical method that follows.
Bisection method
The bisection method starts with two guesses x
1
and x
2
such that f(x
1
) and f(x
2
) lie on
either side of the root so that f(x
1
) f(x
2
) < 0. A guess is made by bisecting the interval
between the two guesses to yield a new guess of x
3
= (x
1
+x
2
)/2. The procedure repeats by
using x
3
and the other guess that brackets the root with x
3
.
To nd the root in between x
1
= 4 and x
2
= 1, we follow the steps outlined in the
following pseudocode:
Handout 3 30/07/02 2
Pseudocode for bisection
Start with two guesses x
1
and x
2
such that f(x
1
) f(x
2
) < 0.
1. Make a guess for the root with x
3
= (x
1
+ x
2
)/2.
2. If f(x
1
) f(x
3
) < 0, then set x
2
= x
3
.
Otherwise set x
1
= x
3
.
3. Repeat until |x
1
x
2
| < 2.
Using this technique we obtain the following history of the solution process with = 10
4
:
Iteration x
1
x
2
x
3
f(x
3
) |x
1
x
2
|/2
1 -4.0000 -1.0000 -2.5000 -2.2500 1.5000
2 -4.0000 -2.5000 -3.2500 1.3125 0.7500
3 -3.2500 -2.5000 -2.8750 -0.6094 0.3750
4 -3.2500 -2.8750 -3.0625 0.3164 0.1875
5 -3.0625 -2.8750 -2.9688 -0.1553 0.0938
6 -3.0625 -2.9688 -3.0156 0.0784 0.0469
7 -3.0156 -2.9688 -2.9922 -0.0390 0.0234
8 -3.0156 -2.9922 -3.0039 0.0195 0.0117
9 -3.0039 -2.9922 -2.9980 -0.0098 0.0059
10 -3.0039 -2.9980 -3.0010 0.0049 0.0029
11 -3.0010 -2.9980 -2.9995 -0.0024 0.0015
12 -3.0010 -2.9995 -3.0002 0.0012 0.0007
13 -3.0002 -2.9995 -2.9999 -0.0006 0.0004
14 -3.0002 -2.9999 -3.0001 0.0003 0.0002
The advantage of the bisection method is that we know exactly how accurate our solution
will be given the number of iterations we use. Each guess halves the uncertainty associated
with the previous one, as shown in the error history in Figure 2. The uncertainty of the
initial guess will be that the root is given by
ROOT
firstguess
= x
3

|x
1
x
2
|
2
. (2)
The next guess will yield an error that is half that much, and so on, so that the n
th
guess
will yield a root given by
ROOT
nthguess
= x
3

|x
1
x
2
|
2
n
. (3)
The other advantage of the bisection method is that it is guaranteed to nd a root as
long as the rst two guesses bracket one. Even if the function is multivalued between the
Handout 3 30/07/02 3
Figure 2: Error history of the bisection method.
rst two guesses a root will be found. But which root is found depends on the initial values
of x
1
and x
2
. For example, consider the function
f(x) = x
3
5x
2
+ 3x + 1 , (4)
whose roots are given by 4.2361, 1.0000, and 0.2361. If the bisection method is used
with x
1
= 1 and x
2
= 6, as shown in Figure 3, the result will be x
3
= 4.2361 because
f((x
1
+ x
2
)/2) = f(2.5) < 0. If bisection is used with x
1
= 6 and x
2
= 6, then the
Figure 3: Graph of x
3
5x
2
+ 3x + 1, showing the roots at x
3
= 0.2361, 1.0, and 4.2361
result will be x
3
= 0.2361. The middle root at x
3
= 1.0 can only be found by choosing
0.2361 < x
1
< 1.0 and 1.0 < x
2
< 4.2361.
Handout 3 30/07/02 4
The only drawback to the bisection method is that the rate of convergence is rather slow.
The following methods will yield faster convergence but are not as easy to implement nor
are they necessarily as robust.
Newtons method
Given a function f(x) = x
2
+x6, if we would like to nd a root then we can use the Taylor
series expansion of the function about some initial guess to help us nd the root. If we start
at x
1
, then we can obtain information about the function in the vicinity of that point, say
at x
2
= x
1
+ x, with the Taylor series about x
1
,
f(x
2
) = f(x
1
+ x) = f(x
1
) + x
df
dx

x=x
1
+ HOT (5)
f(x
2
) f(x
1
) + (x
2
x
1
)
df
dx

x=x
1
. (6)
If we are looking for the root f(x
2
) = 0, then we can use the Taylor series to approximate
what value of x
2
will yield f(x
2
) = 0, so that we can solve for what x
2
needs to be in order
to approximate the root,
f(x
2
) f(x
1
) + (x
2
x
1
)
df
dx

x=x
1
(7)
0 = f(x
1
) + (x
2
x
1
)
df
dx

x=x
1
(8)
x
2
= x
1

f(x
1
)
f

(x
1
)
. (9)
Because this will yield an approximation to the root, we need to continue this procedure
until f(x
2
) < or |x
2
x
2
| < . The following is a pseudocode for Newtons method:
Pseudocode for Newtons method
1. Choose a starting value x
1
.
2. Shoot for the root with x
2
= x
1
f(x
1
)/f

(x
1
).
3. If |x
2
x
1
| < or f(x
2
) < then done!
4. Otherwise set x
1
= x
2
and return to step 2.
Which root is found depends on the initial guess x
1
. If the function f(x) = x
2
+ x 6 and
the initial guess is x
1
= 4, then the root is x
2
= 3. The results of this iteration are shown
in the table below with = 10
5
. The convergence history of the error is shown in Figure 4.
Comparing the results from the bisection method to Newtons method, we see that Newtons
Handout 3 30/07/02 5
Iteration x
1
x
2
f(x
2
) |x
1
x
2
|/2
1 -4.0000 -3.1429 0.7347 0.8571
2 -3.1429 -3.0039 0.0193 0.1390
3 -3.0039 -3.0000 0.0000 0.0039
4 -3.0000 -3.0000 0.0000 0.0000
Figure 4: Error history of Newtons method.
method converges in only 4 steps while the bisection method converges in 14 steps, and the
tolerance for Newtons method is 10 times smaller! This is because the error of each step in
Newtons method is roughly equal to the square of the error of the previous step, whereas
the error in the Bisection method is only half of the previous step.
Which root is obtained depends on the initial guess x
1
. How the root depends on the
guess is given below. The particular root found obeys the rule of thumb of Newtons method,
Initial guess x
1
Root found
< x
1
<
1
2
-3

1
2
< x
1
< + +2
in which the root nder alwasy goes downhill from the initial guess, as shown in Figure 5.
This rule of thumb is what leads to one drawback of Newtons method. Figure 6 shows a
function in which the root nder will fail because it will always move downhill towards the
minimum of the function, which is not the root. The second drawback of Newtons method
is that it requires that we evaluate the function as well as its derivative at every iteration.
For larger calculations this can be quite a drawback. Also, more often than not, it is dicult
to compute the derivative f

(x) because the closed-form solution of f(x) may not be known


in advance.
Handout 3 30/07/02 6
Figure 5: The dependence of the root on the initial guess for the Newton method. The
root-nder always goes downhill.
Figure 6: Newtons method will fail if the initial guess x
1
is chosen to the right of the dashed
vertical line.
Handout 4 01/08/02 1
Lecture 3: Solution of nonlinear equations II
Secant method
The secant method is used as an alternative to Newtons method when the derivative f

(x)
is not known analytically. It derives its name from a line drawn through two points in a
curve, which is called a secant. From Figure 1, we see that the approximate root x
NR
can
Figure 1: Illustration of Newtons method being used to approximate the root x
R
at x
NR
.
be obtained with Newtons method using
x
NR
= x
0

f(x
0
)
f

(x
0
)
. (1)
When the derivative is not known, then we need to choose two points x
1
and x
2
in the vicinity
of x
0
and write Newtons method using the approximate values at x
0
that are obtained from
the known quantities at x
1
and x
2
. As shown in Figure 2, if the function is assumed to be
linear, then the approximate value of x
0
is
x
0

1
2
(x
1
+ x
2
) . (2)
The approximate values of f(x
0
) and its derivative are then given by
f(x
0
)
1
2
[f(x
1
) + f(x
2
)] (3)
f

(x
0
)
f(x
2
) f(x
1
)
x
2
x
1
. (4)
Substituting these approximations into the formula for Newtons method (1) yields
Handout 4 01/08/02 2
Figure 2: Illustration of the secant method being used to approximate the root x
R
at x
3
.
x
3
=
1
2
(x
1
+ x
2
)
1
2
[f(x
1
) + f(x
2
)]
f(x
2
)f(x
1
)
x
2
x
1
, (5)
which, after some manipulation, yields
x
3
= x
2
f(x
2
)
x
1
x
2
f(x
1
) f(x
2
)
. (6)
This same result could have been obtained by using similar triangles, for which, as shown in
Figure 3,
tan =
f(x
1
) f(x
2
)
x
2
x
1
=
f(x
2
)
x
3
x
2
. (7)
Figure 3: Illustration of the secant method being used to approximate the root x
R
at x
3
with similar triangles.
When we use the two points x
1
and x
2
to obtain a guess for x
3
, x
3
will be closer to the
root than either of our two initial guesses. This is usually the case as long as x
2
is closer to
the root than x
1
. If not, then the method will just take longer to converge as long as f(x)
Handout 4 01/08/02 3
is continuous. In order to speed up convergence, we swap x
1
and x
2
if x
1
is closer to the
root. A simple test for determining whether or not we need to swap is if |f(x
1
)| < |f(x
2
)|,
which works most of the time, and in the cases that it doesnt work, then the algorithm still
converges, just at a slower rate. The pseudocode for the secant method is shown below.
Pseudocode for the secant method
1. Start with two guesses x
1
and x
2
near the root.
2. If |f(x
1
)| < |f(x
2
)| then swap x
1
and x
2
with
Set x
temp
= x
2
Set x
2
= x
1
Set x
1
= x
temp
3. Make a guess for the root with x
3
= x
2
f(x
2
)
x
1
x
2
f(x
1
)f(x
2
)
.
4. Set x
1
= x
2
Set x
2
= x
3
5. Return to step 3 until |f(x
3
)| < .
Figures 4 and 5 demonstrate the eect of swapping the initial guesses with the function
f(x) = 3x + 10 sin(x). The non-swapped cases still converge to within = 10
5
, but at a
slower rate, and, as shown in Figure 5, swapping does not have to change the convergence
history by very much. Nevertheless, it is always a good idea to swap x
1
and x
2
when
|f(x
1
)| < |f(x
2
)|.
Figure 4: Demonstration of the eect of swapping the initial guesses when x
1
= 1 and x
2
= 6.
Handout 4 01/08/02 4
Figure 5: Demonstration of the eect of swapping the initial guesses when x
1
= 1 and x
2
= 8.
Linear interpolation
When the slope is relatively large near the root, the secant method can overshoot the root
and slow down convergence. This can be remedied by starting with guesses for x
1
and x
2
that bracket the root, and making sure that the subsequent values of x
1
and x
2
bracket
the root as well. This is essentially the same as the bisection method, except each time the
estimate for the root is interpolated linearly rather than bisected. Figure 6 depicts the secant
interpolation diagram for the case when x
1
and x
2
bracket the root. The pseudocode for
Figure 6: Illustration of the linear interpolation method being used to approximate the root
x
R
at x
3
with guesses x
1
and x
2
that bracket the root.
the linear interpolation method is identical to that for the secant method, except additional
steps are required to ensure that x
1
and x
2
always bracket the root.
Handout 4 01/08/02 5
Pseudocode for the linear interpolation method
1. Start with two guesses x
1
and x
2
such that f(x
1
)f(x
2
) < 0.
2. Make a guess for the root with x
3
= x
2
f(x
2
)
x
1
x
2
f(x
1
)f(x
2
)
.
3. If f(x
1
)f(x
3
) < 0
Set x
2
= x
3
Otherwise set x
1
= x
3
.
4. Return to step 3 until |f(x
3
)| < .
Fixed-point iteration
When the equation f(x) = 0 can be written in the form x = g(x), then the roots r that
satisfy f(r) = 0 are known as the xed points for the function g. The xed-point iteration
determines the roots (under the right conditions) using
x
n+1
= g(x
n
) n = 0, 1, . . . . (8)
The requirements are that g(x) and g

(x) be continuous within an interval surrounding a


root x
R
and that |g

(x)| < 1 within that interval. If x


1
is chosen so that it is within the
interval then the method will converge to x
R
. This is a sucient condition for convergence,
but is not a necessary one. That is, other cases may converge even if they do not satisfy
|g

(x)| < 1 in the interval. The pseudocode for xed-point iteration is given below.
Pseudocode for xed-point iteration
1. Start with an initial guess x
1
such that |g

(x
1
)| < 1
2. Make a guess for the root with x
2
= g(x
1
).
3. If |x
2
x
1
| < then done!
Otherwise set x
1
= x
2
and return to step 2.
Handout 5 05/08/02 1
Lecture 4: Numerical dierentiation
Finite dierence formulas
Suppose you are given a data set of N non-equispaced points at x = x
i
with values f(x
i
)
as shown in Figure 1. Because the data are not equispaced in general, then x
i
= x
i+1
.
Figure 1: Data set of N points at x = x
i
with values f(x
i
)
Lets say we wanted to compute the derivative f

(x) at x = x
i
. For simplicity of notation,
we will refer to the value of f(x) at x = x
i
as f
i
. Because, in general, we do not know the
form of f(x) when dealing with disrete points, then we need to determine the derivatives of
f(x) at x = x
i
in terms of the known quantities f
i
. Formulas for the derivatives of a data
set can be derived using Taylor series.
The value of f(x) at x = x
i+1
can be written in terms of the Taylor series expansion of
f about x = x
i
as
f
i+1
= f
i
+ x
i+1
f

i
+
x
2
i+1
2
f

i
+
x
3
i+1
6
f

i
+O

x
4
i+1

. (1)
This can be rearranged to give us the value of the rst derivative at x = x
i
as
f

i
=
f
i+1
f
i
x
i+1

x
i+1
2
f

i

x
2
i+1
6
f

i
+O

x
3
i+1

. (2)
If we assume that the value of f

i
does not change signicantly with changes in x
i+1
, then
this is the rst order derivative of f(x) at x = x
i
, which is written as
f

i
=
f
i+1
f
i
x
i+1
+O(x
i+1
) . (3)
Handout 5 05/08/02 2
This is known as a forward dierence. The rst order backward dierence can be obtained
by writing the Taylor series expansion about f
i
to obtain f
i1
as
f
i1
= f
i
x
i
f

i
+
x
2
i
2
f

i

x
3
i
6
f

i
+O

x
4
i

, (4)
which can be rearranged to yield the backward dierence of f(x) at x
i
as
f

i
=
f
i
f
i1
x
i
+O(x
i
) . (5)
The rst order forward and backward dierence formulas are rst order accurate approxima-
tions to the rst derivative. This means that decreasing the grid spacing by a factor of two
will only increase the accuracy of the approximation by a factor of two. We can increase the
accuracy of the nite dierence formula for the rst derivative by using both of the Taylor
series expansions about f
i
,
f
i+1
= f
i
+ x
i+1
f

i
+
x
2
i+1
2
f

i
+
x
3
i+1
6
f

i
+O

x
4
i+1

(6)
f
i1
= f
i
x
i
f

i
+
x
2
i
2
f

i

x
3
i
6
f

i
+O

x
4
i

. (7)
Subtracting equation (7) from (6) yields
f
i+1
f
i1
= (x
i+1
+ x
i
) f

i
+
x
2
i+1
x
2
i
2
f

i
+
x
3
i+1
+ x
3
i
6
f

i
+ O

x
4
i+1

+O

x
4
i

f
i+1
f
i1
x
i+1
+ x
i
= f

i
+
x
2
i+1
x
2
i
2 (x
i+1
+ x
i
)
f

i
+
x
3
i+1
+ x
3
i
6 (x
i+1
+ x
i
)
f

i
+ O

x
4
i+1
x
i+1
+ x
i

+O

x
4
i
x
i+1
+ x
i

, (8)
which can be rearranged to yield
f

i
=
f
i+1
f
i1
x
i+1
+ x
i

x
2
i+1
x
2
i
2 (x
i+1
+ x
i
)
f

i
+O

x
3
i+1
+ x
3
i
6 (x
i+1
+ x
i
)

(9)
In most cases if the spacing of the grid points is not too eratic, such that x
i+1
x
i
,
equation (9) can be written as the central dierence formula for the rst derivative as
f

i
=
f
i+1
f
i1
2x
i
+O

x
2
i

. (10)
What is meant by the order of accuracy?
Suppose we are given a data set of N = 16 points on an equispaced grid as shown in Figure
2, and we are asked to compute the rst derivative f

i
at i = 2, . . . , N 1 using the forward,
backward, and central dierence formulas (3), (5), and (10). If we refer to the approximation
Handout 5 05/08/02 3
Figure 2: A data set consisting of N = 16 points.
of the rst derivative as
f
x
, then these three formulas for the rst derivative on an equispaced
grid with x
i
= x can be approximated as
Forward dierence
f
x
=
f
i+1
f
i
x
, (11)
Backward dierence
f
x
=
f
i
f
i1
x
, (12)
Central dierence
f
x
=
f
i+1
f
i1
2x
. (13)
These three approximations to the rst derivative of the data shown in Figure 2 are shown
in Figure 3. Now lets say we are given ve more data sets, each of which denes the same
function f(x
i
), but each one has twice as many grid points as the previous one to dene the
function, as shown in Figure 4. The most accurate approximations to the rst derivatives
will be those that use the most rened data with N = 512 data points. In order to quantify
how much more accurate the solution gets as we add more data points, we can compare the
derivative computed with each data set to the most resolved data set. To compare them, we
can plot the dierence in the derivative at x = 0.5 and call it the error, such that
Error =

f
x

f
x

n=6

(14)
where n = 1, . . . , 5 is the data set and n = 6 corresponds to the most rened data set. The
result is shown in Figure 5 on a log-log plot. For all three cases we can see that the error
closely follows the form
Error = kx
n
, (15)
where k = 1.08 and n = 1 for the forward and backward dierence approximations, and
k = 8.64 and n = 2 for the central dierence approximation. When we plot the error of a
numerical method and it follows the form of equation (15), then we say that the method is
Handout 5 05/08/02 4
Figure 3: Approximation to the rst derivative of the data shown in Figure 2 using three
dierent approximations.
n
th
order and that the error can be written as O(x
n
). Because n = 1 for the forward and
backward approximations, they are said to be rst order methods, while since n = 2 for
the central approximation, it is a second order method.
Taylor tables
The rst order nite dierence formulas in the previous sections were written in the form
df
dx
=
f
x
+ Error , (16)
where
f
x
is the approximate form of the rst derivative
df
dx
with some error that determines
the order of accuracy of the approximation. In this section we dene a general method of
estimating derivatives of arbitrary order of accuracy. We will assume equispaced points, but
the analysis can be extended to arbitrarily spaced points. The n
th
derivative of a discrete
function f
i
at points x = x
i
can be written in the form
d
n
f
dx
n

x=x
i
=

n
f
x
n
+O(x
m
) , (17)
where

n
f
x
n
=
j=+N
r

j=N
l
a
j+N
l
f
i+j
, (18)
and m is the order of accuracy of the approximation, a
j+N
l
are the coecients of the ap-
proximation, and N
l
and N
r
dene the width of the approximation stencil. For example, in
the central dierence approximation to the rst derivative,
f

i
=
1
2x
f
i1
+ 0f
i
+
1
2x
f
i+1
+O

x
2

, (19)
= a
0
f
i1
+ a
1
f
i
+ a
2
f
i+1
+O

x
2

. (20)
Handout 5 05/08/02 5
Figure 4: The original data set and 5 more, each with twice as many grid points as the
previous one.
In this case, N
l
= 1, N
r
= 1, a
0
= 1/2x, a
1
= 0, and a
2
= +1/2x. In equation (18) the
discrete values f
i+j
can be written in terms of the Taylor series expansion about x = x
i
as
f
i+j
= f
i
+ jxf

i
+
(jx)
2
2
f

i
+ ... (21)
= f
i
+

k=1
(jx)
k
k!
f
(k)
i
. (22)
Using this Taylor series approximation with m+2 terms for the f
i+j
in equation (18), where
m is the order of accuracy of the nite dierence formula, we can substitute these values into
equation (17) and solve for the coecients a
j+N
l
to derive the appropriate nite dierence
formula.
As an example, suppose we would like to determine a second order accurate approxima-
tion to the second derivative of a function f(x) at x = x
i
using the data at x
i1
, x
i
, and
Handout 5 05/08/02 6
Figure 5: Depiction of the error in computing the rst derivative for the forward, backward,
and central dierence formulas
x
i+1
. Writing this in the form of equation (17) yields
d
2
f
dx
2
=
f
x
+O

x
2

, (23)
where, from equation (18),
f
x
= a
0
f
i1
+ a
1
f
i
+ a
2
f
i+1
. (24)
The Taylor series approximations to f
i1
and f
i+1
to O(x
4
) are given by
f
i1
f
i
xf

i
+
x
2
2
f

i

x
3
6
f

i
+
x
4
24
f
iv
i
, (25)
f
i+1
f
i
+ xf

i
+
x
2
2
f

i
+
x
3
6
f

i
+
x
4
24
f
iv
i
, (26)
Rather than substitute these into equation (24), we create a Taylor table, which requires
much less writing, as follows. If we add the columns in the table then we have
a
0
f
i1
+ a
1
f
i
+ a
2
f
i+1
= (a
0
+ a
1
+ a
2
)f
i
+ (a
0
+ a
2
)xf

i
+ (a
0
+ a
2
)
x
2
2
f

i
+ (a
0
+ a
2
)
x
3
6
f

i
+ (a
0
+ a
2
)
x
4
24
f
iv
i
. (27)
Handout 5 05/08/02 7
Term in (24) f
i
xf

i
x
2
f

i
x
3
f

i
x
4
f
iv
i
a
0
f
i1
a
0
a
0
a
0
/2 a
0
/6 a
0
/24
a
1
f
i
a
1
0 0 0 0
a
2
f
i+1
a
2
a
2
a
2
/2 a
2
/6 a
2
/24
0 0 1 ? ?
Because we would like the terms containing f
i
and f

i
on the right hand side to vanish, then
we must have a
0
+ a
1
+ a
2
= 0 and a
0
+ a
2
= 0. Furthermore, since we want to retain the
second derivative on the right hand side, then we must have a
0
+ a
2
= 1. This yields three
equations in three unknowns for a
0
, a
1
, and a
2
, namely,
a
0
+ a
1
+ a
2
= 0
a
0
+ a
2
= 0
a
0
/2 + a
2
/2 = 1
, (28)
in which the solution is given by a
0
= a
2
= 1 and a
1
= 2. Substituting these values into
equation (27) results in
f
i1
2f
i
+ f
i+1
= x
2
f

i
+
x
4
12
f
iv
i
, (29)
which, after rearranging, yields the second order accurate nite dierence formula for the
second derivative as
f

i
=
f
i1
2f
i
+ f
i+1
x
2
+O

x
2

, (30)
where the error term is given by
Error =
x
2
12
f
iv
i
. (31)
As another example, let us compute the second order accurate one-sided dierence for-
mula for the rst derivative of f(x) at x = x
i
using x
i
, x
i1
, and x
i2
. The Taylor table
for this example is given below. By requring that a
0
+ a
1
+ a
2
= 0, 2a
0
a
1
= 1, and
Term f
i
xf

i
x
2
f

i
x
3
f

i
x
4
f
iv
i
a
0
f
i2
a
0
2a
0
2a
0
4a
0
/3 2a
0
/3
a
1
f
i1
a
1
a
1
+a
1
/2 a
1
/6 a
1
/24
a
2
f
i
a
2
0 0 0 0
0 1 0 ? ?
2a
0
+ a
1
/2 = 0, we have a
0
= 1/2, a
1
= 2, and a
2
= 3/2. Therefore, the second order
accurate one-sided nite dierence formula for the rst derivative is given by
df
dx
=
f
i2
4f
i1
+ 3f
i
2x
+O

x
2

, (32)
where the error term is given by
Error =
x
2
3
f

i
. (33)
Handout 5 05/08/02 8
Higher order nite dierence formulas can be derived using the Taylor table method described
in this section. These are shown in Applied numerical analysis, sixth edition, by C. F. Gerald
& P. O. Wheatley, Addison-Welsley, 1999., pp. 373-374.
Handout 6 06/08/02 1
Lecture 5: Numerical integration
Discretizing the integral
In this lecture we will derive methods of computing integrals of functions that cannot be
solved analytically. A typical function that cannot be solved analytically is the error function,
Erf(y) =
2

_
y
0
e
x
2
dx . (1)
To leave the analysis in its most general form, we will consider an evaluation of the integral
_
b
a
f(x) dx . (2)
This integral is evaluated numerically by splitting up the domain [a, b] into N equally spaced
intervals as shown in Figure 1. Because we assume that the intervals are constant, then the
interval width is given by
Figure 1: Discretization of a function f(x) into N = 8 equally spaced subintervals over [a, b].
h = x = x
i+1
x
i
. (3)
The idea behind the numerical integration formulas is to approximate the integral in each
subinterval and add up the N approximate integrals to obtain the integral over [a, b].
Handout 6 06/08/02 2
Trapezoidal rule
The Trapezoidal rule approximates the function within each subinterval using the rst term
in the Taylor series expansion about x
i
, such that, in the range [x
i
, x
i+1
],
f(x) = f
i
+ (x x
i
)f

i
+
1
2
(x x
i
)
2
f

i
+O
_
(x x
i
)
3
_
. (4)
Using this approximation, we can evaluate the integral over [x
i
, x
i+1
] with
_
x
i+1
x
i
f(x) dx =
_
x
i+1
x
i
f
i
+ (x x
i
)f

i
+
1
2
(x x
i
)
2
f

i
dx , (5)
where we have omitted the truncation error term since the last term will end up being the
error term in the analysis. Making a change of variables such that
s =
x x
i
x
i+1
x
i
=
x x
i
h
, (6)
we have
_
x
i+1
x
i
f(x) dx = h
_
1
0
f
i
+ hsf

i
+
1
2
h
2
s
2
f

i
ds ,
= hsf
i
+
1
2
h
2
s
2
f

i
+
1
6
h
3
s
3
f

1
0
,
= hf
i
+
1
2
h
2
f

i
+
1
6
h
3
f

i
.
Substituting in an approximation for the rst derivative
f

i
=
f
i+1
f
i
h

h
2
f

i
, (7)
we have
_
x
i+1
x
i
f(x) dx = hf
i
+
1
2
h
2
_
f
i+1
f
i
h

h
2
f

i
_
+
1
6
h
3
f

i
,
=
1
2
h(f
i
+ f
i+1
)
1
12
h
3
f

i
. (8)
Which shows that the Trapezoidal rule approximates the integral of the function over the
subinterval [x
i
, x
i+1
] as the area of the trapezoid created by the function values at f
i
and
f
i+1
, as shown in Figure 2.
The integral over [a, b] is evaluated by taking the sum of the approximate integrals eval-
uated in each subinterval as
_
b
a
f(x) dx =
N1

i=0
_
x
i+1
x
i
f(x) dx ,
=
N1

i=0
_
1
2
h(f
i
+ f
i+1
)
1
12
h
3
f

i
_
,
=
1
2
h(f
0
+ 2f
1
+ 2f
2
+ . . . + 2f
N2
+ 2f
N1
+ f
N
)
h
3
12
N1

i=0
f

i
.
Handout 6 06/08/02 3
Figure 2: Depiction of how the trapezoidal rule approximates the integral on the subinterval
[x
i
, x
i+1
].
The error term is given by
Error =
h
3
12
_
f

0
+ f

1
+ f

2
+ . . . + f

N1
_
,
=
Nh
3
12
_
f

0
+ f

1
+ f

2
+ . . . + f

N1
N
_
.
If the mean value of f

i
is given by
_
f

0
+ f

1
+ f

2
+ . . . + f

N1
N
_
, (9)
then we know that it must lie within the bounds of f

(x), and hence it can be represented


as f

() for some such that


f

() =
_
f

0
+ f

1
+ f

2
+ . . . + f

N1
N
_
. (10)
Therefore, since Nh = (b a), the error becomes
Error =
(b a)h
2
12
f

() = O
_
h
2
_
, (11)
which shows that the trapezoidal rule is second order accurate.
Simpsons rules
Simpsons 1/3 rule
Simpsons 1/3 rule approximates the function within the interval [x
i
, x
i+2
] as a quadratic,
as shown in Figure 3. This is done by writing the Taylor series expansion of f(x) about
Handout 6 06/08/02 4
Figure 3: Depiction of how the Simpsons 1/3 rule approximates the function f(x) with a
quadratic through x
i
, x
i+1
, and x
i+2
.
x = x
i+1
to obtain
f(x) = f
i+1
+ (x x
i+1
)f

i+1
+
1
2
(x x
i+1
)
2
f

i+1
+
1
6
(x x
i+1
)
3
f

i+1
+
1
24
(x x
i+1
)
4
f
(iv)
i+1
+O
_
(x x
i+1
)
5
_
.
The integral in the subinterval [x
i
, x
i+2
] is then given by
_
x
i+2
x
i
f(x) dx =
_
x
i+2
x
i
f
i+1
+ (x x
i+1
)f

i+1
+
1
2
(x x
i+1
)
2
f

i+1
+
1
6
(x x
i+1
)
3
f

i+1
+
1
24
(x x
i+1
)
4
f
(iv)
i+1
dx ,
where the truncation error has been left o since the last term will end up being the error.
Making a change of variables such that
s =
2(x x
i+1
)
x
i+2
x
i
=
x x
i+1
h
, (12)
we have
_
x
i+2
x
i
f(x) dx = h
_
+1
1
f
i+1
+ hsf

i+1
+
1
2
h
2
s
2
f

i+1
+
1
6
h
3
s
3
f

i+1
+
1
24
h
4
s
4
f
(iv)
i+1
ds , (13)
which becomes
_
x
i+2
x
i
f(x) dx = hsf
i+1
+
1
2
h
2
s
2
f

i+1
+
1
6
h
3
s
3
f

i+1
+
1
24
h
4
s
4
f

i+1
+
1
120
h
5
s
5
f
(iv)
i+1

+1
1
,
= 2hf
i+1
+
1
3
h
3
f

i+1
+
1
60
h
5
f
(iv)
i+1
.
Handout 6 06/08/02 5
Using the second order accurate approximation to the second derivative
f

i+1
=
f
i
2f
i+1
+ f
i+2
h
2

h
2
12
f
(iv)
i+1
, (14)
the integral becomes
_
x
i+2
x
i
f(x) dx = 2hf
i+1
+
1
3
h
3
_
f
i
2f
i+1
+ f
i+2
h
2

h
2
12
f
(iv)
i+1
_
+
1
60
h
5
f
(iv)
i+1
.
=
1
3
h(f
i
+ 4f
i+1
+ f
i+2
)
1
90
h
5
f
(iv)
i+1
. (15)
The integral over [a, b] is taken by taking the sum of the approximate integrals, as in
_
b
a
f(x) dx =
N/2

i=0
_
x
i+2
x
i
f(x) dx ,
=
N/2

i=0
_
1
3
h(f
i
+ 4f
i+1
+ f
i+2
)
1
90
h
5
f
(iv)
i+1
_
.
The sum is given by
1
3
h( f
0
+ 4f
1
+ f
2
+
f
2
+ 4f
3
+ f
4
+
f
4
+ 4f
5
+ f
6
+ . . . +
f
N6
+ 4f
N5
+ f
N4
+
f
N4
+ 4f
N3
+ f
N2
+
f
N2
+ 4f
N1
+ f
N
) ,
which becomes
_
b
a
f(x) dx =
1
3
(f
0
+ 4f
1
+ 2f
2
+ 4f
3
+ . . . + 4f
N3
+ 2f
N2
+ 4f
N1
+ f
N
)

1
90
h
5
N/2

i=0
f
(iv)
i+1
.
The error term is given by
Error =
1
90
h
5
N/2

i=0
f
(iv)
i+1
, (16)
which, using the same arguments as those for the Trapezoidal rule, becomes
Error =
1
180
(b a)h
4
f
(iv)
() = O
_
h
4
_
, (17)
which shows that Simpsons 1/3 rule is fourth order accurate.
Handout 6 06/08/02 6
Simpsons 3/8 rule
Simpsons 3/8 rule approximates the function within the subinterval [x
i
, x
i+3
] using a quartic.
The Taylor series expansion is performed about x
i+3/2
to obtain
f(x) = f
i+3/2
+ (x x
i+3/2
)f

i+3/2
+
1
2
(x x
i+3/2
)
2
f

i+3/2
+
1
6
(x x
i+3/2
)
3
f

i+3/2
+
1
24
(x x
i+3/2
)
4
f
(
iv)
i+3/2
+O
_
(x x
i+3/2
)
5
_
. (18)
Integrating this function in a similar manner to that used for the 1/3 rule yields
_
b
a
f(x) dx =
3
8
h(f
0
+ 3f
1
+ 3f
2
+ 2f
3
+ 3f
4
+ 3f
5
+ . . .
+ 2f
N3
+ 3f
N2
+ 3f
N1
+ f
N
) (19)

1
80
(b a)h
4
f
(iv)
() .
Summary of integration formulas and pseudocodes
Trapezoidal rule
_
b
a
f(x) dx =
1
2
h(f
0
+ 2f
1
+ 2f
2
+ . . . + 2f
N2
+ 2f
N1
+ f
N
) + Error
Error =
1
12
(b a)h
2
f

() = O
_
h
2
_
1. If f
i
and h are already known discretely on an equispaced grid with N + 1 points,
then proceed to step 2.
Otherwise, choose interval [a, b] and set h = (b a)/N.
for i = 1 to N + 1
Set x
i
= a + h(i 1)
Set f
i
= f(x
i
)
end
2. Set I = 0
for i = 2 to N
Set I = I + hf
i
end
Set I = I +
1
2
h(f
1
+ f
N+1
)
3. The integral is given by I.
Handout 6 06/08/02 7
Simpsons 1/3 rule (N divisible by 2)
_
b
a
f(x) dx =
1
3
(f
0
+ 4f
1
+ 2f
2
+ 4f
3
+ . . . + 4f
N3
+ 2f
N2
+ 4f
N1
+ f
N
)
+ Error
Error =
1
180
(b a)h
4
f
(iv)
() = O
_
h
4
_
1. If f
i
and h are already known discretely on an equispaced grid with N + 1 points,
where N is even, then proceed to step 2.
Otherwise, choose interval [a, b] and set h = (b a)/N, with N even.
for i = 1 to N + 1
Set x
i
= a + h(i 1)
Set f
i
= f(x
i
)
end
2. Set I = 0
for i = 1 to
N
2
Set I = I +
4
3
hf
2i
end
for i = 1 to
N
2
1
Set I = I +
2
3
hf
2i+1
end
Set I = I +
1
3
h(f
1
+ f
N+1
)
3. The integral is given by I.
Handout 6 06/08/02 8
Simpsons 3/8 rule (N divisible by 3)
_
b
a
f(x) dx =
3
8
h(f
0
+ 3f
1
+ 3f
2
+ 2f
3
+ 3f
4
+ 3f
5
+ . . .
+ 2f
N3
+ 3f
N2
+ 3f
N1
+ f
N
) (20)
+ Error
Error =
1
80
(b a)h
4
f
(iv)
() .
1. If f
i
and h are already known discretely on an equispaced grid with N + 1 points,
where N is divisible by 3, then proceed to step 2.
Otherwise, choose interval [a, b] and set h = (b a)/N, with N divisible by 3.
for i = 1 to N + 1
Set x
i
= a + h(i 1)
Set f
i
= f(x
i
)
end
2. Set I = 0
for i = 2 to N
Set I = I +
9
8
hf
i
end
for i = 1 to
N
3
1
Set I = I
3
8
hf
3i+1
end
Set I = I +
3
8
h(f
1
+ f
N+1
)
3. The integral is given by I.
Handout 8 13/08/02 1
Lecture 6: Review of ODEs
Types of ODEs
Homogeneous vs. inhomogeneous ODEs
The most basic rst order ordinary dierential equation can be written as
dy
dt
+ y = 0 . (1)
We have written it in terms of t but it can also be in terms of x, or any other variable.
The dependent variable is y, while the independent variable is t. This is a rst order
homogeneous ODE. It is rst order because the highest derivative is the rst derivative of
y(t), and, because it can be written in terms of the function y(t) and its derivatives only, it
is homogeneous. By integrating the equation we see that the general solution is given by
y(t) = ae
t
, (2)
where a is some constant that is determined by the initial condition which can either be in
terms of y(0) or y

(0). An example of a rst order inhomogeneous ODE is given by


dy
dt
+ y = z(t) , (3)
which is inhomogeneous because the general solution is given by the sum of the homogenous
solution, say y
h
(t), and the particular solution y
p
(t), so that y(t) = y
h
(t) + y
p
(t), where the
homogeneous solution is given by the solution of
dy
h
dt
+ y
h
= 0 , (4)
and the particular solution solves
dy
p
dt
+ y
p
= z(t) . (5)
Linear vs. nonlinear ODEs
An ODE will be nonlinear when the terms consist of products of the dependent variable.
For example,
dy
dt
+ y = 0 , (6)
is a linear homogeneous rst order ODE, while
dy
dt
+ y
2
= 0 (7)
Handout 8 13/08/02 2
and
y
dy
dt
+ y = 0 (8)
are nonlinear, because they consist of products of the dependent variable y(t) and its
derivatives y
n
(t) (Remember that the zeroth derivative is the function itself, i.e. y
0
(t) = y(t)).
An easy way to check to see if an equation is linear, substitute in the depedent variable
y times a constant ky, and see if the equation changes. For example, substituting in ky for
y in equation (6), we have
d(ky)
dt
+ (ky) = 0
dy
dt
+ y = 0 , (9)
which does not change the equation. However, if we make the same substitution into equation
(7), we have
d(ky)
dt
+ (ky)
2
= 0
dy
dt
+ ky
2
= 0 , (10)
which does change the ODE. Therefore, equation (7) is nonlinear.
Constant vs. non-constant coecients
If an ODE consists of products of the dependent variable and the independent variable, then
it is an ODE with non-constant coecients. For example,
dy
dt
+ yt = 0 (11)
and
sin(t)
dy
dt
+ y = 0 (12)
are linear homogeneous rst order ODEs, but because they consists of products of y
n
(t)
and functions of t, they are ODEs with non-constant coecients.
Higher order and systems of ODEs
In the previous examples, all of the ODEs have consisted of rst order ODEs. Higher
order ODEs consist of higher order derivatives of the dependent variable, such as
d
2
y
dt
2
+ y = 0 , (13)
which is a second order linear homogeneous ODE, and
d
4
y
dx
2
+
dy
dx
= 0 , (14)
which is a fourth order linear homogeneous ODE.
Handout 8 13/08/02 3
All higher order ODEs can be written as systems of ODEs, in that instead of being
written as an ODE in one variable, they are written as an ODE of several variables. The
second order ODE
d
2
y
dt
2
+ y = 0 , (15)
can be written as a system of two rst order ODEs if we let y
1
= y and y
2
= dy/dt, so
that
dy
1
dt
= y
2
, (16)
dy
2
dt
= y
1
. (17)
If we dene the vector y as
y =

y
1
y
2

(18)
and the matrix A as
A =

0 1
0

, (19)
then the system of ODEs in equations (16) and (17) can be written in matrix-vector form as
d
dt

y
1
y
2

0 1
0

y
1
y
2

, (20)
or, in more compact notation,
dy
dt
= Ay . (21)
As another example, consider the fourth order linear inhomogenous ODE given by
d
4
y
dx
4
+ y
dy
dx
= 0 . (22)
If we let
y =

y
1
y
2
y
3
y
4

y
y

, (23)
then the ODE in (22) can be written as four rst order ODEs as
dy
1
dx
= y
2
,
dy
2
dx
= y
3
,
dy
3
dx
= y
4
,
dy
4
dx
= y
1
y
2
,
Handout 8 13/08/02 4
or, in matrix form,
dy
dx
= Ay , (24)
where
A =

0 1 0 0
0 0 1 0
0 0 0 1
y
2
0 0 0

. (25)
This method of rewriting ODEs in matrix form forms the basis for the numerical solution of
ODEs of all order, since we can derive algorithms to solve rst order ODEs and apply those
to their counterparts in matrix-vector form.
Initial and boundary conditions
All of the preceding examples considered dierent types of ODEs. But just as important
as the ODEs themselves are the initial and boundary conditions. An ODE without initial
or boundary conditions is like a brain without a body, or a ship without any water. The
ODE itself determines the general solution of a problem. It can only be written in terms
of unknown coecients. But the initial and boundary conditions determine what those
coecients must be in order to solve the problem.
Initial conditions are specied for time-dependent problems, while boundary conditions
are specied for space-dependent problems. The number of initial or boundary conditions
required depends on the order of the ODE. Consider, for example, the equation for the
height of a tennis ball that is dropped from your hand to the ground. Neglecting the forces
of friction imposed by the air on the ball, the ODE governing the height of the tennis ball is
given by
d
2
y
dt
2
= g , (26)
where y is the height of the tennis ball above the ground, and g = 9.81m/s
2
is the acceleration
due to gravity. The general solution of this second order linear inhomogeneous ODE is given
by
y(t) =
1
2
gt
2
+ at + b , (27)
where a and b are unknown coecients. In order to determine what a and b are, you must
specify two initial conditions to determine the solution of its height with time. These
initial conditions are given by what you knew about the ball when you dropped it. That is,
you released it from rest, and you released it from a certain height o the ground. These
initial conditions in mathematical form are given by
You dropped the ball from 1 m above the ground: y(t = 0) = 1 .
You dropped the ball from rest: v(t = 0) = y

(t = 0) = 0 .
Substituting these into the general solution (27), we have the equation for the height of the
ball as
y(t) = 1
1
2
gt
2
. (28)
Handout 8 13/08/02 5
The fundamental rule for initial conditions is that the number of initial conditions must
equal the order of the highest derivative in the problem. Therefore, you need two initial
conditions for a second order time-dependent ODE, while you need three boundary conditions
for a third order space-dependent ODE.
Solution methods for rst order ODEs
For more information see: http://www.math.hmc.edu/calculus/tutorials/odes/
Separable ODEs
An ODE is separable if it can be written in the form
f(x)dx = g(y)dy . (29)
We can then integrate both sides to nd the solution y = h(x) (if we can do it analytically,
of course). As an example, consider the rst order ODE
dy
dx
+ xy = 0 . (30)
This ODE is separable because it can be written as
dy
y
= x dx , (31)
which can be solved by integrating both sides to yield
y = ae
x
2
/2
, (32)
where a is a constant.
Integrating factor
Suppose we have a rst order linear ODE of the form
dy
dx
+ f(x)y = g(x) . (33)
If we multiply both sides by a new function h(x), we have
h
dy
dx
+ hfy = gh,
Using the chain rule:
d
dx
(hy) = h
dy
dx
+
dh
dx
y ,
d
dx
(hy) y
dh
dx
+ hfy = hg ,
d
dx
(hy) + y

hf
dh
dx

= hg .
Handout 8 13/08/02 6
If we require that
hf
dh
dx
= 0 , (34)
then we have
d
dx
(hy) = hg , (35)
and the solution is then given by
y =
1
h(x)

g(x)h(x) dx . (36)
From equation (34), the integrating factor h(x) must be given by
h(x) = e

f(x) dx
. (37)
Change of variables
If a rst order ODE cannot be separated but it can be written in the form
dy
dx
= f(x, y) , (38)
where f(kx, ky) = f(x, y), then the change of variables z = y/x will make it separable. This
is known as a homogeneous equation of order zero. As an example, consider the ODE
dy
dx
=
y x
x 4y
. (39)
To test if this is a homogeneous equation of order zero, let x = kx and y = ky,
f(kx, ky) =
ky kx
kx 4ky
=
y x
x 4y
= f(x, y) . (40)
Making the substitution z = y/x, or y = zx, we have
dy
dx
=
zx x
x 4zx
,
x
dz
dx
+ z =
zx x
x 4zx
,
x
dz
dx
=
4z
2
1
1 4z
. (41)
This equation is separable and the solution is given by
(2y + x)
3
(2y x) = c . (42)
Handout 9 15/08/02 1
Lecture 7: Numerical solution of ODEs I
The model ODE
In this lecture we will be learning how to solve the rst order ODE
dy
dt
= f(t, y) . (1)
The reason we analyze such a simplied equation is because, as we saw in the previous
lecture, all higher ODEs can be written in the form of a system of rst order ODEs, which
we write as
dy
dt
= F(t, y) . (2)
These higher order systems can be solved using the same methods we develop to solve the
model ODE (1).
Forward and backward Euler: explicit vs. implicit methods
Discretization
The model ODE (1) is written discretely by choosing a time step (or space step) at which
we would like to evaluate both sides of the equation. Lets say we want to evaluate both
sides of (1) at time step n. In this case, the model ODE would be written as
dy
dt

n
= f
n
, (3)
where f
n
= f(t
n
, y
n
). So far the discretization is exact. We have not made any approxima-
tions yet because we are assuming that we can evaluate everything exactly. If we approximate
the left hand side with the forward discrete derivative with
dy
dt

n
=
y
n+1
y
n
t
+O(t) , (4)
then we have the rst order accurate approximate to the model equation (1) as
y
n+1
y
n
t
= f
n
+O(t) , (5)
or
y
n+1
= y
n
+ tf
n
+O

t
2

. (6)
This equation is known as the forward Euler method because it uses the forward discrete
derivative in time to evaluate the left hand side. Since in order to evaluate y
n+1
, we use
information from time step n, this is known as an explicit method.
Handout 9 15/08/02 2
If we choose to write the model equation (1) at time step n + 1
dy
dt

n+1
= f
n+1
, (7)
then this can be approximated using the backward discrete derivative to yield
y
n+1
y
n
t
= f
n+1
+O(t) , (8)
or
y
n+1
= y
n
+ tf
n+1
+O

t
2

. (9)
This is known as the backward Euler method because is uses the backward nite dierence
to evaluate the rst derivative. If you were to evaluate y at time step n + 1 you would see
that you need information at time step n + 1 in order to compute f
n+1
. When you need
information at the next time step, the method is known as an implicit method.
An example
Lets say you want to numerically determine the evolution of the ODE
dy
dt
= y cos y , (10)
with y(0) = 1. If we use the forward Euler method, we have
y
n+1
= y
n
+ ty
n
cos y
n
,
y
n+1
= y
n
(1 + t cos y
n
) . (11)
We can easily obtain y
1
if y
0
is known because everything on the right hand side is known
explicitly. If we use the backward Euler method, however, we have
y
n+1
= y
n
+ ty
n+1
cos y
n+1
,
y
n+1
(1 t cos y
n+1
) = y
n
. (12)
Now, instead of having the solution of y
1
in terms of y
0
, we have a horrendous nonlinear
equation for y
1
that must be solved using a nonlinear equation solver, such as Newtons
method. Clearly, then, in this case, the explicit method is much faster than the implicit
method because we do not have to iterate at every time step to nd the solution. The next
section shows the advantages of using implicit methods.
The linearized ODE
In the preceding example we saw how the forward Euler method was much easier and faster
to use than the backward Euler method. Any time something seems too good to be true
in numerical methods, it really is too good to be true. Which leads us to the rst law of
numerical methods: There is no free lunch!. The problem with the forward Euler method,
Handout 9 15/08/02 3
despite its simplicity, is that it can be unstable, while the implicit backward Euler method
is unconditionally stable.
In order to study the stability of numerical methods for ODEs, we rst need a model
equation that we can use to apply each method to and analyze its stability properties. This
model equation is the linear ODE
dy
dt
= y , (13)
where is some characteristic value of the ODE that arises from assuming that the ODE
behaves in this linear manner. We need to do this because we would like to analyze the
linear stability characteristics of numerical methods applied to all ODEs in general. Take
for example, the ODE used in the previous example,
dy
dt
= y cos y . (14)
In order to analyze the stability properties of this nonlinear ODE, we need to linearize it.
When we linearize an ODE, we analyze its behavior in the vicinity of some point t
0
, y
0
to
determine its stability properties. To analyze the behavior of an ODE in the vicinity of y
0
and t
0
, we make the subsitution y = y
0
+y

and t = t
0
+t

, and assume that y

= y y
0
and
t

= t t
0
represent very small quantities. Substituting these values into equation (14), we
have
dy
dt
=
dt

dt
d(y
0
+ y

)
dt

= (y
0
+ y

) cos(y
0
+ y

) . (15)
In order to linearize this, we need to use the Taylor Series approximation of the cosine
function
cos(y
0
+ y

) = cos(y
0
) y

sin(y
0
) +O

(y

)
2

. (16)
Substitution into equation (15) yields
dy

dt

+ (cos y
0
y
0
sin y
0
) y

= y
0
cos y
0
+O

(y

)
2

. (17)
If we assume that y

is very small, then the second order term is negligible, and we have
dy

dt

+ (cos y
0
y
0
sin y
0
) y

= y
0
cos y
0
, (18)
which is a linear inhomogeneous ODE in terms of y

and t

that represents the behavior of


the original nonlinear ODE in equation (14) in the vicinity of y
0
, t
0
. If we substitute back
in the values for y

= y y
0
and t

= t t
0
we have
dy
dt
+ (cos y
0
y
0
sin y
0
) y = 2y
0
cos y
0
y
2
0
sin y
0
. (19)
If we split the linearized solution into its homogeneous and particular parts with y = y
h
+y
p
,
then the homogenous solution satises
dy
h
dt
= y
h
, (20)
where = (cos y
0
y
0
sin y
0
). If we analyze the stability properties of this linearized ODE,
then we can apply that analysis to the nonlinear problem by seeing if it remains stable at
all values of t
0
and y
0
.
Handout 9 15/08/02 4
Stability
If we apply the forward Euler method to the model linearized ODE
dy
dt
= y , (21)
then we have
y
n+1
= y
n
hy
n
,
= y
n
(1 h) , (22)
where h = t. If we write the amplication factor at each time step as
G
n
=

y
n+1
y
n

, (23)
then, for the forward Euler method, we have
G
n
= |1 h| , (24)
where the vertical bars imply the modulus, to account for the possibility that may not
necessarily be real. If the amplication is less than 1, then we are guaranteed that the
solution will not grow without bound, and hence it will be stable. If we assume that is
real, then for stability we must have
1 < 1 h < +1 , (25)
which implies that, for stability, 0 < h < 2, if is real. This translates to a time step
restriction for stability, for which 0 < t < 2/.
Now consider the backward Euler method applied to the model linearized ODE. This
yields
y
n+1
= y
n
hy
n+1
,
(1 + h) y
n+1
= y
n
, (26)
and the amplication factor is given by
G
n
=

1
1 + h

. (27)
If is real, then we must have h > 0, or t > 0. The backward Euler method is hence
stable in the linear sense for all t! While it may be more expensive to use the implicit
method, as in the example discretization of equation (14), it is guaranteed to be stable.
The greatest drawback to the Euler methods is that they are rst order accurate. In the
next sections, we derive more accurate methods to solve ODEs.
Handout 9 15/08/02 5
Euler predictor-corrector method
The improved Euler method is derived by integrating the model ODE from t
n
to t
n+1
to
obtain

t
n+1
t
n
dy
dt
dt =

t
n+1
t
n
f(y) dt . (28)
Using the trapezoidal rule, we can approximate the above integral to third order accuracy
with
y
n+1
y
n
=
t
2
(f
n
+ f
n+1
) +O

t
3

. (29)
to obtain the second order accurate approximation to the model ODE as
y
n+1
y
n
t
=
1
2
(f
n
+ f
n+1
) +O

t
2

. (30)
As it is, this method is an implicit method because we need information at time step n + 1
in order to evaluate the right hand side. Instead of using f
n+1
, we will use a predicted value,
f

= f(y

), where y

is obtained with the forward Euler predictor step


y

= y
n
+ tf
n
. (31)
The Euler predictor-corrector method is then given in two steps:
Predictor: y

= y
n
+ tf
n
+O

t
2

,
Corrector: y
n+1
= y
n
+
t
2
(f
n
+ f

) +O

t
3

. (32)
This method is second order accurate, since y

approximates y
n+1
to second order accuracy.
Substituting y

= y
n+1
+O(t
2
) into f

yields
f

= f(y

) ,
= f(y
n+1
+O

t
2

) ,
= f(y
n+1
) +O

t
3

.
Substituting this result into the corrector yields
y
n+1
y
n
t
=
1
2
(f
n
+ f
n+1
) +O

t
2

, (33)
which is identical in accuracy to equation (30).
Runge-Kutta methods
The Runge-Kutta methods are the most popular methods of solving ODEs numerically.
They can be derived for any order of accuracy, but we will derive the second order method
rst. The second order Runge-Kutta method is derived by taking two steps to get from n
to n + 1 with
y
n+1
= y
n
+ ak
1
+ bk
2
,
k
1
= hf(t
n
, y
n
) ,
k
2
= hf(t
n
+ h, y
n
+ k
1
) , (34)
Handout 9 15/08/02 6
where h = t is the time step. In order to determine what the constants a, b, , and
are, we must use the Taylor series to match the terms and make the method second order
accurate. By substituting in for k
1
and k
2
, we have
y
n+1
= y
n
+ ahf(t
n
, y
n
) + bhf [t
n
+ h, y
n
+ hf(t
n
, y
n
)] . (35)
In order to expand the third term in equation (35), we need to use the Taylor series expansion
of a function of more than one variable, which is given by
f(t + t, y + y) = f(t, y) + t
f
t
+ y
f
y
+O(ty) , (36)
which, when applied to the third term in equation (35), results in
f [t
n
+ h, y
n
+ hf(t
n
, y
n
)] = f + h
f
t
+ hf
f
y
, (37)
where all functions and derivatives are evaluated at time step n, and we have left o the
truncation error. Substituting this into equation (35) results in
y
n+1
= y
n
+ h(a + b)f + bh
2
f
t
+ bh
2
f
f
y
. (38)
Since y is only dependant on the variable t, then the Taylor series expansion about y
n+1
is
given by the ordinary derivatives with
y
n+1
= y
n
+ h
dy
dt
+
h
2
2
d
2
y
dt
2
+O

h
3

. (39)
But since the ODE we are trying to solve is given by
dy
dt
= f , (40)
then we know that
d
2
y
dt
2
=
df
dt
, (41)
so equation (39) becomes
y
n+1
= y
n
+ hf +
h
2
2
df
dt
, (42)
where we have left o the truncation error. Since from the chain rule, if f is a function of t
and y, then
df =
f
t
dt +
f
y
dy , (43)
then
df
dt
=
f
t
+
f
y
dy
dt
=
f
t
+
f
y
f . (44)
Substitution into equation (42) yields
y
n+1
= y
n
+ hf +
h
2
2
f
t
+
h
2
2
f
f
y
. (45)
Handout 9 15/08/02 7
Comparing this equation to equation (38),
y
n+1
= y
n
+ hf +
h
2
2
f
t
+
h
2
2
f
f
y
,
y
n+1
= y
n
+ h(a + b)f + bh
2
f
t
+ bh
2
f
f
y
, (46)
in order for the terms to match, we must have
a + b = 1 ,
b =
1
2
,
b =
1
2
. (47)
This is a system of three equations in four unknowns. Therefore, we are free to choose one
independantly and the others will then be determined, and the method will still be a second
order method. If we let a = 1/2, then the other parameters must be b = 1/2, = 1, and
= 1, so that the second order Runge-Kutta method is given by
y
n+1
= y
n
+
1
2
k
1
+
1
2
k
2
,
k
1
= hf(t
n
, y
n
) ,
k
2
= hf(t
n
+ h, y
n
+ hf(t
n
, y
n
)) , (48)
which is just the Euler predictor-corrector scheme, since k
2
= hf

, and substitution results


in
y
n+1
= y
n
+
h
2
(f
n
+ f

) . (49)
Higher order Runge-Kutta methods can be derived using the same technique. The most
popular method is the fourth order Runge-Kutta method, or RK4 method, which is given
by
y
n+1
= y
n
+
1
6
(k
1
+ 2k
2
+ 2k
3
+ k
4
) ,
k
1
= hf(t
n
, y
n
) ,
k
2
= hf

t
n
+
h
2
, y
n
+
1
2
k
1

,
k
3
= hf

t
n
+
h
2
, y
n
+
1
2
k
2

,
k
4
= hf (t
n
+ h, y
n
+ k
3
) .
Although this method is a fourth order accurate approximation to the model ODE, it requires
four function evaluations at each time step. Again, there is never any free lunch!
Handout 11 27/08/02 1
Lecture 9: Numerical solution of boundary value
problems
Initial vs. boundary value problems
In lectures 7 and 8 we discussed numerical solution techniques for initial value problems.
Those concerned solutions of ordinary dierential equations of the form
dy
dt
= f (t, y) , (1)
where initial conditions were imposed at the same locations, most likely t = 0 in time, of
the form
y(0) = y
0
. (2)
That is, every initial value of the elements of y is specied at the same location in time.
An example of an initial value problem is given by the second order ODE
d
2
y
dt
2
+ g = 0 , (3)
with initial conditions y(0) = y
0
and y(0) = 0. This is written in vector form as
dy
dt
+f (t, y) = 0 , (4)
where
y =

y
1
y
2

y
y

, (5)
and
f (t, y) =

y
2
g

, (6)
with initial conditions
y(0) =

y
0
0

. (7)
The dierence between initial and boundary value problems is that rather than initial
conditions being imposed at the same point in the independent variable (in this case, t),
boundary conditions are imposed at dierent values of the independent variable. As an
example of a boundary value problem, consider the second order ODE
d
2
y
dx
2
+
2
y = 0 , (8)
with boundary conditions given by y(0) = 0 and y(1) = 1. This problem cannot be solved
using the methods we learned for the initial value problems because the two conditions
imposed on the problem are not at coincident locations of the independent variable x.
Handout 11 27/08/02 2
Boundary condition types
Dirichlet condition (Value specied)
When the value is specied at a particular location of the independent variable, this is known
as a Dirichlet boundary condition. Examples of a Dirichlet boundary condition are given by
y(0) = a , (9)
or
y(b) = 2 . (10)
Neumann condition (Derivative specied)
If the derivative is specied, then this is known as a Neumann boundary condition. Examples
of Neumann conditions are given by
y

(0) = 1 , (11)
and
y

(a) = b . (12)
Mixed condition (Gradient + value)
When the boundary condition species an equation that involves both a value and the
derivative, it is known as a mixed condition. Examples are given by
y

(a) + y(a) = 0 , (13)


and
y

(0) = 2y(0) . (14)


The shooting method
The shooting method uses the methods developed for solving initial value problems to solve
boundary value problems. The idea is to write the boundary value problem in vector form
and begin the solution at one end of the boundary value problem, and shoot to the other
end with an initial value solver until the boundary condition at the other end converges to
its correct value.
The vector form of the boundary value problem is written in the same way as it was for
the initial value problems, except all of the initial conditions are not known a-priori. As an
example, take the boundary value problem
d
2
y
dx
2
+
2
y = 0 , (15)
with boundary conditions y(0) = 0 and y(1) = 1. In vector form, this is given by
dy
dx
+f (x, y) = 0 , (16)
Handout 11 27/08/02 3
where
y =

y
1
y
2

y
y

, (17)
and
f (x, y) =

y
2

2
y
1

. (18)
All of the elements of the boundary condition vectors are not known initially, because certain
components will depend on the solution of the problem. Since we are only given y(0) and
y(1), then the boundary condition vectors are given by
y(0) =

0
?

, y(1) =

1
?

. (19)
We leave question marks in place of the unknown boundary conditions because they will only
be known when we actually solve the problem. In this case, we will only know the values of
y

(0) and y

(1) when we have the solution to the boundary value problem (15).
As another example, suppose we want to express the boundary value problem
y
xxxx
+ ay
xx
= 0 , (20)
with boundary conditions y(0) = 0, y

(0) = 1, y(1) = 0, and y

(0) = 1 in vector form.


Because this is a fourth order ODE, we know that it has four elements in the y vector, and
as a result, it has the four given boundary conditions. The y vector is given by
y =

y
1
y
2
y
3
y
4

y
y
x
y
xx
y
xxx

, (21)
and the boundary value problem is given by
dy
dx
+f (x, y) = 0 , (22)
where
f (x, y) =

y
2
y
3
y
4
ay
3

, (23)
with boundary conditions
f (0) =

0
1
?
?

, f (1) =

0
1
?
?

. (24)
Because we are only given four boundary conditions, the other values of the derivatives at
the boundary are determined after a solution of the problem is found.
The best way to illustrate the shooting method is with an example.
Handout 11 27/08/02 4
An example of the shooting method
Find the solution of the boundary value problem
d
2
y
dx
2
y = 0 , (25)
with boundary conditions y(0) = 0, y

(1) = 1.
1: Write the BVP in vector form
In order to solve this problem numerically, we write it in its vector form as
dy
dx
+f (x, y) = 0 , (26)
where
y =

y
1
y
2

y
y
x

, (27)
and
f (x, y) =

y
2
y
1

, (28)
with boundary conditions
y(0) =

0
?

, y(1) =

?
1

. (29)
2: Discretize
The problem is rst discretized into N points, the number of which depends on the desired
accuracy of the solution. We will use N = 20 for this example and assume that this yields
a converged result. The independent variable x is discretized with x
i
= ix, with x =
L/(N1), where L = 1 is the size of the domain. Sometimes we might need to discretize the
grid with an unequispaced grid if the terms in the boundary value problem vary considerably
in some locations of the domain in which we are solving the problem. Since this problem is
linear and behaves smoothly, we do not need to worry about this.
3: Choose an integrator
For this problem we will use the Euler predictor-corrector algorithm, which will give us values
for y
1
and y
2
in the domain if we give it starting values y
1
(0) and y
2
(0). But only y
1
(0) is
specied, so we need to iterate to determine y
2
(0).
4: Iterate to nd the solution
This is the trickiest part of the problem. Because the only boundary condition at x = 0 is
y
1
(0) = 0, then we need to guess the value of y
2
(0) and use the predictor-corrector algorithm
to shoot to the other end of the domain and see if this guess satises the boundary condition
Handout 11 27/08/02 5
Figure 1: Results of the shooting method with a guess of y
2
(0) = 1 which yields y
2
(1) =
1.542566.
y
2
(1) = 1. Lets say we guess a value of y
2
(0) = 1. The predictor-corrector algorithm will
yield the result shown in Figure 1. Because y
2
(1) = 1.542566 does not match the correct
value of y
2
(1) = 1 (which is specied as a boundary condition), then we need to try again.
Lets try y
2
(0) = 1. Using this guess, the predictor-corrector shooting method yields the
result shown in Figure 2. Again, this is the incorrect answer since a guess of y
2
(0) = 1
Figure 2: Results of the shooting method with a guess of y
2
(0) = 1 which yields y
2
(1) =
1.542566.
yields y
2
(1) = 1.542566.
The shooting method gives us a value for y
2
(1) when we are given a value for y
2
(0). That
is, if we guess the slope y
x
at x = 0, then the shooting method will give us a value for the
Handout 11 27/08/02 6
slope y
x
at x = 1, which is specied as a boundary condition in the problem as y

(1) = 1.
To solve the boundary value problem, we need to iterate with dierent values of y
2
(0) until
we converge upon the correct value of y
2
(1). This can be done with a root-nder such as the
bisection method, the secant method, or linear interpolation. The table below depicts the
results of the two previous guesses we used to solve the initial value problem. We can use
Guess number Guess for y
2
(0) Result of shooting method y
2
(1)
1 1.0 1.542566
2 -1.0 -1.542566
the Secant method to nd a good value for the next guess. If we let s be the guess for y
2
(0)
and E(s) = y
2
(1) y

(1) be the error in the result of the shooting method, then we need to
use the Secant method to nd the root of
E(s) = 0 . (30)
This is done by using the formula for the secant method, which is given by
s
3
= s
2
E(s
2
)

s
1
s
2
E(s
1
) E(s
2
)

. (31)
Using the results from the table above, we have
E(s
1
) = 1.542566 (1) = 2.542566 ,
E(s
2
) = 1.542566 (1) = 0.542566 ,
and
s
3
= 1.0 (0.542566)

1.0 (1.0)
2.542566 (0.542566)

= 0.648270 . (32)
If we use y
2
(0) = 0.648270, then the result is shown in Figure 3. As shown in the
gure, when we use a gues of y
2
(0) = 0.64827, we end up with a slope at x = 1 of
y
2
(1) = 0.999999, which is the exact value (or close enough)! The result in Figure 3 is
therefore the solution of the boundary value problem, which is y = sinh(x)/cosh(1). From
this we can see that the shooting method only requires us to shoot for the result three
times for linear boundary value problems. Two guesses are required, and then a linear
interpolation yields the solution to within the errors of the method used to integrate the
ODE. In this case, since the Euler predictor-corrector method is second-order accurate in
x, then we know that we must have the solution to the boundary value problem to within
O(x
2
).
Only three steps are required to nd the solution for linear problems, and the accuracy of
the result is governed by the accuracy of the shooting method used. For nonlinear problems,
however, more iterations are required, and one must continue to integrate until the residual
error in the root of E(s) is below some specied tolerance. If the tolerance is less than the
error of the shooting method, then the error in the solution of the boundary value problem
will be governed by the shooting method.
Handout 11 27/08/02 7
Figure 3: Results of the shooting method with a guess of y
2
(0) = 0.648270 which yields
y
2
(1) = 0.999999.
The nite-dierence method
The boundary value problem is given by
d
2
y
dx
2
y = 0 , (33)
with boundary conditions y(0) = 0, y

(1) = 1. In order to solve this boundary value


problem with the nite dierence method, the following steps should be taken.
1: Discretize x
The discretization of the boundary value problem for the nite-dierence method is done
dierently than for the shooting method. In order to guarantee second order accuracy of the
Neumann (derivative) boundary condition at x = 1 (and lead to a tridiagonal system as in
step 5), the grid must be staggered about that boundary. That is, the x values must lie
on either side of the point x = 1. In order to stagger the grid, a discretization of x with N
points must be given by
x
i
=

i
3
2

x , (Neumann boundary conditions) (34)


with x = 1/(N 2). This is the discretization we will use for the current problem, since
it has a Neumann boundary condition.
As an aside, if the problem only consists of Dirichlet boundary conditions, then it is better
to collocate the x values with the boundaries. In this case it is best to use the discretization
x
i
= ix , (Dirichlet boundary conditions) (35)
with x = 1/(N 1).
Handout 11 27/08/02 8
2: Discretize the governing ODE
The governing ODE for this problem can be discretized by rewriting it as a nite dierence
equation at each point x
i
, for which
d
2
y
dx
2

i
y
i
= 0 i = {2, . . . , N 1} . (36)
The second order accurate nite dierence approximation is then given by
y
i1
2y
i
+ y
i+1
x
2
y
i
= 0 , (37)
which can be rewritten as
a
i
y
i1
+ b
i
y
i
+ c
i
y
i+1
= d
i
, (38)
where
a
i
=
1
x
2
,
b
i
=

1 +
2
x
2

,
c
i
=
1
x
2
,
d
i
= 0 .
We have neglected the discretization error, keeping in mind that the discretization is second
order accurate in x, and we will assume that d
i
= 0 and that the coecients are not
constant with i to be as general as possible. These equations are only valid for i {2, . . . , N
1} since the discrete second derivative is not dened at i = 1 or i = N as we have written it.
3: Discretize the boundary conditions
Just as the governing ODE is discretized, so must the boundary conditions. The boundary
condition at x = 0 is given by y(0) = 0. Because the grid we are using is staggered, we
do not have values at x = 0, but rather, we have values at x
1
= x/2 and x
2
= +x/2.
Therefore, the value at x = 0 must be interpolated with the values at y
1
and y
2
. This is
given by a centered interpolation to obtain y
3/2
as
y
3/2
=
y
1
+ y
2
2
+O

x
2

= 0 . (39)
Solving for y
1
and neglecting the discretization error, we have
y
1
= y
2
. (40)
The boundary condition at x = 1 is discretized by writing the second-order accurate
approximation for the rst derivative at x = 1 to obtain
dy
dx

i=N1/2
=
y
N
y
N1
x
+O

x
2

= 1 . (41)
Leaving out the discretization error, we have
y
N
= y
N1
x . (42)
Handout 11 27/08/02 9
4: Embed the boundary conditions
The discretized ODE (38) is only valid for i {2, . . . , N 1}. Therefore, it can only be used
to solve for points in that range. Any terms in the discretized ODE that contain points not
in that range are removed by embedding the boundary conditions. If we write the discretized
ODE at i = 2 and i = N 2 we have
a
2
y
1
+ b
2
y
2
+ c
2
y
3
= d
2
,
a
N1
y
N2
+ b
N1
y
N1
+ c
N1
y
N1
= d
N1
.
(43)
From the boundary conditions, we know that
y
1
= y
2
,
y
N
= y
N1
x .
Substituting the boundary conditions into equations (43), we have
(b
2
a
2
)y
2
+ c
2
y
3
= d
2
,
a
N1
y
N2
+ (b
N1
c
N1
)y
N1
= d
N1
+ c
N1
x .
(44)
5: Set up the linear system
The discretized set of ODEs that govern the behavior of y
i
where i {2, . . . , N 1} is then
given by
i = 2 (b
2
a
2
)y
2
+ c
2
y
3
= d
2
,
i = {3, . . . N 2} a
i
y
ii
+ b
i
y
i
+ c
i
y
i+1
= d
i
i = N 1 a
N1
y
N2
+ (b
N1
c
N1
)y
N1
= d
N1
+ c
N1
x .
This represents a linear system of the form

b
2
c
2
a
3
b
3
c
3
a
4
b
4
c
4
.
.
.
.
.
.
.
.
.
a
N3
b
N3
c
N3
a
N2
b
N2
c
N2
a
N1
b
N1

y
2
y
3
y
4
.
.
.
y
N3
y
N2
y
N1

d
2
d
3
d
4
.
.
.
d
N3
d
N2
d
N1

,
where we have performed the replacements
b
2
b
2
a
2
,
b
N1
b
N1
+ c
N1
,
d
N1
d
N1
+ c
N1
x . (45)
Handout 11 27/08/02 10
6: Solve the linear system
The linear system derived in the previous step can be represented as
Ay = d. (46)
The objective is to now solve the system with
y = A
1
d. (47)
We can usually take advantage of the structure of A in order to speed up the calculation
of its inverse. In this case it turns out that A is a tridiagonal matrix. That is, it has three
diagonals, and as a result, it can be solved with the use of a tridiagonal solver.
The solution y then represents the solution of the boundary value problem we initially
set out to solve. Due to the accumulation of errors in the tridiagonal solver, this method
turns out to be rst-order accurate in x, as opposed to the second-order accurate shooting
method with the use of the Euler predictor-corrector method.
Handout 12 29/08/02 1
Lecture 10: Numerical solution of characteristic-value
problems
A characteristic-value problem
In lecture 9 we covered two methods on how to solve boundary value problems. Those
concerned the solution of ODEs with conditions imposed at dierent values of the indepen-
dent variable. Now we will cover the numerical solution of characteristic-value problems.
Characteristic-value problems are a subset of boundary value problems because the govern-
ing ODE is a boundary value problem, except in characteristic-value problems we are also
concerned with nding the characteristic-value, or eigenvalue, that governs the behavior of
the boundary value problem.
Consider the boundary value problem
d
2
y
dx
2
+ k
2
y = 0 , (1)
with boundary conditions y(0) = 0 and y(1) = 0. The value of k is not known a-priori, and
it represents the characteristic value, or eigenvalue, of the problem. This characteristic value
problem has an analytical solution, which is given by
y(x) = a cos(kx) + b sin(kx) . (2)
Substituting in the boundary conditions, we have
y(0) = a = 0 ,
y(1) = a cos(k) + b sin(k) = 0 .
The trivial solution is a = b = 0, but this is not a very useful result. A more useful result is
to set
sin(k) = 0 , (3)
which is satised when
k = n n = 1, 2, . . . . (4)
The general solution to the problem is then given by
y(x) = b sin(nx) n = 1, 2, . . . . (5)
These solutions are known as the eigenfunctions of the boundary value problem and k is
known as the eigenvalue. It is common to write the solution as
y
n
(x) = b sin(k
n
x) k
n
= n n = 1, 2, . . . , (6)
where y
n
(x) is referred to as the n
th
eigenfunction with corresponding eigenvalue k
n
.
In this lecture we will be concerned with solution methods for nding the eigenvalue of
the characteristic-value problem.
Handout 12 29/08/02 2
The characteristic-value problem in matrix form
Just as we did in lecture 9, we will discretize the characteristic-value problem using the
second order nite dierence representation of the second derivative to yield
y
i1
2y
i
+ y
i+1
x
2
+ k
2
y
i
= 0 , (7)
with y
1
= 0 and y
N
= 0 as the boundary conditions. If we discretize the domain with N = 5
points, then we have x = 1/(N 1) = 0.25, and the equations for y
i
, i = 2, . . . , 4 are given
by
16y
1
32y
2
+ 16y
3
+ k
2
y
2
= 0 ,
16y
2
32y
3
+ 16y
4
+ k
2
y
3
= 0 ,
16y
3
32y
4
+ 16y
5
+ k
2
y
4
= 0 ,
but since we know that y
1
= 0 and y
5
= 0, we have
32y
2
+ 16y
3
+ k
2
y
2
= 0 ,
16y
2
32y
3
+ 16y
4
+ k
2
y
3
= 0 ,
16y
3
32y
4
+ k
2
y
4
= 0 ,
This can be written in matrix-vector form as

32 16 0
16 32 16
0 16 32

y
2
y
3
y
4

y
2
y
3
y
4

0
0
0

, (8)
where = k
2
. If we let
A =

32 16 0
16 32 16
0 16 32

, (9)
then the problem can be written in the form
(AI)y = 0 , (10)
where
y =

y
2
y
3
y
4

, (11)
and the identity matrix is given by
I =

1 0 0
0 1 0
0 0 1

. (12)
Equation (10) is known as an eigenvalue problem, in which we must nd the values of
that satisfy the nontrivial solution for which y = 0. The nontrivial solution is determined
by nding the eigenvalues of A, which are given by a solution of
det(AI) = 0 . (13)
Handout 12 29/08/02 3
The eigenvalues are then given by a solution of the third-order polynomial
(32 )

(32 )
2
512

= 0 , (14)
whose roots are given by

1
= 9.37 ,

2
= 32.00 ,

3
= 54.63 ,
or, in terms of k =
1/2
,
k
1
= 3.06 ,
k
2
= 5.66 ,
k
3
= 7.39 .
These are close to the analytical values of
k
1
= = 3.14 ,
k
2
= 2 = 6.28 ,
k
3
= 3 = 9.42 ,
but because we only discretized the problem with N = 5 points, the errors are considerably
large. The errors can be reduced by using more points, but as N gets very large, it becomes
much too dicult to solve for the eigenvalues. Therefore, we need to come up with other
methods to solve for the eigenvalues and eigenvectors of A that are more ecient.
The power method
The power method computes the largest eigenvalue and its corresponding eigenvector in the
following manner:
Pseudocode for the power method
1. Choose a starting vector x = [1, 1, 1]
T
. Set the starting eigenvalue as = 1.
2. Approximate the eigenvector with x Ax. The approximate eigenvalue is

=
max(x).
3. Normalize with x x/

.
4. If |

| < then nished, x is the eigenvector and is its corresponding eigenvalue.


Otherwise set =

and return to step 2.


Handout 12 29/08/02 4
As an example, consider the matrix
A =

32 16 0
16 32 16
0 16 32

. (15)
If we start with x = [1, 1, 1]
T
, then the rst iteration will give us
x = Ax =

32 16 0
16 32 16
0 16 32

1
1
1

16
0
16

, (16)
which gives us our rst approximate eigenvalue as

= 16, and normalizes the eigenvector


as x = [1, 0, 1]
T
. Iterating again with this eigenvector, the second guess for the eigenvector
is given by
x = Ax =

32 16 0
16 32 16
0 16 32

1
0
1

32
32
32

, (17)
which gives us our second guess of the eigenvalue as

= 32, and normalizes the eigenvector


as x = [1, 1, 1]
T
. Subsequent iterations yield eigenvectors of
x =

48
64
48

53.33
74.67
53.33

54.40
76.80
54.40

, . . . , (18)
and we see that we are converging upon the correct eigenvalue of = 54.63 and the corre-
sponding eigenvector x = [1, 1.4142, 1]
T
.
We can also nd the smallest eigenvalue of A by using the power method with the inverse
A
1
. The advantage of the power method is that it is very simple and easy to program.
However, it has poor convergence characteristics and does not behave well for matrices with
repeated eigenvalues. There are a host of other methods available to compute the eigenvalues
of a matrix, but they are beyond the scope of this course. Both Octave and Matlab can be
used to compute the eigenvalues of a matrix with
>> [v,d]=eig(A)
where d is a diagonal matrix containing the eigenvalues of A and v is a matrix whose columns
are the eigenvectors of A.
Handout 13 03/09/02 1
Lecture 11: Numerical solution of stochastic dierential
equations I
Continuous random variables
In terms of its probability density function p(x), a continuous random variable X will lie in
the range [a, b] with a probability given by
P(a < X b) =

b
a
p(x) dx , (1)
where, by denition,

p(x) dx = 1 . (2)
In terms of its cumulative distribution function F(x), the random variable X will lie in some
range [a, b] with a probability given by
P(a < X b) = F(b) F(a) , (3)
The probability density function and the cumulative distribution function of a random vari-
able X are related with
p(x) =
d
dx
F(x) . (4)
From this we can see that

p(x) dx =

d
dx
F(x) dx ,

p(x) dx = F(a) lim


b
F(b) , (5)
and we can dene the cumulative distribution function as
F(a) =

p(x) dx . (6)
Examples
If X is uniformly distributed on [0, 1], then the probability density function (PDF) of X is
given by
p(x) =

1 0 x 1
0 otherwise
. (7)
As a check,

p(x) dx =

1
0
1 dx = 1 . (8)
Handout 13 03/09/02 2
the cumulative distribution function is then given by
F(a) =

0 a < 0
a 0 a < 1
1 1 a
(9)
If X is normally distributed with mean 0 and variance 1, that is, if X N(0; 1), then
the PDF of X is given by
p(x) =
1

2
e
x
2
/2
. (10)
The cumulative distribution function is then given by
F(a) =
1

e
x
2
/2
dx . (11)
Discrete random variables
If X is a discrete random variable, then it may only have a nite number of possible values
at locations x
i
, so that
p
i
= P(X = x
i
) , (12)
and, because X is normalized, we must have
N

i=1
p
i
= 1 , (13)
where N is the number of possible values of X. As an example, a two-point distribution of
X is given by
p
1
= P(X = 1) = 1/2 ,
p
2
= P(X = +1) = 1/2 .
This is normalized since p
1
+ p
2
= 1.
Moments of random variables
The p
th
moment of a continuous random variable X is given by
E(X
p
) =

x
p
p(x) dx , (14)
and the p
th
moment of a discrete random variable X is given by
E(X
p
) =
N

i=1
x
p
i
p
i
. (15)
The mean or expected value of a continuous random variable is its rst moment
= E(X) =

xp(x) dx , (16)
Handout 13 03/09/02 3
which is also referred to as expectation x. The p
th
centered moment of X is given by
E((X )
p
) =

(x )
p
p(x) dx . (17)
The rst centered moment is always 0 by denition, since
E((X )) =

(x )p(x) dx ,
=

xp(x) dx

p(x) dx ,
= 1 = 0 . (18)
The variance of a continuous random variable X is its second centered moment, and is
dened by
Var(X) = E((X )
2
) =
2
. (19)
Expanding the second centered moment yields
E((X )
2
) =

(x )
2
p(x) dx ,
=

(x
2
2x +
2
)p(x) dx ,
=

x
2
p(x) dx 2

xp(x) dx +
2

p(x) dx ,
= E(X
2
) 2
2

xp(x) dx +
2
,
= E(X
2
)
2
, (20)
so that
Var(X) = E((X )
2
) =
2
= E(X
2
)
2
. (21)
Examples
If X is uniformly distributed such that
p(x) =

1 0 x 1
0 otherwise
, (22)
then the expected value of X is given by
=

x 0 dx +

1
0
x 1 dx +


1
x 0 dx ,
=
x
2
2

1
0
,
=
1
2
. (23)
Handout 13 03/09/02 4
The variance of X is then given by
Var(X) = E(X
2
)
2
,
=

x
2
p(x) dx
2
,
=

x
2
0 dx +

1
0
x
2
1 dx +


1
x
2
0 dx
1
4
,
=
x
3
3

1
0

1
4
,
=
1
12
.
If X is normally distributed such that X N(; ), then the PDF is given by
p(x) =
1

2
2
e

(x)
2
2
2
. (24)
The expected value and variance are given by
E(X) = ,
Var(X) =
2
.
Two or more random variables
If X
1
and X
2
are continuously distributed random variables with a PDF p(x
1
, x
2
) then in
general,
E(X
1
+ X
2
) = E(X
1
) + E(X
2
) ,
Var(X
1
+ X
2
) = Var(X
1
) + Var(X
2
) .
The covariance function is given by
Cov(X
1
, X
2
) = E ((X
1

1
)(X
2

2
)) ,
=

(x
1

1
)(x
2

2
)p(x
1
, x
2
) dx
1
dx
2
,
=

(x
1
x
2

1
x
2

2
x
1
+
1

2
)p(x
1
, x
2
) dx
1
dx
2
,
=

x
1
x
2
dx
1
dx
2

x
2
p(x
1
, x
2
) dx
1
dx
2

x
1
p(x
1
, x
2
) dx
1
dx
2
+
1

p(x
1
, x
2
) dx
1
dx
2
,
= E(X
1
X
2
)
2

2
+
1

2
,
= E(X
1
X
2
)
1

2
.
If X
1
and X
2
are independant then we can write p(x
1
, x
2
) = p
1
(x
1
)p
2
(x
2
), and
E(X
1
X
2
) = E(X
1
)E(X
2
) ,
Var(X
1
+ X
2
) = Var(X
1
) + Var(X
2
) . (25)
Handout 14 05/09/02 1
Lecture 12: Numerical solution of stochastic dierential
equations II
Brownian Motion
The most fundamental example of a stochastic process is termed standard Brownian motion
or a standard Wiener process. Brownian motion was discovered by botanist Robert Brown
in 1827, when he observed the motion of a pollen grain as it moved randomly in a glass of
water. Because the water molecules collide with the pollen grain in a random fashion, the
pollen grain moves about randomly. The motion of the pollen grain is stochastic because
its position from one point in time to the next can only be dened in terms of a probability
density function.
Consider a pollen grain in a one-dimensional glass of water that is only allowed to move
in the vertical z-direction. At one point in time t
n
its position is given by z
n
, and at the
next point in time, say at t
n+1
, its position is given by z
n+1
. Depending on how many water
molecules collide with the pollen grain, it may move a very large amount, or it may not
move at all. It turns out that even though we cant determine how far the pollen grain
moves over a period of time, we can say that the distance it moves is normally distributed
with mean 0 and variance 1, as shown in Figure 1, where the units are arbitrary (they could
be micrometers, or 10
6
meters, in this example). If the distance it moves from time t
n
to
Figure 1: Depiction of the position z
n
and the likely next position z
n+1
of a pollen grain
undergoing Brownian motion, showing the normal distribution at t
n+1
.
t
n+1
is given by z
n
, then we can write
z
n+1
= z
n
+ z
n
. (1)
Handout 14 05/09/02 2
If the distance it moves is normally distributed with mean 0 and variance 1, then we can
write
z
n
N(0, 1) , (2)
and from this we know that
E(z
n
) = 0 ,
Var(z
n
) = E(z
2
n
)
2
= E(z
2
n
) = 1 .
Beginning with z
1
= 0, we can run a simulation of a pollen track by using the random number
generation capabilities of any programming language. In Matlab or Octave, for example, the
randn routine returns a normally distributed random number with mean 0 and variance 1.
Over 500 time steps, the particle track of a pollen grain is shown in Figure 2. If we repeat
Figure 2: Particle track z
n
of a pollen grain undergoing Brownian motion over 500 time steps
from t
1
= 0 to t
500
= 499.
the simulation for 100 particles that are all released at z = 0 and assume that they do not
interact with one another, then their resultant tracks are shown in Figure 3. From this
gure we can see that the particles are more likely to end up closer to where they started
from then very far away. To get a better idea of the development of the particle distributions,
we can run the simulation with 10 000 particles and plot probability density functions of the
particle distributions at dierent points in time, as shown in Figure 4. Rather than look
at the PDFs, we can simply look at the standard deviation of the particle distributions as a
function of time. Figure 5 depicts the standard deviation as a function of time as well as
the theoretical prediction =

t for 100 particles, and Figure 6 depicts those for 10 000


particles.
Handout 14 05/09/02 3
Figure 3: Particle tracks z
n
of 100 pollen grains undergoing Brownian motion over 500 time
steps from t
1
= 0 to t
500
= 499.
Figure 4: Probability density functions of the particle distributions at dierent points in
time for a total of 10 000 particles.
These gures show us that, in the limit of an innite number of particles, the standard
deviation of the distribution of particles is given by
(t) =

t . (3)
Handout 14 05/09/02 4
Figure 5: Standard deviation as a function of time for the particle distributions with 100
particles.
Figure 6: Standard deviation as a function of time for the particle distributions with 10 000
particles.
If we designate the start time of this simulation as t = s, then we know that the standard
deviation grows according to
(t) =

t s . (4)
Handout 14 05/09/02 5
We can see the behavior of the standard deviation with time by looking at the PDFs of
the particle distributions in Figure 4. Each PDF gets wider and shorter as the simulation
progresses, indicating that the PDFs are normal distributions centered with mean 0 and
variance
2
(t) = t s. The PDF can then be written as a function of time and space as
p(z, t) =
1

2
2
(t)
e

z
2
2
2
(t)
. (5)
In the previous simulations we used a time step of t = 1. After this amount of time
we assumed that the particle moved according to a normal distribution with mean 0 and
variance 1, which was represented with
z
n
N(0, 1) . (6)
Now suppose that the time step is much larger, and say that it is given by t = ts. Because
the standard deviation grows with time according to =

t s, then the likelihood of a


particle being farther away is higher because more time passed. This is accounted for by
writing
z
n

t sN(0, 1) , (7)
and as a result we know that
E(z
n
) = 0 ,
Var(z
n
) = E((

t sN(0, 1))
2
)
2
= (t s)E(N(0, 1)
2
) = t s .
We can then run a particle simulation with 1 time step to see what the distribution of
particles will be like at any point in time, since we can write the position z of a particle after
a certain amount of time t s as
z
1
= z
0
+

t sN(0, 1) . (8)
This forms the basis for the more formal denition of Brownian motion or a Weiner process.
Standard Brownian Motion or Standard Weiner Process
Standard Brownian motion or a standard Wiener process governs the behavior of the random
variable W(t) in continuous time 0 t T, that satises the following conditions (from
[1]):
1. W(0) = 0 with probability 1.
2. If 0 < s < t < T, then the random variable W = W(t)W(s) is normally distributed
with mean 0 and variance t s, and satises
W

t sN(0, 1) . (9)
3. If 0 < s < t < u < v < T and W
1
= W(t) W(s) and W
2
= W(v) W(u), then
W
1
and W
2
are independent.
Handout 14 05/09/02 6
The motion of the pollen grain in a glass of water can be written in this standard Brownian
notation by letting the position of the pollen grain at time t
n
be W
n
, and say that the pollen
grain will move according to
W
n+1
= W
n
+ W
n
, (10)
where, from the rst condition, we must have W
1
= 0, and the increment is given by
W
n
=

tN(0, 1) , (11)
where t = t
n+1
t
n
is the time increment.
Stochastic ODEs and Stochastic Calculus
Suppose we have an ODE given by
dx
dt
= f(x, t) + g(x, t)(t) , (12)
where (t) is some random perturbation. If we write it in dierential form, then we have
dx = f(x, t)dt + g(x, t)(t)dt . (13)
If we let (t)dt = dW(t), then we have the Stochastic dierential equation
dx = f(x, t)dt + g(x, t)dW(t) . (14)
In order to solve this equation, the standard method would be to integrate both sides to
obtain
x(t) = x(0) +

t
0
f(x(s), s)dx +

t
0
g(x(s), s)dW(s) . (15)
It turns out that if the integrands are non-deterministic, standard calculus does not apply.
In order to solve the integral, we must make a change of variables and write the stochastic
dierential equation (14) in terms of v(x). This is given by using the Taylor-series expansion
v(x + x) = v(x) + x
dv
dx
+
1
2
x
2
d
2
v
x
2
+O

x
3

. (16)
Then we can write the dierential for v(x) as
dv = lim
x>0
v(x + x) v(x) =
dv
dx
dx +
1
2
dx
2
d
2
v
dx
2
+O

dx
3

, (17)
which, when x is deterministic, simplies to
dv =
dv
dx
dx . (18)
However, for stochastic calculus, dx
2
does not approach 0 faster than dx, so equation (18)
is incorrect. Substituting in for dx from equation (14) we have
dv =
dv
dx
(f(x, t)dt + g(x, t)dW) +
1
2
d
2
v
dx
2

f
2
(x, t)dt
2
+ 2f(x, t)g(x, t)dWdt + g
2
(x, t)dW
2

.
(19)
Handout 14 05/09/02 7
We can neglect dt
2
and dWdt, but it turns out that dW
2
= dt and it cannot be neglected,
so the stochastic dierential equation for v is given by
dv =

f(x, t)
dv
dx
+
1
2
g
2
(x, t)
d
2
v
dx
2

dt + g(x, t)
dv
dx
dW . (20)
We can now integrate this equation using the methods of deterministic calculus to obtain
the solution for v(x) as
v(t) = v(0) +

t
0

f(x, s)
dv
dx
+
1
2
g
2
(x, s)
d
2
v
dx
2

ds +

t
0
g(x, s)
dv
dx
dW , (21)
where we have assumed that v(x) and v(t) are equivalent since v(x) = v(x(t)).
References
[1] D. J. Higham. An algorithmic introduction to numerical simulation of stochastic dier-
ential equations. SIAM Review, 43:525546, 2001.
Handout 15 12/09/02 1
Lecture 13: Numerical solution of stochastic dierential
equations III
Discrete stochastic integrals
We would like an approximate solution to the simplied stochastic dierential equation
dx(t) = W(t)dW(t) , (1)
where the behavior of W(t) is governed by the rules of Brownian motion. Integrating both
sides, we obtain
x(t) =

t
0
W(s)dW(s) , (2)
where we have assumed that x(0) = 0 for simplicity.
The It o integral
In the Ito integral, the integral (2) is approximated with the Riemann sum
x(t) =
N1

j=0
W
j
(W
j+1
W
j
) , (3)
where we have assumed that t = Nt, W
N
= W(t) and W
0
= W(0). This can be written as
x(t) =
1
2
N1

j=0
W
2
j+1
W
2
j

W
2
j+1
2W
j
W
j+1
+ W
2
j

,
=
1
2
N1

j=0
W
2
j+1
W
2
j
(W
j+1
W
j
)
2
,
=
1
2

(W
2
1
W
2
0
) + (W
2
2
W
2
1
) + . . . + (W
2
N1
W
2
N2
) + (W
2
N
W
2
N1
)

N1

j=0
(W
j+1
W
j
)
2
,
=
1
2

W
2
N
W
2
0

1
2
N1

j=0
(W
j+1
W
j
)
2
,
=
1
2

W
2
N
W
2
0

1
2
N1

j=0
W
2
j
.
The sum can be written as
N1

j=0
W
2
j
= N

1
N
N1

j=0
W
2
j

, (4)
Handout 15 12/09/02 2
which is just the discrete form of the variance of a random variable with zero mean,
Var(W) = E(W
2
) =
1
N
N1

j=0
W
2
j
. (5)
Since we know that W
j
= W
j+1
W
j
is normally distributed with mean 0 and variance
t because it governs the jump for Brownian motion, then, as N ,
1
N
N1

j=0
W
2
j
= t , (6)
so that the approximation to the integral (2) becomes
x(t) =
1
2

W
2
N
W
2
0

1
2
Nt . (7)
From the denition of Brownian motion, W
0
= W(0) = 0, and after substituting in for
W
N
= W(t) and Nt = t, we have
x(t) =
1
2
W(t)
2

1
2
t . (8)
The Stratonovich integral
Rather than approximating the integral at the left side of the interval, the Stratonovich
integral approximate the integral (2) with the midpoint rule
x(t) =
N1

j=0
W

t
j
+ t
j+1
2

(W
j+1
W
j
) . (9)
We will approximate the value of W(t
j+1/2
) with
W(t
j+1/2
) =
1
2
(W
j
+ W
j+1
) + C
j
, (10)
where C
j
must be determined so that the above approximation still satises the rules of
Brownian motion. If we let Z(t
j
) = W(t
j+1/2
) represent a random variable that must satisfy
the rules of Brownian motion, then the second rule of Brownian motion says that if Z(t
j
) and
Z(t
j
+ t) are both random variables, then the increment Z
j
= Z(t
j
+ t) Z(t
j
) must
have mean 0 and variance t, that is, E(Z
j
) = 0 and Var(Z
j
) = t. The increment is
given by
Z
j
=
1
2
[(W
j+1
+ W
j+2
) + 2C
j+1
(W
j
+ W
j+1
) 2C
j
] ,
=
1
2
(W
j+2
W
j
) + (C
j+1
C
j
) .
Since we know that W
j+1
= W
j
+ W
j
and W
j+2
= W
j+1
+ W
j+1
, then W
j+2
= W
j
+
W
j
+ W
j+1
, and the increment is given by
Z
j
=
1
2
(W
j
+ W
j+1
) + C
j
, (11)
Handout 15 12/09/02 3
where C
j
= C
j+1
C
j
. Taking the mean of Z
j
yields
E(Z
j
) = E

1
2
(W
j
+ W
j+1
) + C
j

,
=
1
2
[E(W
j
) + E(W
j+1
)] + E(C
j
) , (12)
but since E(W
j
) = 0 and E(W
j+1
) = 0, and we require that E(Z
j
) = 0, then we know
that the behavior of C
j
must satisfy E(C
j
) = E(C
j+1
) E(C
j
) = 0. This is most easily
satised by requiring that C
j
have a zero mean. Taking the variance of Z
j
yields
Var(Z
j
) = E(Z
2
j
) E(Z
j
)
2
, (13)
but since we require that E(Z
j
) = 0, then the variance is given by
Var(Z
j
) = E(Z
2
j
) . (14)
Substituting in the values from above we have
Var(Z
j
) = E

1
4
(W
j
+ W
j+1
+ 2C
j
)
2

,
=
1
4

(W
j
+ W
j+1
)
2

+ 4E (C
j
W
j
) + 4E (C
j
W
j+1
) + 4E

C
2
j

,
=
1
4

W
2
j

+ 2E (W
j
W
j+1
) + E

W
2
j+1

+4E (C
j
W
j
) + 4E (C
j
W
j+1
) + 4E

C
2
j

.
Because W
j
, W
j+1
, and C
j
are all independent of each other, then we know that
E(W
j
W
j+1
) = E(W
j
)E(W
j+1
) = 0 ,
E(C
j
W
j+1
) = E(C
j
)E(W
j+1
) = 0 ,
E(C
j
W
j
) = E(C
j
)E(W
j
) = 0 ,
so that we have
Var(Z
j
) =
1
4

E(W
2
j
) + E(W
2
j+1
)

+ E(C
2
j
) . (15)
Because W
j
and W
j+1
satisfy the rules for Brownian motion, and we want Z
j
to satisfy
the rules for Brownian motion, then we must have
Var(Z
j
) = t ,
E(W
2
j
) = t ,
E(W
2
j+1
) = t ,
so that we have
E(C
2
j
) =
t
2
. (16)
Handout 15 12/09/02 4
Substituting in for C
j
= C
j+1
C
j
, we have
E(C
2
j+1
) 2E(C
j+1
C
j
) + E(C
2
j
) =
t
2
. (17)
Since C
j+1
and C
j
are independent, then E(C
j+1
C
j
) = E(C
j+1
)E(C
j
) = 0 and we must have
E(C
2
j+1
) + E(C
2
j
) =
t
2
, (18)
which is satised when
E(C
2
j+1
) = E(C
2
j
) =
t
4
. (19)
If C
j
is a normal distribution with mean 0 and variance t/4, then W(t
j+1/2
) will satisfy
the rules for Brownian motion, and is given by
W(t
j+1/2
) =
1
2
(W
j
+ W
j+1
) + C
j
, (20)
where
C
j
N

0,
t
4

. (21)
Substitution into equation (9) yields
x(t) =
1
2
N1

j=0
(W
j
+ W
j+1
2C
j
) (W
j+1
W
j
) ,
=
1
2
N1

j=0

W
2
j+1
W
2
j

N1

j=0
C
j
(W
j+1
W
j
) ,
=
1
2

W
2
1
W
2
0

W
2
2
W
2
1

+ . . . +

W
2
N1
W
2
N2

W
2
N
W
2
N1

N1

j=0
C
j
(W
j+1
W
j
) ,
=
1
2

W
2
N
W
2
0

1
N
N1

j=0
C
j
W
j

.
The last term corresponds to an approximation of NE(C
j
W
j
), which, since C
j
and W
j
are independent, approaches 0 as N . Therefore, with W
0
= W(0) = 0 and W
N
= W(t),
the Stratonovich integral approximates the integral (2) with
x(t) =
1
2
W(t)
2
. (22)
Comparing the two methods, we see that, depending on where W(t) is evaluated when
approximating
x(t) =

t
0
W(s)dW(s) , (23)
the result can be very dierent, since
x(t) =
1
2
W(t)
2

t
2
Ito
x(t) =
1
2
W(t)
2
Stratonovich
. (24)
Handout 15 12/09/02 5
Numerical discretization techniques
From lecture 12, the model stochastic dierential equation is given by
dx(t) = f(x, t)dt + g(x, t)dW(t) , (25)
where dW(t) represents a random variable in continuous time that satises the rules of
Brownian motion and f(x, t) and g(x, t) are deterministic functions. If we integrate both
sides of equation (25) from t
n
to t
n+1
then we have
x
n+1
= x
n
+

t
n+1
t
n
f(x, t)dt +

t
n+1
t
n
g(x, t)dW(t) , (26)
where x
n+1
= x(t
n+1
) and x
n
= x(t
n
). In the next sections we use dierent methods to
approximate the deterministic and stochastic integrals on the right hand side (for details see
[1]).
Strong and weak convergence
A discretization of a stochastic dierential equation governing the behavior of the random
variable x(t) is said to have strong order of convergence n if we can dene a constant k such
that
E |x
j
x(jt)| kt
n
, (27)
where x
j
is the discrete solution and x(jt) is the exact solution. Strong convergence
therefore depends on the expected value of the error of the solution at some point in time.
A discretization of a stochastic dierential equation is said to have weak order convergence
n if we can dene a constant k such that
|E(x
j
) E(x(jt))| kt
n
. (28)
Weak convergence is not as strict as strong convergence because it is a function of the error
of the mean rather than the mean of the error.
The Euler-Maruyama method
In the Euler-Maruyama method, the deterministic integral in equation (26) is approximated
with the rectangular rule and the Ito rule is used to compute the stochastic integral to yield
x
n+1
= x
n
+ tf
n
+ g
n
W
n
, (29)
where f
n
= f(x
n
, t
n
), g
n
= g(x
n
, t
n
), W
n
= W(t
n+1
) W(t
n
). We can see that this method
reduces to the forward Euler method for deterministic ODEs. This method has a strong
order of convergence of n = 1/2 and a weak order of convergence of n = 1.
Handout 15 12/09/02 6
Milsteins higher order method
The Euler-Maruyama method can be made to converge strongly to rst order by keeping
higher order terms in the stochastic integral in equation (25). Using this method, the dis-
cretized stochastic dierential equation becomes
x
n+1
= x
n
+ tf
n
+ g
n
W
n
+
1
2
g
n
g

W
2
n
t

, (30)
where f
n
= f(x
n
, t
n
), g
n
= g(x
n
, t
n
), g

n
= dg/dx, and W
n
= W(t
n+1
) W(t
n
).
References
[1] D. J. Higham. An algorithmic introduction to numerical simulation of stochastic dier-
ential equations. SIAM Review, 43:525546, 2001.

You might also like