Professional Documents
Culture Documents
1. When something is unclear, ask me, not you neighbor, who is busy himself. Ask
as much questions as you need.
x + !2 "#
x$ +02 x = 0
Viscous damping
% & '
x(t) = A exp(t) exp i 02 2t
Fluid mechanics: Flow around a sphere with increasing Reynolds number/ flow speed:
Analytical solutions exist only for the Stokes flow problem.
Stokes Vortices Vortex Street Turbulence
0.3 UNIX-Workstations
The MAC-computers in the terminal-room cannot be used remotely. If students want
to login remotely from outside the terminal-room, the access is possible to the SUN-
cluster which has the name sun.edu.cc.uec.ac.jp, which consists neither of MACs
nor PCs, but UNIX-Workstations. The login (also possible from the MAC-computers)
has to be via the secure shell and login by setting the X-terminal is possible as
ssh -X sun.edu.cc.uec.ac.jp
or
ssh -Y sun.edu.cc.uec.ac.jp
(the option -Y or -X depends on the version of the operating system one is logging
on from). UNIX was originally written to be used from a commando-prompt window,
not from GUI/Window systems. It is advisable to be able to use the original UNIX-
commands. If you like UNIX and would like a UNIX-like environment on your PC,
install the free CYGWIN-package, www.cygwin.com.
Some survival-UNIX-commands:
0.3. UNIX-WORKSTATIONS 5
mkdir ml
cd ml
6
0.4 MATLAB
0.4.1 Introduction: Interpreters and Compilers
In general, for the programming projects with high numerical complexity, it will be
the best to develop the algorithms in MATLAB. MATLAB is, like BASIC or symbolic
computer languages like MAPLE, MATHEMATICA, MACSYMA and REDUCE, an
interpreter language, i.e. the language commands are translated into processor instruc-
tions. Nevertheless, MATLAB is not a symbolic language, but performs all calculations
numerically, i.e. with floating point numbers.1 The language can be used either from
a command prompt or as a functional (or object-oriented) programming language. In
compiler languages like FORTRAN, C or PASCAL, the program is fully translated
into processor instructions before execution. If errors occur at runtime, the memory
contents is difficult to analyze, usually only with the help of a debugger, which may
alter the program execution and memory layout up to the point that some errors can-
not reproduced. The debuggers properties vary much more than the language itself. In
MATLAB, after a program crash, the data are still accessible in MATLABs memory
and can be analyzed using the commands from the MATLAB-language itself.
Interpreters allow fast program development. As a rule, their execution times are
higher than those of compiler languages, but during program development, usually the
compile time is more costful than the actual runtime. In MATLAB, when complex
builtin functions are initialized via small commands, like a matrix inversion, very often
the advantage in speed for the compiler languages is negligible.
Many programming languages have a whole zoo of data types. MATLABs elementary
elementary data type is the complex matrix. (Recently, MATLAB also offers more
kinds of data types, but we will not use them in this course). Variables can be pro-
cessed up to the point where they take a complex value. Variables which are used as
indices must nevertheless have an integer value.
Because it is not possible to declare variables in MATLAB, is refuses to process vari-
ables which are not initialized. In FORTRAN77, for example, it was possible to use
variables which were neither declared nor initialized, and which assumed the value 0
at the moment they were used.
matlab
at the command prompt, which starts the MATLAB-desktop. If you are busy and you
dont want to see the splash-screen (MATLAB-Commercial) at the program start, use
matlab -nosplash
1
The symbolic package available with MATLAB is basically MAPLE with a MATLAB-Interface.
0.4. MATLAB 7
Basic Commands:
edit starts the MATLAB-Editor with Syntax-highlighting of MATLAB-
commands. You can use any editor you like to write MATLAB-files,
but the line-end may vary between operating systems and may lead
to trouble
clear empties the memory
clear a clear the variable a from the memory
who displays the variables which have been assigned
help gives help concerning a specific topic
help help tells you how to use the help function
lookfor looks for a word in the help files, useful if you are looking for a com-
mand according to context, but are not sure about the command
name
disp(a) displays the value of the variable a
disp(a) displays just the string, a.
rand random number generator, will be used a lot to initialize data
format format the output, format compact suppresses output of empty lines,
format short forces the rounding of the output to eight digits, but
the computations are still performed with full precision
% comment sign
ls displays the current working directory of MATLAB, i.e. the directory
for which MATLAB can access the files directly
cd changes the current working directory of MATLAB
2
To get an idea why the JAVA-Interface of MATLAB crashes so often, see the internal memo from
SUN from http://www.internalmemos.com/memos/memodetails.php?memo id=1321
8
Special Characters:
! escape sequence, allows to use UNIX-Commands like cd, pwd from
the MATLAB-prompt
[...] 1. vector brackets referring to the value of the entry, [1 2 3] is a
vector with the entries 1, 2 and 3.
2. brackets referring to the output arguments of functions.
(...) 1. Brackets referring to the indices of a vector, a(3) is the third
element of the vector a
2. brackets referring to the input arguments of a function.
... three dots mark the end of a line which is continued in the next line
; has no syntactical function like in C but is only used to suppress the
output of the operation
i, j stands for the complex increment 1, but can also be overwritten
for other uses.
pi is indeed 3.1415....
, divide commands, when several command lines should be written in
the same editor line
: divide loop variables, lower_bound:stepwidth:upper_bound,
WARNING !
lower_bound,stepwidth,upper_bound only displays the variables
lower_bound, stepwidth, upper_bound
Control statements are usually terminated via the end command, no matter whether
it is an if statement or a for loop:
a=2
b=3
for i=1:10 if (a>b)
i disp(a>b)
end else
disp(a<=b)
end
0.4. MATLAB 9
MATLAB was started by Cleve Moler, a famous researcher in numerical linear algebra,
as a MATRIX LABORATORY for his students, which should allow fast, save and easy
development of algorithms for numerical matrix analysis.
MATLAB has evolved to a general purpose language which specialized applications in
many fields. Many books in the meantime use MATLAB either as a formal language of
for the programming examples, have a look at http://www.mathworks.com/support/
books/index.jsp.
Matrix Syntax:
* multiplies two matrices according to the con-
ventions of inner/outer/matrix product
.* multiplies two matrices elementwise
a(2:4) elements of vector a from the second to the
fourth element
end the last element in a row/column of a vec-
tor/matrix
a(2:end) elements of vector a from the second to the last
element
b=c(2:3,2:6) assign b the values in the matrix c from line 2
and 3 from row 2 to 6
With the matrix syntax and the proper use of brackets, many operations can be sim-
plified without the use of loops:
for i=1:20 a=[0.5:0.5:10]
a(i)=i/2 or
end a=linspace(0.5,10,20)
Many functions either operate on vectors and matrices elementwise, or they are matrix
function in the sense that the operations are performed as matrix functions.
Matrix/Vector Functions:
length give the longest dimension of a matrix, or the length of a vector
size gives the dimensions of a matrix
linspace(a,b,m) make a vector with entries in m equidistant intervals between a and b
rand(n,m) set up a random matrix with n lines and m rows
exp exponential function, works elementwise on a matrix
expm matrix exponential function, works on the eigenvalue of a matrix and
can only be used for square matrices
eig eigenvalue decomposition
inv matrix inversion
norm matrix/vector norm
det determinant
svd singular value decomposition
10
function [out_arg1,out_arg2,arg3]=my_funtion(in_arg1,in_arg2,arg3)
% function [out_arg1,out_arg2,arg3]=my_function(in_arg1,in_arg2,arg3)
% The first comment after the function declaration is
% displayed if "help my_function" is typed to write
% self-documenting functions
........
return
It is advisable always to end a function with a return statement, and also the main
program.
For input-functions, MATLAB-functions use call by value, which means that the
input-arguments (in round brackets) cannot be modified in the functions. Only the
output-arguments (in []-brackets) can be modified by the called function. If an argu-
ment is to be used as an input-argument and an output-argument, it must appear in
the round brackets and in the []-brackets, like arg3 in the above example.
Global variables can be defined with the statement global in a similar way as variables
are declared in other programming languages. The same declaration must then be used
in the functions which use the variable. Global variables can be also overwritten in the
functions, they are call by reference variables.
FORTRAN uses call by reference for all input variables of subroutines. C uses call by
value for scalars and call by reference for arrays, so that a pointer to a variable must
be used if a scalar is to be modified in the functions.
Functions can be overloaded for different numbers of input parameters and for scalar
and matrix functions. If the operations used in the function allow an interpretation in
the matrix-sense, the function can automatically used for functions.
0.4. MATLAB 11
Exercises
1. Set up a vector with the entries (1, 2, 3, 4, . . . n) once using a for-loop, the second
time using an implicit loop.
2. Multiply every second element with a constant a, once using a for-loop, once
using an implicit loop.
3. Write a program which tests which finds out which elements of a vector are even
4. See what happens if you set up ones(L) , ones(L,1), ones(1,L), and what
happens when you try to multiply these objects with each other.
clear
step=2
upper_bound=10
for i=1,step,upperbound
disp(i)
end
return
b) What does the program really do? c) How do you have to rewrite the program
so that the program does what you expected it to do in a)
9. Use the help-function of MATLAB to find out the relation between the built-in
function gamma and the factorial.
Chapter 1
In this chapter, I will discuss the basics of programming style for numerical computing.1
Everything seems to be a matter of course, and during several courses, some students
who considered themselves experienced programmers skipped these lessons. Usually,
after 2 weeks homework, they ran into exactly those pitfalls, problems and errors as I
discussed in these pages, and usually wasted several hours which could have been spent
productively. My usual comment was: We had this two weeks ago when you didnt
attend . . .
1
I will use the terminology Computational Physics, Computational Engineering, Scientific
Computing, Scientific Programming pretty much as synonyms. Numerical methods, numerical
mathematics, numerical algorithms I will use when I want to emphasize mathematical techniques
to handle floating point computations, minimize roundoff-errors, control discretization errors etc.
Numerical physics i will use if i want to emphasize that the techniques for the computational
physics require an understanding of the floating point computations involved.
14 CHAPTER 1. HOW TO WRITE BETTER PROGRAMS
should be avoided. If one wants to express the order of the derivation, e.g. time
derivation of a coordinate (Gear Predictor corrector method uses up to the 5th time
derivative ....), is may be good practice to use
x_0, x_1, x_2, x_4 . . . . .
for the coordinate, first derivative, second and so on. A convention like
dx_f, dy_f, dx2_f, dx_dy_f .....
is also not a bad idea.
The Fortran77-standard allowed only variable (and subroutine) names of six charac-
ters, so once I spend a happy week in 1989 rewriting my longer into shorter variable
names. That was before the coming of the Fortran90-standard, nowadays, all Fortran-
Compilers accept longer variable names, but you may come across programs in the old
convention. I am not sure about the variable-lengths for C++-compilers, but be aware
that internally the compiler will expand internally the variable name variablename in
structure structurename in the object objectname to something like
objectname_structurename_variablename
and when these names become too long, this may also cause trouble. A colleagues
program once refused to compile because the internal name representation was longer
than 256 characters, and also debugging tools have problems if e.g. subroutine names
in objects or modules are becoming too long. As far as too long variable names are
concerned, one may run into similar Problems with new C++ Compilers as one did
with Fortran77 Compilers decades ago.
Be aware that similar variable names can be easily confused, especially if they make
use of uppercase/lowercase letters and the underscore, like
Variablename, variablename, Variable_name
so the use of all three in the same program will certainly cause problems. It is a
good convention to use variable names which sound differently. At one point in ones
programming career, one should decide whether to write composite variable names
with an underscore, Variable_name, or not, as Variablename, or as VariableName.
More considerations about the convention and choices for Variable names can be found
in Code Complete.
It is practical to reserve i,j,k for loop variables for short loops, which increases read-
ability, especially if the original mathematical formulae e.g. for vector operations use
1.1. PROGRAMMING STYLE 15
for i_particles=1:n_particle
m_particle(i_particle)=r_particle^2*pi
end
#define _ -F<00||--F-OO--;
int F=00,OO=00;
main(){F_OO();printf("%1.3f\n",4.*-F/OO/OO);}F_OO()
{
_-_-_-_
2
Further Information on how to write good Programs can be found in: Code Complete, SteveMc-
Connel, Microsoft Press, Paperback 1993, also available in Japanese
3
Homepage at http://www.ioccc.org/
16 CHAPTER 1. HOW TO WRITE BETTER PROGRAMS
_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_-_-_-_-_
_-_-_-_-_-_-_-_
_-_-_-_
}
As the purpose of science is clarity, the purpose is writing scientific code is also clarity and
readability. Unreadable code is code which is hard to debug, and errors in scientific computing
are much more difficult to detect if you get the second digit in your as in the case of commer-
cial software, where you can always tell by messages like segmentation violation. Moreover,
commercial software vendors can make money by selling software updates, whereas in scien-
tific computing, people who wrote buggy code will have trouble in their career. Unreadable
code is not the fault of the programming language, though some programming languages
attract chaotic programmers more than others. The advantage of restrictive programming
languages like ADA is, that you cannot make certain classed of errors.
What is in a line
Be aware that identical operations in a computer are not speeded up by cramming everything
in the same line,
a=2*b
a=a^2
a=a/c+d
will take the same computer time as
a=((2*b)^2)/c+d
what is more readable, depends on the implemented formulae. There are tricks called per-
formance optimization which actually allow faster program execution due to the style the
code is written, but this has nothing to do with cramming many commands in a single line,
but this can only discussed in a later chapter.
Coherence
If you are not sure which lines in the code should be grouped together, it is best to stick
to the concept of coherence, writing operations in consecutive lines which affect the same
variables. Instead of
1.1. PROGRAMMING STYLE 17
a1=b1*c1
a2=b2+c2
a3=b3/c3
a1=a1-d1/e1
a2=(a1+a2)/2
a3=a3*a2
it is better to write
a1=b1*c1
a1=a1-d1/e1
a2=b2+c2
a2=(a1+a2)/2
a3=b3/c3
a3=a3*a2
Once I had to find the error in the program of a student. The result was correct, except that
it was 10 orders of magnitude wrong. He should have divided the result by a timestep dt.
The student knew that if one has to do many divisions by the same numberdt, it is fast to
compute the inverse i_dt=1/dt and multiply with i_dt. And he thought that he could save
programming time by not defining a new variable i_dt, so his program looked like
dt=10^-5
dt=1/dt
............
(one page of code
............
result=preliminary_result/dt
A perfect interaction of a stupid choice of variable names (the name of the variable at the
end did not match the meaning), a code which was longer than one page, so one could not
read it in a single window, and an incoherent way of using the variable dt.
input(mass)
if (mass<-0)
error(mass should be larger than 0)
endif
For general software, it may be a good idea to define a default value. For most numerical
applications (except for accuracy thresholds), specifying a default input may be a very bad
idea.
(~a<b*c&d==0)
1.3.2 Comments
Usually, every line which contains information which is not self-explaining, like
volume=lx*lx*lz
mass=volume*rho
should better be documented. Of course, the amount of comments necessary grows with the
number of people who are supposed to use the code, with the number of functions and lines
in the code and with the complexity. If you are not sure who will use the code, then better
write your program documentation in English. It is generally a good Idea to formalize ones
documentation, especially at the beginning of functions/subroutines:
%PURPOSE: What the program is supposed to do
%USAGE: When and how the program
%AUTHOR: Who wrote the program
%DATE: Date when the program was written
%ALGORITHM: If the algorithm used is more complicated
% than what you can document in the body of the subroutine, you
% better explain the algorithm here
%LITERATURE: If you have used a complicated algorithm e.g. for
% matrix inversion etc, write from which book or article the algorithm
% comes, usually you have also used the naming conventions, and anybody
% who wants to understand the algorithm (maybe you after ten years) better
% reads the literature first.
%CAVEATS: If you have programmed
%TODO: How to improve the algorithm the next time you have time
%REVISION HISTORY: Write the date when you modified the algorithm,
20 CHAPTER 1. HOW TO WRITE BETTER PROGRAMS
This above example is easy to maintain, to modify or add to. What is not easy to
maintain, would be something like
% PURPOSE:
% +-------------+
and so on and so on. The simpler you design your comments, the more likely it is that
you really write them in the way they should be written. If any of the above points, leave
them away. If the routine is complete and runs as it should, dont write an empty TODO
point. If your routine-name is my_asin, (My arcus-sinus), then you dont have to do much in
principle. But if the routine actually computes the sinus in a non-standard-way by polynomial
approximation, you better write where you have it from in the literature. If the routine is
vectorized, this should be stated in the PURPOSE. If the vectorization works only if a vectorized
division is availabe, this should be written in the caveat. If you write a routine for the first
time, you dont have to write a REVISIONHISTORY, the date is enough.
And when you change the routine, also change the comments! Nothing is more confusing
than working with a correct routine for my_sinus which calculates a cosinus.
Exercises
1. Check the MATLAB-programs you wrote up to now whether they are in accordance
with the above ideas
2. Write a program which creates a matrix where the first column contains equally spaced
x-values between -5 and 5, and the second column contains the values of the second-
order polynom y = ax2 + bx + c
3. Write a program which makes creates a matrix where the first column contains equally
spaced x-values between -5 and 5, and the second column contains the values of the
function y = 1/(1 + x)
4. Write a program which can dectect whether the result of an mathematical computation
has complex parts
Chapter 2
Stochastic methods I
Stochastic methods use concepts from probability theory. Knowledge about stochastic meth-
ods is important in every field of science and engineering, because each data series contains
a certain element of chance or a certain scattering of the data.
This program using the function rand for equally distributed random numbers gives the
following output:
>> showrand
ans =
0.9501
a =
0.2311 0.6068 0.4860 0.8913
b =
0.7621 0.4447 0.7382 0.9169
0.4565 0.6154 0.1763 0.4103
0.0185 0.7919 0.4057 0.8936
0.8214 0.9218 0.9355 0.0579
22 CHAPTER 2. STOCHASTIC METHODS I
the mean of the squares of the differences between the respective samples and their mean.
The square root of the variance is called the standard deviation.
Exercise: Calculate by hand the theoretical mean and variance of the for random numbers
equally distributed between 0 and 1.
Another random number generator in MATLAB is randn, which creates random numbers
according to the Gauss distribution
) *
1 (x xm )2
G(x) = exp ,
2 2 2
and the normally distributed random numbers from randn have mean xm and standard
deviation = 1.
Exercise 2 : Estimate the error-dependence in a statistical sequence of random numbers
from the number of random numbers used by comparing the theoretical variance for the
randn-random number generator with the actually measured variance.
2
clear
format compact
1.5
format short
a=rand(1,4)
hist(a) 1
d=rand(1,50000);
2000
subplot(3.9,2,5), hist(d)
set(gca,Xticklabel,) 0
title(50000 random numbers,10 bins) 50000 numbers,50 bins
axis tight
400
subplot(3.9,2,7), hist(d,100)
title(50000 numbers,50 bins) 200
axis tight
0
0.2 0.4 0.6 0.8
Exercise 3: Estimate the dependence of the statistical fluctuations, i.e. dependence in the
differences in the number of entries in the histogram on the number of entries.
A basic test for random numbers is whether the random numbers in a bin are the same
within the statistical fluctuations Much more sophisticated tests for Random Numbers can
be found in Knuth1 . Nevertheless, to evaluate the usability of a random number algorithm
for a given problem, one should not rely on theoretically available algorithms, but one should
1
Donald Knuth, The Art of Computer Programming, Addison-Wesley 1998
24 CHAPTER 2. STOCHASTIC METHODS I
test the algorithm for a problem with an unknown solution with a related problem for which
one knows the solution. Another visual way of controlling random number sequences is to
plot one sequence as the x- and the other as the y-coordinate:
clear
format compact
n_rn=100; 0.8
a=rand(n_rn,1);
b=rand(n_rn,1); 0.6
plot(a,b,.)
axis image 0.4
where X should be a numerical value. For general random number generators, very often
prime numbers have to be used as seed, so always read the documentation first.
Before random numbers could be easily and fast generated with computer algorithms, math-
ematicians used tabulated random numbers2 , similar as values for integrals are still used
today. Some of these random number tables had been compiled using Roulette Results from
the Casino of Monte Carlo in Monaco, and so Monte Carlo Methods got their name. In re-
cent years, in Computer Science it has become fashionable to name some methods Las Vegas
methods instead of Monte Carlo methods, but the difference is purely academic.
See E.g. Random numbers in uniform and normal distribution : with indices for subsets / compiled
2
SA max (Si )
PAB = .
SB SA + SB
In the following program, the game strength (team_quality) is the same for each team,
nevertheless you will find that usually one team wins. In the run which is depicted behind
the listing, the percentage of wins for all the teams are plotted. One can see that in the
beginning leading team 2 finishes as the last team, whereas the also leading team 6
wins the championship. In real life, sports reporters waste a lot of time and energy
on explaining such developments, but in out simulation, we can see that such such narrow
outcomes are just a result of chance. For stock exchange fluctuations, the same reasoning
applies.
for i=1:n_team
team_quality(i)=11
end
n_games_played(1:n_team)=0
n_games_won(1:n_team)=0
for i_game=1:n_game
for i_team=1:n_team
for j_team=i_team+1:n_team
n_games_played(i_team)=n_games_played(i_team)+1;
n_games_played(j_team)=n_games_played(j_team)+1;
win_probability=...
...% relative probability
(team_quality(i_team)/team_quality(j_team))*...
...% normalization
max(team_quality)/(team_quality(i_team)+team_quality(j_team));
plot(normalization,score(:,1)./normalization,--,...
normalization,score(:,2)./normalization,.,...
normalization,score(:,3)./normalization,+,...
normalization,score(:,4)./normalization,-.,...
normalization,score(:,5)./normalization,:,...
normalization,score(:,6)./normalization,-)
legend(team 1,team 2,team 3,team 4,team 5,team 6)
2.1. RANDOM NUMBER GENERATORS 27
0.8
0.6
0.4
team 1
0.2 team 2
team 3
team 4
0 team 5
0 10 20 30 40 50 60 70 80 90 100
team 6
Exercise: Modify the quality and see how the winning probability changes. Find out how
strong you have to modify the winning probability so that it wins in all test runs.
Numerical Analysis I
hexadecimal to decimal or whatever, you can always use the MATLAB-functions dec2hex,
hex2dec, dec2bin and bin2dex. 2, pronounced as to, like in decimal to hexadecimal.
The same naming logic is applied in num2str, conversion from numeric to string.
The difference between one integer and the next largest representable integer is always one,
and integers in different representations are always the same integers.
Integers in FORTRAN are also sometimes declared as INTEGER*4, because 4 Byte=4 8
Bit=32 digits are used to represent these standard integers. As one bit is reserved for the
sign of the integer, largest representable integer is something like 231 1, the smallest 231 +1.
As extensions to Standard -FORTRAN, there exist in some compilers also the INTEGER*8
type (8-Bit-integers, from 263 + 1 to 263 1) and the INTEGER*2 type (2-Bit-integer).
INTEGER*8 is convenient when large Integer-values have to expressed without the rounding
occurring in Floating point computations, whereas INTEGER*2 is convenient if large arrays
30 CHAPTER 3. NUMERICAL ANALYSIS I
of integers must be stored where the integers can only take very few values. The danger of
using the non-standard integer-types is that if one changes the compiler (or the computer
one works on), these data-types may not be available any more, and one has to rewrite the
whole program.
The C/C++-standards do not define the absolute accuracy of their data-types, but provides
the type int and longint, where the longint has possibly the larger number of digits (but may
have the same number as short int). Additionally, there is the unsigned-data-type, which
allows to represent a largest number in signed which is twice as large as in the signed data
type.
program test_implicit
implicit none
write(*,*) 3/7 ! = 0 Integer-division
write(*,*) 3./7. ! = 0.428571 REAL*4-Division
write(*,*) 3.d0/7.d0 ! = 0.428571428571429 REAL*8 Division
stop
end
3.2. DATA TYPES: FLOATING POINT NUMBERS 31
3.2.1 Error
For the following sections, it will be convenient
to define the numerical error of an opera- exact A B = C
tion, the difference between the outcome of an numerical A B = C
exact operation using real numbers and the absolute error #absolute = |C C|
numerical operation using numbers as they relative error #relative = |C C|/|C|
are stored
in the computer. With respect to representing mathematical real numbers, e.g. multiples of
5./9., on the computer, integers have a constant absolute error, on average the error is of the
order of 1, whereas for floating point numbers have constant relative error as can be seen
in the following table.
Floating point number Integer
Operation Error Operation Error
50./9.=5.555555555555556 < O(1014 ) 50/9=5 < O(1)
500./9.=55.55555555555556 < O(1013 ) 500/9=55 < O(1)
relative error constant absolute error constant
3.2.2 Usage
They are the only numbers on a computer which which fast, numerical computations are
possible over a large range of possible values. Floating Point Operations, FLOPS, are usually
given as the benchmarks for computers, and currently the fastest computer in the World,
the Earth Simulator near Yokohama, can do about 40 Tera-Flops. The precision of the
declared variables is usually expressed in the declaration statement: In the FORTRAN77
standard, REAL*4/REAL*8 (or DOUBLE PRECISION) expressed that 4/8 Byte were used
to represent and the data.
3.2.3 Data-Layout
In floating point numbers, Mantissa and Exponent are stored in such a way that the number
is represented as the sum of powers of the base , precision t and lower and upper bounds
for the exponents e, L e U. A floating point number xcan then be represented as
% '
d1 d2 dt
x= + 2 + ... t e
with
0 di 1, (i = 1, . . . , t)
The usual real numbers in a higher programming like C or FORTRAN language have the
following characteristics:
Kind Byte/Bit mantissa/exponent Range valid digits
Real 4/32 23/8 8.431037 3.371038 6-7
Double 8/64 52/11 4.1910 307 1.6710 308 15-16
32 CHAPTER 3. NUMERICAL ANALYSIS I
3.2.4 Example
The above representation does not give equidistant numbers, as can be seen if the distribution
of numbers for is plotted for = 2, 1 e 2, t = 3:
-4 -3 -2 -1 0 1 2 3 4
As can be seen by the above graph, floating point numbers have as many numbers between 1
to 10 as between 10 to 100, whereas integers and fixed point numbers have as many numbers
in the interval from 0 to 1 as from 1 to 2. In other words, if numbers are rounded to fixed
point numbers, there is a constant absolute error over the whole range of numbers, whereas
for floating point numbers, there is a constant relative error over the whole range of available
numbers.
The builtin-function in MATLAB to find out which is the largest relative space between
successive floating-point numbers is eps. This function depends on the implementation of
MATLAB as well as on the hardware and will give different results on different processors. If
you use a computer language other than MATLAB of FORTRAN90, where these functions
are built in, you can use the following algorithm:
% program eval_myeps
clear
format compact
% compute machine-epsilon
myeps=1.
myepsp1=myeps+1.
while (myepsp1>1)
myeps=0.5*myeps;
myepsp1=1+myeps;
end
myeps
Other builtin functions which are convenient to get ideas about the feasibility of some numer-
ical algorithm are realmax, the largest representable floating point number, and realmin,
the smallest floating point number which is larger than 0. All these functions eps, realmin,
realmax are implementation dependent, i.e. their result may be different on different com-
puter models, because the mathematical operations are wired in a different way on the
chip.
The actual number of valid digits of mantissa and exponent are usually not defined in
3.3. CHECKING FOR EQUALITY 33
S EEEEEEEEEEE FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
0 1 11 12 63
with the sign S, the exponents E and Mantissa digits F. whereas CRAY used something like
S EEEEEEEEEEEEEEEEE FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
0 1 17 18 63
which due to the lower accuracy and other idiosyncrasies in rounding, has now totally van-
ished. For most numerical computations, double precision is sufficient, and the errors with
single precision computations will be too large. Additionally to double precision, many man-
ufacturers offered Real*16/Quadruple Precision, which usually will be considerably slower
than double precision.
Modern compilers and Processors, as the Pentium4 and the G4/G5 allow faster computa-
tions if the compiler options allow rounding for double precision, so that the results will be
considerably less accurate than double precision/16 digits, but still more accurate than single
precision/8 digits.
That 4/8 Byte are used does not mean that all compiler functions operate on these data
types are computed with the full accuracy, correct up to the last Bit. The IEEE-Standard
defined that all results have to be given in such a way that only the last/ least significant bit
is rounded. Because this can become quite costly, one can usually choose compiler options
which offer higher accuracy, but not so good performance, or faster but less accurate code.
The errors elf-implemented routines may suffer from additional errors which will be discussed
in the next sections.
if (a==b)
but check for equality with a certain error #. Be sure whether you need the absolute error
n=10
epsilon=10^{-n}
if (abs(a-b)<epsilon)
n=10
epsilon=10^{-n}
if (abs(a-b)<epsilon*max(abs(a),abs(b)))
34 CHAPTER 3. NUMERICAL ANALYSIS I
n=10
epsilon=10^{-n}
if (abs(a-b)/max(abs(a),abs(b))<epsilon) ! dont do this !!!!
which will crash the program in case that a = b = 0. Moreover, a multiplication can be
executed faster than a division, so if the if-condition is inside an often executed loop, the
division can slow down the execution of the loop considerably.
> sqrt(-1)
ans = 0 + 1i
> asin(1.5)
ans = 1.57080 - 0.96242i
This may become a problem if the expected result is indeed real, but very near the undefined
value, e.g. if the result without rounding error should be 1, but due to rounding it is e.g..
1.000000000001, and the asin computed from it is
> asin(1.000000000001)
ans = 1.5708e+00 - 1.4143e-06i
so that the computation will be continued with a complex part. In such cases, the input
should always be checked with an if-statement whether it conforms to the expectations.
There is also an IEEE-Standard which defines such exceptions, e.g. what should be done if
e.g. a number is divided by 0. The result is stored in a bit-pattern which is outputted as NaN,
Not a Number. MATLAB is a bit more sophisticated. For a start, it gives the correct
result for the division, :
> 4/0
warning: division by zero
ans = Inf
> -2/0
warning: division by zero
ans = -Inf
for Inf, the usual rules apply, but some cases are different:
3.5. ERRORS 35
> Inf+3
ans = Inf
> Inf+Inf
ans = Inf
> Inf/Inf
ans = NaN
> Inf-Inf
ans = NaN
When tested for equality via the ==-Operator, one Idiosyncrasy is, that Infinity is always
equal to Infinity in MATLAB, but NaN is always unequal to NaN:
> 4==4
ans = 1
> Inf==Inf
ans = 1
> NaN==NaN
ans = 0
> isnan(4)
ans = 0
octave:17> isnan(NaN)
ans = 1
octave:18>
To test which numbers are the largest and smallest, the MATLAB-Functions realmin and
realmax can be used. Because Inf, -Inf and NaN must are represented as Floating-Point
patterns in MATLAB, there are about three to four Bit-patterns less available in MATLAB
than in Compilers for e.g. FORTRAN or C which dont use Inf and Nan. Because the
Bit-Pattern of the largest Numbers are used, the largest represented floating-point number
is smaller than in the compilers.
3.5 Errors
As we have seen in the previous chapter, the representation of real numbers as floating point
approximation leads intrinsically to rounding errors. In the following, we will treat additional
sources of error which occur in the evaluation of algebraic equations.
The error which results when e.g. the infinite series is instead computed with only a finite
number of operations, i.e. truncated after a finite step is called the truncation error. In fact,
if in a given interval a function f (x) is given by the infinite polynomial series with series
coefficients a1 , a2 , a3 . . . , a , if the series is truncated after n steps in a given interval, an
approximation
n
(
f (x) = a%i xi + O(xn+1 ) (3.2)
i=0
can be found which has a smaller error than the truncated series using the coefficients of the
infinite series ai . Such a series is called an n-th order approximation of f (x), which often
makes use of the expansion of the function in terms of Orthogonal polynomials1
Whereas the exponential function exp(x) is defined in the infinite series with the coefficients
( xn
exp(x) = ,
n=0
n!
the best finite approximation in the interval 0 x ln2 to exp(x) with 10 digits is
exp(x) = a0 + a1 x1 + a2 x2 + a3 x3 + a4 x4 + a5 x5 + a6 x6 + a7 x7 + (x)
with (x) 2 1010 and the coefficients are given in Tab.3.2. Be aware that the coef-
ficients for the truncated polynomial approximation depend on the interval for which the
approximation should be used, to minimize the error.
n an 1/n!
0 1.00000 00000 1.000000000000000000
1 -0.99999 99995 -1.000000000000000000
2 0.49999 99206 0.500000000000000000
3 -0.16666 53019 -0.166666666666666657
4 0.04165 73475 0.041666666666666664
5 -0.00830 13598 -0.008333333333333333
6 0.00132 98820 0.001388888888888889
7 -0.00014 131261 -0.000198412698412698
Table 3.2: Coefficients for the Polynomial approximation of exp(x) in the interval
0 x ln2 (middle column) and the corresponding coefficients of the infinite Taylor
series.
In practice, many transcendental functions f (x) which are introduced in elementary classes
of mathematical lessons are numerically better approximated by other approximations than
polynomial approximations, e.g. by making explicit use of divisions, which can itself mimic
operations of infinite order in x, either via Pade-Approximation (quotient of two polynomial
expressions) or via continued fractions.
An effective strategy, especially with periodic functions, is argument reduction, so that one
does not have to compute the Taylor series for large x, but for a small x near the origin
Chap. 22, Handbook of Mathematical Functions, M. Abramowitz, I. Stegun, National Bureau of
1
Standards.
3.5. ERRORS 37
by either shifting the periodic functions like sin, cos into the interval between [0, /4], or by
decomposing the function into a product of an integer argument and an non-integer argument,
like in the case of the exponential function, where one computes
return
2
Handbook of Mathematical Functions, M. Abramowitz, I. Stegun, National Bureau of Standards.
38 CHAPTER 3. NUMERICAL ANALYSIS I
we obtain
myexp=-4.422614950123058e-07
as a result, instead of the correct
exp(x) = 1.250152866386743e 09.
As we see, the result is so very wrong that not even the sign of the result is correct, we get
a negative value for a computation which should always give positive values. The problem is
also not the range of the numbers, because the smallest number representable in MATLAB
precision is 10300 , much smaller than the correct result, 109 . The problem is also not a
truncation error, as we are still trying to add taylor contributions, even the result does not
change any more after the 95th iteration. The problem is that we try to add something which
is smaller than the last digit of the summation.
There are possibilites to circumvent such kinds of problems which will be explained later in
the lecture.
a=cos(x)^2-sin(x)^2
this gives dubious results whenever the argument x is a multiple of /4, with arbitrary number
of canceled digits. The problem can simply be circumvented by using the trigonometric
identity cos(2x) = cos(x)2 sin(x)2 so that
a = cos(2x) (3.3)
always gives the result with the accuracy of the compilers evaluation of the cos- evaluation.
From partial Integration we obtain a relation between En and En1 , which can be used to
iteratively compute En if we have E0 given:
, 1
-1 , 1
-
n x1
x e dx = x e n x1 -
- nxn1 ex1 dx
0 0 0
, 1
= 1n xn1 ex1 dx
0
En = 1 nEn1 , n = 2, . . .
3.6. GOOD AND BAD DIRECTIONS IN NUMERICS 39
As can be seen, the second iteration with the wrong starting value converges against the
right end-value exp(1), whereas the first iteration with the right starting value converges
against a wrong result. This shows the art of numerical computing, which is, to obtain a
correct end result with a good routine and a wrong starting value, instead of obtaining a
wrong end result with a correct starting, but a bad routine.
It will be later become obvious in this course that integration is always the good direction
in numerical computing, which can decrease initial errors, whereas the differentiation is the
bad direction, which can increase initial errors. This is in contrast to manual calculation,
where the differentiation is easier to treat than integration.
( f ()(x0 )
f (x) = (x x0 ) . 2
cos(x)
=1
! 0th Order
0 2nd Order
For the functions exp(t), sin(t), cos(t), the Taylor series 4th Order
6th Order
is given below: 2
n
( t t t2 t3 5 0 5
exp(t) = =1+ + + + ...
n=0
n! 1! 2! 3! 6
(
t2n+1 t3 t5 exp(x)
sin(t) = (1)n = t + ... 4 0th Order
n=0
(2n + 1)! 3! 5! 1st Order
2nd Order
( t2n t2 t4 t6 2
3rd Order
cos(t) = (1)n = 1 + + ...
n=0
2n! t! 4! 6!
0
5 0 5
If we truncate the (for transcendental functions infinite) series after a finite number of terms,
we obtain the Taylor approximation. The the evaluation of a Taylor approximation, e.g. of
fourth order with the coefficients a, b, c, d, e
series can be done in an efficient and in an inefficient way. Using the above formula directly,
we can write
f(x)=a+b*x+c*x*x+d*x*x*x+e*x*x*x*x
3.7. CALCULUS AND ORDER OF METHODS 41
so that we need four additions and ten multiplications. If we use brackets around the expres-
sion in a skilled way, four additions and four multiplications are sufficient:
f(x)=a+(b+(c+(d+e*x)*x)*x)*x
It is easy to write down the derivative of the above polynom as
f(x)=b+(2*c+(3*d+4*e*x)*x)*x
8
In MATLAB, the evaluations of polynoms is imple- 2
f(x)=x 1
mented with the function polyval, the derivative with d/dx f(x)=2x
polyder, but the order of the coefficients is the oppo-
6
site from the above example, and the graph can be seen
on the right:
3.7.2 Integration I
In the same way that many transcendental functions can be represented by an infinite Taylor
series but approximated as a finite polynomial series in x, integrals and derivatives can be
approximated by replacing the infinitely small differential dx by the finite difference x,
and the error can be expressed as a power of x, as in the approximation of transcendental
functions by finite power series. The simplest method to numerically integrate an integral
, b
I= f (x)dx (3.10)
a
42 CHAPTER 3. NUMERICAL ANALYSIS I
where b a is a multiple of x, and the integration points are spaced equidistantly.3 I (1)
means that the method is of first order in x, the error is of the order of x2 .
Numerical integration is sometimes called quadrature, maybe from the time where the
integral was approximated numerically by drawing squares under the graph, and this box
counting was the first non-analytical quadrature. As an example, let us compute the
integral
, b
erf(b) erf(a)
exp(x )dx =
2
,
a 2 2
which is a bit unintuitive because its needs the error function erf to be represented analyti-
cally. With the integration bounds of [0, 1] the integral is with about 15 digits accuracy
, 1
exp(x2 )dx = .7468241328124270.
0
Now let us approximate this integral with the rectangle midpoint rule, where we replace the in-
tegral with a Riemann sum with n corner points and we will evaluate
the function in the middle of the n 1 in- Integration with Rectangle Midpoint Rule
tervals of equal width h with the functon 1
evaluated at the middle instead of the left 0.9
or right end of the integration interval: 0.8
clear 0.7
format long 0.6
n=101 % n odd !
0.5
dx=1/(n-1) % stepsize
xrect=[dx/2:dx:1-dx/2]; 0.4
yrect=exp(-xrect.*xrect); 0.3
sum(yrect)*dx 0.2
, b . 0.1 /
f (x)dx = h f (x0 )+f (x1 )+f (x2 ) . . .+f (xn )
a 0
0.4 0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4
There are methods which dont chose the points equidistantly, but optimize the choice of points
3
so that the most accurate approximation is obtained with the minimum number of points.
3.7. CALCULUS AND ORDER OF METHODS 43
Instead of evaluating the left and right Integration wit Trapeze Rule
1
bound of each interval, we will count the
function values between the upper and 0.9
clear 0.4
Surprisingly, the result of 0.74681800146797 is one digit less accurate than the result with the
midpoint rule, though the program was more complicated, because we had to think about
a proper way to implement the trapeze shape for each interval. If we think about a graph
with mostly negative curvature, the trapeze rule will end with an approximation which is
constantly below the true function value. For the rectangles of the midpoint rule are partly
above, partly below the graph, so that there is error compensation already within one interval.
In the rectangle midpoint rule, we have chosen the quadrature points in the middle of the
interval. If we would have choosen the values for the function evaluation at the left/right
boundary of each interval, we would have obtainted 0.74997860426211/ 0.74365739867383,
considerably less accurate than the rectangle midpoint rule.
It can be shown4 that the Integral over convex curve
midpoint rule has an accuracy of Trapeze rule Rec. Midp. rule
1 3 %% underestimated overestinated
h f (xi ), (3.12)
24 i
better than the trapeze rule which
has an accuracy of
1 3 %%
h f (xi ), (3.13)
12 i
so it is surprising that textbooks usually introduce numerical quadrature via the trapeze rule.
Because both formulae are correct up to the second power of hi , and the error is of the third
power, they are called formulae of second order.
More accurate, accurate to third order, is the composite Simpson-Rule S(f ), which makes use
of combining the rectangle midpoint rule R(f ) and the trapez rule T (f ). When we compare
the integral for the midpoint rule and for the trapeze rule, we see that in our integration
4
G.E. Forsythe, M. Malcom, C. Moler, Computer Methods for Mathematical computations, Pren-
tice Hall 1977
44 CHAPTER 3. NUMERICAL ANALYSIS I
intervall with a convex function, the trapeze rule allways gives a too small result, the midpoint
rule gives allways a too large result. Therefore, if we average R(f ) and T (f ), we will get a
better result than R(f ) and T (f ) alone. Because the error for R(f ) (Eqn. 3.12) is twice as
large as the error for T (f ) (Eqn. 3.13), we should not take the direct average 12 R(f ) + 12 T (f ),
but the weighted average so that the error of both rules cancels:
2 1
S(f ) = R(f ) + T (f ).
3 3
Its error can be shown5 to be of the order of
1 4 %%%%
h f (xi ).
2880 i
For our example with the integral from 0 to 1 over exp(x2 ), we obtain 0.746824132817537
as the result, instead of the exact 0.7468241328124270 . . . .
In our second-order formulae, we tried to approximate the graph with straight lines and
integrated the area below the curve. A parabola is determined by 3 points, and therefore one
can also try to approximate the graph via a parabola instead of a straight line to obtain a
Simpson rule directly by supplying three integration points for each interval.
It is therefore necessary to have an odd number of integration points, and the direct derivation
of the Simpson rule can be done for an integration interval with length 2h and by inserting
the Taylor expansion of the function f (x) with the th derivatives f () around the point
x0 so that
( f ()(x0 ) Simpson-Rule
f (x) = (x x0 )
=0
!
clear
format long,format compact
n=101 % n odd !
dx=1/(n-1) % Stepsize
xsimp=[0:dx:1];
ysimp=exp(-xsimp.*xsimp);
(4*sum(ysimp(2:2:n))+2*sum(ysimp(3:2:n-1))+ysimp(1)+ysimp(n))*dx/3
which gives the Result 0.74682413289418, slightly worse than the composed Simpson rule.
In the following table, we compare the errors of the different orders by underlining the correct
digits and introduce the Big-O-notation
Method Integral Order correctness
01
0 exp(x )dx
2
1. If a formula if of nth order and a discretization of 1/100 of the interval length are used,
for a first order implementation, the error is about 1/100=1 %, for the second order
method it will be about 1/10.000 and for the third order method it will be 1/1000000.
(Of course, the prefactors in the order also have to be taken into consideration).
4. Be aware that it is not possible to integrate functions analytically if their integral has
no solution due to divergence etc .....
In this section, we have discussed the error resulting from the integration over a whole interval.
This is also called a global error, in contrast to the error which occurs in the approximation
of the single interval. Numerical methods suffering from truncation error vary depending on
whether the global error is the same as the local error, whether the global error is larger than
46 CHAPTER 3. NUMERICAL ANALYSIS I
the local error (many solvers for differential equations which do not conserve energy) or the
global error is smaller than the local error (error compensation as in the case of the Simpson
Integration.).
One generally should be very
careful in using a method with
Relative Accuracy for Integrating exp(x*x) between 0 and 1
low order accuracy and a small 10
0
Carrying the error compensation in formulas with truncation error further to higher orders,
by combining low order methods as in the case of the composite Simpson rule so that a
higher-order method results, is called Romberg integration. If the limit for infinitely high
orders is taken, this is called Richardson-extrapolation, and these ideas can also be applied
to differentiation and the solution of numerical differential equations.
3.7. CALCULUS AND ORDER OF METHODS 47
3.7.3 Differentiation I
In the same way one can derive the Newton-Coates formulae for integrals from their Taylor
expansion in the previous section, one can derive formulae for the derivatives using the
Taylor expansion6 . Such approximations are often called finite difference formulas, as they
approximate the differential with the finite difference. For a data which take the value
fi2 , fi1 , fj , fj+1 , fj+2 at equidistant points, we get the following finite difference schemes
for first order derivatives.:
Name: Finite difference scheme: Leading error
ForwardDifference (fi+1 fi )/x xf %% (x)/2
BackwardDifference (fi fi1 )/x xf %% (x)/2
3point symmetric (fi+1 fi1 )/(2x) x2 f %%% (x)/6
3point asymmetric (1.5fi + 2fi+1 .5fi+2 )/x x2 f %%% (x)/3
5point symmetric (fi2 8fi1 + 8fi+1 fi+2 )/(12x) x4 f %%%%% (x)/3
Note that the leading coefficients in front of the fi2 , fi1 , fj , fj+1 , fj+2 have to add up to 0.
For second order derivatives, similar schemes are be written down in the following table and
again the coefficients add up to 0:
Name: Finite difference scheme: Leading error
3point symmetric (fi1 2fi + fi+1 )/(x2 ) x2 f %%%% (x)/12
3point asymmetric (fi 2fj+1 + fi+2 )/(x2 ) x2 f %%% (x)
5point symmetric (fi2 + 16fi1 30fi + 16fi+1 fi+2 )/(12x) x4 f %%%%%% (x)/90
In contrast to numerical integration, which smoothes out errors via error compensation,
numerical differentiation roughens up the solution. If high accuracy is desired, there are
usually better solutions than computing the derivatives directly via finite difference schemes.
The graph to the right shows the numerical integral of
, x
1 sin(y)dy = 1 cos(x)
0
d
sin(x) = cos(x)
dx
with some additional noise of 1 % in sin(x). The above graph was produced with the following
program:
clear
format compact
nstep=200
x=linspace(0,4*pi,nstep);
dx=mean(diff(x));
idx=1/dx;
y=sin(x-dx/2)+0.01*(rand(size(x))-.5);
subplot(3,1,1)
6
Clive A.J. Fletcher, Computational Techniques for Fluid Dynamics, Vol.1, 2nd. ed. Springer 1990
48 CHAPTER 3. NUMERICAL ANALYSIS I
1
sin(x)+0.005*rand
0.5 d/dx sin(x)
cos(x)
0 1int(sin(x))
0.5
1
0 2 4 6 8 10 12
absolute error
0.1
0.05
0
0.05
cos(x)d/dx*sin(x)
0.1 cos(x)1+int(sin(1)
0 2 4 6 8 10 12
relative error
4 (cos(x)d/dx*sin(x))/cos(x)
(cos(x)1+int(sin(1))/cos(x)
2
2
0 2 4 6 8 10 12
plot(x,y,-.,x(1:nstep-1),diff(y)*idx,...
x(1:nstep-1),cos(x(1:nstep-1)),:,...
x(1:nstep-1),1-cumsum(y(1:nstep-1))*dx,--)
axis tight
legend(sin(x)+-0.005*rand,d/dx sin(x),cos(x),1-int(sin(x)))
subplot(3,1,2)
plot(x(1:nstep-1),diff(y)*idx-cos(x(1:nstep-1)),...
x(1:nstep-1),1-cumsum(y(1:nstep-1))*dx-cos(x(1:nstep-1)),:)
legend(cos(x)-d/dx*sin(x),cos(x)-1+int(sin(1))
title(absolute error)
axis tight
subplot(3,1,3)
plot(x(1:nstep-1),((diff(y)*idx-cos(x(1:nstep-1)))./cos(x(1:nstep-1))),...
x(1:nstep-1),(1-cumsum(y(1:nstep-1))*dx-cos(x(1:nstep-1)))...
./cos(x(1:nstep-1)),:)
legend((cos(x)-d/dx*sin(x))/cos(x),(cos(x)-1+int(sin(1))/cos(x))
axis tight
title(relative error)
return
Both the differential and the integral should give cos(x), but the differential is so noisy that
the result deviates visibly from the exact solution. The integral over the noisy result gives
nevertheless a smooth curve. This is again a case of a good and a bad direction of
numerical computing, as we encountered before by rewriting the iterative computation of the
3.7. CALCULUS AND ORDER OF METHODS 49
equation
En = 1 nEn1 (numerically unstable)
into
En = 1 nEn1 (numerically stable).
As can be seen, for the differentiation and its inverse operation, the integration, in numerical
analysis, the differentiation consists the bad direction, integration is the good direction.
In numerical analysis, integrals, also of higher order, can usually be computed with sufficient
precision, in contrast to derivatives, whereas in analytical calculations, it is usually always
possible to compute differentials, but very often the computation of closed forms for integrals
is problematic.
Exercises:
1. Write a program which produces floating point numbers for base 2 and mantissa 4, as well
as for base 4 with mantissa 2.
a) Chose the exponent so that both number systems are roughly comparable.
b) Plot the position of the numbers.
c) Compare both number systems: Which number system can be supposed to have the better
roundoff-properties.
2. Write a program which computes the exponential function exp(x) using the Taylor series
and one program which computes the exponential function by evaluating the integer part of
x using powers of the Euler number e and the non-integer-part using the taylor series. For
which size of the arguments become the
Chapter 4
Graphics
>> a=[3:6]
a =
3 4 5 6
>> a=[3:.5:6]
a =
3.0000 3.5000 4.0000 4.5000 5.0000 5.5000 6.0000
This is different from loops in FORTRAN and C, where the stepsize is added as the third ar-
gument for a loop statement. Whereas the colon operator notation using : constructs a vector
with a given lower and upper bound for a given stepsize, [lower_bound:stepsize:upper_bound],
if instead of the stepsize the number of points is known, it is more convenient to use the
linspace-function
>> a=linspace(3,6,7)
a =
3.0000 3.5000 4.0000 4.5000 5.0000 5.5000 6.0000
>> b=logspace(1,1000,3)
b =
10 Inf In
>> b=logspace(1,4,4)
b =
10 100 1000 10000
52 CHAPTER 4. GRAPHICS
If several vectors should be concatenated, this can be done with the brackets for the array-
constructor []
>> c=[1 3]
c =
1 3
>> c=[4 c b]
c =
Columns 1 through 6
4 1 3 10 100 1000
Column 7
10000
After a lot of vector operations, one usually one also needs functions which give informations
about the vectors used. The most elementary function, which displays information about
variables, is
>> who
a ans b c
The length of a vector is displayed by
>> length(a)
ans =
7
but this function makes no difference between column- and row vectors. For information on
higher dimensions, one has to use the function
>> size(a)
ans =
1 7
Vector elements can be accessed either via the for loops like in other programming languages
like in
>> for i=1:length(b)
f(i)=2*b(i)
end
f =
20
f =
20 200
f =
20 200 2000
f =
20 200 2000 20000
53
or via the colon-notation with : and round brackets so that for a vector
>> c=.2:.2:1.2
c =
0.2000 0.4000 0.6000 0.8000 1.0000 1.2000
the assignment of the second to the fourth element to a vector g can be written as
>> g=c(2:4)
g =
0.4000 0.6000 0.8000
The whole of a vector can be assigned without specifying the bounds like in
>> h=c(:)
h =
0.2000
0.4000
0.6000
0.8000
1.0000
1.2000
If the vector from a lower bound up to the end should be assigned, this can be done via the
the end statement in round brackets together with the colon operator :
>> v=c(4:end)
v =
0.8000 1.0000 1.2000
Functions which operate on vectors are usually defined in the canonical way, that means
in a way in which one expects the function to work. The functions prod and sum acting on a
vector behave in the way one expects, e.g. they give as a result the product and sum of the
vector elements. Whereas prod and sum are acting on vectors and give a scalar as a result,
the functions cumsum and cumprod which computed the cumulated sum and the cumulated
product give a vector as a result
>> cumsum(1:5)
ans =
[1 3 6 10 15]
One must be careful with the use of multiplicative operators *, / and ^, which are in
MATLAB in general interpreted in the sense of numerical linear algebra, so that column-
and line- operations must match. If one wants to use these operators elementwise, one should
use their elementwise variants which are preceded by a ., as in .*, ./ and .^.
54 CHAPTER 4. GRAPHICS
>> ones(2)
ans =
1 1
1 1
For non-square matrixes, two indices have to be specified, where the first is the columns-index,
and the second is the row index, for example
>> ones(3,2)
ans =
1 1
1 1
1 1
The zeros function behaves in the same way as the ones function, only that it sets up
matrixes with 0 as every element
>> zeros(2,3)
ans =
0 0 0
0 0 0
In linear algebra, the identity matrix is very important, and therefore the unit matrix in
MATLAB is named eye, eyedentity / identity
>> eye(3)
ans =
1 0 0
0 1 0
0 0 1
It may be surprising, but the identity-matrix is also defined for non-square matrices, as the
following example shows
>> eye(2,5)
ans =
1 0 0 0 0
0 1 0 0 0
Another important matrix function is the constructor for the random matrix
4.1. SETTING UP AND MANIPULATING MATRICES 55
>> rand(2,4)
ans =
0.8214 0.6154 0.9218 0.1763
0.4447 0.7919 0.7382 0.4057
>> c=ones(2)-eye(2)
c =
0 1
1 0
>> b=zeros(2)
b =
0 0
0 0
>> d=[2 3
4 5]
d =
2 3
4 5
>> e=[c b
d c]
e =
0 1 0 0
1 0 0 0
2 3 0 1
4 5 1 0
A very convenient function similar to linspace in one dimension which can be used to set up
arguments for functions in higher dimensions is the meshgrid-function which the functionality
is as follows:
>> x=[.1:.1:.5]
x =
0.1000 0.2000 0.3000 0.4000 0.5000
>> y=[20:-4:1]
y =
20 16 12 8 4
>> subplot(2,2, 1)
>> plot(y)
>> subplot(2,2,2)
>> plot(x,y)
which displays on the screen (note the differnt scale on the x-axis)
20 20
15 15
10 10
5 5
0 0
1 2 3 4 5 0.1 0.2 0.3 0.4 0.5
Plots of vectors can be done either by plotting the vector directly or by specifying two vectors,
the first will be taken as the x-axis. If the vector length does not match, MATLAB issues
an error message and stops the program execution. The plots are automatically done in the
sub-plot which has been called last.
A subtle way of plotting it the plot of a vector of complex numbers. If you have a complex
vector c, you can get the real part x and the imaginary part y via
x=real(c)
y=imag(c)
The command plot(c) has then the same effect as plot(x,y) which means that the imagi-
nary part is plotted versus the real part.
If a new plotting window should be opened, this can be done via the figure command, the
first window is built is figure(1) command which is automatically executed if no plotting
4.3. VISUALIZING ARRAYS 57
window is open, figure(2) opens a second plotting window and so on. Plots are done in the
window for which the figure command was called last.
There is a wide variety of ways to influence graph annotation in MATLAB
0.3
% example for graph anotation
subplot(2,2,1) 0.2 any label here
x=[.1:.1:.5]
0.1
plot(x,x.*x,x,x.*x.*log(x))
Yaxis
xlabel(Xaxis) x2
0
ylabel(Yaxis) x2*log(x)
1
To look at a drawing in higher resolution,
use the zoom command, aim with the mouse- 1.5
pointer at the region which should be zoomed
2
and click the left mousebuttom (the right
mousebuttom unzooms the region again). 2.5
0.1 0.2 0.3 0.4 0.5
Arrays are plotted as arrays with the verbmesh-command, which plots the data in a
wire-frame-type of graph, as below on the right.
The view command can be used to set a different viewing angle for three-dimensional plots. It
is also possible to change the viewpoint interactively via the rotate3d command by pointing
with the mouse on the frame and pulling the frame of the 3D-graph.
1000
800
1000
600
400
500
200
0
0
500
200
400
1000
8
600 6 8
7
4 6
800 5
4
2 3
2
1000 0 1
1 2 3 4 5 6 7 8
clear
format compact
x=[linspace(0.1,10,100)];
subplot(2,2,1)
plot(x,x,-,x,exp(x),--,x,1./x,:,x,log(x),-.,x,1./sqrt(x),-+)
axis([0 10 -3 20 ])
title(linear plot)
legend(x,exp(x),1/x,log(x),1/sqrt(x))
subplot(2,2,2)
semilogx(x,x,-,x,exp(x),--,x,1./x,:,x,log(x),-.,x,1./sqrt(x),-+)
axis([0.1 10 -3 20 ])
title(semilogarithmic in x-direction)
legend(x,exp(x),1/x,log(x),1/sqrt(x),2)
subplot(2,2,3)
semilogy(x,x,-,x,exp(x),--,x,1./x,:,x,log(x),-.,x,1./sqrt(x),-+)
axis([0.1 10 .01 20000 ])
title(semilogarithmic in y-direction)
legend(x,exp(x),1/x,log(x),1/sqrt(x),2)
4.4. ANALYZING SYSTEMS VIA PLOTTING 59
subplot(2,2,4)
loglog(x,x,-,x,exp(x),--,x,1./x,:,x,log(x),-.,x,1./sqrt(x),-+)
axis([0.1 10 .01 20000 ])
title(logarithmic )
legend(x,exp(x),1/x,log(x),1/sqrt(x),2)
5 5
0 0
0 2 4 6 8 10 1 0 1
10 10 10
0 0
10 10
2 2
10 10
2 4 6 8 10 10
1
10
0 1
10
Many systems in science and mathematics can be better understood by just plotting typical
properties in different scales. Logarithmic, linear, exponential and power laws can be found
in nature, and are easily identifiable by plotting the data in difference scales.
60 CHAPTER 4. GRAPHICS
If instead of the y-axis the x-axis is choosen in logarithmic scale, logarithmic curves become
straight lines. Logarithmic curves grow slower than linear curves. Typical examples for
logarithmic behavior are animal senses. Light and sound are perceived on a logarithmic
4.4. ANALYZING SYSTEMS VIA PLOTTING 61
scale, i.e. sound is not twice as loud if the pressure of the sound wave is twice as high,
but 102 as hight.
clear
format compact
a=linspace(.0,100,1000)
la=length(a)
x=linspace(0.1,1000,100);
y=zeros(size(x));
for i=1:la
y=y+exp(-.1*a(i).*x);
end
loglog(x,y)
This means that a power-law is found in a system if there are many different scales, which
contribute to an exponential phenomenon and on each size scale (variable a in the above
example) there is a different prefactor in the exponential law.
In the same way as the curve of a power law can be written as a superposition of exponential
curves, a Lorentzian can be written as a superposition of Gaussian curves.
Another example for power-laws is the 1/f -noise in many technical applications. I its very
often found in system where a seemingly continuous process is a result of discrete processes,
and the deviation from the mean causes the noisy fluctuations.
62 CHAPTER 4. GRAPHICS
axis image
defines the minimal coordinates for x and yaxis respectively. Because MATLAB usually
chooses the axis so as to end the axis at values for multiples of 1, 10, 100, it is sometimes
necessary to set
axis tight
so that the axes terminate at the extremal values of the plot. Apart from the axis, a grid can
be specified by using grid. Sometimes it is necessary to modify picture properties like the
axis labels etc from the default values chosen by MATLAB. For the program
clear
format compact
x=linspace(0,2*pi,10);
y=sin(x);
h=plot(x,y)
g=axes
get(h)
get(g)
the plot is defined as a variable, and these variable can be displayed with the get command.
The entries for h and g can then be directly modfied using the set-command, by specifying
the object-name, h, g, the property to modify, e.g. Color and the new value, e.g.
set(g,Color,[.5 .5 .5])
Another possible usage, if one already knows the property name, is e.g.
set(gca,XTickLabel,{One;Two;Three;Four})
which labels the first four tick marks on the x-axis and then reuses the labels until all ticks
are labeled. The labels can be positioned like
set(gca,XTickLabel,{1;10;100})
4.6. MATLAB-OUTPUT INTO A FILE 63
image
name=imread(name.gif)
image(name)
The date in the variable name can then be manipulated like a usuall MATLAB-array.
diary on
and MATLAB will then output not only to the screen, but also in the file diary . If one
wants to redirect the output in a file with a special name, one can use the command
diary(special_filename)
diary off
If you want to include the output in a LaTeX document and preserve the Computer-output-
look, you can use the
\begin{document}
\end{\document}
style, all the program examples in this scriptum are produced in such a way.
r=1
4.9. PROCESSING GRAPHICS 65
\epsfig{file=graphiken/circle_square.eps,height=2cm}
r=1
\epsfig{file=graphiken/circle_square.eps,height=2cm,angle=-90}
r=1
\epsfig{file=graphiken/circle_square.eps,height=2cm,width=4cm}
r=1
In principle, all postscript-files should be includable in Latex, but some Programs produce
postscript-output which is not compatible with LaTeX. Under UNIX, one can use the com-
mand
to convert a file name.ps to a file name.epsi which corresponds to the encapsulated postscript
interchange format.
xv name.jpg
will load the program name.jpg. Pressing the right button of the mouse will make a menu
appear, and the graphics can be saved as Postscript by choosing the appropriate menu (SAVE
FORMAT POSTSCRIPT).
Chapter 5
Linear Algebra
Usually, one learns about linear algebra in the first year of study, but often, one needs it much
later, when one has forgotten most of it already. MATLAB means Matrix LABoratory and
its first version was written by Cleve Moler so that his students could learn linear Algebra
more easily.
General documentation of MATLAB can also be found on http://www.mathworks.com/
access/helpdesk/help/techdoc/matlab.shtml
A =
0.520109 0.340012 0.470293
0.510104 0.326988 0.636776
0.010375 0.782090 0.900370
> diag(A)
ans =
0.52011
0.32699
0.90037
If the input of the diag-command is a vector, diag constructs a matrix with the vector on
the diagonal, a typical example how commands are overloaded in MATLAB:
> b=[3 5 7]
b =
3 5 7
> A=diag(b)
A =
68 CHAPTER 5. LINEAR ALGEBRA
3 0 0
0 5 0
0 0 7
> A=rand(2)
A =
0.66166 0.48661
0.69184 0.39113
> B=A
B =
0.66166 0.69184
0.48661 0.39113
Because MATLAB knows the difference between column- and row-vectors, the transpose-
operator can also be used to transform column- into row-vectors and vice versa:
> v=[1 2 3 4 5]
v =
1 2 3 4 5
> u=v
u =
1
2
3
4
5
For complex-valued matrices, the -operator gives the Hermitian conjugate matrix:
> H=rand(3)+sqrt(-1)*rand(3)
H =
0.59574 + 0.89043i 0.91601 + 0.87663i 0.19920 + 0.74066i
0.71691 + 0.73996i 0.31324 + 0.44034i 0.19254 + 0.85119i
0.38660 + 0.13756i 0.33661 + 0.71527i 0.29184 + 0.58186i
> G=H
G =
0.59574 - 0.89043i 0.71691 - 0.73996i 0.38660 - 0.13756i
0.91601 - 0.87663i 0.31324 - 0.44034i 0.33661 - 0.71527i
0.19920 - 0.74066i 0.19254 - 0.85119i 0.29184 - 0.58186i
The commands which extract the upper/lower trigonal matrix are triu/tril:
5.2. MATRIX PRODUCTS 69
> A
A =
0.951650 0.084814 0.208357
0.109170 0.585341 0.562931
0.667123 0.528991 0.860920
> tril(A)
ans =
0.95165 0.00000 0.00000
0.10917 0.58534 0.00000
0.66712 0.52899 0.86092
> triu(A)
ans =
0.95165 0.08481 0.20836
0.00000 0.58534 0.56293
0.00000 0.00000 0.86092
If the columns or rows should be flipped, i.e. if their order should be inverted, this can be
done with the commands fliup and fliplr, flip up down and flip left right:
> fliplr(A)
ans =
0.73180 0.40541
0.55208 0.79014
> flipud(A)
ans =
0.79014 0.55208
0.40541 0.73180
These two commands can be used to form a transposition for a complex matrix, which is not
the hermitian conjugate:
> A=rand(2)+sqrt(-1)*rand(2)
A =
0.839504 + 0.572899i 0.466803 + 0.675260i
0.086815 + 0.252680i 0.132638 + 0.086518i
> B=fliplr(flipud(A))
B =
0.132638 + 0.086518i 0.086815 + 0.252680i
0.466803 + 0.675260i 0.839504 + 0.572899i
> u=[1 2 3 4]
u =
1 2 3 4
> v=[5 6 7 8]
v =
5 6 7 8
> u.*v
ans =
5 12 21 32
The inner product for a row-vector u and a column-vector w is computed with the operator
*:
> u=[1 2 3 4]
u =
1 2 3 4
> w=[1 1 2 2]
w =
1
1
2
2
> u*w
ans = 17
> w*u
ans =
1 2 3 4
1 2 3 4
2 4 6 8
2 4 6 8
Matrices can be treated in the same way as vectors with elementwise multiplication .* or
multiplication in the sense of linear algebra:
> A=[1 2
> 3 4]
A =
1 2
3 4
> B=[1 -1
5.2. MATRIX PRODUCTS 71
> -2 2]
B =
1 -1
-2 2
> A*B
ans =
-3 3
-5 5
> A.*B
ans =
1 -2
-6 8
> A=[1 2
> 3 4]
A =
1 2
3 4
> v=[1
> 2]
v =
1
2
> A*v
ans =
5
11
1
Matlab also has the Kronecker-Product as a builtin function,
> u=[1 2 3 4]
u =
1 2 3 4
> v=[5 6 7 8]
v =
5 6 7 8
> kron(u,v)
ans =
5 6 7 8 10 12 14 16 15 18 21 24 20 24 28 32
72 CHAPTER 5. LINEAR ALGEBRA
> kron(u,v)
ans =
5 10 15 20
6 12 18 24
7 14 21 28
8 16 24 32
Whereas the elementwise matrix product computed with . is commutative, of course the
matrix product computed with is not commutative.
Vectors for which the scalar product is 0 are called orthogonal. Whereas orthogonality of two
vectors v and w can be defined in theoretical mathematics as the property that their scalar
product is zero, v w = 1, in numerical mathematics it is necessary to define orthogonality
in a way so that possible rounding errors are taken into account, as the following example
shows
> v*w
ans = -9.64636952420157e-17
Obviously the last result should be exactly zero, but due to the rounding errors in the
computation, there is a finite error. How the definition of orthogonality can be applied in
such a way that rounding errors are taken into account can be seen in the next section about
the rank of matrices.
> size(A)
ans =
2 2
size gives a two-row vector as an answer, the number of columns is size(A,2), the number
of rows size(A,1). Also the length of columns / rows for a column/ row vector v can be
computed with size(v,2) / size(v,1)
The rank of a matrix is in theoretical linear algebra the number of linear independent
rows/columns. Because the definition of linear independence is equivalent to the definition
of orthogonality, we will use the rank computation as the criterion for the orthogonality. The
rank of a matrix can be computed with MATLABSsrank command. How the rank command
works, will be explained later in the section about the singular value decomposition, along
how one should choose the optional threshold in MATLABSs rank command. First let us
review some theorems about the rank of matrices1
Random matrices have nearly always full rank, i.e. the rank of a matrix con-
structed with rand is the same as the number of the columns/rows, and if the number
rows/columns larger than the number of columns/rows, we have rank(A)=min(size(A)):
> A=rand(3,4)
A =
0.23382 0.43570 0.42862 0.97961
0.79868 0.34546 0.69142 0.74305
0.66927 0.71192 0.15419 0.11667
> rank(A)
ans = 3
Square matrices which have a rank smaller than their number of columns/rows are
called singular. They cannot be inverted, and systems of linear equations where the
equations form a singular matrix cannot be solved. Their determinant vanishes.
The Rank of a matrix does not change through transposition, complex or hermitian
conjugation.
The product of non-singular matrices as the same rank as the matrices themselves:
> A=rand(3,4)
A =
1
Roger A. Horn, Charles R. Johnson, Matrix Analysis Cambrigde University Press 1991
74 CHAPTER 5. LINEAR ALGEBRA
> B=rand(3)
B =
0.908288 0.703948 0.363589
0.245781 0.950685 0.097344
0.942011 0.726192 0.064962
> C=B*A
C =
0.52703 0.94231 0.57935 1.45301
0.18896 0.92705 0.17690 0.70990
0.50276 0.75283 0.44795 1.19982
> rank(A)
ans = 3
> rank(B)
ans = 3
> rank(C)
ans = 3
For rank-deficient matrix, the rank of the product matrix is the same as that of the
matrix with the lowest rank:
> A=rand(2,4)
A =
0.200421 0.795092 0.896583 0.454798
0.838726 0.220597 0.018236 0.018493
> A(3,:)=A(2,:)
A =
0.200421 0.795092 0.896583 0.454798
0.838726 0.220597 0.018236 0.018493
0.838726 0.220597 0.018236 0.018493
> B=rand(3)
B =
0.94359 0.31700 0.20635
0.50896 0.36833 0.40063
0.48172 0.19705 0.42594
> C=B*A
C =
0.62806 0.86569 0.85555 0.43882
5.3. REPETITION OF ELEMENTARY LINEAR ALGEBRA 75
> rank(A)
ans = 2
> rank(B)
ans = 3
> rank(C)
ans = 2
5.3.3 Rank-Inequalities
For A Mm,n we have rankA min(m, n)
When a column or a row of a matrix are deleted, the rank of the resulting matrix
cannot be larger than the rank of the original matrix.
min(rankArankB)
Properties of MatrixNorms
||A|| 0 (Non-negativity)
||A|| = 0 if A = 0
> A=rand(3)
A =
0.209224 0.413728 0.212479
0.106481 0.192283 0.074438
0.291095 0.436435 0.508115
> det(A)
ans = -0.0017939
> det(B)
ans = 0.0017939
> A=rand(2)
A =
0.29975 0.85007
0.88812 0.33290
> B=rand(2)
B =
0.89979 0.72370
0.53648 0.97567
> A/B
ans =
-0.33410 1.11909
1.40492 -0.70090
> C=inv(B)
C =
1.9926 -1.4780
-1.0956 1.8376
> A*C
ans =
-0.33410 1.11909
1.40492 -0.70090
Because the product C*A does not necessarily the same result as the product A*C, there is
also the right division of a matrix, which with the above matrices gives
> C*A
ans =
-0.71537 1.20181
1.30362 -0.31963
> B\A
78 CHAPTER 5. LINEAR ALGEBRA
ans =
-0.71537 1.20181
1.30362 -0.31963
If one tries to invert a singular matrix, MATLAB gives a result (usually wrong), and issues
an error message:
> A=[1 1
> 1 1]
A =
1 1
1 1
> inv(A)
warning: inverse: matrix singular to machine precision, rcond = 0
ans =
1 1
1 0
> B=inv(A)
warning: inverse: matrix singular to machine precision, rcond = 0
B =
1 1
1 0
> B*A
ans =
2 2
1 1
Therefore, there are 6 possible orders to to program the loops, but basically, there are only
two possibilities:
clear
format long
n=20
b=randn(n).*10.^(16*randn(n));
c=randn(n).*10.^(16*randn(n));
5.4. HOW MANY MATRIX PRODUCTS ARE POSSIBLE 79
tic
% Version 1: Dot-Product
a1=zeros(n);
for j=1:n
for i=1:n
for k=1:n
a1(i,j)=a1(i,j)+b(i,k)*c(k,j);
end
end
end
toc
tic
% Equivalent to
a2=zeros(n);
for j=1:n
for i=1:n
a2(i,j)=b(i,:)*c(:,j);
end
end
toc
tic
% Version 2: Daxpy-Product
a3=zeros(n);
for j=1:n
for k=1:n
for i=1:n
a3(i,j)=a3(i,j)+b(i,k)*c(k,j);
end
end
end
toc
tic
% equivalent to
a4=zeros(n);
for j=1:n
for k=1:n
a4(:,j)=a4(:,j)+c(k,j)*b(:,k);
end
end
toc
return
80 CHAPTER 5. LINEAR ALGEBRA
We have also included the tic and toc command to profile the time used for a matrix
multiplication. It can be seen that MATLAB performs much faster if the inner loop is
evaluated using the :-notation.
The first version of the matrix-matrix multiplication has a inner vector product as a Kernel,
the inner part of the routine. The second version of the Matrix-multiplication has a kernel
which can be written as
(y = a(x + (y ,
an operation where the left side in words is A X Plus Y, for which often the acronym
SAXPY or DAXPY (S for single, D for double precision) is in use.
It turns out that both operations are numerically equivalent, and both need 2l3 floating point
operations (multiplications and additions).
It is common to give the speed of computers by how many Floating Point Operations Per
Second (Flops) then can perform. Modern PCs are in the range of a few hundreds MFlops,
Workstations are nowadays in the GFlops-Range, and the Earth Simulator, a Supercomputer
near Yokohama, can to about 4 TeraFLOPS.
Using programs to test the speed of Computers is called benchmarking.
Ax = b,
and transform the matrix and the right-hand-side vector b via elementary row- and column-
operations (subtracting multiples of some rows from other rows) to upper triangular form,
where all the elements below the diagonal are 0:
-
a1,1 a1,2 a1,k - b1
-
-
0 a2,2 a2,k - b2
.. .. .. - .
.. - .
. . . . - .
-
0 0 ak,k - bk
5.5. MATRIX INVERSES AGAIN 81
xk = b/ak,k
. /
xk1 = bk1 ak1,k xk /ak1,k1
. /
xk2 = bk2 ak2,k1 xk1 ak2,k xk /ak2,k2
k
1 (
xi = bi ai,j xj
ai,i j=i+1
This scheme of eliminating elements so that a triangular coefficient matrix survives for which
the unknowns can be computed in a trivial way is called Gaussian elimination. As an example
9 3 4 x1 7
4 3 4 x2 = 8
1 1 1 x3 3
in augmented form -
9 3 4 -- 7
-
4 3 4 - 8 .
-
1 1 1 - 3
We start by interchanging the first and the last row
-
1 1 1 -- 3
-
4 3 4 - 8 .
-
9 3 4 - 7
Next, we subtract 4times the first row from the second row and nine times the first row from
the last row: -
1 1 1 -- 3
-
0 1 0 - 4 .
-
0 6 5 - 20
Finally, we add -6 times the second row to the last row, and obtain the triangular system
-
1 1 1 - 3
-
-
0 1 0 - 4 ,
-
0 0 5 - 4
from which we can compute the unknowns successively as x= 4/5, x= 4 and x= 1/5.
Ax = b
LU = A
so that the solution can again be computed in a trivial way. In MATLAB, the LU-factorization
can be computed via the lu command, for example as
> a
a =
> [l,u]=lu(a)
l =
1.000000000000000 0.000000000000000 0.000000000000000
-0.044275098518011 1.000000000000000 0.000000000000000
0.556903499136249 0.577325122175704 1.000000000000000
u =
-1.068845629692078 0.583466410636902 -0.017438033595681
0.000000000000000 -0.669700958047086 -0.289110165605131
0.000000000000000 0.000000000000000 -0.929460456605607
The solution of a linear system Ax = b can be computed in MATLAB with the slash-
command, which is not only the division from the left for scalars,
5/4
ans = 1.25000000000000
> 5\4
ans = 0.800000000000000
but also for matrices. The Algebraic meaning is
A\B = A1 B, A/B = A B 1 ,
and for matrices (Remember that MATLAB means MATRIX Laboratory), this is not nec-
essarily the same. The solution for Ax = b can be obtained by formally dividing through A
from the left,
Ax = b A\Ax = A\b x = A\b.
The solution of the triangular system, including with testing whether Ax is really equal b,
can then be programmed in the following way:
5.5. MATRIX INVERSES AGAIN 83
> A=rand(3)
A =
> b=rand(3,1)
b =
0.81931
0.61835
0.14195
> x=A\b
x =
-0.74296
-2.36996
2.67169
> A*x
ans =
0.81931
0.61835
0.14195
In LINPACK, the elimination step is called factoring (because the LU-decomposition produces
two factors, L and U), and the Double precision GEneral matrix FActoring is therefore called
DGEFA. The solution/substitution of the system is DGESL , SL for solution.
There exists also a LINPACK-benchmark, which sets up matrices in a well-defined way and
computes the matrix inverses, then computes the number of floating point operations and
the time and then computes the Flop-rate. In this way, the speed of computers has been
evaluated for decades.2
where the columns of the identity matrix E have the role of clear
the right-hand side b and the columns of A1 are the un- format compact
knowns x. It is now clear why it is advantageous to use the n=150
LU-decomposition, as it allows the simultaneous solution of A=randn(n);
systems with arbitrary many columns on the right hand side. b=randn(n,1);
After the factoring is completed, the solution step for com-
puting the inverse of a ll matrix takes l times as many steps flops(0)
as the solution of the system for a single column right-hand- x1=A\b;
side. We can see this by using the flops-command which flops
was available with old versions of MATLAB (before version flops(0)
6), which measured the number of floating point iterations, x2=inv(A)*b;
and the example program on the right. flops
For 150 150 matrices, the number of FLOPS necessary for return
the solution of the linear system is 2419042, for the computa-
tion of the matrix inverse it is 6907967. This means that the >>
number of FLOPS required for the matrix inversion is about n =
three times as much than the solution of the linear system. 150
For the solution of the linear system, the highest computa- ans =
tional cost is actually the factoring, not the backward substi- 2419042
tution, and we can see that the backward substitution takes ans =
twice as much operations as the factoring itself. 6907967
A=
1.000000000000000 1.000000000000000
0.999999990000000 1.000000000000000
the inverse
99999999.4975241 -99999999.4975241
-99999998.4975241 99999999.4975241
5.6. EIGENVALUES 85
5.6 Eigenvalues
The eigenvalues can be computed in MATLAB via the eig command. For a random matrix
A, we obtain the eigenvalues as
>> A=randn(2)
A =
0.5181 -1.2274
0.8397 0.1920
>> eig(A)
ans =
0.3551 + 1.0020i
0.3551 - 1.0020i
86 CHAPTER 5. LINEAR ALGEBRA
so one can see that the eigenvalues of a real square matrix are in general not real. For a
symmetric matrix, we see that
>> A=A+A
A =
1.0363 -0.3876
-0.3876 0.3840
>> eig(A)
ans =
1.2167
0.2035
we obtain real eigenvalues. Formally, the eigenvalues i of a matrix A are often introduced
as the roots of the characteristic polygyon of A,
1 0 0
0 1 0
det(A E) = 0, E=
.. .. . . .. .
. . . .
0 0 1
)) * ) **
a 0 1 0
det = (a )(b ) = 0,
0 b 0 1
we see that the solutions for are exactly a and b. In other words, the eigenvalues of a
diagonal matrix are the diagonal matrix entries themselves. As we have seen above, the
eigenvalues of a real symmetric diagonal matrix are real. If we look at the characteristic
polynomial, we see that for an upper triangular matrix, the off-diagonal elements vanish in
the characteristic polynomial, so also for a trigonal matrix the eigenvalues are exactly the
diagonal elements:
A =
1.0363 -0.3876
0 0.3840
>> eig(A)
ans =
1.0363
0.3840
5.6. EIGENVALUES 87
>> [u,l]=eig(A)
u =
0.85065080835204 0.52573111211913
-0.52573111211913 0.85065080835204
l =
-0.23606797749979 0
0 4.23606797749979
88 CHAPTER 5. LINEAR ALGEBRA
(the eigenvectors l are then not outputted as a vector, but as a diagonal matrix). In our
above example, where we iteratively multiplied the vector v with the matrix A, the end result
for v is
>> v
v =
0.52573111213781
0.85065080834049
which is the right column of u, and therefore the eigenvector to the larger eigenvalue 2 =
4.23606797749979. In other words, out iterative multiplication of a vector to a matrix is a way
to find the largest eigenvalue and the eigenvector corresponding to this largest eigenvector,
and in the literature, this method is often called the power method, because it corresponds
to multiplying a power of A onto v :
An v > (umax
The matrix u which contains the eigenvectors is at the same time the transformation which
transforms A onto diagonal form, so that
) *
% 0.23606797749979 0
u Au =
0 4.23606797749979
Actually, this is not the case. The numerical algorithm for the computation of the eigenvalues,
which will not be elaborates here, makes use of all of the l l matrix entries, whereas the
solution of the characteristic polynomial only makes use of l coefficients computed from the
l l matrix entries, so again we loose significant information as in the example for the
intersection computation of ellipses via the fourth order polynomial.
On the contrary, instead of computing eigenvalues via roots it is usually feasible to compute
the roots of a polynomial by rewriting it as the corresponding eigenvalue problem. First let
us divide the polynomial P (x)
Then one can set up the so-called companion matrix CP for P (x) e.g. as
0 1 0 0
0 0 1 0
.. .. .. .. ..
CP =
. . . . .
,
0 0 0 1
a0 a1 a2 ak1
and the eigenvalues of CP are the roots of the polynomial P (x). For example, the polynomial
P (x) = x3 2x2 5t + 6 = 0
C =
0 1 0
0 0 1
-6 5 2
> eig(C)
ans =
3.0000
1.0000
-2.0000
Now let us solve the Lorentz-Model with constant stepsize using the Euler method and let
us plot the Eigenvalues of the matrix. We know already that the Euler-method is bad,
so our solution will be inaccurate, but it will be much more interesting to implement the
Euler-method in two different ways and see how the two solutions diverge from each other.
t_max=1.3
dt=0.01 % diverges with this timestep: dt=0.011;
ndt=round(t_max/dt);
x=zeros(ndt,1);
mateig=zeros(ndt,1);
x(1)=1;
y=x; z=x;
bild=0;
k=[1 1 1];
k(:,2:ndt)=zeros(3,ndt-1);
prop=[ -sigma sigma 0,
0 -1 0,
0 0 -b];
mat_eig(i+1)=max(abs(eig(prop*dt)));
end
subplot(4,1,1)
plot3(x(1:ndt),y(1:ndt),z(1:ndt));
subplot(4,1,2)
plot3(k(1,1:ndt),k(2,1:ndt),k(3,1:ndt));
subplot(4,1,3)
plot3(k(1,1:ndt)-x(1:ndt),...
k(2,1:ndt)-y(1:ndt),...
k(3,1:ndt)-z(1:ndt));
subplot(4,1,4)
plot(mat_eig)
This is the first surprise, that two implementations of the Euler-method dont give numerical
identical results, and the difference increases if we increase the maximal time. The next
92 CHAPTER 5. LINEAR ALGEBRA
surprise comes when we increase the timestep from dt=0.01 to dt=0.013. We can see that
then the Solution and the maximal eigenvalues start to diverge, and if the maximal time is
taken longer, the program even crashes because it reaches infinity. Here we have found the
property of the solution of differential equations, that the eigenvalues of the corresponding
matrix times the time step may not become larger than 1, or the solution does not converge
any more.
The eigenvalue-spectrum obtained from the Euler-method is also representative for the eigen-
values which we would obtain from higher-order Methods like Runge-Kutta, which are itself
only a sophisticated concatenation of Euler-steps with different step-size.
For ordinary differential equations there is a closed theory about which solution method
should be applied in which case. In the case of ordinary differential equations, the total
differential imposes additional constraints on the solution so that the numerical equations can
be satisfied more easily. In contrast, Partial differential equations are much more difficult to
treat numerically, because the boundary conditions impose certain constraints on the solution
method, so that in the case of nonlinear equations, the optimal choice for a solution strategies
is far from obvious.
Rewriting the equation with the second derivative of the position x, we get
F (x, x, t) = mx,
which due to the second derivation of x is called a ordinary differential equation of second
order. In general, it can be shown that n ordinary differential equations of order m can be
rewritten into n m coupled differential equations of first order. For the case of Newtons
equations of motion, this can be done by introducing the velocity v as the derivative of x so
that
F (x, x, t) = mv,
v = x.
Because standard texts in numerical analysis prefer to deal with first order differential equa-
tions, is is importand to understand the latter form.
94 CHAPTER 6. ORDINARY DIFFERENTIAL EQUATIONS
2Dx 02 = x.
The solution of this equation with the Damping term 2D and the frequency of the undamped
oscillation 0 is
Though there are solution schemes to solve second order equations directly, it is usually
simpler to solve equation of second order by reducing them to a system of coupled first order
equations. For our problem, we introduce the velocity v and its time derivative v = a, so this
leads to the system of first order equations
v = 2Dx 02 x,
x = v.
8v9
It is customary in the mathematical community to introduce the vector y = x (without
vector symbol( ) and to rewrite the equation as
d
y = F (y, t),
dt
where F (y, t) becomes a vector-valued function with the time t and the vector y as argument.
dy y(t + t) y(t)
dt t
so that the first order approximation in the solution of one time-step ti with value y(ti ) and
the Function value Fi = F (yi , ti ) to the next is
y(ti + t) y(ti ) + Fi t,
clear
format compact
y(1,1)=v0
y(1,2)=x0
t(1)=t0
n=1
while (t(n)<t_max)
% velocity
y(n+1,1)=y(n,1)+dt*(-omega0^2 * y(n,2)- 2*D*y(n,1) );
% position
y(n+1,2)=y(n,2)+dt*y(n,1);
% current time
t(n+1)=t(n)+dt;
n=n+1;
end
% exact solution
omega_d=sqrt(omega0^2-D^2);
y_ex=(x0*exp(-D*t).*cos(omega_d*t));
subplot(2,2,1)
plot(t,y(:,2),-,t,y_ex,:)
legend(Euler, dt=0.1,Exact)
axis tight
Euler
Computed 1
solution Euler, dt=0.1
Exact
Exact 0.5
y0 solution
0.5
t0 t 0+dt
Strategy of the Euler method: Eval- 0 10 20 30
uate the value at the right side of the Result of the Euler method for the damped harmonic
interval via the starting value of the oscillator: The period and the amplitude are wrong.
left side and the tangent of the left
side of the interval.
By construction, the Euler-method is a first-order method, because we only retained the
terms proportional to t in the expansion. Of course, if we plot the absolute error for our
96 CHAPTER 6. ORDINARY DIFFERENTIAL EQUATIONS
exponentially vanishing solution, the error also vanishes exponentially, therefore we dont
draw the error here.
3
10
4
10
5
10
6
10
7
10
0 5 10 15 20 25 30
For some ordinary differential equations, we will increase also the rounding error (from adding
each new timestep) for integrating out the same time interval, so in the limit of dt 0, we
wont obtain the correct result with the Euler Method. Therefore there is one thing about the
Euler Method which should be kept in mind: NEVER USE THE EULER METHOD
IN A SERIOUS APPLICATION1
Except for stochastic differntial equations, where the stochastic noise destroys the systematic
1
Exact 0.5
y0
solution
1. The idea of first advancing the time integration in a predictor step, then to modify
the result in a corrector step, is the basis of the so-called predictor-corrector methods.
98 CHAPTER 6. ORDINARY DIFFERENTIAL EQUATIONS
2. The idea to use more than a single value of F (y, t) within a single time intervall dt is
the basis of the Runge-Kutta-type methods.
y(1,1)=v0
y(1,2)=x0
t(1)=t0
n=1
while (t(n)<t_max)
% halfstep
% velocity
y_pred1=y(n,1)+dt*(-omega0^2 * y(n,2)- 2*D*y(n,1) );
% position
y_pred2=y(n,2)+dt*y(n,1);
% velocity
y(n+1,1)=y(n,1)+dt*(-omega0^2*.5*(y(n,2)+y_pred2)- 2*D*.5*(y(n,1)+y_pred1));
% position
y(n+1,2)=y(n,2)+dt*.5*(y(n,1)+y_pred1);
% current time
t(n+1)=t(n)+dt;
n=n+1;
end
% exact solution
omega_d=sqrt(omega0^2-D^2);
y_ex=(x0*exp(-D*t).*cos(omega_d*t));
subplot(2,2,1)
plot(t,y(:,2),-,t,y_ex,:)
legend(Euler, dt=0.1,Exact)
axis tight
6.2. THE EULER METHOD 99
Modified Euler
Predicted 1
Solutions Heun, dt=0.1
Exact Exact
solu
Computed tion
0.5
corrected
solution
0
t0 t 0+dt 0.5
Strategy of Heuns method: Evalu- 0 10 20 30
ate the values and tangents at the Result of Heuns method for the damped harmonic os-
left and right side of the interval as cillator: The period and the amplitude are computed
predicted values and take the av- much more accurately than for the Euler method.
erage as corrected value.
6.2.4 Stability
Up to now we have focused in our investigations purely from the point of accuracy, in the
meaning that a numerical solution we get will have some finite error in comarison to the
exact solution of the problem, but will be more or less the same shape. Actually, a more
fundamental problem in numerical analysis is stability, loosely speaking, the mathematical
problem whether a numerical solution has the same shape of the exact solution at all. If
we look at the following first order differential equation
d
y(t) = 1 ty 1/3 ,
dt
which for y(0) = 1 is strictly real in the interval [0,5]. The numerical solution overshoots
for too large time-steps, as is shown in the following graphs for the numerical solution with
the Euler method, so that y(t) becomes negative, and therefore in MATLAB delivers the
complex roots of the negative values of y(t). The result of the numerical integration for too
large time-steps shows a total different shape than the exact solution, and is therefore called
unstable. It is therefore a primary aim to choose numerical methods and time-steps so that
the solution is stable, accuracy is only a secondary concern.
Regrettably, some methods which give very high accuracy for some problems give very small
stability with other problems. It is often advisable to check the stability of a method by
using different time steps to see if the numerical solution changes, or not. If a small change
of the time-step leads to only a small change in the solution, the solution is stable. The
mathematical definition of stability is that a solution undergoes only a small change for a
small change of the initial conditions, and in this respect, the time-step represents something
like an initial condition.
100 CHAPTER 6. ORDINARY DIFFERENTIAL EQUATIONS
1.5
0
0.2
1
0.4
0.5
0.6
y
y
0 0.8
1
0.5
1.2
0 1 2 3 4 5 0 1 2 3 4 5
t t
% velocity
y(n+1,1)=y(n,1)+dt*(-omega0^2 * y(n,2)- 2*D*y(n,1) );
% position
y(n+1,2)=y(n,2)+dt*y(n,1);
% velocity
y_pred1=y(n,1)+dt*(-omega0^2 * y(n,2)- 2*D*y(n,1) );
% position
y_pred2=y(n,2)+dt*y(n,1);
6.3. PROGRAMMING ORDINARY DIFFERENTIAL EQUATIONS 101
% velocity
y(n+1,1)=y(n,1)+dt*(-omega0^2*.5*(y(n,2)+y_pred2)- 2*D*.5*(y(n,1)+y_pred1));
% position
y(n+1,2)=y(n,2)+dt*.5*(y(n,1)+y_pred1);
and this is certainly not readable any more. An insight is, that the forcelaw for the time
integration of the spring was inputted twice, once for the corrector step, and once for the
predictor step, so it would be a good idea to input the force law evaluation as a MATLAB
function.
function [output_arg1,output_arg2]=function_name(input_arg1,input_arg2);
% Comment following the function declaration; This comment will be displayed
% when you type
% "help function_name"
% from the MATLAB prompt.
global a % a global variable , which must be declared as global
% somewhere else and initialized
output_arg1=input_arg1+input_arg_2*a
output_arg2=input_arg1*input_arg_2
[out1,out2]=function_name(25,24)
[out1,out2]=function_name(25)
MATLAB terminates with an error message. MATLAB functions cannot override input
arguments, in the following example,
102 CHAPTER 6. ORDINARY DIFFERENTIAL EQUATIONS
function [output_arg1,output_arg2]=function_name(input_arg1,input_arg2);
output_arg1=input_arg1+input_arg_2
output_arg2=input_arg1*input_arg_2
input_arg1=15
return % end of function
the line
input_arg1=15
does not have any meaning, because only output-arguments (in constructor brackets [])
are recopied to the calling program. If not all output- or input-arguments are assigned,
MATLAB terminates with an error message. Overloading, the use of a variable number of
input-arguments is possible, and in this case one has to ask the number of the input arguments
with the MATLAB function nargin and the number of output arguments nargout. We will
not treat overloading here, but is is easy to find examples for overloaded methods by looking
at some MATLAB functions which exist in MATLAB-code (most MATLAB functions are
written in MATLAB) in the toolbox directory.
which histo
will display the directory in which the toolbox-MATLAB-function histo can be found, and it
is possible to load the function into the editor and view the usage of the operator overloading.
with the time t and the generalized coordinates y as input and the first order derivative of
the generalized coordinates dxdy as output. We can rewrite Heuns method
% velocity
y_pred1=y(n,1)+dt*(-omega0^2 * y(n,2)- 2*D*y(n,1) );
% position
y_pred2=y(n,2)+dt*y(n,1);
% velocity
y(n+1,1)=y(n,1)+dt*(-omega0^2*.5*(y(n,2)+y_pred2)- 2*D*.5*(y(n,1)+y_pred1));
% position
y(n+1,2)=y(n,2)+dt*.5*(y(n,1)+y_pred1);
using the MATLAB-function (we will retain the the time as function-argument though there
is no explicit time-dependence in our force law)
6.3. PROGRAMMING ORDINARY DIFFERENTIAL EQUATIONS 103
as
y(1,1)=v0
y(1,2)=x0
t(1)=t0
n=1
while (t(n)<t_max)
% predicted value
y_pred=y(n,:)+dt*f(t(n),y(n))
-omega0^2 * y(n,2)- 2*D*y(n,1) );
% corrected value
y(n+1,:)=y(n,:)+.5*dt*(f(t(n),y(n))+f(t(n),y_pred)
t(n+1)=t(n)+dt;
n=n+1;
end
% exact solution
omega_d=sqrt(omega0^2-D^2);
y_ex=(x0*exp(-D*t).*cos(omega_d*t));
subplot(2,2,1)
plot(t,y(:,2),-,t,y_ex,:)
legend(Euler, dt=0.1,Exact)
axis tight
which is much more readable than the original. This also allows us to see how many function
evaluations are necessary to a given timestep: whereas for the original Euler method we used
one function evaluation per timestep, for Heuns method we need two function evalution
per timestep. Therefore, Heuns method is not only more accurate, but also more costful.
Higher order methods will need even more function evaluations, therefore we rewrite Heuns
method as a function along with the computation of the number of steps, and the control
of the reasonableness of the input parameters. Moreover, the evalf command of MATLAB
is used so that the MATLAB-file which contains the differential equation can be passed as
104 CHAPTER 6. ORDINARY DIFFERENTIAL EQUATIONS
an argument. Moreover, we have initialized tout and yout so that we dont loose time by
allocating new memory space when a new element is added to these vectors in each timestep:
if (size(yinit,1)==1)
yinit=yinit;
end
yout(:,1)=yinit;
y=yinit;
tout(1)=tstart;
n=1;
6.4. THE CLASSICAL RUNGE-KUTTA FORMULA 105
for k=1:nsteps
F1 = feval(f,y,tout(n));
t_full = tout(n) + dt;
ytemp = y + dt*F1;
F2 = feval(f,ytemp,t_full);
n=n+1;
y= y + .5*dt*(F1 + F2);
yout(:,n)=y;
tout(n)=t_full;
end
return
This program can then be called from a driver routine (a routine which does nothing else
than call a specific function) in such a way:
clear ;
format compact
global D, D=.2 ,
global omega0, omega0=1 % Damping and Force constant
omega_d=sqrt(omega0^2-D^2);
dt=0.1, t0=0, t_max=20 % time-step, start-time, end time
x0=1 %Initial conditions
v0=-D*exp(-D*t0)*cos(omega_d*t0)-omega_d*exp(-D*t0)*sin(omega_d*t0);
[t,y]=heun([v0;x0],t0,t_max,dt,harm_osc);
% exact solution
y_ex=(x0*exp(-D*t).*cos(omega_d*t));
subplot(2,2,1)
plot(t,(y(2,:)-y_ex)./y_ex,:)
legend(Heun, dt=0.1,Exact)
axis tight
if (size(yinit,1)==1)
yinit=yinit;
end
yout=zeros(length(yinit),nsteps);
tout=zeros(1,nsteps);
yout(:,1)=yinit;
y=yinit;
tout(1)=tstart; %40
6.4. THE CLASSICAL RUNGE-KUTTA FORMULA 107
n=1;
half_dt = 0.5*dt;
dt_6=dt/6;
for k=1:nsteps
% y
F1 = feval(f,y,tout(n));
t_half = tout(n) + half_dt;
ytemp = y + half_dt*F1;
F2 = feval(f,ytemp,t_half);
ytemp = y + half_dt*F2;
F3 = feval(f,ytemp,t_half);
t_full = tout(n) + dt;
ytemp = y + dt*F3;
F4 = feval(f,ytemp,t_full);
y = y + dt_6*(F1 + F4 + 2.*(F2+F3));
n=n+1;
yout(:,n) = y;
tout(n)=t_full;
end
return;
The same driver as for Heuns program above can be used, just with the line
[t,y]=heun([v0;x0],t0,t_max,dt,harm_osc);
replaced by
[t,y]=rk4_class([v0;x0],t0,t_max,dt,harm_osc);
Actually, this solution is not the exact solution for the initial value problem with v0 = 0, but
for the initial value problem with
The improvement for the Runge-Kutta method compared to Heuns method would not been
visible because the initial value for the integration is so far off that the numerical solution
is quite wrong. Whenever one computes numerical solutions to compare them with exact
solutions, one should be sure that they are the solutions for the identical problem.
108 CHAPTER 6. ORDINARY DIFFERENTIAL EQUATIONS
1
correct initial condition
exact solution
incorrect initial condition
0.5
0.5
0 1 2 3 4 5 6 7 8 9
6.4.3 Accuracy
Now that we have outlined several algorithms with different order (different truncation error
with respect to the Taylor expansion of dt), we should compare the above methods with
respect to their cost and accuracy. As has been mentioned already above, the cost of a
Runge-Kutta step is four function evaluations per timestep, in contrast to a single function
evaluation for Euler and two function evaluations for Heun. Let us compare the accuracy
of the three methods, once for the absolute accuracy ycomputed yexact , and once for
the relative accuracy (ycomputed yexact )/yexact . For the Euler method, we obtain an
exponentially decaying error due to the fact that the solution decays exponentially, and the
relative error increases exponentially. The absolute error starts at the order of 102 , which
is the square of the order of the timestep (dt = 0.1)2 , as was expected.
2
10 2
10
0
10
4
10
2
10
0 10 20 30 40 0 10 20 30 40
For Heuns method, the absolute error starts at the order of 103 , which is the order of the
timestep (dt = 0.1)2 , which was also expected. The relative error is constant for a certain
time, and then increases exponentially.
6.4. THE CLASSICAL RUNGE-KUTTA FORMULA 109
0
4 10
10
2
10
6
10
4
10
0 10 20 30 40 50 0 10 20 30 40 50
For the classical Method by Runge and Kutta, the absolute error starts at the order of
106 , which is by sheer luck one order more accurate than the fifth power of the timestep
(dt = 0.1)4 , which as was expected as the absolute error. Again, as in Heuns method, the
relative error is constant for a certain time, and then diverges exponentially.
Class. RungeKutta, dt=0.1, absolute error Class. RungeKutta, dt=0.1, relative error
2
10
6
10
4
10
8
10
6
10
10
10
10 20 30 40 50 0 10 20 30 40 50
The last investigations have reviewed some old concepts and shown some important new
concepts for error analysis:
1. The order of the Euler, Heun and Runge-Kutta method are 1,2 and 4 respectively,
therefore the absolute error at the beginning of the integration process is of the order
+1 of the timestep dt2 , dt3 and dt4 for an initial amplitude of the order of 1.
2. The local error is the error for a single timestep, and the local absolute error at the
beginning of the integration is the same as the local relative error.
3. The behavior of the relative error is a bit more complicated, as can be seen, the relative
error increases during the integration process, but not monotonically. The error at
the end of the integration process is called the global error, and it can be seen that
the global relative error is much larger than the local absolute error. Whenever one
performs a time-integration of ordinary differential equations, one should know which
is the actually permitted error, and this is determined by the physical problem.
110 CHAPTER 6. ORDINARY DIFFERENTIAL EQUATIONS
y_ex=(x0*exp(-D*t).*cos(omega_d*t));
subplot(2,2,1)
semilogy(t,abs(y(:,2)-y_ex),:)
title(ode23, dt=0.1, absolute error)
axis tight
subplot(2,2,2)
semilogy(t,abs((y(:,2)-y_ex)./y_ex),:)
title(ode23, dt=0.1, relative error)
axis tight
and the file for the differential equation harm_osc2
function dydt=harm_osc2(t,y)
format compact
global D
global omega0
% d velocity/dt
dydt(1,1)=-omega0^2 * y(2)- 2*D*y(1);
% d position/dt
dydt(2,1)=y(1);
return
This file is different from our previously used harm_osc.m file, as the order of the input
parameters t and y are exchanged. The following solution for our damped harmonic oscillator
has been computed:
6.5. ADAPTIVE STEPSIZE CONTROL 111
0.5
0.5
1
0 5 10 15 20 25 30
0.3
0.25
0.2
0.15
0.1
0.05
5 10 15 20 25 30
Above the solution was plotted, below we see the timestep. The time-adaption algorithm
changed the timestep depending on whether the oscillation was at a relative minimum or in
a straight motion. The accuracy of the time integration can be set by the input parameters
of the ode23 function, see help ode23. The accuracy diagram for the default accuracy is the
following:
4
10
6
10 5
10
8
10
0 10 20 30 40 50 0 10 20 30 40 50
The same plots can be made for the ode45 algorithm, which gives the following accuracy
diagram
112 CHAPTER 6. ORDINARY DIFFERENTIAL EQUATIONS
5
10
10
10
10
10
0 10 20 30 40 50 0 10 20 30 40 50
0.5
0.5
1
0 5 10 15 20 25 30 35 40 45 50
0.3
0.25
0.2
0.15
0.1
0.05
5 10 15 20 25 30 35 40 45 50
and it can be seen that MATLAB starts with a very small timestep and then increases the
timestep significantly to reach the default accuracy of the time integrator. The advantage of
these adaptive methods for reasonableordinary differential equations are:
One can specify the relative and absolute errors on input, and obtain a solution which
is guaranteed to be inside the specified errors.
The performance is optimal, i.e. for the given method there will be no solution which
will be computable with less timesteps/less computer time.
Without knowing anythin about the system, and the relation between the timestep
and the error resulting from the timestep for the given set of equations, one obtains a
6.5. ADAPTIVE STEPSIZE CONTROL 113
correct solution.
Therefore, it is allways a good idea to start an investigation of a problem with the above
method. But there are some caveats, for the case of unreasonable differential equations,
and these are systems which are often encountered in daily life, which are treated in the next
subsection.
FCoul = sign(v),
it may happen that the solution for the equation is not smooth enough, so that even a
reduction of the time step does not lead to the same solution for different orders of the
function evaluation of the ODE-solver. In that case, the solver stops, or it continues
only with very small time steps so that the solution is not finished within finite time.
Bouncing balls: If an object flies in free motion in a gravitational field, its trajectories
are parabolic. If it hits a target, the motion is suddenly reversed. For the numerical
time integration, the free motion allows a very large timestep, whereas in the moment
where the target is hit, the timestep has to be drastically reduced. It is possible that
numerical solvers with adaptive stepsize control are not able to reduce the step-size
appropriately, and in the simulation the impacting particle may not be reflected, but
may fly through the target. The risk for such a mishap is higher for higher order
solvers, e.g. for 8th order.
Because the adaptive stepsize control needs some information about how the timestep must
be reduced, MATLAB allows to specify the way in which the timestep should be changed via
the options-command.
Obviously, this forcelaw has a jump at v = 0, and we know from physics that the friction
FCoul can take any value from fn to fn for v = 0. Actually, there is a method to solve
114 CHAPTER 6. ORDINARY DIFFERENTIAL EQUATIONS
such an undetermined Problem in a numerically exact way2 , but we will just try to use
the adaptive stepsize control in the
hope that we get a reasonable solu-
tion by decreasing the stepsize. Using
the ordinary differential equation clear
format compact
y = y 2Dsign (y) , global D
D=0.1
0
[t2,y2]=ode23(lin_coul_osc,[0 tmax],[1 0]);
t2_plot=linspace(0,max(t2),2*length(t2));
0.5 y2_plot=interp1(t2,y2,t2_plot,spline);
t2_plot=[t2_plot tmax];
0 2 4 6
y2_plot=[y2_plot ; [0,1]];
subplot(2,2,1)
0
10 plot(t1_plot,y1_plot(:,1),...
t2_plot,y2_plot(:,1),:)
axis tight
legend(D=0.05,D=0.1)
dt
title(d/dx^2 x + 2 D *sign(d/dx x) + x = 0)
subplot(2,2,2)
D=0.05 semilogy(t1(2:end),diff(t1),t2(2:end),diff(t2),:)
D=0.1
5 ylabel(dt)
10
0 2 4 6 xlabel(timestep)
timestep
legend(D=0.05,D=0.1)
It can be seen that for D = 0.05, axis([0 7 1e-5 1])
as long the oscillation resembles the
oscillation of the damped harmonic return
oscillator, the timestep is compara-
ble to the one one expects for the
harmonic oscillator. For D = 0.1, function dydt = f(t,y)
where the 0-amplitude is reached, the % lin_coul_osc.m
timestep goes down for several orders global D
ob magnitude to guarantee the van- dydt = [y(2); -y(1)-2*D*sign(y(2))];
ishing of the amplitude, and the inte- return
gration is slowed down several orders
of magnitude in comparison with the
damped harmonic oscillator.
2
Hairer et al, Solving Ordinary Diff. Equations I, Springer
6.6. STIFF DIFFERENTIAL EQUATIONS 115
it may happen that the solver reduces the timestep to numerically 0 and the solution process
terminates with an error message.
vandermodeODE, ode23
2
2 10
1.9995
3
10
timestep
1.999
1.9985 4
10
1.998
5
1.9975 10
0 0.5 1 1.5 0 200 400 600 800 1000
vandermodeODE, ode23s
0
2 10
1.9995
1.999
timestep
1.9985
1.998
5
1.9975 10
0 0.5 1 1.5 0 10 20 30
F1 = feval(f,yout(2,1));
y_mdt=y(2,1)-F1*dt2;
F2 = feval(f,yout(2,1));
yout(2,2)=2*y(2,1)-y_mdt-F2*dt2;
tout(2)=dt;
for k=2:nsteps-1
F1 = feval(f,yout(2,k));
t_full = tout(k) + dt;
yout(2,k+1)=2*yout(2,k)-yout(2,k-1)+F1*dt2;
tout(k+1)=t_full;
end
return;
6.7.2 Precision
We compare the numerical solution for the harmonic oscillator without damping
function out=verlet_lin_osc(in)
% verlet-lin-osc
% linear oscillator with frequency omega
% for use with verlet-type integrator
global omega0
out=-omega0^2*in;
return
clear ;
format compact
global D, D=0 ,
global omega0, omega0=1 % Damping and Force constant
omega_d=omega0;
tic
[t,y]=verlet([v0 x0],t0,t_max,dt,verlet_lin_osc);
y_ex=(x0*cos(omega_d*t*1)); % exact solution
subplot(3,1,1)
semilogy(t,abs((y(2,:)-y_ex)))
title(Error for verlet)
axis tight
subplot(3,1,2)
semilogy(rkt2,abs(rky2(:,2)-y_rk2))
title(Error for ode23)
axis tight
subplot(3,1,3)
semilogy(rkt4,abs(rky4(:,2)-y_rk4))
title(Error for ode45)
axis tight
return
Error for verlet
4
10
6
10
5
10
6
10
8
10
0 50 100 150 200 250 300
It can be seen that the Verlet-Algorithm has a larger error for the initial timesteps, due to
our choice of the earliest timestep in first order. Nevertheless, the error bound is constant
over the whole integration interval. The remarkable property of the Verlet-method is that its
120 CHAPTER 6. ORDINARY DIFFERENTIAL EQUATIONS
6.7.3 Velocities
The verlet methods only makes use of the coordinates, not of the velocities. Because the
velocities dont occur in the equations, they can only be estimated using the relation
so that the velocities of a timestep are only known after the completion of the following
timestep. Therefore, it is not possible to incorporate velocity-dependent interactions in the
verlet-scheme.
Often, it is not clear how large a timestep should be chosen for a given dissipative problem.
There are some people who advocate the following procedure: Run the problem without dis-
sipation and fix the timestep so that the change in energy during the simulation is negligible,
than use this timestep for the dissipative system. Our exploration of the symplectic integrator
shows that such a strategy is meaningless. The non-dissipative systems are a totally differ-
ent class than the dissipative systems, even the best non-symplective integrators cannot
compete with quite mediocre symplectic integrators. Contrarywise, symplectic integrators
cannot be used with dissipation, in the above Verlet-Stormer integration there is no possibil-
ity to implement velocity-dependent forces, because at the time the forces must be computed,
the velocity is not yet known. The same is true for modifications like the velocity-Verlet
scheme, where one knows the velocity half a timestep too late, using the velocity from the
previous timestep introduces errors which are of the order of Verlet-scheme itself.