You are on page 1of 70

Mathematical Modeling

and Simulation
Nguyen V.M. Man, Ph.D.
Applied Statistician
September 6, 2010
Contact: mnguyen@cse.hcmut.edu.vn
or mannvm@uef.edu.vn
ii
Contents
0.1 Mathematical modeling and simulation Why? . . . . . . . . . 6
0.2 Mathematical modeling and simulation How? . . . . . . . . . 6
0.3 Cautions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
0.4 Typical applications . . . . . . . . . . . . . . . . . . . . . . . 7
0.5 Computing Software . . . . . . . . . . . . . . . . . . . . . . . 7
1 Dynamic Systems 9
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2 Discrete Dynamic Systems- a case study . . . . . . . . . . . . 9
1.3 Continuous Dynamic Systems . . . . . . . . . . . . . . . . . . 14
2 Stochastic techniques 17
2.1 Generating functions . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Convolutions . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3 Compound distributions . . . . . . . . . . . . . . . . . . . . . 22
2.4 Introdductory Stochastic Processes . . . . . . . . . . . . . . . 24
2.5 Markov Chains (MC), a keytool in modeling random phenomena 26
2.6 Classication of States . . . . . . . . . . . . . . . . . . . . . . 30
2.7 Limiting probabilities and Stationary distribution of a MC . . 32
2.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3 Simulation 37
3.1 Introductory Simulation . . . . . . . . . . . . . . . . . . . . . 37
3.2 Generation of random numbers . . . . . . . . . . . . . . . . . 38
3.3 Transformation random numbers into input data . . . . . . . 39
3.4 Measurement of output data . . . . . . . . . . . . . . . . . . 41
3.5 Analyzing of output- Making meaningful inferences . . . . . . 45
3.6 Simulation languages . . . . . . . . . . . . . . . . . . . . . . . 45
3.7 Research 1: Simulation of Queueing systems with multiclass
customers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
iii
CONTENTS 1
4 Probabilistic Modeling 47
4.1 Markovian Models . . . . . . . . . . . . . . . . . . . . . . . . 47
4.1.1 Exponential distribution . . . . . . . . . . . . . . . . . 47
4.1.2 Poisson process . . . . . . . . . . . . . . . . . . . . . . 48
4.2 Bayesian Modeling in Probabilistic Nets . . . . . . . . . . . . 48
5 Statistical Modeling in Quality Engineering 49
5.1 Introduction to Statistical Modeling (SM) . . . . . . . . . . . 49
5.2 DOE in Statistical Quality Control . . . . . . . . . . . . . . . 52
5.3 How to measure factor interactions? . . . . . . . . . . . . . . 53
5.4 What should we do to bring experiments into daily life? . . . 53
6 New directions and Conclusion 57
6.1 Black-Scholes model in Finance . . . . . . . . . . . . . . . . . 57
6.2 Drug Resistance and Design of Anti-HIV drug . . . . . . . . . 57
6.3 Epidemic Modeling . . . . . . . . . . . . . . . . . . . . . . . . 57
6.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
7 Appendices 59
7.1 Appendix A: Theory of stochastic matrix for MC . . . . . . . 59
7.2 Appendix B: Spectral Theorem for Diagonalizable Matrices . 61
Keywords: linear algebra, computational algebra, graph, random processes,
simulation, combinatorics, statistics, Markov chains, discrete time processes
2 CONTENTS
Introduction
We propose a few specic mathematical modeling techniques used in various
applications such as Statistical Simulations of Service systems, Reliability
engineering, Finance engineering, Biomathematics, Pharmaceutical Science,
and Environmental Science. These are aimed for graduates in Applied
Mathematics, Computer Science and Applied Statistics at HCM City.
The aims the course
This lecture intergrates mathematical and computing techniques into
modeling and simulating of industrial and biological processes.
The structure of the course. The course consists of three parts:
Part I: Introductory specic topics
Part II: Methods and Tools
Part III: Connections and research projects

Working method.
Each group of 2 graduates is expected to carry out a small independent
research project (max 25 pages, font size 11, 1.5 line spacing, Time new
roman) from the chosen topic and submit their report at the end of the
course [week 15].
Examination The grading will be based on performance in:
* hand-ins of home-work assignments (weight 20% of the grade)
* a written report of group work on a small project topic (20%) and
three-times oral presentation about the project (20%)
* a nal exam (40%) covers basic mathematical and statistical methods
that have been introduced
Literature. Many, will know in the lecture.
3
4 CONTENTS
Prerequisites The participants will benet from a solid knowledge of
advanced calculus, discrete mathematics, basic knowledge of symbolic
computing, ordinary and partial dierential equations and programming
experience with Matlab, Scilab, R, Maple (or an equivalent language).
Part I: Introductory specic topics case studies

This course aims at teaching the principles and practice of mathematical


and statistical modeling. An important part of the course is to recognize the
essential mechanisms governing a phenomenon. These mechanisms have to
be translated into mathematics and included into the model. This activity
requires both a good understanding of the system under consideration and
good mathematical skills. Although mathematical modelling may make use
of all elds of mathematics, this course will concentrate on applications in
Pharmaceutics, Finance and Industry, and focus mostly on discrete models.
Diierential equations could be involved but at a moderate level.
Except from selfstudy and class-teaching, an important part of the course
concerns working in a small group on a specic project. The topics for
project work may dier from year to year.
Organization. The rst part of the course consists of few lectures where
the main methods are presented. The second part of the course is oered as
intensive-course weeks, fully devoted to assigned projects. Here the
graduates work in groups of 2 students. In most cases a computer program
for simulating, investigating or computing a certain physical phenomenon
has to be developed. Furthermore, there is six weeks of presentations where
all the groups present and discuss their projects. Finally, the written report
of your project has to be handed in.
Time distribution will be:
15 weeks = 1 intro + 2 methods + 1 problem solving+
1 presentation A (introducing software) + 2 methods + 1 problem solving +
3 weeks of presentation B (setting up model) +
3 weeks of presentation C (using model) + 1 review.
The main topics are:
MM and Simulation: mathematical models for simulation, why?
MM and Simulation: mathematical models for simulation, how?
An Economic System: modeling inventory models
A Pharmaceutical Phenomenon: math view of comparment models
CONTENTS 5
Part II: Methods of MM and Simulation

We will discuss the followings:


Introductory Simulation
Dynamic Systems
Probabilistic and stochastic techniques
Part III: New applications of MMS

We investigate few fascinating applications:


Probabilistic Modeling
Statistical Modeling
Pharmaceutical Modeling
Financial Modeling

Proposed project list


1. Multi-compartment Models in Pharmaceutical Science [ref. chapter I,
II of R. Bellman]
2. Pharmacokinetic Properties of Compartment Models- the case one
drug [ref. chapter I, IV of R. Bellman]
3. The use of Control Theory in Optimal Dosage Determination [ref.
chapter I, VII of R. Bellman]
4. The use of Decision Theory and Dynamic programming in Optimal
Dosage Determination [ref. chapter I, IX, X of R. Bellman]
5. Application of Large Deviation theory in insurance industry [ref. F.
Esscher, Notices of AMS, Feb 2008.]
Problem: too many claims could be made against the insurance
company, we worry about the total claim amount exceeding the
reserve fund set aside for paying these claims.
6 CONTENTS
Part I: Motivation of MMS.
0.1 Mathematical modeling and simulation Why?
Increasing the understanding of the system,
Predicting the future system behavior,
Carrying technical and quantitative computations for control design,
from which optimization can be done
Studying human-machine cooperation.
0.2 Mathematical modeling and simulation How?
The modeling process itself is (or should be) most often an
iterative process: one can distinguish in it a number of rather separate steps
which usually must be repeated. One begins with the real system under
investigation and pursues the following sequence of steps:
(i) empirical observations, experiments, and data collection;
(ii) formalization of properties, relationships and mechanisms
which result in a biological or physical model (e.g., mechanisms,
biochemical reactions, etc., in a metabolic pathway model;
stress-strain, pressure-force relationships in mechanics; functional
relationships between cost and reliabilities of distinct components in a
software development projects);
(iii) abstraction or mathematization resulting in a mathematical
model (e.g., algebraic and/or dierential equations with constraints
and initial and/or boundary conditions);
(iv) model analysis (which can consist of simulation studies, analytical
and qualitative analysis including stability analysis, and use of
mathematical techniques such as perturbation studies);
(v) interpretation and comparison (with the real system) of the
conclu- sions, predictions and conjectures obtained from step (iv);
(vi) changes in understanding of mechanisms, etc., in the real system.
0.3. CAUTIONS 7
0.3 Cautions
Common diculties/limitations often encountered in the modeling of
systems:
(a) Availability and accuracy of data; (b) Analysis of the mathematical
model;
(c) Use of local representations that are invalid for the overall system;
(d) Assumptions that the model is the real system;
(e) Obsession with the solution stage; (f) Communication in
interdisciplinary eorts.
0.4 Typical applications
Finance Trend Analysis with Stochastic Calculus
Two projects
Financial Economics- Models in Finance Engineering
Two projects
Pharmaceutical Phenomena- from math view
Three projects
0.5 Computing Software
Working Groups introduce softwares:
SciLab, Matlab,
Maple, Mathematica
G.A.P, Singular, R, OpenModelica, and so on

Part II: Techniques and Algorithms.


8 CONTENTS
Chapter 1
Dynamic Systems
We outline the modeling process of dynamic systems and introduce major
tools of the trade.
1.1 Introduction
Our viewpoints are:
connection between models and data namely connection between
dynamic modeling and statistical modeling, is the rst concern, and
modern computer-based statistical methods must be applied
intensively to dynamic models.
Connecting models with data is almost always the eventual goal. So taking
the time to learn statistical theory and discrete methods will make you a
better modeler and a more eective collaborator with experimentalists.
1.2 Discrete Dynamic Systems- a case study
Consider a simple discrete dynamic system S that depends on four binary
variables f, g, c, w. [S changes its states when f, g, c, w change their values.]
Suppose that
only f is the factor that can change states of S, i.e. state u changes to
another state v if and only if the equality u
f
+v
f
= 1 holds
a state u = (u
f
, u
g
, u
c
, u
w
) changes to another state v = (v
f
, v
g
, v
c
, v
w
)
i at most two coordinates of them are dierent,
9
10 CHAPTER 1. DYNAMIC SYSTEMS
the system evolves from the initial state s
I
= (0000) (source) to a nal
state s
F
= (1111) (sink).
The aim: provided some constraints between the four binary variables, we
want to choose a shortest path from the source s
I
= (0000) to the sink
s
F
= (1111).
First Tool: Invariants
Invariant means a not-changing property of a process or system when that
process evolves. Similar terms of invariant are law and pattern. In
mathematical modeling and algorithmic problem solving, very often
1/ you wants to model a system in a concise way as much as possible (of
course after droping unimportant aspects to make the model tractable);
2/ then your goal is to nd some solution or conclusion from making logical
reasoning on the model by mathematical techniques or by simulation it.
The crucial structures/objects/property that you wish to nd out and
employ during both two phases above are xed pattern or invariant during
the whole process evolution. Reason: these xed rules could help you to
keep the model within tractable range, and moreover to restrict the search
domain of solutions.
Example 1. Cutting a chocolate bar of a rectangular shape by a horizontal
cut or vertical cut into unit-size squares provides an invariant between the
number of cuts c and the number of pieces p:
p c = 1
no matter how large the bar is!
How to nd invariants? But how to gure out interesting invariants? This
activity depends very much on the type and complication of your system or
process. You have to detect and employ any meaningful relationship
between the key components/factors/variables that make the process
running or working. The more various models and application domains you
face, the more mathematical methods that you have in your hands, the
better the invariants could be found and exploited.
Denition 1 (System invariants). Invariants of the system S are specic
constraints or properties that do not change when the system factors
(variables) change their values [to ensure S stability, existence ...] during
the evolution of concerned system.
Fact 1.
a) Try your best to represent invariants mathematically, not by words!
b) When you think that some rules could be invariants, you have to prove
your thoughts by logical reasoning.
1.2. DISCRETE DYNAMIC SYSTEMS- A CASE STUDY 11
Example 2. A farmer wishes to ferry a goat, a cabbage and a wolf across a
river with a small boat which can
accommodate at most one of his belongings at a time.
Furthermore, he will not let the goat be alone (i.e. without him) with
cabbage, for a clear reason that the predator will eat the prey in the
mentioned case. What would be possible invariants of the process of
transporting between the river banks?
You can not formulate any invariant without introducing some major
variables. Here four variables f, g, c, w would be reasonably good for
describing the moving process, where f, g, c, w all receive values from the
binary set L(eft bank), R(ight bank) = 0, 1. Well, the situation that the
goat is not allowed be alone with cabbage without farmer is just
symbolically represented by the xed rule:
Invariant 1 : (g = c) OR (f = g = c)
This rule is indeed an invariant, since it is (or better must be) maintained
in the whole process. For instance, if we require that
(i) either f, g and c must receive the same value v 0, 1
(ii) or g and c can not receive the same value v 0, 1 if f gets value 1 v
what would be the invariants you could say?
In case (i), easy to formulate the rst invariant as f = g = c.
Brute force.
Denition 2 (Brute-force). Search exhaustively all possibilities of the
search space and, after checking, list all solutions.
Obviously all possible states V of the system S above are the solution set of
the polynomial system of equations
f
2
f = g
2
g = c
2
c = w
2
w = 0.
However, if no constraint is detected and imposed, you have to search
exhaustively all possible paths from s
I
to s
F
: this task is computationally
impossible when we have a lot binary variables or their values are not
binary. For instance, if we use
Constraint I:
(I
a
) f, g and c must receive the same value a 0, 1
(I
b
) OR g and c must receive dierent values
then a few states must be discarded from V , such as (1, 0, 0, 1), (0, 1, 1, 0).
Finding the invariant saying that the small boat can accommodate at most
one of the farmer belongings at a time is more tricky! To do this, you have
12 CHAPTER 1. DYNAMIC SYSTEMS
to think about the state-changing aspect of the process when the farmer
rows from one side to the other side of the river. [You will not draw a boat
with some items on it and move it on the river several times, will you?]
Then a simple question naturally arises: what are the states of the process?
This smart question leads us to the next mathematical tool, shown in the
next part.
State-transition graphs Usually a state set of a graph G = (V, E) can be
given by a binary space/set V consisting of length n vectors:
V = u = (u
1
, u
2
, . . . , u
n
) : u
i
= 0 u
i
= 1.
The Hamming distance d(u, v) between two binary states
u = (u
1
, u
2
, . . . , u
n
) and v = (v
1
, v
2
, . . . , v
n
) is the number of their distinct
coordinates:
d(u, v) = [u
1
v
1
[ +[u
2
v
2
[ +. . . +[u
n
v
n
[ =
n

i=1
[u
i
v
i
[
The weight of a binary state/vector is dened to be
wt(u) = d(u, 0) =
n

i=1
[u
i
0[ =
n

i=1
u
i
.
The Hamming distance d(., .) dened on some binary space V is also called
the Hamming metric, and the space V equipped with the Hamming metric
d(., .) is called a Hamming metric space.
Denition 3 (State-transition graph). State-transition graph G = (V, E) of
a developing system S is a directed graph where
the vertices V consists of all feasible states that the system can realize
the edges E consists of arcs e = (u, v) such that state u can reach to
state v during the evolution of the concerned system
Very often, changing states of a state-transition graph G = (V, E) can be
conducted mathematically by measuring how far the Hamming distance is
between an original state u = (u
1
, u
2
, . . . , u
n
) to its eect state
v = (v
1
, v
2
, . . . , v
n
).
Example 3 (The farmers crossing river problem, cont.). The states of the
river crossing process are binary vectors of length 4
u = (u
f
, u
g
, u
c
, u
w
) = (u
1
, u
2
, u
3
, u
4
) 0, 1
4
,
if we encode the left bank L and the right bank R by 0, 1 as done above!
1.2. DISCRETE DYNAMIC SYSTEMS- A CASE STUDY 13
In our specic example above, V can hold all 16 = 2
4
possible states if no
system invariants would be found and imposed on S. With Constraint I, V
can be redened as V := V ` (1, 0, 0, 1), (0, 1, 1, 0).
We understand when the farmer is rowing his boat, for instance from a left
river bank u = (u
1
, u
2
, u
3
, u
4
) to a right river bank v = (v
1
, v
2
, . . . , v
n
) (or
the other way round), his position must change. The changing of state u to
v creates an edge e = (u, v) E, indeed! More clearly, the edge e = (u, v)
is truly determined i
if u
1
= L(i.e. 0) then v
1
= R(i.e. 1); or the other way round.
A ha, we just nd another invariant, must always be true to let the process
run, say: an edge e = (u, v) would exist if we equivalently have:
Invariant 2 : u
1
+v
1
= 1, where the sum is binary plus.
Combining with the fact that
the small boat can accommodate at most one of the farmer belongings we
realize that
a starting state u change at most two of its two coordinates to be the result
state v.
Hence, the third invariant is found:
Invariant 3 : d(u, v) =
4

i=1
[u
i
v
i
[ 2.
Decomposition
Having known how to describe a process or system by a state-transition
graph G = (V, E) is not enough! The reason is that we sometimes wish
to search from the all elligible states in V to nd best solutions, or
to determine an optimal path running through that search space V .
This comes down to list all states in V eciently! In that situation, we
could split the search space into several small-enough piceces, usually is
called Decompostion, and then list all elements in that pieces, called
Brute force.
Example 4 (The farmers crossing river problem, cont.). The set of
elligible states V consists of two parts: one holds every state corresponding
to the position L of the farmer, and the other one holds every state
corresponding to the position R of the farmer.
14 CHAPTER 1. DYNAMIC SYSTEMS
This observation tells us to decompose the state vertices V into two subsets:
V
L
= u
L
= (0, u
2
, u
3
, u
4
) 0, 1
4
; and V
R
= u
R
= (1, u
2
, u
3
, u
4
) 0, 1
4

Now Brute forcing the subset V


L
means listing all vertices
u
L
= (0, u
2
, u
3
, u
4
), just have to keep in your mind Invariant 1 and
maintain it! Do similarly for the subset V
R
.
Solution
To determine a solution for the farmer, our specic instance so far, we nd
a path running through the search space V by combining several methods
suggested above.
Example 5 (The farmers crossing river problem, cont.). With the starting
state LLLL = 0000 of the farmer, transporting all his items to the right
bank requests us, mathematically, to nd a path P (consists of a few edges
e = (u, v) E) to the nal state RRRR = 1111.
To do this, for any vertex u = (u
1
, u
2
, u
3
, u
4
) P, Invariants 2 and 3
provide candidate vertices v = (1 u
1
, v
2
, v
3
, v
4
) for making valid edges
e = (u, v) E. You have to follow every possible tracks generated at any
intermediate vertex of P, until you arrive in the nal state 1111.
1.3 Continuous Dynamic Systems
Major steps are:
1. Setting the objective
2. Building an initial model
3. Developing equations for process rates
4. Nonlinear rates from data: nonparametric models
5. Stochastic models
6. Fitting rate equations by calibration
Setting the objective
Few crucial steps should be considered in this phase:
Decide the objective to be theoretical or practical modeling; where
theoretical modeling: the putative model helps us to understand the
system and interpret observations of its behavior
practical modeling : the putative model helps us to predict the system
1.3. CONTINUOUS DYNAMIC SYSTEMS 15
Decide how much numerical accuracy you need
Assess the feasibility of your goals: should a bit pessimism (start
small rst, and then expand it to a more complex one)
Assess the feasibility of your data: should a bit optimistic (dont
worry if you miss some data from the begining)
Building an initial model
Conceptual model and diagram. The best known is compartmental model;
you have to decide which variables and processes in the system are the most
important and which compartment should they be located.
Developing equations for process rates Having drawn a model
diagram, we next need an equation for each process rate. Mathematically,
we need a dierential equation expressed by:
an Ordinary Dierential Equation (ODE) of the form:
x

= f(x, u, t)
where x

denotes the derivative of x (the state variables) with respect


to the time variable t, and u the input vector variable, or
by Dierential Algebraic Equations (DAE):
x

= f(x, u, t), and 0 = g(x, u, t)


Linear rates: when and why?. See R. Bellman
Nonlinear rates from data: tting parametric models. See R. Bellman
16 CHAPTER 1. DYNAMIC SYSTEMS
Chapter 2
Stochastic techniques
We will discuss the followings:
Generating Functions
Stochastic processes
Markov chains
2.1 Generating functions

Introduction Probabilistic models often involve several random variables


of interest. For example, in a medical diagnosis context, the results of
several tests may be signicant , or in a networking context, the workloads
of several gateways may be of interest. All of these random variables are
associated with the same experiment, sample space, and probability law,
and their values may relate in interesting ways Generating functions are
important in handling stochastic processes involving integral-valued random
variables.
Elementary results
Suppose we have a sequence of real numbers a
0
, a
1
, a
2
, . . . Introducing the
dummy variable x, we may dene a function
A(x) = a
0
+a
1
x +a
2
x
2
+ =

j=0
a
j
x
j
. (2.1)
If the series converges in some real interval x
0
< x < x
0
, the function A(x)
is called the generating function of the sequence a
j
.
17
18 CHAPTER 2. STOCHASTIC TECHNIQUES
Fact 2. If the sequence a
j
is bounded by some cosntant K, then A(x)
converges at least for [x[ < 1 [Prove it!]
Fact 3. In case of the sequence a
j
represents probabilities, we introduce
the restriction
a
j
0,

j=0
a
j
= 1.
The corresponding function A(x) is then a probability-generating function.
We consider the (point) probability distribution and the tail probability of
a random variable X, given by
P[X = j] = p
j
, P[X > j] = q
j
,
then the usual distribution function is
P[X j] = 1 q
j
.
The probability-generating function now is
P(x) =

j=0
p
j
x
j
= E(x
j
), E indicates the expectation operator.
Also we can dene a generating function for the tail probabilities:
Q(x) =

j=0
q
j
x
j
.
Q(x) is not a probability-generating function, however.
Fact 4.
a/ P(1) =

j=0
p
j
1
j
= 1 and
[P(x)[

j=0
[p
j
x
j
[

j=0
p
j
1 if [x[ < 1. So P(x) is absolutely
convergent at least for [x[ 1.
b/ Q(x) is absolutely convergent at least for [x[ < 1.
c/ Connection between P(x) and Q(x): (check this!)
(1 x)Q(x) = 1 P(x) or P(x) +Q(x) = 1 +xQ(x).
Mean and variance of a probability distribution
m = E(X) =

j=0
j p
j
= P

(1) =

j=0
q
j
= Q(1) (why!?)
Recall that the variance of the probability distribution p
j
is

2
= E(X(X 1)) +E(X) [E(X)]
2
2.1. GENERATING FUNCTIONS 19
we need to know
E[X(X 1)] =

j=0
j(j 1) p
j
= P

(1) = 2Q

(1)?
Therefore,

2
=???Whatisit
Exercise: Find the formula of the r-th factorial moment

[r]
= E(X(X 1)(X 2) (X r + 1))
Finding a generating function from a recurrence.
Multiply both sides by x
n
.
Example: Fibonacci sequence
f
n
= f
n1
+f
n2
= F(x) = x +xF(x) +x
2
F(x)
Finding a recurrence from a generating function.
Whenever you know F(x), we nd its power series P, the coecicents of P
before x
n
are Fibonacci numbers.
How? Just remember how to nd partial fractions expansion of F(x), in
particular a basic expansion
1
1 x
= 1 +x +
2
x
2
+
In general, if G(x) is a generating function of a sequence (g
n
) then
G
(n)
(0) = n!g
n
Multiple random variables. We consider probabilities involving
simultaneously the numerical values of several random variables and to
investigate their mutual couplings. In this section, we will extend the
concepts of PMF and expectation developed so far to multiple random
variables.
Consider two discrete random variables X, Y : S R associated with the
same experiment. The joint PMF of X and Y is dened by
p
X,Y
(x, y) = P[X = x, Y = y]
for all pairs of numerical values (x, y) that X and Y can take. We will use
the abbreviated notation P(X = x, Y = y) instead of the more precise
notations P[(X = x) (Y = y)] or P[X = xand Y = x].. For the pair of
random variables X, Y , we say
20 CHAPTER 2. STOCHASTIC TECHNIQUES
Denition 4. X and Y are independent if for all x, y R, we have
P[X = x, Y = y] = P[X = x]P[Y = y] p
X,Y
(x, y) = p
X
(x) p
Y
(y),
or in terms of conditional probability
P(X = x[Y = y) = PX = x.
This can be extended to the so-called mutually independent of a nite
number n r. v.s.
Expectation. The expectation operator denes the expected value of a
random variable X as
Denition 5.
E(X) =

xRange(X)
PX = x x
If we consider X is a function from a sample space S to the naturals N, then
E(X) =

i=0
PX > i.(Why?)
Functions of Multiple Random Variables. When there are multiple
random variables of interest, it is possible to generate new random variables
by considering functions involving several of these random variables. In
particular, a function Z = g(X, Y ) of the random variables X and Y denes
another random variable. Its PMF can be calculated from the joint PMF
p
X,Y
according to
p
Z
(z) =

(x,y)|g(x,y)=z
p
X,Y
(x, y).
Furthermore, the expected value rule for functions naturally extends and
takes the form
E[g(X, Y )] =

(x,y)
g(x, y) p
X,Y
(x, y).
Theorem 6. We have two important results of expectation.
1. (Linearity) E(X +Y ) = E(X) +E(Y ) for any pair of random
variables X, Y
2. (Independence) E(X Y ) = E(X) E(Y ) for any pair of independent
random variables X, Y
2.2. CONVOLUTIONS 21
2.2 Convolutions
Now we consider two nonnegative independent integral-valued random
variables X and Y , having the probability distributions
PX = j = a
j
, PY = k = b
k
. (2.2)
The joint probability of the event (X = j, Y = k) is a
j
b
k
obviously. We
form a new random variable
S = X +Y,
then the event S = r comprises the mutually exclusive events
(X = 0, Y = r), (X = 1, Y = r 1), , (X = r, Y = 0).
Fact 5. The probability distribution of the sum S then is
PS = r = c
r
= a
0
b
r
+a
1
b
r1
+ +a
r
b
0
.
Proof.
p
S
(r) = P(X+Y = r) =

(x,y):x+y=r
P(X = xand Y = y) ==

x
p
X
(x) p
Y
(rx)
Denition 7. This method of compounding two sequences of numbers (not
necessarily be probabilities) is called convolution. Notation
c
j
= a
j
b
j

will be used.
Fact 6. Dene the generating functions of the sequence a
j
, b
j
and c
j

by
A(x) =

j=0
a
j
x
j
, B(x) =

j=0
b
j
x
j
, C(x) =

j=0
c
j
x
j
,
it follows that C(x) = A(x)B(x). [check this!]
In practical applications, the sum of several independent integral-valued
random variables X
i
can be dened
S
n
= X
1
+X
2
+ +X
n
, n Z
+
.
If the X
i
have a common probability distribution given by p
j
, with
probability-generating function P(x), then the probability-generating
function of S
n
is P(x)
n
. Clearly, the n-fold convolution of S
n
is
p
j
p
j
p
j
(n factors) = p
j

.
22 CHAPTER 2. STOCHASTIC TECHNIQUES
2.3 Compound distributions
In our discussion so far of sums of random variables, we have always
assumed that the number of variables in the sum is known and xed , i.e., it
is nonrandom. We now generalize the previous concept of convolution to
the case where the number N of random variables X
k
contributing to the
sum is itself a random variable! In particular, we consider the sum
S
N
= X
1
+X
2
+ +X
N
, where
PX
k
= j = f
j
,
PN = n = g
n
,
PS
N
= l = h
l
.
(2.3)
Probability-generating functions of X, N and S are
F(x) =

f
j
x
j
,
G(x) =

g
n
x
n
,
H(x) =

h
l
x
l
.
(2.4)
Compute H(x) with respect to F(x) and G(x). Prove that
H(x) = G(F(x)).
Example 6. A remote village has three gas stations, and each one of them
is open on any given day with probability 1/2, independently of the others.
The amount of gas available in each gas station is unknown and is
uniformly distributed between 0 and 1000 gallons. We wish to characterize
the distribution of the total amount of gas available at the gas stations that
are open.
The number N of open gas stations is a binomial random variable with
p = 1/2 and the corresponding transform is
G
N
(x) = (1 p +pe
x
)
3
=
1
8
(1 +e
x
)
3
.
The transform (probability-generating function) F
X
(x) associated with the
amount of gas available in an open gas station is
F
X
(x) =
e
1000x
1
1000x
.
The transform H
S
(x) associated with the total amount S of gas available at
the three gas stations of the village that are open is the same as G
N
(x),
except that each occurrence of e
x
is replaced with F
X
(x), i.e.,
2.3. COMPOUND DISTRIBUTIONS 23
H
S
(x) = G(F(x)) =
1
8
(1 +F
X
(x))
3
.
Application in Large Deviation theory
We are interested in a practical situation in insurance industry, originally
realized from 1932 by F. Esscher, (Notices of AMS, Feb 2008).
Problem: too many claims could be made against the insurance company,
we worry about the total claim amount exceeding the reserve fund set aside
for paying these claims.
Our aim: to compute the probability of this event.
Modeling. Each individual claim is a random variable, we assume some
distribution for it, and the total claim is then the sum S of a large number
of (independent or not) random variables. The probability that this sum
exceeds a certain reserve amount is the tail probability of the sum S of
independent random variables.
Large Deviation theory invented by Esscher requires the calculation of
the moment generating functions! If your random variables are independent
then the moment generating functions are the product of the individual
ones, but if they are not (like in a Markov chain) then there is no longer
just one moment generating function!
Research project: study Large Deviation theory to solve this problem.
24 CHAPTER 2. STOCHASTIC TECHNIQUES
2.4 Introdductory Stochastic Processes
The concept. A stochastics process is just a collection (usually innite) of
random variables, denoted X
t
or X(t); where parameter t often represents
time. State space of a stochastics process consists of all realizations x of X
t
,
i.e. X
t
= x says the random process is in state x at time t. Stochastics
processes can be generally subdivided into four distinct categories
depending on whether t or X
t
are discrete or continuous:
1. Discrete processes: both are discrete, such as Bernoulli process (die
rolling) or Discrete Time Markov chains.
2. Continuous time discrete state processes: the state space of X
t
is
discrete and the index set, e.g. time set T of t is continuous, as an
interval of the reals R.
Poisson process the number of clients X(t) who has entered
ACB from the time it opened until time t. X(t) will have the
Poisson distribution with the mean E[X(t)] = t ( being the
arrive rate).
Continuous time Markov chain.
Queuing process people not only enter but also leave the bank,
we need the distribution of service time (the time a client spends
in ACB).
3. Continuous processes: both X
t
and t are continuous, such as diusion
process (Brownian motion).
4. Discrete time continuous state processes: X
t
is continuous and t is
discrete the so-called TIME SERIES such as
monthly uctuations of the ination rate of Vietnam,
daily uctuations of a stock market.
Examples
1. Discrete processes: random walk model consisting of positions X
t
of
an object (drunkand) at time discrete time point t during 24 hours,
whose directional distance from a particular point 0 is measured in
integer units. Here T = 0, 1, 2, . . . , 24.
2. Discrete time continuous processes: X
t
is the number of births in a
given population during time period [0, t]. Here T = R
+
= [0, ) and
the state space is 0, 1, 2, . . . , The sequence of failure times of a
machine is a specic instance.
2.4. INTRODDUCTORY STOCHASTIC PROCESSES 25
3. Continuous processes: X
t
is population density at time
t T = R
+
= [0, ), and the state space of X
t
is R
+
.
4. TIME SERIES of daily uctuations of a stock market
What interesting characteristics of SP that we want to know? We
know a stochastic process is a mathematical model of a probabilistic
experiment that evolves in time and generates a sequence of numerical
values. Three interesting aspects of SP that we want to know:
(a) We tend to focus on the dependencies in the sequence of values
generated by the process. For example, how do future prices of a stock
depend on past values?
(b) We are often interested in long-term averages, involving the entire se-
quence of generated values. For example, what is the fraction of time that a
machine is idle?
(c) We sometimes wish to characterize the likelihood or frequency of certain
boundary events. For example, what is the probability that within a
given hour all circuits of some telephone system become simultaneously
busy, or what is the frequency with which some bu[U+FB00]er in a
computer net- work over[U+FB02]ows with data?
Few fundamental properties and categories
1. STATIONARY property: A process is stationary when all the X(t)
have the same distribution. That means, for any , the distribution of
a stationary process will be unaected by a shift in the time origin,
and X(t) and X(t +) will have the same distributions. For the
rst-order distribution,
F
X
(x; t) = F
X
(x; t +) = F
X
(x); and f
X
(x; t) = f
X
(x).
These processes are found in Arrival-Type Processes. For which, we
are interested in occurrences that have the character of an arrival,
such as message receptions at a receiver, job completions in a
manufacturing cell, customer purchases at a store, etc. We will focus
on models in which the interarrival times (the times between
successive arrivals) are independent random variables.
The case where arrivals occur in discrete time and the interarrival
times are geometrically distributed is the Bernoulli process.
The case where arrivals occur in continuous time and the
interarrival times are exponentially distributed is the Poisson
process. Bernoulli process and Poisson process will be investigated in
detail in the Stochastic Processes course.
26 CHAPTER 2. STOCHASTIC TECHNIQUES
2. MARKOVIAN (memory-less) property: Many processes with
memory-less property caused by experiments that evolve in time and
in which the future evolution exhibits a probabilistic dependence on
the past. As an example, the future daily prices of a stock are
typically dependent on past prices. However, in a Markov process, we
assume a very special type of dependence: the next value depends on
past values only through the current value, that is X
i+1
depends only
on X
i
, and not on any previous values.
2.5 Markov Chains (MC), a keytool in modeling
random phenomena
We discuss the concept of discrete time Markov Chain or just Markov
Chains in this section. Suppose we have a sequence M of consecutive trials,
numbered n = 0, 1, 2, . The outcome of the nth trial is represented by the
random variable X
n
, which we assume to be discrete and to take one of the
values j in a nite set Q of discrete outcomes/states e
1
, e
2
, e
3
, . . . , e
s
.
M is called a (discrete time) Markov chain if, while occupying Q states at
each of the unit time points 0, 1, 2, 3, . . . , n 1, n, n + 1, . . ., M satises the
following property, called Markov property or memoryless property.

P(X
n+1
= j[X
n
= i, , X
0
= a) = P(X
n+1
= j[X
n
= i), for all n = 0, 1, 2, .

(In each time step n to n + 1, the process can stay at the same state e
i
(at
both n, n + 1) or move to other state e
j
(at n + 1) with respect to the
memoryless rule, saying the future behavior of system depends only on the
present and not on its past history.)
Denition 8 (One-step transition probability).
Denote the absolute probability of outcome j at the nth trial by
p
j
(n) = P(X
n
= j) (2.5)
The one-step transition probability, denoted
p
ij
(n + 1) = P(X
n+1
= j[X
n
= i),
dened as the conditional probability that the process is in state j at time
n + 1 given that the process was in state i at the previous time n, for all
i, j Q.
2.5. MARKOVCHAINS (MC), AKEYTOOL INMODELINGRANDOMPHENOMENA27
Independent of time property- Homogeneous Markov chains. If
the state transition probabilities p
ij
(n + 1) in a Markov chain M is
independent of time n, they are said to be stationary, time homogeneous or
just homogeneous. The state transition probability in homogeneous chain
then can be written without mention time point n:
p
ij
= P(X
n+1
= j[X
n
= i). (2.6)
Unless stated otherwise, we assume and will work with homogeneous
Markov chains M. The one-step transition probabilities given by 3.2 of
these Markov chains must satisfy:
s

j=1
p
ij
= 1; for each i = 1, 2, , s and p
ij
0.
Transition Probability Matrix. In practical applications, we are likely given
the initial distribution (i.e. the probability distribution of starting position
of the concerned object at time point 0), and the transition probabilities;
and we want to determine the the probability distribution of position X
n
for any time point n > 0. The Markov property, quantitatively described
through transition probabilities, can be represented in the state transition
matrix P = [p
ij
]:
P =

p
11
p
12
p
13
. . . .p
1s
.
p
21
p
22
p
23
. . . p
2s
.
p
31
p
32
p
33
. . . p
3s
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

(2.7)
Briey, we have
Denition 9. A (homogeneous) Markov chain M is a triple (Q, p, A) in
which:
Q is a nite set of states (be identied with an alphabet ),
p(0) are initial probabilities, (at initial time point n = 0)
P are state transition probabilities, denoted by a matrix P = [p
ij
] in
which
p
ij
= P(X
n+1
= j[X
n
= i)
.
And such that the memoryless property is satised,ie.,
P[X
n+1
= j[X
n
= i, , X
0
= a] = P[X
n+1
= j[X
n
= i], for all n.
28 CHAPTER 2. STOCHASTIC TECHNIQUES
In practice, the initial probabilities p(0) is obtained at the current time
(begining of a research), and the transition probability matrix P is found
from empirical observations in the past. In most cases, the major concern is
using P and p(0) to predict future.
Example 7. The Coopmart chain (denoted C) in SG currently controls
60% of the daily processed-food market, their rivals Maximart and other
brands (denoted M) takes the other share. Data from the previous years
(2006 and 2007) show that 88% of Cs customers remained loyal to C, while
12% switched to rival brands. In addition, 85% of Ms customers remained
loyal to M, while other 15% switched to C. Assuming that these trends
continue, use MC theory to determine Cs share of the market (a) in 5
years and (b) over the long run.
Proposed solution. Suppose that the brand attraction is time homogeneous,
for a sample of large enough size n, we denote the customers attention in
the year n by a random variable X
n
. The market share probability of the
whole population then can be approximated by using the sample statistics,
e.g.
P(X
n
= C) =
[x : X
n
(x) = C[
n
, and P(X
n
= M) = 1 P(X
n
= C).
Set n = 0 for the current time, the initial probabilities then is
p(0) = [0.6, 0.4] = [P(X
0
= C), P(X
0
= M)].
Obviously we want to know the market share probabilities
p(n) = [P(X
n
= C), P(X
n
= M)] at any year n > 0. We now introduce a
transition probability matrix with labels on rows and columns to be C and
M
P =

C M

C 0.88 0.12
M 0.15 0.85

1 a = 0.88 a = 0.12
b = 0.15 1 b = 0.85

, =

0.88 0.12
0.15 0.85

,
(2.8)
where a = p
CM
= P[X
n+1
= M[X
n
= C], b = p
MC
= P[X
n+1
= C[X
n
= M].

Higher-order transition probabilities.


The aim: nd the absolute probabilities at any stage n. We write
p
(n)
ij
= P(X
n+m
= j[X
m
= i), with p
(1)
ij
= p
ij
(2.9)
for the n-step transition probability, being dependent of m N, see
Equation 3.2. The n-step transition matrix is denoted as P
(n)
= (p
(n)
ij
). For
2.5. MARKOVCHAINS (MC), AKEYTOOL INMODELINGRANDOMPHENOMENA29
the case n = 0, we have
p
(0)
ij
=
ij
= 1 if i = j, and i = j.
Chapman Komopgorov equations. Chapman Komopgorov equations relate
the n-step transition probabilities and k-step and n k-step transition
probabilities:
p
(n)
ij
=
s

h=1
p
(nk)
ih
p
(k)
hj
, 0 < k < n.
This results in the matrix notation
P
(n)
= P
(nk)
P
(k)
.
Since P
(1)
= P, we get P
(2)
= P
2
, and in general P
(n)
= P
n
.
Let p
(n)
denote the vector form of probability mass distribution (pmf or
absolute probability distribution) associated with X
n
of a Markov process,
that is
p
(n)
= [p
1
(n), p
2
(n), p
3
(n), . . . , p
s
(n)],
where each p
i
(n) is dened as in 2.5.
Proposition 10. The absolute probability distribution p
(n)
at any stage n
of a Markov chain is given in the matrix form
p
(n)
= P
n
p
(0)
, where p
(0)
= p is the initial probability vector. (2.10)
Proof. We employ two facts:
* P
(n)
= P
n
, and
* the absolute probability distribution p
(n+1)
at any stage n + 1 (associated
with X
n+1
) can be found by the 1-step transition matrix P = [p
ij
] and the
distribution
p
(n)
= [p
1
(n), p
2
(n), p
3
(n), . . . , p
s
(n)]
at any stage n (associated with X
n
):
p
j
(n + 1) =
s

i=1
p
ij
p
i
(n), or in the matrix notation p
(n+1)
= P p
(n)
.
Then just do the induction
p
(n+1)
= P p
(n)
= P P, p
(n1)
= = P
n+1
p
(0)
.
Example 8 (The Coopmart chain: cont. ). (a/) Cs share of the market
in 5 years can be computed by
p
(5)
= [p
C
(5), p
M
(5)] = P
5
p
(0)
.
30 CHAPTER 2. STOCHASTIC TECHNIQUES
2.6 Classication of States
Accessible states. State j is said to be accessible from state i if for some
n 0, p
(n)
ij
> 0, and we write i j. Two states i and j accessible to each
other are said to communicate, and we write i j. If all states
communicate with each other, then we say that the Markov chain is
irreducible.
Recurrent states. Let A(i) be the set of states that are accessible from i. We
say that i is recurrent if for every j that is accessible from i, i is also
accessible from j; that is, for all j A(i) we have that i A(j).
When we start at a recurrent state i, we can only visit states j A(i) from
which i is accessible. Thus, from any future state, there is always some
probability of returning to i and, given enough time, this is certain to
happen. By repeating this argument, if a recurrent state is visited once, it
will be revisited an innite number of times.
Transient states. A state is called transient if it is not recurrent. In
particular, there are states j A(i) such that i is not accessible from j.
After each visit to state i, there is positive probability that the state enters
such a j. Given enough time, this will happen, and state i cannot be visited
after that. Thus, a transient state will only be visited a nite number of
times.
If i is a recurrent state, the set of states A(i) that are accessible from i,
form a recurrent class (or simply class), meaning that states in A(i) are all
accessible from each other, and no state outside A(i) is accessible from
them. Mathematically, for a recurrent state i, we have A(i) = A(j) for all j
that belong to A(i), as can be seen from the denition of recurrence. It can
be seen that at least one recurrent state must be accessible from any given
transient state. This is intuitively evident, and a more precise justication
is given in the theoretical problems section. It follows that there must exist
at least one recurrent state, and hence at least one class. Thus, we reach
the following conclusion.
Markov Chain Decomposition.
A MC can be decomposed into one or more recurrent classes, plus
possibly some transient states.
A recurrent state is accessible from all states in its class, but is not
accessible from recurrent states in other classes.
A transient state is not accessible from any recurrent state.
2.6. CLASSIFICATION OF STATES 31
At least one, possibly more, recurrent states are accessible from a
given transient state.
Remark 7. For the purpose of understanding long-term behavior of
Markov chains, it is im- portant to analyze chains that consist of a single
recurrent class.
For the purpose of understanding short-term behavior, it is also important
to analyze the mech- anism by which any particular class of recurrent states
is entered starting from a given transient state.
Periodic states.
Absorption probabilities. In this section, we study the short-term behavior
of Markov chains. We rst consider the case where the Markov chain starts
at a transient state. We are interested in the rst recurrent state to be
entered, as well as in the time until this happens. When focusing on such
questions, the subsequent behavior of the Markov chain (after a recurrent
state is encountered) is immaterial. State j is said to be an absorbing state
if p
jj
= 1; that is, once state j is reached, it is never left. We assume,
without loss of generality, that every recurrent state k is absorbing:
p
kk
= 1, p
kj
= 0 for all j = k.
If there is a unique absorbing state k, its steady-state probability is 1
(because all other states are transient and have zero steady-state
probability), and will be reached with probability 1, starting from any
initial state.
If there are multiple absorbing states, the probability that one of them
will be eventually reached is still 1, but the identity of the absorbing
state to be entered is random and the associated probabilities may
depend on the starting state.
In the sequel, we x a particular absorbing state, denoted by s, and consider
the absorption probability a
i
that s is eventually reached, starting from i:
a
i
= P(X
n
eventually becomes equal to the absorbing state s[X
0
= i).
Absorption probabilities can be obtained by solving a system of linear
equations.
a
s
= 1, a
i
= 0, for all absorbing i = s, a
i
=
m

j=1
p
ij
a
j
, for all transient i.
32 CHAPTER 2. STOCHASTIC TECHNIQUES
2.7 Limiting probabilities and Stationary
distribution of a MC
Denition 11. Vector p

is called a stationary distribution of a Markov


chain X
n
, n 0 with the state transition matrix P if:
p

P = p

.
This equation indicates that a stationary distribution p

is a left
eigenvector of P with eigenvalue 1. In general, we wish to know limiting
probabilities p

from taking n in the equation


p
()
= P

p
(0)
.
We need some general results to determine the stationary distribution p

and limiting probabilities p

of a Markov chain.
A) Markov chains that have two states. At rst we investigate the
case of Markov chains that have two states, say Q = e
1
, e
2
. Let a = p
e
1
e
2
and b = p
e
2
e
1
the state transition probabilities between distinct states in a
two state Markov chain, its state transition matrix is
P =

p
11
p
21
p
12
p
22

1 a a
b 1 b

, where 0 < a < 1, 0 < b < 1. (2.11)


Proposition 12.
a) The n-step transition probability matrix is given by
P
(n)
= P
n
=
1
a +b

b a
b a

+ (1 a b)
n

a a
b b

b) Find the limit matrix when n .


To prove this basic Proposition 12 (computing transition probability matrix
of two state Markov chains), we use a fundamental result of Linear Algebra
that is recalled in Subsection ??.
Proof. The eigenvalues of the state transition matrix P found by solving
equation
c() = [I P[ = 0
are
1
= 1 and
2
= 1 a b. The spectral decomposition of square matrix
says P can be decomposed into two constituent matrices E
1
, E
2
(since only
two eigenvalues was found):
E
1
=
1

1

2
[P
2
I], E
2
=
1

2

1
[P
1
I].
2.8. EXERCISES 33
That means, E
1
, E
2
are orthogonal matrices, i.e. E
1
E
2
= 0 = E
2
E
1
, and
P =
1
E
1
+
2
E
2
; E
2
1
= E
1
, E
2
2
= E
2
.
Hence, P
n
=
n
1
E
1
+
n
2
E
2
= E
1
+ (1 a b)
n
E
2
, or
P
(n)
= P
n
=
1
a +b

b a
b a

+ (1 a b)
n

a a
b b

b) The limit matrix when n :


lim
n
P
n
=
1
a +b

b a
b a

B) Markov chains that have more than two states. For s > 2, it is
cumbersome to compute constituent matrices E
i
of P, we could employ the
so-called regular property. Markov chains are regular if there exists m N
such that P
(m)
= P
m
> 0 (every entry is positive).
2.8 Exercises
A/ Simple skills.

Let Z
1
, Z
2
, be independent identically distributed r.v.s with
P(Z
n
= 1) = p and P(Z
n
= 1) = q = 1 p for all n. Let
X
n
=
n

i=1
Z
i
, n = 1, 2,
and X
0
= 0. The collection of r.v.s X
n
, n 0 is a random process, and it
is called the simple random walk X(n) in one dimension.
(a) Describe the simple random walk X(n).
(b) Construct a typical sample sequence (or realization) of X(n).
(c) Find the probability that X(n) = 2 after four steps.
(d) Verify the result of part (a) by enumerating all possible sample
sequences that lead to the value X(n) = 2 after four steps.
(e) Find the mean and variance of the simple random walk X(n). Find
the autocorrelation function R
X
(n, m) of the simple random walk
X(n).
34 CHAPTER 2. STOCHASTIC TECHNIQUES
(f) Show that the simple random walk X(n) is a Markov chain.
(g) Find its one-step transition probabilities.
(h) Derive the rst-order probability distribution of the simple random
walk X(n).
Solution.
(a) The simple random walk X(n) is a discrete-parameter (or time),
discrete-state random process. The state space is
E = ..., 2, 1, 0, 1, 2, ..., and the index parameter set is T = 0, 1, 2, ....
(b) A sample sequence x(n) of a simple random walk X(n) can be produced
by tossing a coin every second and letting x(n) increase by unity if a head H
appears and decrease by unity if a tail T appears. Thus, for instance, we
have a small realization of X(n) in Table 2.8:
n 0 1 2 3 4 5 6 7 8 9 10
Coin tossing H T T H H H T H H T
x
n
0 1 0 - 1 0 1 2 1 2 3 2
Table 2.1: Simple random walk from Coin tossing
The sample sequence x(n) obtained above is plotted in (n, x(n))-plane. The
simple random walk X(n) specied in this problem is said to be
unrestricted because there are no bounds on the possible values of X. The
simple random walk process is often used in the following primitive
gambling model: Toss a coin. If a head appears, you win one dollar; if a tail
appears, you lose one dollar.
B/ Concepts.

1. Show that if P is a Markov matrix, then P


n
is also a Markov matrix
for any positive integer n.
2. A state transition diagram of a nite-state Markov chain is a line
diagram with a vertex corresponding to each state and a directed line
between two vertices i and j if p
ij
> 0. In such a diagram, if one can
move from i and j by a path following the arrows, then i j. The
diagram is useful to determine whether a nite-state Markov chain is
irreducible or not, or to check for periodicities. Draw the state
2.8. EXERCISES 35
transition diagrams and classify the states of the MCs with the
following transition probability matrices:
P
1
=

0 0.5 0.5
0.5 0 0.5
0.5 0.5 0

; P
2
=

0 0 0.5 0.5
1 0 0 0
0 1 0 0
0 1 0 0

; P
3
=

0.3 0.4 0 0 0.3


0 1 0 0 0
0 0 0 0.6 0.4
0 0 1 0 0

3. Verify the transitivity property of the Markov chain ; that is, if i j


and j k, then i k. (Hint: use Chapman Komopgorov equations).
4. Show that in a nite-state Markov chain, not all states can be
transient.
C/ Markov Chains and Modeling.

1. A certain product is made by two companies, A and B, that control


the entire market. Currently, A and B have 60 percent and 40
percent, respectively, of the total market. Each year, A loses 5 of its
market share to By while B loses 3 of its share to A. Find the relative
proportion of the market that each hold after 2 years.
2. Let two gamblers, A and B, initially have k dollars and m dollars,
respectively. Suppose that at each round of their game, A wins one
dollar from B with probability p and loses one dollar to B with
probability q = 1 p. Assume that A and B play until one of them
has no money left. (This is known as the Gamblers Ruin problem.)
Let X
n
be As capital after round n, where n = 0, 1, 2, and X
0
= k.
(a) Show that X(n) = X
n
, n 0 is a Markov chain with absorbing
states.
(b) Find its transition probability matrix P. Realize P when
p = q = 1/2 and N = 4
(c*) What is the probability of As losing all his money?
Hint: Dierent rounds are assumed independent. The gambler A, say,
plays continuously until he either accumulates a target amount of m,
or loses all his money. We introduce the Markov chain shown whose
state i represents the gamblers wealth at the beginning of a round.
The states i = 0 and i = m correspond to losing and winning,
respectively. All states are transient, except for the winning and losing
states which are absorbing. Thus, the problem amounts to nding the
probabilities of absorption at each one of these two absorbing states.
Of course, these absorption probabilities depend on the initial state i.
36 CHAPTER 2. STOCHASTIC TECHNIQUES
D/ Advanced Skills.

Theorem 13. If every eigenvalue of a matrix P yields linearly independent


left eigenvectors in number equal to its multuiplicity, then
* there exists a nonsingular matrix M whose rows are left eigenvectors of
P, such that
* D = MPM
1
is a diagonal matrix with diagonal elements are the
eigenvalues of P, repeated according to multiplicity.
Apply this for a practical problem in Business Intelligence through a case
study in mobile phone industry in VN. Due to a most recent survey, there
are four big mobile producers/sellers N, S, M and L, and their market
distributions in 2007 is given by the stochastic matrix:
P =

N M L S

N 1 0 0 0
M 0.4 0 0.6 0
L 0.2 0 0.1 0.7
S 0 0 0 1

Is P regular? ergodic? Find the long term distribution matrix


L = lim
m
P
m
. What is your conclusion? (Remark that the state N and
S are called absorpting states).
Chapter 3
Simulation
This section is aimed at providing a brief introduction to simulation
methods and tools within Industrial Statistics, Computational Mathematics
and Operations Research.
3.1 Introductory Simulation
Practical Motivation. An organisation has realised that a system is not
operating as desired, it will look for ways to improve its performance. To do
so, sometimes it is possible to experiment with the real system and, through
observation and the aid of Statistics, reach valid conclusions towards future
system improvement. However, experiments with a real system may entail
ethical and/or economical problems, which may be avoided dealing with a
prototype, a physical model.
Sometimes, it is not feasible or possible, to build a prototype, yet we may
obtain a mathematical model describing, through equations and constraints,
the essential behaviour of the system. This analysis may be done,
sometimes, through analytical or numerical methods, but the model may be
too complex to be dealt with. Statistically, in the design phase of a system,
there is no system available, we can not rely on measurements for
generating a pdf. In such extreme cases, we may use simulation. Large
complex system simulation has become common practice in many industrial
areas. Essentially, simulation consists of
(i) building a computer model that describes the behaviour of a
system; and
(ii) experimenting with this model to reach conclusions that support
decisions.
37
38 CHAPTER 3. SIMULATION
Once we have a computer siumulation model of the actual system, we need
to generate values for the random quantities that are part of the system
input (to the model).
Note that: Besides Simulation, two other key methods used to solve
practical problems in OR are Linear Programming, and Statistical Methods.
In this chapter, from the Statistical point of view, we introduce key
concepts, methods and tools from simulation with the Industrial
Statistics orientation in mind. The major parts of this section are from [8]
and [28] We mainly consider the problem within Step (ii) only. To conduct
Step (i) rightly and meaningfully, a close collaboration with experts in
specic areas is vital. Topics discussing Step (ii) are shown in the other
chapters. We learn
1. How to generate random numbers?
2. How to transformation random numbers into input data?
3. How to measure/record output data?
4. How to analyze and intepret output data and making meaningful
inferences?
3.2 Generation of random numbers
General concepts. Ref 3.1, [8] and [28].
The most basic computational component in simulation involves the
generation of random variables distributed uniformly between 0 and 1.
These then can be used to generate other random variables, both discrete
and continuous depending on practical contexts. Few major requirements
for meaningfully reasonable/reliable simulation:
the simulation is run long enough to obtain an estimate of the
operating characteristics of the system
the number of runs also should be large enough to obtain reliable
estimate
the result of each run is a random sample implies that a simulation is
a statistical experiment, that must be conducted using statistical
tools such as: i) point estimation, ii) condent intervals and iii)
hypothesis testing.
3.3. TRANSFORMATION RANDOM NUMBERS INTO INPUT DATA39
A schematic diagram to mathematically simulate a system. If a system S is
described by a discrete random variable X, a fundamental diagram to
simulate S is:
A random number generator G uniform random variable U pdf or cdf of X.
3.3 Transformation random numbers into input
data
Ref 3.2, [8]
Now some advanced simulation techniques.
Practicality: we use G to randomly compute specic value of X. in the last
two phases of this diagram. Using the so-called discrete inverse transform
method, in which we write the cdf of X by F(k) =

k
i=0
p(i) [0, 1], then:
- generate a uniform random number U [0, 1] by G,
- nd the value of X = k by determining the interval [F(k 1), F(k)]
consisting of U, mathematically this means to nd the preimage F
1
(U).
The Transformation Method
Generally, we need an algorithm, named Transformation Method, described
in two steps:
Step 1 use an algorithm A to generates variates V
n
, n = 1, 2, ... of a r.v. V
(V = U in the above example) with specic cdf F
V
(v) for continuous
case or pdf f
V
(v) for discrete case. Then
Step 2 employ an approriate transformation g(.) to generate a variate of
X, namely X
n
= g(V
n
).
Theorem 14 (Relationship of V and X). Consider a r.v. V with pdf f
V
(v)
and given transformation X = g(V ). Denote by v
1
, v
2
, , v
n
the real roots
of the equation
x g(v) = 0 then the pdf of the r.v. X is given by (3.1)
f
X
(x) =
n

l=1
f
V
(v
l
)
1
[
dg
dv
(v
l
)[
.
Given x, if Equ. 3.1 has no real solutions then the odf f
X
(x) = 0
Proof. DIY
40 CHAPTER 3. SIMULATION
Two most important uses of the Transformation Method are:
A) Linear (ane when b = 0) case: X = g(V ) = aV +b where a, b R.
Then
f
X
(x) =
1
[a[
f
V
(
x b
a
).
B) Inverse case X = g(V ) = F
1
X
(V ) where F
X
(x) is the cdf of the random
variable X.
Theorem 15 (Inverse case). Consider a r.v. V with uniform cdf
F
V
(v) = v, v [0, 1]. Then the transformation X = g(V ) = F
1
X
(V ) gives
variates x of X with cdf F
X
(x).
Proof. For any real number a, and due to the monotonicity of the cdf
function F
X
, so
P(X a) = P[F
1
X
(V ) a] = P[V F
X
(a)] = F
V
(F
X
(a)) = F
X
(a).
Use this, an algorithm is formulated for generating variates of a r.v. X.
1. Invert the given cdf F
X
(x) to nd its inverse F
1
X
2. Generate a uniform variate V [0, 1]
3. Generate variates x via the transformation X = F
1
X
(V ).
Example 9. Consider a Bernoulli r.v. X B(p) where p = P(X = 1). In
addition, the cdf F
X
(x) = P(X x) = V is a step (stair-case) function u(.).
[That is, u(t) = b
i
if a
i
t < a
i+1
, where (a
i
)
i
is an ascending sequence.]
Here
F
X
(x) = 0 if x < 0, F
X
(x) = p if 0 x < 1, and F
X
(x) = p + (1 p) = 1 if
1 x.
How to generate X? We employ V UniDist([0.1]), and the fact that the
inverse
F
1
X
(V ) = u(V (1 p)).
Example 10. Consider a binomial r.v. X BinomDist(n, p) where
p = P(X = 1). X takes values in X = 0, 1, ..., n and the distribution is
given by a probability function
p(k) = P(X = k) =

n
k

p
k
(1 p)
nk
.
3.4. MEASUREMENT OF OUTPUT DATA 41
We employ the fact that V UniDist([0.1]), and use
F
X
(x) = P(X x) = V x = F
1
X
(V ) = u(V ),
in which the parameters of the step function u(V ) are given by:
u(V ) = k if

i=0..k1
p(i) < V

i=0..k
p(i), k 1, ..., n; u(V ) = 0 if
V < 0.
How is this done? Simply split the vertical interval [0, 1] into n + 1
subintervals, with the length of the kth subinterval equal to
p(k) = P(X = k), k 0, 1, ..., n.

Example 11 (Implementation in Maple). If a system S is described by a


binomial random variable X describing an arrival process of tankers at a
sea-port with parameters (n, p) = (5, 0.7), then its mean is = np = 3.5. A
binomial random variate x for the number of tankers arriving can be
generated using the Maple function
x = random[binomiald[n, p]][1].

Key steps to obtain reliable simulation result


1. determine a proper simulation run length T, i.e. T times of generating
uniform random number U [0, 1], with the same probability
distribution each time
2. determine a proper number of runs R
3. design good random number generators G
Determine R.
Key steps in designing good random number generators G.
Good random number generators mainly are based on integer arithmetic
with modulo operations. A good one could be V
i
= aV
i1
mod 2
32
for
computer with 32-bit CPUs.
3.4 Measurement of output data
Ref 3.3, [8]
The third step in a simulation process consists of passing the inputs
through the simulation model to obtain outputs to be analysed later. We
42 CHAPTER 3. SIMULATION
shall consider the two main application areas in Industrial Statistics: Monte
Carlo simulation and discrete event simulation.
Discrete event simulation models Discrete event simulation (DES)
deals with systems whose state changes at discrete times, not continuously.
These methods were initiated in the late 50s; for example, the rst
DES-specic language, GSP, was developed at General Electric by Tocher
and Owen to study manufacturing problems.
To study such systems, we build a discrete event model. Its evolution in
time implies changes in the attributes of one of its entities, or model
components, and it takes place in a given instant. Such change is called
event. The time between two instants is an interval. A process describes the
sequence of states of an entity throughout its life in the system.
There are several strategies to describe such evolution, which depend on the
mechanism that regulates time evolution within the system.
When such evolution is based on time increments of the same
duration, we talk about synchronous simulation.
When the evolution is based on intervals, we talk about asynchronous
simulation.
Generation of values of a Markov Chain
Discrete Time Markov Chain (DTMC). We consider a homogeneous
DTMC described by a transition matrix P. How do we generate sample
paths of states X
n
? (574-575 Viniotis)
Briey, we have learned that
Denition 16. A (homogeneous) Markov chain M is a triple (Q, p, A) in
which:
Q is a nite set of states (be identied with an alphabet ),
p are initial probabilities, (at time point t = 0)
P are state transition probabilities, denoted by a matrix P = [p
ij
] in
which
p
ij
= P(X
n+1
= j[X
n
= i)
.
And such that the memoryless property is satised,ie.,
P(X
n+1
= j[X
n
= i, , X
0
= a) = P(X
n+1
= j[X
n
= i), for all n.
3.4. MEASUREMENT OF OUTPUT DATA 43
Independent of time property- Homogeneous Markov chains. If the state
transition probabilities p
ij
(n + 1) in a Markov chain M is independent of
time n, they are said to be stationary, time homogeneous or just
homogeneous. The state transition probability in homogeneous chain then
can be written without mention time point n:
p
ij
= P(X
n+1
= j[X
n
= i). (3.2)
Unless stated otherwise, we assume and will work with homogeneous
Markov chains M. The one-step transition probabilities given by 3.2 of
these Markov chains must satisfy:
s

j=1
p
ij
= 1; for each i = 1, 2, , s and p
ij
0.
Transition Probability Matrix. In practical applications, we are likely given
the initial distribution (i.e. the probability distribution of starting position
of the concerned object at time point 0), and the transition probabilities;
and we want to determine the the probability distribution of position X
n
for any time point n > 0. The Markov property, quantitatively described
through transition probabilities, can be represented conveniently in the
so-called state transition matrix P = [p
ij
]:
P =

p
11
p
12
p
13
. . . .p
1s
.
p
21
p
22
p
23
. . . p
2s
.
p
31
p
32
p
33
. . . p
3s
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

(3.3)
Denition 17. Vector p

is called a stationary distribution of a Markov


chain X
n
, n 0 with the state transition matrix P if:
p

P = p

.
Question: how to nd a stationary distribution of a Markov chain?
Consider a homogeneous DTMC X
n
described by the transition matrix
P = [p
ij
]. How do we generate sample paths of X
n
. Two issues involved
here:
a) Only steady state results are of interest
b) Transient results are of interest as well.
In a), we want to generate values for a single stationary random variable p

that describes the steady-state behavior of the MC. p

is one-dimensional
pdf the algorithm after Theorem 15 suces
44 CHAPTER 3. SIMULATION
Instances of synchronous and asynchronous simulation We illustrate
both strategies describing how to sample from a Markov chain with state
space S and transition matrix
P = (p
ij
), with p
ij
= P(X(n + 1) = j[X(n) = i).
The obvious way to simulate the (n + 1)-th transition, given X(n), is
Generate X(n + 1) p
x(n)j
: j S.
This synchronous approach has the potential shortcoming that
X(n) = X(n + 1), with the corresponding computational eort lost.
Alternatively, we may simulate T
n
, the time until the next change of state
and, then, sample the new state X(n +T
n
). If X(n) = s, T
n
follows a
geometric distribution GeomDist(p
ss
) of parameter p
ss
and X(n +T
n
) will
have a discrete distribution with mass function
p
sj
(1pss)
: j S ` s.
Should we wish to sample N transitions of the chain, assuming X(0) = i
0
,
we do
Do t = 0, X(0) = i
0
While t < N
Sample h GeomDist(p
x(t)x(t)
)
Sample X(t +h)
p
x(t)j
(1p
x(t)x(t)
)
: j S ` x(t)
Do t = t +h
Two key strategies for asynchronous simulation.
One is that of event scheduling. The simulation time advances until the
next event and the corresponding activities are executed. If we have k types
of events (1, 2, . . . , k) , we maintain a list of events, ordered according to
their execution times (t
1
, t
2
, . . . , t
k
) . A routine R
i
associated with the i-th
type of event is started at time
i
= min(t
1
, t
2
, . . . , t
k
).
An alternative strategy is that of process interaction. A process represents
an entity and the set of actions that experiments throughout its life within
the model. The system behaviour may be described as a set of processes
that interact, for example, competing for limited resources. A list of
processes is maintained, ordered according to the occurrence of the next
event. Processes may be interrupted, having their routines multiple entry
points, designated reactivation points.
Each execution of the program will correspond to a replication, which
corresponds to simulating the system behaviour for a long enough period of
3.5. ANALYZINGOF OUTPUT- MAKINGMEANINGFUL INFERENCES45
time, providing average performance measures, say X(n), after n customers
have been processed. If the system is stable,
X(n) X.
If, e.g., processing 1000 jobs is considered long enough, we associate with
each replication j of the experiment the output X
j
(1000). After several
replications, we would analyse the results as described in the next section.
3.5 Analyzing of output- Making meaningful
inferences
Ref 3.4, [8] and [21], Section 5.
3.6 Simulation languages
Use JMT system or OpenModelica.
3.7 Research 1: Simulation of Queueing systems
with multiclass customers
Classical queueing models have been extensively studied from the 60s
during the emerging of internet. One of the pioneers of the eld is Leonard
Kleinrock at UCLA. In fact, queueing models are applied not only in
networks and systems of computers but also in any service system of an
economy that posseses resource allocation and/or sharing. In Europe, the
project called Euro-NGI (European Network of Exellence Project on
Design and Enginnering of the Next-Generation Internet) has been created
just a few years.
We restricted ourselve to studying and simulating basic queueing systems
such as M/M/1, M/M/1/K and M/G/1 systems. Now, how to improve the
work in [8]?

Part III: Practical Applications of MMS.


46 CHAPTER 3. SIMULATION
Chapter 4
Probabilistic Modeling
This is a seminar-based chapter. Main references are:
1/ Chapter 6 of Simulation, A Modelers Approach by Jame Thomson,
Wiley, 2000
2/ Chapters 1-4 of Mathematical Modeling and Computer
Simulation by Daniel Maki and Maynard Thompson, Thomson
Brookscole, 2006
3/ Chapter 4 of Risk and Financial Management- Mathematical
Methods by C. S. Tapiero, Wiley, 2004
4.1 Markovian Models
4.1.1 Exponential distribution
An exponential random variable T with parameter has the density
distribution
f(t) = e
t
, t 0,
the cumulative distribution and tail (survivor) distribution respectively are
P(T t) =

t
0
f(x) dx; and P(T > t) =

+
t
f(x) dx = e
t
.
Memory-less property of exponential distributions
For exponential random variables T,
P(T > s +t) = P(T > t)P(T > s).
The Erlang random variable If T
1
, T
2
are two independent and
identically distributed (i.i.d.) exponential random variables, what would be
the distribution of S
2
= T
1
+T
2
?
47
48 CHAPTER 4. PROBABILISTIC MODELING
4.1.2 Poisson process
Suppose an experiment begins at time point t = 0 and whose ith event
occurs at time point, a random variable T
i
0 named point of occurrence,
for i = 1, 2, . Let Z
n
= T
n
T
n1
denote the interarrival time period. If
the Z
n
s are i.i.d. then Z
n
, n 1 is called a recurrent/renewal process.
T
n
, n 0 itself is called an arrival process.
Counting process N(t)
If we now view time t is continuous, a random process N(t), t 0 is said
to be a counting process if N(t) counts the number of events that have
occurred in the interval (0, t). Obviously
1. N(t) N
2. N(s) N(t) if s t
3. N(t) N(s) = the number of events that have occurred in the
interval (s, t).
Poisson process N(t), t 0 is a special type of Counting process.
A counting process N(t), t 0 is said to be a Poisson process with rate
> 0 if
Remark: the Poisson distribution is the limit case of binomial distribution.
Interarrival times of the Poisson process
Nonhomogeneous Poisson process
Compound Poisson process
4.2 Bayesian Modeling in Probabilistic Nets
Chapter 5
Statistical Modeling in
Quality Engineering
5.1 Introduction to Statistical Modeling (SM)
This chapter is planned for persons interested in the design, conduct and
analysis of experiments in the physical, chemical, biological, medical, social,
psychological, economic, engineering or industrial sciences. The chapter will
examine how to design experiments, carry them out, and analyze the data
they yield. Our major aims are:
1/ provide an introduction to descriptive and inferential statistical concepts
and methods. Topics include grouping of data, measures of central tendency
and dispersion, probability concepts and distributions, sampling, statistical
estimation, and statistical hypothesis testing.
2/ introduce a specic problem in Statistical Quality Control: Design of
Experiments (DOE)
Why Statistics
[See [27] for more information.]
Statistical methods are applied in an enormous diversity of problems in
such elds as:
2
Agriculture (which varieties grow best?)
2
Genetics, Biology (selecting new varieties, species)
2
Economics (how are the living standards changing?)
2
Market Research (comparison of advertising campaigns)
2
Education (what is the best way to teach small children reading?)
2
Environmental Studies (do strong electric or magnetic elds induce higher
cancer rates?)
2
Meteorology (is global warming a reality?)
2
Medicine (which drug is best?)
49
50CHAPTER 5. STATISTICAL MODELINGINQUALITYENGINEERING
2
Psychology (how are shyness and loneliness related?)
2
Social Science (comparison of peoples reaction to di R (erent stimuli)
Basic terms
1. A population is the collection of items under discussion. It may be
nite or innite; it may be real or hypothetical. A sample is a subset
of a population. The sample should be chosen to be representative of
the population because we usually want to draw conclusions or
inferences about the population based on the sample.
2. An appropriate statistical model for our data will often be of the
form
Observed data = f(x; ) +error;
where x are variables we have measured and are parameters of our
model.
3. Variable. A property or characteristic on which information is
obtained in an experiment. There are two major kinds of variables:
a. Quantitative Variables (measurements and counts)
continuous (such as heights, weights, temperatures); their values
are often real numbers; there are few repeated values;
discrete (counts, such as numbers of faulty parts, numbers of
telephone calls etc); their values are usually integers; there may
be many repeated values.
b. Qualitative Variables (factors, class variables); these variables
classify objects into groups.
categorical (such as methods of transport to College); there is no
sense of order;
ordinal (such as income classied as high, medium or low); there
is natural order for the values of the variable.
4. Observation. The collection of information in an experiment, or
actuial values obtained on variables in an experiment. Response
variables are outcomes or observed values of an experiment.
5. Parameters and Statistics. A parameter is a numeric
characteristic of a population or a process. A statistic is numerical
characteristic that is computed from a sample of observations.
5.1. INTRODUCTION TO STATISTICAL MODELING (SM) 51
6. Distribution. A tabular, graphical or theoretical description of the
values of a variable using some measure of how frequently they occur
in a population, a process or a sample.
7. Parametric methods versus non-parametric methods. A
method for making statistical inferences that assumes that samples
come from a known family of distributions. For example, the method
of analysis of variance assumes that samples are drawn from normal
distributions. Non-parametric methods allow making statistical
inferences from samples that does not assume the sample to come
from any underlying family of distributions and make no assumptions
about any population parameters.
8. Mathematical models and Statistical models. A model is
termed mathematical if it is derived from theoretical considerations
that represent exact, error-free assumed relationships among the
variables. A model is termed statistical if it is derived from data that
are subject to various types of specications, observations,
experimental, and/or measurement errors.
9. Regression analysis is used to model relationships between random
variables, determine the magnitude of the relationships between
variables. Some are independent variables or the predictors, also
called explanatory variables, control variables, or regressors, usually
named X
1
, . . . , X
d
. The others are response variables, also called
dependent variables, explained variables, predicted variables, or
regressands, usually named Y . If there is more than one response
variable, we speak of multivariate regression.
Brief aims of designing experiments
Various (statistical) designs are discussed and their respective dierences,
advantages, and disadvantages are noted. In particular, factorial and
fractional factorial designs are discussed in detail. These are designs in
which two or more factors are varied simultaneously; the experimenter
wishes to study not only the eect of each factor, but also how the eect of
one factor changes as the other factors change. The latter is generally
referred to as an interaction among factors. Generally, designing
experiments helps us do the followings
perform experiments to evaluate the eects the factors have on the
characteristics of interest, and also discover possible relationship among the
factors (which could aect the characteristics). The goal is to use these new
understanding to improve product.
answers to questions such as:
1. What are the key factors in a process?
52CHAPTER 5. STATISTICAL MODELINGINQUALITYENGINEERING
2. At what settings would the process deliver acceptable performance?
3. What are the key, main and interaction eects in the process?
4. What settings would bring about less variation in the output?
Important steps in designing experiments
Several critical steps should be followed to achieve our goals:
1. State objective: write a mission statement for the experiment or
project;
2. Choose response: it is about consultation, have to ask clients what
they want know, or ask yourself; pay attention to the
nominal-the-best responses;
3. Perform pre-experiment data analysis?
4. Choose factors and levels: you have to use owchart to represent the
process or system, use cause-eect diagram to list the potential factors
that may impact the response;
5. Select experimental plan
6. Perform the experiment
7. Analyze the data
8. Draw conclusions and make recommendations.
5.2 DOE in Statistical Quality Control
History. The DOEs history goes back to the 1930s of the 20-th century,
when Sir R. A. Fisher in England used Latin squares to randomize the plant
varieties before planning at his farm, among other activities. The goal was
to get high productivity havests. The mathematical theory of combinatorial
designs was developed by R.C. Bose in the 1950s in India and then in the
US. Nowadays, DOE is extensively studied and employed in virtually every
human beings activities, and the mathematics for DOE is very rich.
The term Algebraic Statistics was coined by Pistone, Riccomagno and
Wynn in 2000. Motivated by problems in Design of Experiments, such as
computing fractional factorial design, they developed the systematic use of
Groebner basis methods for problems in discrete probability and statistics.
In this lecture, the fractional factorial design has been chosen for detailed
study in view of its considerable record of success over the last thirty years.
5.3. HOW TO MEASURE FACTOR INTERACTIONS? 53
It has been found to allow cost reduction, increase eciency of
experimentation, and often reveal the essential nature of a process.
What is an Experiment Design? Fix n nite subsets D
1
; . . . ; D
n
of the set
of real numbers R. Their cartesian product D = D
1
D
n
is a fnite
subset of R
n
. In statistics, the set D is called a full factorial design. A basic
aim in our studying is using full factorial designs or their subsets to nd a
regression model describing the relationship between factors.
An example of special interest is the case when D
i
= 0; 1 for all i. In that
case, D consists of the 2
n
vertices of the standard n-cube and is referred to
as a full Boolean design. For instance, consider a full factorial design 2
3
with three binary factors: the factor x
1
of mixture ratio, the factor x
2
of
temperature, the factor x
3
of experiment time period and the response y
of wood toughness. The levels of factors are given in the following table:
Factor Low (0) High (1)
Mix(ture) Ratio 45p 55p
Temp(erature) 1000C 1500C
Time period 30m 90m
Table 5.1: Factor levels of 2
3
factorial experiment
5.3 How to measure factor interactions?
This is very complicated topic! See more at [7].
5.4 What should we do to bring experiments into
daily life?
There is a few ways to do that but we have to employ Data Analysis
techniques in a great deal. We illustrate the way we do by going through a
particular instance, e.g. a forward-looking application in wood
industry. Then see the next section for the data analyzing.
Description
A household furnishture production project requires studying product
toughness using 8 factors. Steps are
Select experimental plan
54CHAPTER 5. STATISTICAL MODELINGINQUALITYENGINEERING
RUN Mix Ratio Temp Time Yield
1 45p (-) 100C (-) 30m (-) 8
2 55p (+) 100C (-) 30m (-) 9
3 45p (-) 150C (+) 30m (-) 34
4 55p (+) 150C (+) 30m (-) 52
5 45p (-) 100C (-) 90m (+) 16
6 55p (+) 100C (-) 90m (+) 22
7 45p (-) 150C (+) 90m (+) 45
8 55p (+) 150C (+) 90m (+) 56
Table 5.2: Results of an example 2
3
Full Factorial Experiment
Choose factors and levels
State objective
Conducting experiments
Data analysis
Draw conclusions and make recommendations
Select experimental plan
We employ a strength 3 fractional factorial design, also called strength 3
mixed-levels Orthogonal Array (OA) that has 96 runs and is able to
accomodate studying up to eight factors. This array is denoted by
OA(96; 6 4
2
2
5
; 3), its factors and their levels are described in Table 5.3.
The factor description of a workable design. The full factorial design
of the eight factors described above is the Cartesian product
0, 1, . . . , 5 0, 1, . . . , 3
2
0, 1
5
.
Using the full design, we are able to estimate all interactions, but
performing all 3072 runs exceeds the rms budget. Instead we use a
fractional factorial design, that is, a subset of elements in the full factorial
design.
Our aim choose a fractional design that has rather small runsize but still
allows us to estimate the main eects and some of the two-interactions.
A workable solution is the 96 run experimental design presented in Table
5.4. This allows us to estimate the main eect of each factor and some of
their pairwise interactions.
5.4. WHAT SHOULDWE DOTOBRINGEXPERIMENTS INTODAILYLIFE?55
Table 5.3: Eight factors, the number of levels and the level meanings
Level
Factor Description # 0 1 2 3 4 5
1 (A) wood 6 pine oak birch chestnut poplar walnut
2 (B) glue 4 a (less b c d (most
adhesive) adhesive)
3 (C) moisture content 4 10% 20% 30% 40%
4 (D) processing time 2 1 h(our) 2h
5 (E) pretreatment 2 no yes
6 (F) indenting of
wood samples 2 no yes
7 (G) pressure 2 1 pas(cal) 10 pas
8 (H) hardening
conditions 2 no yes
The construction of new factors given the run size of an OA of strength 2
and 3 (ie., extending factors while xing the number of experiments and the
strength) by a combined approach is detailed in Chapters 3 and 4 of [3].
Remark 8.
1. If we want to measure simultaneously all eects up to 2-interactions of
the above 8 factors, an ? run fractional design would be needed.
2. Constructing a ? run design is possible, and could be found with
trial-and-error algorithms. But it lacks some attractive features such
as balance, which will be discussed below.
3. The responses Y have been computed by simulation, not by conducting
actual experiments.
56CHAPTER 5. STATISTICAL MODELINGINQUALITYENGINEERING
run A B C D E F G H Y
wood glue moisture processing pre- indenting pressure hardening yield
type content time treatment of wood conditions
samples
6 4 4 2 2 2 2 2
1 0 0 0 0 0 0 0 0
2 0 0 1 1 1 0 1 1
3 0 0 2 1 0 1 1 0
4 0 0 3 0 1 1 0 1
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
81 5 0 0 1 1 1 0 0
82 5 0 1 0 0 1 1 1
83 5 0 2 0 1 0 1 0
84 5 0 3 1 0 0 0 1
85 5 1 0 0 1 0 1 1
86 5 1 1 1 1 1 0 0
87 5 1 2 1 0 0 0 1
88 5 1 3 0 0 1 1 0
89 5 2 0 0 0 1 0 1
90 5 2 1 1 0 0 1 0
91 5 2 2 1 1 1 1 1
92 5 2 3 0 1 0 0 0
93 5 3 0 1 0 0 1 0
94 5 3 1 0 1 0 0 1
95 5 3 2 0 0 1 0 0
96 5 3 3 1 1 1 1 1
Table 5.4: A mixed orthogonal design with 3 distinct sections
The 96 runs balanced factorial design.
Chapter 6
New directions and
Conclusion
This is a seminar-based chapter. Topics could be
6.1 Black-Scholes model in Finance
See Rubenstein.
6.2 Drug Resistance and Design of Anti-HIV drug
See Richard Bellman.
6.3 Epidemic Modeling
See O. Diekmann.
6.4 Conclusion
57
58 CHAPTER 6. NEW DIRECTIONS AND CONCLUSION
Chapter 7
Appendices
7.1 Appendix A: Theory of stochastic matrix for
MC
A stochastic matrix is a matrix for which each column sum equals one. If
the row sums also equal one, the matrix is called doubly stochastic. Hence
the transition probability matrix P = [p
ij
] is a stochastic matrix.
Proposition 18. Every stochastic matrix K has
1 as an eigenvalue (possibly with multiple), and
none of the eigenvalues exceeds 1 in absolute value, that is all
eigenvalues
i
satisfy [
i
[ 1.
Proof. DIY
Fact 9. If K is a stochastic matrix then K
m
is a stochastic matrix.
Proof. Let e = [1, 1, , 1]
t
the all-one vector, then use the fact that
Ke = e. Prove that K
m
e = e.
Let A = [a
ij
] > 0 denote that every element a
ij
of A satises the condition
a
ij
> 0.
Denition 19.
A stochastic matrix P = [p
ij
] is ergodic if lim
m
P
m
= L (say)
exists, that is each p
(m)
ij
has a limit when m .
59
60 CHAPTER 7. APPENDICES
A stochastic matrix P is regular if there exists a natural m such that
P
m
> 0. In our context, a Markov chain, with transition probability
matrix P, is called regular if there exists an m > 0 such that P
m
> 0,
i.e. there is a nite positive integer m such that after m time-steps,
every state has a nonzero chance of being occupied, no matter what
the initial state.
Example 12. Is the matrix
P =

0.88 0.12
0.15 0.85

regular? ergodic? Calculate the limit matrix L = lim


m
P
m
.
The limit matrix L = lim
m
P
m
practically shows the longterm behaviour
(distribution, property) of the process. How to know the existence L (i.e.
the ergodicity of transition matrix P = [p
ij
])?
Theorem 20. A stochastic matrix P = [p
ij
] is ergodic if and only if
* the only eigenvalue of modul (magnitude) 1 is 1 itself, and
* if = 1 has multiplicity k, there exist k independent eigenvectors
associated with this eigenvalue.
For a regular homogeneous Markov chain we have the following theorem
Theorem 21 (Regularity of stochastic matrix). If a stochastic matrix
P = [p
ij
] is regular then
1. 1 is an eigenvalue of multiplicity one, and all other eigenvalues
i
satisfy [
i
[ < 1;
2. P is ergodic, that is lim
m
P
m
= L exists. Furthermore, Ls rows
are identical and equal to the stationary distribution p

.
Proof. If (1) is proved then, by Theorem 20, P = [p
ij
] is ergodic. Hence,
when P = [p
ij
] is regular, the limit matrix L = lim
m
P
m
does exist. By
the Spectral Decomposition (7.1),
P = E
1
+
2
E
2
+ +
k
E
k
, where all [
i
[ < 1, i = 2, , k.
Then, by (7.2) L = lim
m
P
m
= lim
m
(E
1
+
m
2
E
2
+ +
m
k
E
k
) = E
1
.
Let vector p

be the unique left eigenvector associating with the biggest


eigenvalue
1
= 1 (which is simple eigenvalue since it has multiplicity one),
that is p

P = p

(P 1I) = 0, (p

is called a stationary distribution


of MC). Your nal task is proving that Ls rows are identical and equal to
the stationary distribution p

i.e.: L = [p

, , p

.
7.2. APPENDIXB: SPECTRAL THEOREMFOR DIAGONALIZABLE MATRICES61
Corollary 22. Few important remarks are: (a) for regular MC, the
long-term behavior does not depend on the initial state distribution
probabilities p(0); (b) in general, the limiting distributions are inuenced by
the initial distributions p(0), whenever the stochastic matrix P = [p
ij
] is
ergodic but not regular. (See more at problem D).
Example 13. Consider a Markov chain with two states and transition
probability matrix

3/4 1/4
1/2 1/2

(a) Find the stationary distribution p of the chain.


(b) Find lim
n
P
n
by rst evaluating P
n
.
(c) Find lim
n
P
n
.
7.2 Appendix B: Spectral Theorem for
Diagonalizable Matrices
Consider a square matrix P of order s with spectrum
(P) =
1
,
2
, ,
k
consisting of its eigenvalues. Few basic facts are:
If (
1
, x
1
), (
2
, x
2
), , (
k
, x
k
) are eigenpairs for P, then
S = x
1
, , x
k
is a linearly independent set. If B
i
is a basis for the
null space N(P
i
I), then B = B
1
B
2
B
k
is a linearly
independent set
P is diagonalizable if and only if P possesses a complete set of
eigenvectors (i.e. a set of s linearly independent vectors). Moreover,
H
1
PH = D = Diagmat(
1
,
2
, ,
s
) if and only if the columns of
H constitute a complete set of eigenvectors and the
j
s are the
associated eigenvalues- i.e., each (
j
, H[, j]) is an eigenpair for P.
Spectral Theorem for Diagonalizable Matrices. A square matrix P of
order s with spectrum (P) =
1
,
2
, ,
k
consisting of eigenvalues is
diagonalizable if and only if there exist constituent matrices
E
1
, E
2
, , E
k
(called the spectral set) such that
P =
1
E
1
+
2
E
2
+ +
k
E
k
, (7.1)
where the E
i
s have the following properties:
E
i
E
j
= 0 whenever i = j, and E
2
i
= E
i
for all i = 1..k
62 CHAPTER 7. APPENDICES
E
1
+E
2
+ +E
k
= I
In practice we employ Fact 7.1 in two ways:
Way 1: if we know the decomposition 7.1 explicitly, then we can compute
powers
P
m
=
m
1
E
1
+
m
2
E
2
+ +
m
k
E
k
, for any integer m > 0. (7.2)
Way 2: if we know P is diagonalizable then we nd the constituent
matrices E
i
by:
* nding the nonsingular matrix H = (x
1
[x
2
[ [x
k
), where each x
i
is a
basis left eigenvector of the null subspace
N(P
i
I) = v : (P
i
I)(v) = 0 Pv =
i
v;
** then, P = HDH
1
= (x
1
[x
2
[ [x
k
) D H
1
where
D = diag(
1
, ,
k
) the diagonal matrix, and
H
1
= K

y
t
1
y
t
2
.
.
.
y
t
k

; (i.e.K = (y
1
[y
2
[ [y
k
)).
Here each y
i
is a basis right eigenvector of the null subspace
N(P
i
I) = v : v

P =
i
v

.
The constituent matrices E
i
= x
i
y
t
i
.
Example 14. Diagonalize the following matrix and provide its spectral
decomposition.
P =

1 4 4
8 11 8
8 8 5

.
The characteristic equation is p() = det(P I) =
3
+ 5
2
+ 3 9 = 0.
So = 1 is a simple eigenvalue, and = 3 is repeated twice (its algebraic
multiplicity is 2). Any set of vectors x satisfying
x N(P I) (P I)x = 0 can be taken as a basis of the eigenspaces
(null space) N(P I). Bases of for the eigenspaces are:
N(P 1I) = span

[1, 2, 2]

; and N(P + 3I) = span

[1, 1, 0]

, [1, 0, 1]

.
Easy to check that these three eigenvectors x
i
form a linearly independent
set, then P is diagonalizable. The nonsingular matrix (also called similarity
transformation matrix)
7.2. APPENDIXB: SPECTRAL THEOREMFOR DIAGONALIZABLE MATRICES63
H = (x
1
[x
2
[x
3
) =

1 1 1
2 1 0
2 0 1

;
will diagonalize P, and since P = HDH
1
we have
H
1
PH = D = Diagmat(
1
,
2
,
2
) = Diagmat(1, 3, 3) =

1 0 0
0 3 0
0 0 3

Here, H
1
=

1 1 1
2 3 2
2 2 1

implies that
y
t
1
= [1, 1, 1], y
t
2
= [2, 3, 2], y
t
3
= [2, 2, 1]. Therefore, the constituent
matrices
E
1
= x
1
y
t
1
=

1 1 1
2 2 2
2 2 2

; E
2
= x
2
y
t
2
=

2 3 2
2 3 2
0 0 0

; E
3
= x
3
y
t
3
=

2 2 1
0 0 0
2 2 1

.
Obviously,
P =
1
E
1
+
2
E
2
+
3
E
3
=

1 4 4
8 11 8
8 8 5

.
64 CHAPTER 7. APPENDICES
Bibliography
[1] Arjeh M. Cohen, Computer algebra in industry: Problem Solving in
Practice, Wiley, 1993
[2] Nguyen, V. M. Man and the DAG group at Eindhoven University of
Technology, www.mathdox.org/nguyen, 2005,
[3] Nguyen, V. M. Man Computer-Algebraic Methods for the Construction
of Designs of Experiments, Ph.D. thesis, 2005, Technische Universiteit
Eindhoven, www.mathdox.org/nguyen
[4] Nguyen, Van Minh Man, Depart. of Computer Science, Faculty of CSE,
HCMUT, Vietnam, www.cse.hcmut.edu.vn/ mnguyen
[5] Brouwer E. Andries, Cohen M. Arjeh and Nguyen, V. M. Man,
Orthogonal arrays of strength 3 and small run sizes,
www.cse.hcmut.edu.vn/ mnguyen/OrthogonalArray-strength3.pdf,
Journal of Statistical Planning and Inference, 136 (2007)
[6] Nguyen, V. M. Man, Constructions of strength 3 mixed orthogonal
arrays,
www.cse.hcmut.edu.vn/ mnguyen/Specific-Constructions-OAs.pdf,
Journal of Statistical Planning and Inference 138- Jan 2008,
[7] Eric D. Schoen and Nguyen, V. M. Man, Enumeration and
Classication of Orthogonal Arrays, Faculty of Applied Economics,
University of Antwerp, Belgium (2007)
[8] Huynh, V. Linh and Nguyen, V. M. Man, Discrete Event Modeling in
Optimization for Project Management, B.E. thesis, HCMUT, 69 pages,
2008.
[9] T. Beth, D. Jung Nickel and H. Lenz. Design Theory vol II, pp 880,
Encyclopedia of Mathematics and Its Applications, Cambridge
University Press (1999)
65
66 BIBLIOGRAPHY
[10] Glonek G.F.V. and Solomon P.J., Factorial and time course designs for
cDNA microarray experiments, Biostatistics 5, 89-111, 2004
[11] N. J. A. Sloane, A Library of Orthogonal Arrays
http://www.research.att.com/ njas/oadir/index.html/,
[12] Warren Kuhfeld,
http://support.sas.com/techsup/technote/ts723.html/
[13] Hedayat, A. S. and Sloane, N. J. A. and Stufken, J., Orthogonal
Arrays, Springer-Verlag, 1999
[14] Madhav, S. P., iSixSigma LLC, Design Of Experiment For Software
Testing, isixsigma.com/library/content/c030106a.asp, 2004
[15] Bernd Sturmfels, Solving Polynomial Systems, AMS, 2002
[16] OpenModelica project, Sweden 2006,
www.ida.liu.se/ pelab/modelica/OpenModelica.html
[17] Computer Algebra System for polynomial computations, Germany
2006 www.singular.uni-kl.de/
[18] Sudhir Gupta. Balanced Factorial Designs for cDNA Microarray
Experiments, Communications in Statistics: Theory and
Methods, Volume 35, Number 8 (2006) , pp. 1469-1476
[19] Morris W. Hisch, Stephen Smale, Dierential Equations, Dynamical
Systems and Linear Algebra, 1980
[20] Jame Thomson, Simulation, A Modelers Approach, Wiley, 2000
[21] David Insua, Jesus Palomo, Simulation in Industrial Statistics, SAMSI,
2005
[22] Ruth J. Williams, Introduction to the Mathematics of Finance, AMS
vol 72, 2006
[23] C. S. Tapiero, Risk and Financial Management- Mathematical
Methods, Wiley, 2004
[24] A.K. Basu, Introduction to Stochastic Processes, Alpha Science 2005
[25] L. Kleinrock, Queueing Systems, vol 2, John Wiley & Sons, 1976
BIBLIOGRAPHY 67
[26] L. Kleinrock, Time-shared systems: A theoretical treatment, Journal of
the ACM 14 (2), 1967, 242-261.
[27] S. G. Gilmour, Fundamentals of Statistics I, Lecture Notes School of
Mathematical Sciences Queen Mary, University of London, 2006
[28] M. Parlar, Interactive Operations Research with Maple, Birkhouser
2000.
[29] Tim Holliday, Pistone, Riccomagno, Wynn, The Application of
Computational Algebraic Geometry to the Analysis of Design of
Experiments: A Case Study.
Copyright 2010 by
Lecturer Nguyen V. M. Man, Ph.D. in Statistics
Working area Algebraic Statistics, Experimental Designs,
Statistical Optimization and Operations Research
Institution University of Technology of HCMC
Address 268 Ly Thuong Kiet, Dist. 10, HCMC, Vietnam
Ehome: www.cse.hcmut.edu.vn/~mnguyen
Email: mnguyen@cse.hcmut.edu.vn
mannvm@uef.edu.vn

You might also like