Professional Documents
Culture Documents
and Simulation
Nguyen V.M. Man, Ph.D.
Applied Statistician
September 6, 2010
Contact: mnguyen@cse.hcmut.edu.vn
or mannvm@uef.edu.vn
ii
Contents
0.1 Mathematical modeling and simulation Why? . . . . . . . . . 6
0.2 Mathematical modeling and simulation How? . . . . . . . . . 6
0.3 Cautions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
0.4 Typical applications . . . . . . . . . . . . . . . . . . . . . . . 7
0.5 Computing Software . . . . . . . . . . . . . . . . . . . . . . . 7
1 Dynamic Systems 9
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2 Discrete Dynamic Systems- a case study . . . . . . . . . . . . 9
1.3 Continuous Dynamic Systems . . . . . . . . . . . . . . . . . . 14
2 Stochastic techniques 17
2.1 Generating functions . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Convolutions . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3 Compound distributions . . . . . . . . . . . . . . . . . . . . . 22
2.4 Introdductory Stochastic Processes . . . . . . . . . . . . . . . 24
2.5 Markov Chains (MC), a keytool in modeling random phenomena 26
2.6 Classication of States . . . . . . . . . . . . . . . . . . . . . . 30
2.7 Limiting probabilities and Stationary distribution of a MC . . 32
2.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3 Simulation 37
3.1 Introductory Simulation . . . . . . . . . . . . . . . . . . . . . 37
3.2 Generation of random numbers . . . . . . . . . . . . . . . . . 38
3.3 Transformation random numbers into input data . . . . . . . 39
3.4 Measurement of output data . . . . . . . . . . . . . . . . . . 41
3.5 Analyzing of output- Making meaningful inferences . . . . . . 45
3.6 Simulation languages . . . . . . . . . . . . . . . . . . . . . . . 45
3.7 Research 1: Simulation of Queueing systems with multiclass
customers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
iii
CONTENTS 1
4 Probabilistic Modeling 47
4.1 Markovian Models . . . . . . . . . . . . . . . . . . . . . . . . 47
4.1.1 Exponential distribution . . . . . . . . . . . . . . . . . 47
4.1.2 Poisson process . . . . . . . . . . . . . . . . . . . . . . 48
4.2 Bayesian Modeling in Probabilistic Nets . . . . . . . . . . . . 48
5 Statistical Modeling in Quality Engineering 49
5.1 Introduction to Statistical Modeling (SM) . . . . . . . . . . . 49
5.2 DOE in Statistical Quality Control . . . . . . . . . . . . . . . 52
5.3 How to measure factor interactions? . . . . . . . . . . . . . . 53
5.4 What should we do to bring experiments into daily life? . . . 53
6 New directions and Conclusion 57
6.1 Black-Scholes model in Finance . . . . . . . . . . . . . . . . . 57
6.2 Drug Resistance and Design of Anti-HIV drug . . . . . . . . . 57
6.3 Epidemic Modeling . . . . . . . . . . . . . . . . . . . . . . . . 57
6.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
7 Appendices 59
7.1 Appendix A: Theory of stochastic matrix for MC . . . . . . . 59
7.2 Appendix B: Spectral Theorem for Diagonalizable Matrices . 61
Keywords: linear algebra, computational algebra, graph, random processes,
simulation, combinatorics, statistics, Markov chains, discrete time processes
2 CONTENTS
Introduction
We propose a few specic mathematical modeling techniques used in various
applications such as Statistical Simulations of Service systems, Reliability
engineering, Finance engineering, Biomathematics, Pharmaceutical Science,
and Environmental Science. These are aimed for graduates in Applied
Mathematics, Computer Science and Applied Statistics at HCM City.
The aims the course
This lecture intergrates mathematical and computing techniques into
modeling and simulating of industrial and biological processes.
The structure of the course. The course consists of three parts:
Part I: Introductory specic topics
Part II: Methods and Tools
Part III: Connections and research projects
Working method.
Each group of 2 graduates is expected to carry out a small independent
research project (max 25 pages, font size 11, 1.5 line spacing, Time new
roman) from the chosen topic and submit their report at the end of the
course [week 15].
Examination The grading will be based on performance in:
* hand-ins of home-work assignments (weight 20% of the grade)
* a written report of group work on a small project topic (20%) and
three-times oral presentation about the project (20%)
* a nal exam (40%) covers basic mathematical and statistical methods
that have been introduced
Literature. Many, will know in the lecture.
3
4 CONTENTS
Prerequisites The participants will benet from a solid knowledge of
advanced calculus, discrete mathematics, basic knowledge of symbolic
computing, ordinary and partial dierential equations and programming
experience with Matlab, Scilab, R, Maple (or an equivalent language).
Part I: Introductory specic topics case studies
i=1
[u
i
v
i
[
The weight of a binary state/vector is dened to be
wt(u) = d(u, 0) =
n
i=1
[u
i
0[ =
n
i=1
u
i
.
The Hamming distance d(., .) dened on some binary space V is also called
the Hamming metric, and the space V equipped with the Hamming metric
d(., .) is called a Hamming metric space.
Denition 3 (State-transition graph). State-transition graph G = (V, E) of
a developing system S is a directed graph where
the vertices V consists of all feasible states that the system can realize
the edges E consists of arcs e = (u, v) such that state u can reach to
state v during the evolution of the concerned system
Very often, changing states of a state-transition graph G = (V, E) can be
conducted mathematically by measuring how far the Hamming distance is
between an original state u = (u
1
, u
2
, . . . , u
n
) to its eect state
v = (v
1
, v
2
, . . . , v
n
).
Example 3 (The farmers crossing river problem, cont.). The states of the
river crossing process are binary vectors of length 4
u = (u
f
, u
g
, u
c
, u
w
) = (u
1
, u
2
, u
3
, u
4
) 0, 1
4
,
if we encode the left bank L and the right bank R by 0, 1 as done above!
1.2. DISCRETE DYNAMIC SYSTEMS- A CASE STUDY 13
In our specic example above, V can hold all 16 = 2
4
possible states if no
system invariants would be found and imposed on S. With Constraint I, V
can be redened as V := V ` (1, 0, 0, 1), (0, 1, 1, 0).
We understand when the farmer is rowing his boat, for instance from a left
river bank u = (u
1
, u
2
, u
3
, u
4
) to a right river bank v = (v
1
, v
2
, . . . , v
n
) (or
the other way round), his position must change. The changing of state u to
v creates an edge e = (u, v) E, indeed! More clearly, the edge e = (u, v)
is truly determined i
if u
1
= L(i.e. 0) then v
1
= R(i.e. 1); or the other way round.
A ha, we just nd another invariant, must always be true to let the process
run, say: an edge e = (u, v) would exist if we equivalently have:
Invariant 2 : u
1
+v
1
= 1, where the sum is binary plus.
Combining with the fact that
the small boat can accommodate at most one of the farmer belongings we
realize that
a starting state u change at most two of its two coordinates to be the result
state v.
Hence, the third invariant is found:
Invariant 3 : d(u, v) =
4
i=1
[u
i
v
i
[ 2.
Decomposition
Having known how to describe a process or system by a state-transition
graph G = (V, E) is not enough! The reason is that we sometimes wish
to search from the all elligible states in V to nd best solutions, or
to determine an optimal path running through that search space V .
This comes down to list all states in V eciently! In that situation, we
could split the search space into several small-enough piceces, usually is
called Decompostion, and then list all elements in that pieces, called
Brute force.
Example 4 (The farmers crossing river problem, cont.). The set of
elligible states V consists of two parts: one holds every state corresponding
to the position L of the farmer, and the other one holds every state
corresponding to the position R of the farmer.
14 CHAPTER 1. DYNAMIC SYSTEMS
This observation tells us to decompose the state vertices V into two subsets:
V
L
= u
L
= (0, u
2
, u
3
, u
4
) 0, 1
4
; and V
R
= u
R
= (1, u
2
, u
3
, u
4
) 0, 1
4
= f(x, u, t)
where x
j=0
a
j
x
j
. (2.1)
If the series converges in some real interval x
0
< x < x
0
, the function A(x)
is called the generating function of the sequence a
j
.
17
18 CHAPTER 2. STOCHASTIC TECHNIQUES
Fact 2. If the sequence a
j
is bounded by some cosntant K, then A(x)
converges at least for [x[ < 1 [Prove it!]
Fact 3. In case of the sequence a
j
represents probabilities, we introduce
the restriction
a
j
0,
j=0
a
j
= 1.
The corresponding function A(x) is then a probability-generating function.
We consider the (point) probability distribution and the tail probability of
a random variable X, given by
P[X = j] = p
j
, P[X > j] = q
j
,
then the usual distribution function is
P[X j] = 1 q
j
.
The probability-generating function now is
P(x) =
j=0
p
j
x
j
= E(x
j
), E indicates the expectation operator.
Also we can dene a generating function for the tail probabilities:
Q(x) =
j=0
q
j
x
j
.
Q(x) is not a probability-generating function, however.
Fact 4.
a/ P(1) =
j=0
p
j
1
j
= 1 and
[P(x)[
j=0
[p
j
x
j
[
j=0
p
j
1 if [x[ < 1. So P(x) is absolutely
convergent at least for [x[ 1.
b/ Q(x) is absolutely convergent at least for [x[ < 1.
c/ Connection between P(x) and Q(x): (check this!)
(1 x)Q(x) = 1 P(x) or P(x) +Q(x) = 1 +xQ(x).
Mean and variance of a probability distribution
m = E(X) =
j=0
j p
j
= P
(1) =
j=0
q
j
= Q(1) (why!?)
Recall that the variance of the probability distribution p
j
is
2
= E(X(X 1)) +E(X) [E(X)]
2
2.1. GENERATING FUNCTIONS 19
we need to know
E[X(X 1)] =
j=0
j(j 1) p
j
= P
(1) = 2Q
(1)?
Therefore,
2
=???Whatisit
Exercise: Find the formula of the r-th factorial moment
[r]
= E(X(X 1)(X 2) (X r + 1))
Finding a generating function from a recurrence.
Multiply both sides by x
n
.
Example: Fibonacci sequence
f
n
= f
n1
+f
n2
= F(x) = x +xF(x) +x
2
F(x)
Finding a recurrence from a generating function.
Whenever you know F(x), we nd its power series P, the coecicents of P
before x
n
are Fibonacci numbers.
How? Just remember how to nd partial fractions expansion of F(x), in
particular a basic expansion
1
1 x
= 1 +x +
2
x
2
+
In general, if G(x) is a generating function of a sequence (g
n
) then
G
(n)
(0) = n!g
n
Multiple random variables. We consider probabilities involving
simultaneously the numerical values of several random variables and to
investigate their mutual couplings. In this section, we will extend the
concepts of PMF and expectation developed so far to multiple random
variables.
Consider two discrete random variables X, Y : S R associated with the
same experiment. The joint PMF of X and Y is dened by
p
X,Y
(x, y) = P[X = x, Y = y]
for all pairs of numerical values (x, y) that X and Y can take. We will use
the abbreviated notation P(X = x, Y = y) instead of the more precise
notations P[(X = x) (Y = y)] or P[X = xand Y = x].. For the pair of
random variables X, Y , we say
20 CHAPTER 2. STOCHASTIC TECHNIQUES
Denition 4. X and Y are independent if for all x, y R, we have
P[X = x, Y = y] = P[X = x]P[Y = y] p
X,Y
(x, y) = p
X
(x) p
Y
(y),
or in terms of conditional probability
P(X = x[Y = y) = PX = x.
This can be extended to the so-called mutually independent of a nite
number n r. v.s.
Expectation. The expectation operator denes the expected value of a
random variable X as
Denition 5.
E(X) =
xRange(X)
PX = x x
If we consider X is a function from a sample space S to the naturals N, then
E(X) =
i=0
PX > i.(Why?)
Functions of Multiple Random Variables. When there are multiple
random variables of interest, it is possible to generate new random variables
by considering functions involving several of these random variables. In
particular, a function Z = g(X, Y ) of the random variables X and Y denes
another random variable. Its PMF can be calculated from the joint PMF
p
X,Y
according to
p
Z
(z) =
(x,y)|g(x,y)=z
p
X,Y
(x, y).
Furthermore, the expected value rule for functions naturally extends and
takes the form
E[g(X, Y )] =
(x,y)
g(x, y) p
X,Y
(x, y).
Theorem 6. We have two important results of expectation.
1. (Linearity) E(X +Y ) = E(X) +E(Y ) for any pair of random
variables X, Y
2. (Independence) E(X Y ) = E(X) E(Y ) for any pair of independent
random variables X, Y
2.2. CONVOLUTIONS 21
2.2 Convolutions
Now we consider two nonnegative independent integral-valued random
variables X and Y , having the probability distributions
PX = j = a
j
, PY = k = b
k
. (2.2)
The joint probability of the event (X = j, Y = k) is a
j
b
k
obviously. We
form a new random variable
S = X +Y,
then the event S = r comprises the mutually exclusive events
(X = 0, Y = r), (X = 1, Y = r 1), , (X = r, Y = 0).
Fact 5. The probability distribution of the sum S then is
PS = r = c
r
= a
0
b
r
+a
1
b
r1
+ +a
r
b
0
.
Proof.
p
S
(r) = P(X+Y = r) =
(x,y):x+y=r
P(X = xand Y = y) ==
x
p
X
(x) p
Y
(rx)
Denition 7. This method of compounding two sequences of numbers (not
necessarily be probabilities) is called convolution. Notation
c
j
= a
j
b
j
will be used.
Fact 6. Dene the generating functions of the sequence a
j
, b
j
and c
j
by
A(x) =
j=0
a
j
x
j
, B(x) =
j=0
b
j
x
j
, C(x) =
j=0
c
j
x
j
,
it follows that C(x) = A(x)B(x). [check this!]
In practical applications, the sum of several independent integral-valued
random variables X
i
can be dened
S
n
= X
1
+X
2
+ +X
n
, n Z
+
.
If the X
i
have a common probability distribution given by p
j
, with
probability-generating function P(x), then the probability-generating
function of S
n
is P(x)
n
. Clearly, the n-fold convolution of S
n
is
p
j
p
j
p
j
(n factors) = p
j
.
22 CHAPTER 2. STOCHASTIC TECHNIQUES
2.3 Compound distributions
In our discussion so far of sums of random variables, we have always
assumed that the number of variables in the sum is known and xed , i.e., it
is nonrandom. We now generalize the previous concept of convolution to
the case where the number N of random variables X
k
contributing to the
sum is itself a random variable! In particular, we consider the sum
S
N
= X
1
+X
2
+ +X
N
, where
PX
k
= j = f
j
,
PN = n = g
n
,
PS
N
= l = h
l
.
(2.3)
Probability-generating functions of X, N and S are
F(x) =
f
j
x
j
,
G(x) =
g
n
x
n
,
H(x) =
h
l
x
l
.
(2.4)
Compute H(x) with respect to F(x) and G(x). Prove that
H(x) = G(F(x)).
Example 6. A remote village has three gas stations, and each one of them
is open on any given day with probability 1/2, independently of the others.
The amount of gas available in each gas station is unknown and is
uniformly distributed between 0 and 1000 gallons. We wish to characterize
the distribution of the total amount of gas available at the gas stations that
are open.
The number N of open gas stations is a binomial random variable with
p = 1/2 and the corresponding transform is
G
N
(x) = (1 p +pe
x
)
3
=
1
8
(1 +e
x
)
3
.
The transform (probability-generating function) F
X
(x) associated with the
amount of gas available in an open gas station is
F
X
(x) =
e
1000x
1
1000x
.
The transform H
S
(x) associated with the total amount S of gas available at
the three gas stations of the village that are open is the same as G
N
(x),
except that each occurrence of e
x
is replaced with F
X
(x), i.e.,
2.3. COMPOUND DISTRIBUTIONS 23
H
S
(x) = G(F(x)) =
1
8
(1 +F
X
(x))
3
.
Application in Large Deviation theory
We are interested in a practical situation in insurance industry, originally
realized from 1932 by F. Esscher, (Notices of AMS, Feb 2008).
Problem: too many claims could be made against the insurance company,
we worry about the total claim amount exceeding the reserve fund set aside
for paying these claims.
Our aim: to compute the probability of this event.
Modeling. Each individual claim is a random variable, we assume some
distribution for it, and the total claim is then the sum S of a large number
of (independent or not) random variables. The probability that this sum
exceeds a certain reserve amount is the tail probability of the sum S of
independent random variables.
Large Deviation theory invented by Esscher requires the calculation of
the moment generating functions! If your random variables are independent
then the moment generating functions are the product of the individual
ones, but if they are not (like in a Markov chain) then there is no longer
just one moment generating function!
Research project: study Large Deviation theory to solve this problem.
24 CHAPTER 2. STOCHASTIC TECHNIQUES
2.4 Introdductory Stochastic Processes
The concept. A stochastics process is just a collection (usually innite) of
random variables, denoted X
t
or X(t); where parameter t often represents
time. State space of a stochastics process consists of all realizations x of X
t
,
i.e. X
t
= x says the random process is in state x at time t. Stochastics
processes can be generally subdivided into four distinct categories
depending on whether t or X
t
are discrete or continuous:
1. Discrete processes: both are discrete, such as Bernoulli process (die
rolling) or Discrete Time Markov chains.
2. Continuous time discrete state processes: the state space of X
t
is
discrete and the index set, e.g. time set T of t is continuous, as an
interval of the reals R.
Poisson process the number of clients X(t) who has entered
ACB from the time it opened until time t. X(t) will have the
Poisson distribution with the mean E[X(t)] = t ( being the
arrive rate).
Continuous time Markov chain.
Queuing process people not only enter but also leave the bank,
we need the distribution of service time (the time a client spends
in ACB).
3. Continuous processes: both X
t
and t are continuous, such as diusion
process (Brownian motion).
4. Discrete time continuous state processes: X
t
is continuous and t is
discrete the so-called TIME SERIES such as
monthly uctuations of the ination rate of Vietnam,
daily uctuations of a stock market.
Examples
1. Discrete processes: random walk model consisting of positions X
t
of
an object (drunkand) at time discrete time point t during 24 hours,
whose directional distance from a particular point 0 is measured in
integer units. Here T = 0, 1, 2, . . . , 24.
2. Discrete time continuous processes: X
t
is the number of births in a
given population during time period [0, t]. Here T = R
+
= [0, ) and
the state space is 0, 1, 2, . . . , The sequence of failure times of a
machine is a specic instance.
2.4. INTRODDUCTORY STOCHASTIC PROCESSES 25
3. Continuous processes: X
t
is population density at time
t T = R
+
= [0, ), and the state space of X
t
is R
+
.
4. TIME SERIES of daily uctuations of a stock market
What interesting characteristics of SP that we want to know? We
know a stochastic process is a mathematical model of a probabilistic
experiment that evolves in time and generates a sequence of numerical
values. Three interesting aspects of SP that we want to know:
(a) We tend to focus on the dependencies in the sequence of values
generated by the process. For example, how do future prices of a stock
depend on past values?
(b) We are often interested in long-term averages, involving the entire se-
quence of generated values. For example, what is the fraction of time that a
machine is idle?
(c) We sometimes wish to characterize the likelihood or frequency of certain
boundary events. For example, what is the probability that within a
given hour all circuits of some telephone system become simultaneously
busy, or what is the frequency with which some bu[U+FB00]er in a
computer net- work over[U+FB02]ows with data?
Few fundamental properties and categories
1. STATIONARY property: A process is stationary when all the X(t)
have the same distribution. That means, for any , the distribution of
a stationary process will be unaected by a shift in the time origin,
and X(t) and X(t +) will have the same distributions. For the
rst-order distribution,
F
X
(x; t) = F
X
(x; t +) = F
X
(x); and f
X
(x; t) = f
X
(x).
These processes are found in Arrival-Type Processes. For which, we
are interested in occurrences that have the character of an arrival,
such as message receptions at a receiver, job completions in a
manufacturing cell, customer purchases at a store, etc. We will focus
on models in which the interarrival times (the times between
successive arrivals) are independent random variables.
The case where arrivals occur in discrete time and the interarrival
times are geometrically distributed is the Bernoulli process.
The case where arrivals occur in continuous time and the
interarrival times are exponentially distributed is the Poisson
process. Bernoulli process and Poisson process will be investigated in
detail in the Stochastic Processes course.
26 CHAPTER 2. STOCHASTIC TECHNIQUES
2. MARKOVIAN (memory-less) property: Many processes with
memory-less property caused by experiments that evolve in time and
in which the future evolution exhibits a probabilistic dependence on
the past. As an example, the future daily prices of a stock are
typically dependent on past prices. However, in a Markov process, we
assume a very special type of dependence: the next value depends on
past values only through the current value, that is X
i+1
depends only
on X
i
, and not on any previous values.
2.5 Markov Chains (MC), a keytool in modeling
random phenomena
We discuss the concept of discrete time Markov Chain or just Markov
Chains in this section. Suppose we have a sequence M of consecutive trials,
numbered n = 0, 1, 2, . The outcome of the nth trial is represented by the
random variable X
n
, which we assume to be discrete and to take one of the
values j in a nite set Q of discrete outcomes/states e
1
, e
2
, e
3
, . . . , e
s
.
M is called a (discrete time) Markov chain if, while occupying Q states at
each of the unit time points 0, 1, 2, 3, . . . , n 1, n, n + 1, . . ., M satises the
following property, called Markov property or memoryless property.
P(X
n+1
= j[X
n
= i, , X
0
= a) = P(X
n+1
= j[X
n
= i), for all n = 0, 1, 2, .
(In each time step n to n + 1, the process can stay at the same state e
i
(at
both n, n + 1) or move to other state e
j
(at n + 1) with respect to the
memoryless rule, saying the future behavior of system depends only on the
present and not on its past history.)
Denition 8 (One-step transition probability).
Denote the absolute probability of outcome j at the nth trial by
p
j
(n) = P(X
n
= j) (2.5)
The one-step transition probability, denoted
p
ij
(n + 1) = P(X
n+1
= j[X
n
= i),
dened as the conditional probability that the process is in state j at time
n + 1 given that the process was in state i at the previous time n, for all
i, j Q.
2.5. MARKOVCHAINS (MC), AKEYTOOL INMODELINGRANDOMPHENOMENA27
Independent of time property- Homogeneous Markov chains. If
the state transition probabilities p
ij
(n + 1) in a Markov chain M is
independent of time n, they are said to be stationary, time homogeneous or
just homogeneous. The state transition probability in homogeneous chain
then can be written without mention time point n:
p
ij
= P(X
n+1
= j[X
n
= i). (2.6)
Unless stated otherwise, we assume and will work with homogeneous
Markov chains M. The one-step transition probabilities given by 3.2 of
these Markov chains must satisfy:
s
j=1
p
ij
= 1; for each i = 1, 2, , s and p
ij
0.
Transition Probability Matrix. In practical applications, we are likely given
the initial distribution (i.e. the probability distribution of starting position
of the concerned object at time point 0), and the transition probabilities;
and we want to determine the the probability distribution of position X
n
for any time point n > 0. The Markov property, quantitatively described
through transition probabilities, can be represented in the state transition
matrix P = [p
ij
]:
P =
p
11
p
12
p
13
. . . .p
1s
.
p
21
p
22
p
23
. . . p
2s
.
p
31
p
32
p
33
. . . p
3s
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
(2.7)
Briey, we have
Denition 9. A (homogeneous) Markov chain M is a triple (Q, p, A) in
which:
Q is a nite set of states (be identied with an alphabet ),
p(0) are initial probabilities, (at initial time point n = 0)
P are state transition probabilities, denoted by a matrix P = [p
ij
] in
which
p
ij
= P(X
n+1
= j[X
n
= i)
.
And such that the memoryless property is satised,ie.,
P[X
n+1
= j[X
n
= i, , X
0
= a] = P[X
n+1
= j[X
n
= i], for all n.
28 CHAPTER 2. STOCHASTIC TECHNIQUES
In practice, the initial probabilities p(0) is obtained at the current time
(begining of a research), and the transition probability matrix P is found
from empirical observations in the past. In most cases, the major concern is
using P and p(0) to predict future.
Example 7. The Coopmart chain (denoted C) in SG currently controls
60% of the daily processed-food market, their rivals Maximart and other
brands (denoted M) takes the other share. Data from the previous years
(2006 and 2007) show that 88% of Cs customers remained loyal to C, while
12% switched to rival brands. In addition, 85% of Ms customers remained
loyal to M, while other 15% switched to C. Assuming that these trends
continue, use MC theory to determine Cs share of the market (a) in 5
years and (b) over the long run.
Proposed solution. Suppose that the brand attraction is time homogeneous,
for a sample of large enough size n, we denote the customers attention in
the year n by a random variable X
n
. The market share probability of the
whole population then can be approximated by using the sample statistics,
e.g.
P(X
n
= C) =
[x : X
n
(x) = C[
n
, and P(X
n
= M) = 1 P(X
n
= C).
Set n = 0 for the current time, the initial probabilities then is
p(0) = [0.6, 0.4] = [P(X
0
= C), P(X
0
= M)].
Obviously we want to know the market share probabilities
p(n) = [P(X
n
= C), P(X
n
= M)] at any year n > 0. We now introduce a
transition probability matrix with labels on rows and columns to be C and
M
P =
C M
C 0.88 0.12
M 0.15 0.85
1 a = 0.88 a = 0.12
b = 0.15 1 b = 0.85
, =
0.88 0.12
0.15 0.85
,
(2.8)
where a = p
CM
= P[X
n+1
= M[X
n
= C], b = p
MC
= P[X
n+1
= C[X
n
= M].
h=1
p
(nk)
ih
p
(k)
hj
, 0 < k < n.
This results in the matrix notation
P
(n)
= P
(nk)
P
(k)
.
Since P
(1)
= P, we get P
(2)
= P
2
, and in general P
(n)
= P
n
.
Let p
(n)
denote the vector form of probability mass distribution (pmf or
absolute probability distribution) associated with X
n
of a Markov process,
that is
p
(n)
= [p
1
(n), p
2
(n), p
3
(n), . . . , p
s
(n)],
where each p
i
(n) is dened as in 2.5.
Proposition 10. The absolute probability distribution p
(n)
at any stage n
of a Markov chain is given in the matrix form
p
(n)
= P
n
p
(0)
, where p
(0)
= p is the initial probability vector. (2.10)
Proof. We employ two facts:
* P
(n)
= P
n
, and
* the absolute probability distribution p
(n+1)
at any stage n + 1 (associated
with X
n+1
) can be found by the 1-step transition matrix P = [p
ij
] and the
distribution
p
(n)
= [p
1
(n), p
2
(n), p
3
(n), . . . , p
s
(n)]
at any stage n (associated with X
n
):
p
j
(n + 1) =
s
i=1
p
ij
p
i
(n), or in the matrix notation p
(n+1)
= P p
(n)
.
Then just do the induction
p
(n+1)
= P p
(n)
= P P, p
(n1)
= = P
n+1
p
(0)
.
Example 8 (The Coopmart chain: cont. ). (a/) Cs share of the market
in 5 years can be computed by
p
(5)
= [p
C
(5), p
M
(5)] = P
5
p
(0)
.
30 CHAPTER 2. STOCHASTIC TECHNIQUES
2.6 Classication of States
Accessible states. State j is said to be accessible from state i if for some
n 0, p
(n)
ij
> 0, and we write i j. Two states i and j accessible to each
other are said to communicate, and we write i j. If all states
communicate with each other, then we say that the Markov chain is
irreducible.
Recurrent states. Let A(i) be the set of states that are accessible from i. We
say that i is recurrent if for every j that is accessible from i, i is also
accessible from j; that is, for all j A(i) we have that i A(j).
When we start at a recurrent state i, we can only visit states j A(i) from
which i is accessible. Thus, from any future state, there is always some
probability of returning to i and, given enough time, this is certain to
happen. By repeating this argument, if a recurrent state is visited once, it
will be revisited an innite number of times.
Transient states. A state is called transient if it is not recurrent. In
particular, there are states j A(i) such that i is not accessible from j.
After each visit to state i, there is positive probability that the state enters
such a j. Given enough time, this will happen, and state i cannot be visited
after that. Thus, a transient state will only be visited a nite number of
times.
If i is a recurrent state, the set of states A(i) that are accessible from i,
form a recurrent class (or simply class), meaning that states in A(i) are all
accessible from each other, and no state outside A(i) is accessible from
them. Mathematically, for a recurrent state i, we have A(i) = A(j) for all j
that belong to A(i), as can be seen from the denition of recurrence. It can
be seen that at least one recurrent state must be accessible from any given
transient state. This is intuitively evident, and a more precise justication
is given in the theoretical problems section. It follows that there must exist
at least one recurrent state, and hence at least one class. Thus, we reach
the following conclusion.
Markov Chain Decomposition.
A MC can be decomposed into one or more recurrent classes, plus
possibly some transient states.
A recurrent state is accessible from all states in its class, but is not
accessible from recurrent states in other classes.
A transient state is not accessible from any recurrent state.
2.6. CLASSIFICATION OF STATES 31
At least one, possibly more, recurrent states are accessible from a
given transient state.
Remark 7. For the purpose of understanding long-term behavior of
Markov chains, it is im- portant to analyze chains that consist of a single
recurrent class.
For the purpose of understanding short-term behavior, it is also important
to analyze the mech- anism by which any particular class of recurrent states
is entered starting from a given transient state.
Periodic states.
Absorption probabilities. In this section, we study the short-term behavior
of Markov chains. We rst consider the case where the Markov chain starts
at a transient state. We are interested in the rst recurrent state to be
entered, as well as in the time until this happens. When focusing on such
questions, the subsequent behavior of the Markov chain (after a recurrent
state is encountered) is immaterial. State j is said to be an absorbing state
if p
jj
= 1; that is, once state j is reached, it is never left. We assume,
without loss of generality, that every recurrent state k is absorbing:
p
kk
= 1, p
kj
= 0 for all j = k.
If there is a unique absorbing state k, its steady-state probability is 1
(because all other states are transient and have zero steady-state
probability), and will be reached with probability 1, starting from any
initial state.
If there are multiple absorbing states, the probability that one of them
will be eventually reached is still 1, but the identity of the absorbing
state to be entered is random and the associated probabilities may
depend on the starting state.
In the sequel, we x a particular absorbing state, denoted by s, and consider
the absorption probability a
i
that s is eventually reached, starting from i:
a
i
= P(X
n
eventually becomes equal to the absorbing state s[X
0
= i).
Absorption probabilities can be obtained by solving a system of linear
equations.
a
s
= 1, a
i
= 0, for all absorbing i = s, a
i
=
m
j=1
p
ij
a
j
, for all transient i.
32 CHAPTER 2. STOCHASTIC TECHNIQUES
2.7 Limiting probabilities and Stationary
distribution of a MC
Denition 11. Vector p
P = p
.
This equation indicates that a stationary distribution p
is a left
eigenvector of P with eigenvalue 1. In general, we wish to know limiting
probabilities p
p
(0)
.
We need some general results to determine the stationary distribution p
of a Markov chain.
A) Markov chains that have two states. At rst we investigate the
case of Markov chains that have two states, say Q = e
1
, e
2
. Let a = p
e
1
e
2
and b = p
e
2
e
1
the state transition probabilities between distinct states in a
two state Markov chain, its state transition matrix is
P =
p
11
p
21
p
12
p
22
1 a a
b 1 b
b a
b a
+ (1 a b)
n
a a
b b
1
2
[P
2
I], E
2
=
1
2
1
[P
1
I].
2.8. EXERCISES 33
That means, E
1
, E
2
are orthogonal matrices, i.e. E
1
E
2
= 0 = E
2
E
1
, and
P =
1
E
1
+
2
E
2
; E
2
1
= E
1
, E
2
2
= E
2
.
Hence, P
n
=
n
1
E
1
+
n
2
E
2
= E
1
+ (1 a b)
n
E
2
, or
P
(n)
= P
n
=
1
a +b
b a
b a
+ (1 a b)
n
a a
b b
b a
b a
B) Markov chains that have more than two states. For s > 2, it is
cumbersome to compute constituent matrices E
i
of P, we could employ the
so-called regular property. Markov chains are regular if there exists m N
such that P
(m)
= P
m
> 0 (every entry is positive).
2.8 Exercises
A/ Simple skills.
Let Z
1
, Z
2
, be independent identically distributed r.v.s with
P(Z
n
= 1) = p and P(Z
n
= 1) = q = 1 p for all n. Let
X
n
=
n
i=1
Z
i
, n = 1, 2,
and X
0
= 0. The collection of r.v.s X
n
, n 0 is a random process, and it
is called the simple random walk X(n) in one dimension.
(a) Describe the simple random walk X(n).
(b) Construct a typical sample sequence (or realization) of X(n).
(c) Find the probability that X(n) = 2 after four steps.
(d) Verify the result of part (a) by enumerating all possible sample
sequences that lead to the value X(n) = 2 after four steps.
(e) Find the mean and variance of the simple random walk X(n). Find
the autocorrelation function R
X
(n, m) of the simple random walk
X(n).
34 CHAPTER 2. STOCHASTIC TECHNIQUES
(f) Show that the simple random walk X(n) is a Markov chain.
(g) Find its one-step transition probabilities.
(h) Derive the rst-order probability distribution of the simple random
walk X(n).
Solution.
(a) The simple random walk X(n) is a discrete-parameter (or time),
discrete-state random process. The state space is
E = ..., 2, 1, 0, 1, 2, ..., and the index parameter set is T = 0, 1, 2, ....
(b) A sample sequence x(n) of a simple random walk X(n) can be produced
by tossing a coin every second and letting x(n) increase by unity if a head H
appears and decrease by unity if a tail T appears. Thus, for instance, we
have a small realization of X(n) in Table 2.8:
n 0 1 2 3 4 5 6 7 8 9 10
Coin tossing H T T H H H T H H T
x
n
0 1 0 - 1 0 1 2 1 2 3 2
Table 2.1: Simple random walk from Coin tossing
The sample sequence x(n) obtained above is plotted in (n, x(n))-plane. The
simple random walk X(n) specied in this problem is said to be
unrestricted because there are no bounds on the possible values of X. The
simple random walk process is often used in the following primitive
gambling model: Toss a coin. If a head appears, you win one dollar; if a tail
appears, you lose one dollar.
B/ Concepts.
0 0.5 0.5
0.5 0 0.5
0.5 0.5 0
; P
2
=
0 0 0.5 0.5
1 0 0 0
0 1 0 0
0 1 0 0
; P
3
=
N M L S
N 1 0 0 0
M 0.4 0 0.6 0
L 0.2 0 0.1 0.7
S 0 0 0 1
k
i=0
p(i) [0, 1], then:
- generate a uniform random number U [0, 1] by G,
- nd the value of X = k by determining the interval [F(k 1), F(k)]
consisting of U, mathematically this means to nd the preimage F
1
(U).
The Transformation Method
Generally, we need an algorithm, named Transformation Method, described
in two steps:
Step 1 use an algorithm A to generates variates V
n
, n = 1, 2, ... of a r.v. V
(V = U in the above example) with specic cdf F
V
(v) for continuous
case or pdf f
V
(v) for discrete case. Then
Step 2 employ an approriate transformation g(.) to generate a variate of
X, namely X
n
= g(V
n
).
Theorem 14 (Relationship of V and X). Consider a r.v. V with pdf f
V
(v)
and given transformation X = g(V ). Denote by v
1
, v
2
, , v
n
the real roots
of the equation
x g(v) = 0 then the pdf of the r.v. X is given by (3.1)
f
X
(x) =
n
l=1
f
V
(v
l
)
1
[
dg
dv
(v
l
)[
.
Given x, if Equ. 3.1 has no real solutions then the odf f
X
(x) = 0
Proof. DIY
40 CHAPTER 3. SIMULATION
Two most important uses of the Transformation Method are:
A) Linear (ane when b = 0) case: X = g(V ) = aV +b where a, b R.
Then
f
X
(x) =
1
[a[
f
V
(
x b
a
).
B) Inverse case X = g(V ) = F
1
X
(V ) where F
X
(x) is the cdf of the random
variable X.
Theorem 15 (Inverse case). Consider a r.v. V with uniform cdf
F
V
(v) = v, v [0, 1]. Then the transformation X = g(V ) = F
1
X
(V ) gives
variates x of X with cdf F
X
(x).
Proof. For any real number a, and due to the monotonicity of the cdf
function F
X
, so
P(X a) = P[F
1
X
(V ) a] = P[V F
X
(a)] = F
V
(F
X
(a)) = F
X
(a).
Use this, an algorithm is formulated for generating variates of a r.v. X.
1. Invert the given cdf F
X
(x) to nd its inverse F
1
X
2. Generate a uniform variate V [0, 1]
3. Generate variates x via the transformation X = F
1
X
(V ).
Example 9. Consider a Bernoulli r.v. X B(p) where p = P(X = 1). In
addition, the cdf F
X
(x) = P(X x) = V is a step (stair-case) function u(.).
[That is, u(t) = b
i
if a
i
t < a
i+1
, where (a
i
)
i
is an ascending sequence.]
Here
F
X
(x) = 0 if x < 0, F
X
(x) = p if 0 x < 1, and F
X
(x) = p + (1 p) = 1 if
1 x.
How to generate X? We employ V UniDist([0.1]), and the fact that the
inverse
F
1
X
(V ) = u(V (1 p)).
Example 10. Consider a binomial r.v. X BinomDist(n, p) where
p = P(X = 1). X takes values in X = 0, 1, ..., n and the distribution is
given by a probability function
p(k) = P(X = k) =
n
k
p
k
(1 p)
nk
.
3.4. MEASUREMENT OF OUTPUT DATA 41
We employ the fact that V UniDist([0.1]), and use
F
X
(x) = P(X x) = V x = F
1
X
(V ) = u(V ),
in which the parameters of the step function u(V ) are given by:
u(V ) = k if
i=0..k1
p(i) < V
i=0..k
p(i), k 1, ..., n; u(V ) = 0 if
V < 0.
How is this done? Simply split the vertical interval [0, 1] into n + 1
subintervals, with the length of the kth subinterval equal to
p(k) = P(X = k), k 0, 1, ..., n.
j=1
p
ij
= 1; for each i = 1, 2, , s and p
ij
0.
Transition Probability Matrix. In practical applications, we are likely given
the initial distribution (i.e. the probability distribution of starting position
of the concerned object at time point 0), and the transition probabilities;
and we want to determine the the probability distribution of position X
n
for any time point n > 0. The Markov property, quantitatively described
through transition probabilities, can be represented conveniently in the
so-called state transition matrix P = [p
ij
]:
P =
p
11
p
12
p
13
. . . .p
1s
.
p
21
p
22
p
23
. . . p
2s
.
p
31
p
32
p
33
. . . p
3s
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
(3.3)
Denition 17. Vector p
P = p
.
Question: how to nd a stationary distribution of a Markov chain?
Consider a homogeneous DTMC X
n
described by the transition matrix
P = [p
ij
]. How do we generate sample paths of X
n
. Two issues involved
here:
a) Only steady state results are of interest
b) Transient results are of interest as well.
In a), we want to generate values for a single stationary random variable p
is one-dimensional
pdf the algorithm after Theorem 15 suces
44 CHAPTER 3. SIMULATION
Instances of synchronous and asynchronous simulation We illustrate
both strategies describing how to sample from a Markov chain with state
space S and transition matrix
P = (p
ij
), with p
ij
= P(X(n + 1) = j[X(n) = i).
The obvious way to simulate the (n + 1)-th transition, given X(n), is
Generate X(n + 1) p
x(n)j
: j S.
This synchronous approach has the potential shortcoming that
X(n) = X(n + 1), with the corresponding computational eort lost.
Alternatively, we may simulate T
n
, the time until the next change of state
and, then, sample the new state X(n +T
n
). If X(n) = s, T
n
follows a
geometric distribution GeomDist(p
ss
) of parameter p
ss
and X(n +T
n
) will
have a discrete distribution with mass function
p
sj
(1pss)
: j S ` s.
Should we wish to sample N transitions of the chain, assuming X(0) = i
0
,
we do
Do t = 0, X(0) = i
0
While t < N
Sample h GeomDist(p
x(t)x(t)
)
Sample X(t +h)
p
x(t)j
(1p
x(t)x(t)
)
: j S ` x(t)
Do t = t +h
Two key strategies for asynchronous simulation.
One is that of event scheduling. The simulation time advances until the
next event and the corresponding activities are executed. If we have k types
of events (1, 2, . . . , k) , we maintain a list of events, ordered according to
their execution times (t
1
, t
2
, . . . , t
k
) . A routine R
i
associated with the i-th
type of event is started at time
i
= min(t
1
, t
2
, . . . , t
k
).
An alternative strategy is that of process interaction. A process represents
an entity and the set of actions that experiments throughout its life within
the model. The system behaviour may be described as a set of processes
that interact, for example, competing for limited resources. A list of
processes is maintained, ordered according to the occurrence of the next
event. Processes may be interrupted, having their routines multiple entry
points, designated reactivation points.
Each execution of the program will correspond to a replication, which
corresponds to simulating the system behaviour for a long enough period of
3.5. ANALYZINGOF OUTPUT- MAKINGMEANINGFUL INFERENCES45
time, providing average performance measures, say X(n), after n customers
have been processed. If the system is stable,
X(n) X.
If, e.g., processing 1000 jobs is considered long enough, we associate with
each replication j of the experiment the output X
j
(1000). After several
replications, we would analyse the results as described in the next section.
3.5 Analyzing of output- Making meaningful
inferences
Ref 3.4, [8] and [21], Section 5.
3.6 Simulation languages
Use JMT system or OpenModelica.
3.7 Research 1: Simulation of Queueing systems
with multiclass customers
Classical queueing models have been extensively studied from the 60s
during the emerging of internet. One of the pioneers of the eld is Leonard
Kleinrock at UCLA. In fact, queueing models are applied not only in
networks and systems of computers but also in any service system of an
economy that posseses resource allocation and/or sharing. In Europe, the
project called Euro-NGI (European Network of Exellence Project on
Design and Enginnering of the Next-Generation Internet) has been created
just a few years.
We restricted ourselve to studying and simulating basic queueing systems
such as M/M/1, M/M/1/K and M/G/1 systems. Now, how to improve the
work in [8]?
t
0
f(x) dx; and P(T > t) =
+
t
f(x) dx = e
t
.
Memory-less property of exponential distributions
For exponential random variables T,
P(T > s +t) = P(T > t)P(T > s).
The Erlang random variable If T
1
, T
2
are two independent and
identically distributed (i.i.d.) exponential random variables, what would be
the distribution of S
2
= T
1
+T
2
?
47
48 CHAPTER 4. PROBABILISTIC MODELING
4.1.2 Poisson process
Suppose an experiment begins at time point t = 0 and whose ith event
occurs at time point, a random variable T
i
0 named point of occurrence,
for i = 1, 2, . Let Z
n
= T
n
T
n1
denote the interarrival time period. If
the Z
n
s are i.i.d. then Z
n
, n 1 is called a recurrent/renewal process.
T
n
, n 0 itself is called an arrival process.
Counting process N(t)
If we now view time t is continuous, a random process N(t), t 0 is said
to be a counting process if N(t) counts the number of events that have
occurred in the interval (0, t). Obviously
1. N(t) N
2. N(s) N(t) if s t
3. N(t) N(s) = the number of events that have occurred in the
interval (s, t).
Poisson process N(t), t 0 is a special type of Counting process.
A counting process N(t), t 0 is said to be a Poisson process with rate
> 0 if
Remark: the Poisson distribution is the limit case of binomial distribution.
Interarrival times of the Poisson process
Nonhomogeneous Poisson process
Compound Poisson process
4.2 Bayesian Modeling in Probabilistic Nets
Chapter 5
Statistical Modeling in
Quality Engineering
5.1 Introduction to Statistical Modeling (SM)
This chapter is planned for persons interested in the design, conduct and
analysis of experiments in the physical, chemical, biological, medical, social,
psychological, economic, engineering or industrial sciences. The chapter will
examine how to design experiments, carry them out, and analyze the data
they yield. Our major aims are:
1/ provide an introduction to descriptive and inferential statistical concepts
and methods. Topics include grouping of data, measures of central tendency
and dispersion, probability concepts and distributions, sampling, statistical
estimation, and statistical hypothesis testing.
2/ introduce a specic problem in Statistical Quality Control: Design of
Experiments (DOE)
Why Statistics
[See [27] for more information.]
Statistical methods are applied in an enormous diversity of problems in
such elds as:
2
Agriculture (which varieties grow best?)
2
Genetics, Biology (selecting new varieties, species)
2
Economics (how are the living standards changing?)
2
Market Research (comparison of advertising campaigns)
2
Education (what is the best way to teach small children reading?)
2
Environmental Studies (do strong electric or magnetic elds induce higher
cancer rates?)
2
Meteorology (is global warming a reality?)
2
Medicine (which drug is best?)
49
50CHAPTER 5. STATISTICAL MODELINGINQUALITYENGINEERING
2
Psychology (how are shyness and loneliness related?)
2
Social Science (comparison of peoples reaction to di R (erent stimuli)
Basic terms
1. A population is the collection of items under discussion. It may be
nite or innite; it may be real or hypothetical. A sample is a subset
of a population. The sample should be chosen to be representative of
the population because we usually want to draw conclusions or
inferences about the population based on the sample.
2. An appropriate statistical model for our data will often be of the
form
Observed data = f(x; ) +error;
where x are variables we have measured and are parameters of our
model.
3. Variable. A property or characteristic on which information is
obtained in an experiment. There are two major kinds of variables:
a. Quantitative Variables (measurements and counts)
continuous (such as heights, weights, temperatures); their values
are often real numbers; there are few repeated values;
discrete (counts, such as numbers of faulty parts, numbers of
telephone calls etc); their values are usually integers; there may
be many repeated values.
b. Qualitative Variables (factors, class variables); these variables
classify objects into groups.
categorical (such as methods of transport to College); there is no
sense of order;
ordinal (such as income classied as high, medium or low); there
is natural order for the values of the variable.
4. Observation. The collection of information in an experiment, or
actuial values obtained on variables in an experiment. Response
variables are outcomes or observed values of an experiment.
5. Parameters and Statistics. A parameter is a numeric
characteristic of a population or a process. A statistic is numerical
characteristic that is computed from a sample of observations.
5.1. INTRODUCTION TO STATISTICAL MODELING (SM) 51
6. Distribution. A tabular, graphical or theoretical description of the
values of a variable using some measure of how frequently they occur
in a population, a process or a sample.
7. Parametric methods versus non-parametric methods. A
method for making statistical inferences that assumes that samples
come from a known family of distributions. For example, the method
of analysis of variance assumes that samples are drawn from normal
distributions. Non-parametric methods allow making statistical
inferences from samples that does not assume the sample to come
from any underlying family of distributions and make no assumptions
about any population parameters.
8. Mathematical models and Statistical models. A model is
termed mathematical if it is derived from theoretical considerations
that represent exact, error-free assumed relationships among the
variables. A model is termed statistical if it is derived from data that
are subject to various types of specications, observations,
experimental, and/or measurement errors.
9. Regression analysis is used to model relationships between random
variables, determine the magnitude of the relationships between
variables. Some are independent variables or the predictors, also
called explanatory variables, control variables, or regressors, usually
named X
1
, . . . , X
d
. The others are response variables, also called
dependent variables, explained variables, predicted variables, or
regressands, usually named Y . If there is more than one response
variable, we speak of multivariate regression.
Brief aims of designing experiments
Various (statistical) designs are discussed and their respective dierences,
advantages, and disadvantages are noted. In particular, factorial and
fractional factorial designs are discussed in detail. These are designs in
which two or more factors are varied simultaneously; the experimenter
wishes to study not only the eect of each factor, but also how the eect of
one factor changes as the other factors change. The latter is generally
referred to as an interaction among factors. Generally, designing
experiments helps us do the followings
perform experiments to evaluate the eects the factors have on the
characteristics of interest, and also discover possible relationship among the
factors (which could aect the characteristics). The goal is to use these new
understanding to improve product.
answers to questions such as:
1. What are the key factors in a process?
52CHAPTER 5. STATISTICAL MODELINGINQUALITYENGINEERING
2. At what settings would the process deliver acceptable performance?
3. What are the key, main and interaction eects in the process?
4. What settings would bring about less variation in the output?
Important steps in designing experiments
Several critical steps should be followed to achieve our goals:
1. State objective: write a mission statement for the experiment or
project;
2. Choose response: it is about consultation, have to ask clients what
they want know, or ask yourself; pay attention to the
nominal-the-best responses;
3. Perform pre-experiment data analysis?
4. Choose factors and levels: you have to use owchart to represent the
process or system, use cause-eect diagram to list the potential factors
that may impact the response;
5. Select experimental plan
6. Perform the experiment
7. Analyze the data
8. Draw conclusions and make recommendations.
5.2 DOE in Statistical Quality Control
History. The DOEs history goes back to the 1930s of the 20-th century,
when Sir R. A. Fisher in England used Latin squares to randomize the plant
varieties before planning at his farm, among other activities. The goal was
to get high productivity havests. The mathematical theory of combinatorial
designs was developed by R.C. Bose in the 1950s in India and then in the
US. Nowadays, DOE is extensively studied and employed in virtually every
human beings activities, and the mathematics for DOE is very rich.
The term Algebraic Statistics was coined by Pistone, Riccomagno and
Wynn in 2000. Motivated by problems in Design of Experiments, such as
computing fractional factorial design, they developed the systematic use of
Groebner basis methods for problems in discrete probability and statistics.
In this lecture, the fractional factorial design has been chosen for detailed
study in view of its considerable record of success over the last thirty years.
5.3. HOW TO MEASURE FACTOR INTERACTIONS? 53
It has been found to allow cost reduction, increase eciency of
experimentation, and often reveal the essential nature of a process.
What is an Experiment Design? Fix n nite subsets D
1
; . . . ; D
n
of the set
of real numbers R. Their cartesian product D = D
1
D
n
is a fnite
subset of R
n
. In statistics, the set D is called a full factorial design. A basic
aim in our studying is using full factorial designs or their subsets to nd a
regression model describing the relationship between factors.
An example of special interest is the case when D
i
= 0; 1 for all i. In that
case, D consists of the 2
n
vertices of the standard n-cube and is referred to
as a full Boolean design. For instance, consider a full factorial design 2
3
with three binary factors: the factor x
1
of mixture ratio, the factor x
2
of
temperature, the factor x
3
of experiment time period and the response y
of wood toughness. The levels of factors are given in the following table:
Factor Low (0) High (1)
Mix(ture) Ratio 45p 55p
Temp(erature) 1000C 1500C
Time period 30m 90m
Table 5.1: Factor levels of 2
3
factorial experiment
5.3 How to measure factor interactions?
This is very complicated topic! See more at [7].
5.4 What should we do to bring experiments into
daily life?
There is a few ways to do that but we have to employ Data Analysis
techniques in a great deal. We illustrate the way we do by going through a
particular instance, e.g. a forward-looking application in wood
industry. Then see the next section for the data analyzing.
Description
A household furnishture production project requires studying product
toughness using 8 factors. Steps are
Select experimental plan
54CHAPTER 5. STATISTICAL MODELINGINQUALITYENGINEERING
RUN Mix Ratio Temp Time Yield
1 45p (-) 100C (-) 30m (-) 8
2 55p (+) 100C (-) 30m (-) 9
3 45p (-) 150C (+) 30m (-) 34
4 55p (+) 150C (+) 30m (-) 52
5 45p (-) 100C (-) 90m (+) 16
6 55p (+) 100C (-) 90m (+) 22
7 45p (-) 150C (+) 90m (+) 45
8 55p (+) 150C (+) 90m (+) 56
Table 5.2: Results of an example 2
3
Full Factorial Experiment
Choose factors and levels
State objective
Conducting experiments
Data analysis
Draw conclusions and make recommendations
Select experimental plan
We employ a strength 3 fractional factorial design, also called strength 3
mixed-levels Orthogonal Array (OA) that has 96 runs and is able to
accomodate studying up to eight factors. This array is denoted by
OA(96; 6 4
2
2
5
; 3), its factors and their levels are described in Table 5.3.
The factor description of a workable design. The full factorial design
of the eight factors described above is the Cartesian product
0, 1, . . . , 5 0, 1, . . . , 3
2
0, 1
5
.
Using the full design, we are able to estimate all interactions, but
performing all 3072 runs exceeds the rms budget. Instead we use a
fractional factorial design, that is, a subset of elements in the full factorial
design.
Our aim choose a fractional design that has rather small runsize but still
allows us to estimate the main eects and some of the two-interactions.
A workable solution is the 96 run experimental design presented in Table
5.4. This allows us to estimate the main eect of each factor and some of
their pairwise interactions.
5.4. WHAT SHOULDWE DOTOBRINGEXPERIMENTS INTODAILYLIFE?55
Table 5.3: Eight factors, the number of levels and the level meanings
Level
Factor Description # 0 1 2 3 4 5
1 (A) wood 6 pine oak birch chestnut poplar walnut
2 (B) glue 4 a (less b c d (most
adhesive) adhesive)
3 (C) moisture content 4 10% 20% 30% 40%
4 (D) processing time 2 1 h(our) 2h
5 (E) pretreatment 2 no yes
6 (F) indenting of
wood samples 2 no yes
7 (G) pressure 2 1 pas(cal) 10 pas
8 (H) hardening
conditions 2 no yes
The construction of new factors given the run size of an OA of strength 2
and 3 (ie., extending factors while xing the number of experiments and the
strength) by a combined approach is detailed in Chapters 3 and 4 of [3].
Remark 8.
1. If we want to measure simultaneously all eects up to 2-interactions of
the above 8 factors, an ? run fractional design would be needed.
2. Constructing a ? run design is possible, and could be found with
trial-and-error algorithms. But it lacks some attractive features such
as balance, which will be discussed below.
3. The responses Y have been computed by simulation, not by conducting
actual experiments.
56CHAPTER 5. STATISTICAL MODELINGINQUALITYENGINEERING
run A B C D E F G H Y
wood glue moisture processing pre- indenting pressure hardening yield
type content time treatment of wood conditions
samples
6 4 4 2 2 2 2 2
1 0 0 0 0 0 0 0 0
2 0 0 1 1 1 0 1 1
3 0 0 2 1 0 1 1 0
4 0 0 3 0 1 1 0 1
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
81 5 0 0 1 1 1 0 0
82 5 0 1 0 0 1 1 1
83 5 0 2 0 1 0 1 0
84 5 0 3 1 0 0 0 1
85 5 1 0 0 1 0 1 1
86 5 1 1 1 1 1 0 0
87 5 1 2 1 0 0 0 1
88 5 1 3 0 0 1 1 0
89 5 2 0 0 0 1 0 1
90 5 2 1 1 0 0 1 0
91 5 2 2 1 1 1 1 1
92 5 2 3 0 1 0 0 0
93 5 3 0 1 0 0 1 0
94 5 3 1 0 1 0 0 1
95 5 3 2 0 0 1 0 0
96 5 3 3 1 1 1 1 1
Table 5.4: A mixed orthogonal design with 3 distinct sections
The 96 runs balanced factorial design.
Chapter 6
New directions and
Conclusion
This is a seminar-based chapter. Topics could be
6.1 Black-Scholes model in Finance
See Rubenstein.
6.2 Drug Resistance and Design of Anti-HIV drug
See Richard Bellman.
6.3 Epidemic Modeling
See O. Diekmann.
6.4 Conclusion
57
58 CHAPTER 6. NEW DIRECTIONS AND CONCLUSION
Chapter 7
Appendices
7.1 Appendix A: Theory of stochastic matrix for
MC
A stochastic matrix is a matrix for which each column sum equals one. If
the row sums also equal one, the matrix is called doubly stochastic. Hence
the transition probability matrix P = [p
ij
] is a stochastic matrix.
Proposition 18. Every stochastic matrix K has
1 as an eigenvalue (possibly with multiple), and
none of the eigenvalues exceeds 1 in absolute value, that is all
eigenvalues
i
satisfy [
i
[ 1.
Proof. DIY
Fact 9. If K is a stochastic matrix then K
m
is a stochastic matrix.
Proof. Let e = [1, 1, , 1]
t
the all-one vector, then use the fact that
Ke = e. Prove that K
m
e = e.
Let A = [a
ij
] > 0 denote that every element a
ij
of A satises the condition
a
ij
> 0.
Denition 19.
A stochastic matrix P = [p
ij
] is ergodic if lim
m
P
m
= L (say)
exists, that is each p
(m)
ij
has a limit when m .
59
60 CHAPTER 7. APPENDICES
A stochastic matrix P is regular if there exists a natural m such that
P
m
> 0. In our context, a Markov chain, with transition probability
matrix P, is called regular if there exists an m > 0 such that P
m
> 0,
i.e. there is a nite positive integer m such that after m time-steps,
every state has a nonzero chance of being occupied, no matter what
the initial state.
Example 12. Is the matrix
P =
0.88 0.12
0.15 0.85
.
Proof. If (1) is proved then, by Theorem 20, P = [p
ij
] is ergodic. Hence,
when P = [p
ij
] is regular, the limit matrix L = lim
m
P
m
does exist. By
the Spectral Decomposition (7.1),
P = E
1
+
2
E
2
+ +
k
E
k
, where all [
i
[ < 1, i = 2, , k.
Then, by (7.2) L = lim
m
P
m
= lim
m
(E
1
+
m
2
E
2
+ +
m
k
E
k
) = E
1
.
Let vector p
P = p
(P 1I) = 0, (p
i.e.: L = [p
, , p
.
7.2. APPENDIXB: SPECTRAL THEOREMFOR DIAGONALIZABLE MATRICES61
Corollary 22. Few important remarks are: (a) for regular MC, the
long-term behavior does not depend on the initial state distribution
probabilities p(0); (b) in general, the limiting distributions are inuenced by
the initial distributions p(0), whenever the stochastic matrix P = [p
ij
] is
ergodic but not regular. (See more at problem D).
Example 13. Consider a Markov chain with two states and transition
probability matrix
3/4 1/4
1/2 1/2
y
t
1
y
t
2
.
.
.
y
t
k
; (i.e.K = (y
1
[y
2
[ [y
k
)).
Here each y
i
is a basis right eigenvector of the null subspace
N(P
i
I) = v : v
P =
i
v
.
The constituent matrices E
i
= x
i
y
t
i
.
Example 14. Diagonalize the following matrix and provide its spectral
decomposition.
P =
1 4 4
8 11 8
8 8 5
.
The characteristic equation is p() = det(P I) =
3
+ 5
2
+ 3 9 = 0.
So = 1 is a simple eigenvalue, and = 3 is repeated twice (its algebraic
multiplicity is 2). Any set of vectors x satisfying
x N(P I) (P I)x = 0 can be taken as a basis of the eigenspaces
(null space) N(P I). Bases of for the eigenspaces are:
N(P 1I) = span
[1, 2, 2]
[1, 1, 0]
, [1, 0, 1]
.
Easy to check that these three eigenvectors x
i
form a linearly independent
set, then P is diagonalizable. The nonsingular matrix (also called similarity
transformation matrix)
7.2. APPENDIXB: SPECTRAL THEOREMFOR DIAGONALIZABLE MATRICES63
H = (x
1
[x
2
[x
3
) =
1 1 1
2 1 0
2 0 1
;
will diagonalize P, and since P = HDH
1
we have
H
1
PH = D = Diagmat(
1
,
2
,
2
) = Diagmat(1, 3, 3) =
1 0 0
0 3 0
0 0 3
Here, H
1
=
1 1 1
2 3 2
2 2 1
implies that
y
t
1
= [1, 1, 1], y
t
2
= [2, 3, 2], y
t
3
= [2, 2, 1]. Therefore, the constituent
matrices
E
1
= x
1
y
t
1
=
1 1 1
2 2 2
2 2 2
; E
2
= x
2
y
t
2
=
2 3 2
2 3 2
0 0 0
; E
3
= x
3
y
t
3
=
2 2 1
0 0 0
2 2 1
.
Obviously,
P =
1
E
1
+
2
E
2
+
3
E
3
=
1 4 4
8 11 8
8 8 5
.
64 CHAPTER 7. APPENDICES
Bibliography
[1] Arjeh M. Cohen, Computer algebra in industry: Problem Solving in
Practice, Wiley, 1993
[2] Nguyen, V. M. Man and the DAG group at Eindhoven University of
Technology, www.mathdox.org/nguyen, 2005,
[3] Nguyen, V. M. Man Computer-Algebraic Methods for the Construction
of Designs of Experiments, Ph.D. thesis, 2005, Technische Universiteit
Eindhoven, www.mathdox.org/nguyen
[4] Nguyen, Van Minh Man, Depart. of Computer Science, Faculty of CSE,
HCMUT, Vietnam, www.cse.hcmut.edu.vn/ mnguyen
[5] Brouwer E. Andries, Cohen M. Arjeh and Nguyen, V. M. Man,
Orthogonal arrays of strength 3 and small run sizes,
www.cse.hcmut.edu.vn/ mnguyen/OrthogonalArray-strength3.pdf,
Journal of Statistical Planning and Inference, 136 (2007)
[6] Nguyen, V. M. Man, Constructions of strength 3 mixed orthogonal
arrays,
www.cse.hcmut.edu.vn/ mnguyen/Specific-Constructions-OAs.pdf,
Journal of Statistical Planning and Inference 138- Jan 2008,
[7] Eric D. Schoen and Nguyen, V. M. Man, Enumeration and
Classication of Orthogonal Arrays, Faculty of Applied Economics,
University of Antwerp, Belgium (2007)
[8] Huynh, V. Linh and Nguyen, V. M. Man, Discrete Event Modeling in
Optimization for Project Management, B.E. thesis, HCMUT, 69 pages,
2008.
[9] T. Beth, D. Jung Nickel and H. Lenz. Design Theory vol II, pp 880,
Encyclopedia of Mathematics and Its Applications, Cambridge
University Press (1999)
65
66 BIBLIOGRAPHY
[10] Glonek G.F.V. and Solomon P.J., Factorial and time course designs for
cDNA microarray experiments, Biostatistics 5, 89-111, 2004
[11] N. J. A. Sloane, A Library of Orthogonal Arrays
http://www.research.att.com/ njas/oadir/index.html/,
[12] Warren Kuhfeld,
http://support.sas.com/techsup/technote/ts723.html/
[13] Hedayat, A. S. and Sloane, N. J. A. and Stufken, J., Orthogonal
Arrays, Springer-Verlag, 1999
[14] Madhav, S. P., iSixSigma LLC, Design Of Experiment For Software
Testing, isixsigma.com/library/content/c030106a.asp, 2004
[15] Bernd Sturmfels, Solving Polynomial Systems, AMS, 2002
[16] OpenModelica project, Sweden 2006,
www.ida.liu.se/ pelab/modelica/OpenModelica.html
[17] Computer Algebra System for polynomial computations, Germany
2006 www.singular.uni-kl.de/
[18] Sudhir Gupta. Balanced Factorial Designs for cDNA Microarray
Experiments, Communications in Statistics: Theory and
Methods, Volume 35, Number 8 (2006) , pp. 1469-1476
[19] Morris W. Hisch, Stephen Smale, Dierential Equations, Dynamical
Systems and Linear Algebra, 1980
[20] Jame Thomson, Simulation, A Modelers Approach, Wiley, 2000
[21] David Insua, Jesus Palomo, Simulation in Industrial Statistics, SAMSI,
2005
[22] Ruth J. Williams, Introduction to the Mathematics of Finance, AMS
vol 72, 2006
[23] C. S. Tapiero, Risk and Financial Management- Mathematical
Methods, Wiley, 2004
[24] A.K. Basu, Introduction to Stochastic Processes, Alpha Science 2005
[25] L. Kleinrock, Queueing Systems, vol 2, John Wiley & Sons, 1976
BIBLIOGRAPHY 67
[26] L. Kleinrock, Time-shared systems: A theoretical treatment, Journal of
the ACM 14 (2), 1967, 242-261.
[27] S. G. Gilmour, Fundamentals of Statistics I, Lecture Notes School of
Mathematical Sciences Queen Mary, University of London, 2006
[28] M. Parlar, Interactive Operations Research with Maple, Birkhouser
2000.
[29] Tim Holliday, Pistone, Riccomagno, Wynn, The Application of
Computational Algebraic Geometry to the Analysis of Design of
Experiments: A Case Study.
Copyright 2010 by
Lecturer Nguyen V. M. Man, Ph.D. in Statistics
Working area Algebraic Statistics, Experimental Designs,
Statistical Optimization and Operations Research
Institution University of Technology of HCMC
Address 268 Ly Thuong Kiet, Dist. 10, HCMC, Vietnam
Ehome: www.cse.hcmut.edu.vn/~mnguyen
Email: mnguyen@cse.hcmut.edu.vn
mannvm@uef.edu.vn