You are on page 1of 65

Introduction to Probability

 A phenomenon is called random if the outcome of an


experiment is uncertain.

 Random phenomena often follow recognizable patterns.

 This long-run regularity of random phenomena can be


described mathematically.

The mathematical study of randomness is called


probability theory—probability provides a mathematical
description of randomness.
Elementary Probability Theory
 Probability is a set function P that assigns to each event A
in the sample space Ω a number p(A), called the
probability of event A, such that the following are true:
p(A) ≥ 0 for every event A
p(Ω) = 1.00
 If there is a set of countable events {A1, A2, . . . An}, and
if the events are mutually exclusive, then:
p ( A1 ∪ A2 ∪ …..∪ An ) = p ( A1 ) + p ( A2 ) + ……
p (An )
 The probability of the union of mutually exclusive
events is the sum of their individual probabilities .
Sample Spaces
 Probabili
ties estimate the frequency of outcomes of random experiments.

 Outcome
s can be from a finite or countable sample space (set) Ω of events or be
tuples drawn over reals R.

Coin
toss: Ω = {H,T}
Packets
to a URL per day: Ω = N (positive integers)
Rain in
cms/month in Prov.: Ω = R (real number)
Probability Space
 Sample space: all possible outcomes
 Events: A family F of subsets of sample space Ω.
E.g. Ω = {H,T}3, F0 = {TTT, HHT, HTH, THH} (Even no.
Hs). F1 = {HTT, THT, TTH, HHH} (Odd no. Hs).
 Events are mutually exclusive if they are disjoint.
E.g. F0 and F1 above.
 A probability distribution is a function:
The probability distribution assigns a probability
0 ≤ P(E) ≤ 1 to each event E.
Properties of Probability
Function
1. For any event E in Ω, 0 ≤ P(E) ≤ 1.
P(Ω) = 1
For any finite or countable infinite sequence of disjoint
events E1, E2, …
Probabilities of Events
2. Joint probability:

Notation:
3. Probability of a union :

4. Complement of event A: A =A = Ω–A.P(A )


=1
Probabilities of Events
5. If events A and B are mutually exclusive
P(A B) = 0
P(A B) = P(A) + P(B)

6. Conditional probability of A given B,


P(A/B) = P(A,B)/P(B) or P(A,B) = P(A/B)P(B)
Baye’s Rule: P(B/A)=P(B).P(A/B)/P(A).

7. Events A and B are statistically independent if


P(A/B) = P(A), i.e., P(A,B) = P(A)P(B)
Marginal Probability
 Given a sample space Ω = K2 containing pairs of events
Ai,Bj over K, the marginal probability is

P(A) = ∑j P(A,Bj),

where Bj are mutually exclusive.


Principle of Exclusion/Inclusion
Let |A| = size of A

|A∪B| = |A|+|B| - |A∩B|

|A∪B ∪ C| = |A|+|B|+|C| - |A∩B|


- |A∩C| - |B∩C| + |A∩B ∩C|
Definition of random variable
A random variable is a function that assigns a number
to each outcome in a sample space.
 If the set of all possible values of a random variable X
is countable, then X is discrete. The distribution of X is
described by a probability mass function:
( )
p ( a ) = P { s ∈ S : X ( s ) = a} = P { X = a}
Otherwise, X is a continuous random variable if there
is a nonnegative function f(x), defined for all real
numbers x, such that for any set B,

( )
P { s ∈ S : X ( s ) ∈ B} = P { X ∈ B} = ∫ f ( x ) dx
B

f(x) is called the probability density function of X.


Applications of Random Variable
Study of probability
The number of stock price increases in a month
Analysis of games of chance, stochastic events
Results of scientific experiments
Estimation of wireline & wireless channel because the
randomness of received signal
Many business application where we will have random variable
such that probability distribution & expected value is unknown
Random Variables

Discrete: Bernoulli, Binomial, Geometric, Poisson.

Continuous: Uniform, Exponential,Gamma, Normal.

Expectation & Variance, Joint Distributions, Moment


Generating Functions, Limit Theorems.
pmf’s and cdf’s
The probability mass function (pmf) for a discrete
random variable is positive for at most a countable
number of values of X: x1, x2, …, and

∑ p( x ) =1
i =1
i

The cumulative distribution function (cdf) for any


random variable X is
F ( x ) = P { X ≤ x}
F(x) is a nondecreasing function with
lim F ( x ) = 0 and lim F ( x ) = 1
x →−∞ x →∞

For a discrete random variable X : F ( a ) = ∑ p( x )


xi ≤ a
i
Bernoulli Random Variable
An experiment has two possible outcomes, called
“success” and “failure”: sometimes called a Bernoulli
trial

The probability of success is p

X = 1 if success occurs, X = 0 if failure occurs


Then p(0) = P{X = 0} = 1 – p and p(1) = P{X = 1} = p
X is a Bernoulli random variable with parameter p.
Binomial Random Variable
A sequence of n independent Bernoulli trials are performed,
where the probability of success on each trial is p

X is the number of successes


Then for i = 0, 1, …, n,
 n i
p ( i ) = P { X = i} =   p ( 1 − p )
n −i

i
where  n  n!
 =
 i  i !( n − i ) !
X is a binomial random variable with parameters n and p.
Geometric Random Variable
A sequence of independent Bernoulli trials is performed with
p = P(success)
X is the number of trials until (including) the first success.
Then X may equal 1, 2, … and

p ( i ) = P { X = i} = ( 1 − p )
i −1
p, i = 1, 2,...

X is named after the geometric series:



1
If r < 1, then ∑ r = i

i =1 1− r
Use this to verify that : ∞

∑ p ( i) = 1
i =1
Poisson Random Variable
X is a Poisson random variable with parameter l > 0 if
e−λ λ i
p ( i ) = P { X = i} = , i = 0,1,...
i!

Note:
∑ i =0 p ( i ) = 1 follows from eλ = ∑i=0 λ i i !
∞ ∞

X can represent the number of “rare events” that occur


during an interval of specified length

 A Poisson random variable can also approximate a


binomial random variable with large n and small p if l =
np: split the interval into n subintervals, and label the
occurrence of an event during a subinterval as “success”.
Continuous random variables
A probability density function (pdf) must satisfy:

f ( x) ≥ 0

∫ f ( x ) dx = 1
−∞
b
P { a ≤ X ≤ b} = ∫ f ( x ) dx (note P { X = a} = 0)
a

a dF ( x )
 The cdf is: F ( a ) = P { X ≤ a} = ∫ f ( x ) dx, so f ( x ) =
−∞ dx
 ε ε
P  a − ≤ X ≤ a +  ≈ ε f ( means
a) that f(a) measures how
 2 2 likely X is to be near a.
Uniform random variable
X is uniformly distributed over an interval (a, b) if its pdf
is  1
 , a< x<b all we know
f ( x) = b − a about
0, otherwise
X is that it takes
a
Then its cdf is: value between a
0, x ≤ a and b
x−a

F ( x) =  ,a< x<b
b − a
1, x ≥ b
Exponential random variable
X has an exponential distribution with parameter l > 0 if
its pdf is
λ e − λ x , x ≥ 0
f ( x) = 
0, otherwise
Then its cdf is:

0, x < 0
F ( x) =  −λ x
1 − e ,x≥0

This distribution has very special characteristics that we


will use often!
Gamma random variable
X has an gamma distribution with parameters l > 0 and a >
0 if its pdf is
 λ e − λ x ( λ x ) α −1
 , x≥0
f ( x) =  Γ(α )

0, otherwise
It gets its name from the gamma function

Γ ( α ) = ∫ e − x xα −1dx
0

If a is an integer, then
Γ ( α ) = ( α −1) !
Normal random variable
 X has a normal distribution with parameters m and s2 if its
pdf is
1 −( x − µ ) 2
f ( x) =
2σ 2
e , −∞ < x < ∞

This is the classic “bell-shaped” distribution widely used


in statistics. It has the useful characteristic that a linear
function Y = aX+b is normally distributed with parameters
am+b and (as)2 . In particular, Z = (X – m)/s has the
standard normal distribution with parameters 0and 1.
Expectation
 Expected value (mean) of a random variable is

 ∑ i xi p ( xi ) , discrete

E[ X ] =  ∞
 ∫- ∞ xf ( x ) dx, continuous

Also called first moment – like moment of inertia of the


probability distribution
 If the experiment is repeated and random variable observed
many times, it represents the long run average value of the
r.v.
Expectations of Discrete Random
Variables
Bernoulli: E[X] = 1(p) + 0(1-p) = p

Binomial: E[X] = np

Geometric: E[X] = 1/p (by a trick, see text)

Poisson: E[X] = l : the parameter is the expected or


average number of “rare events” per interval; the random
variable is the number of events in a particular interval
chosen at random
Expectations of Continuous
Random Variables
Uniform: E[X] = (a + b)/2

Exponential: E[X] = 1/l

Gamma: E[X] = ab

Normal: E[X] = m : the first parameter is the expected

value: note that its density is symmetric about x = m:


1 − ( x − µ ) 2 2σ 2
f ( x) = e , −∞ < x < ∞

Expectation of a function of a r.v.
First way: If X is a r.v., then Y = g(X) is a r.v.. Find the
distribution of Y, then find
E [ Y ] = ∑ i yi p ( yi )
Second way: If X is a random variable, then for any
real-valued function g,

 ∑ i g ( xi ) p ( xi ) , X discrete

E  g ( X )  =  ∞
 ∫- ∞ g ( x ) f ( x ) dx, X continuous
If g(X) is a linear function of X:

E [ aX + b] = aE [ X ] + b
Higher-order moments
The nth moment of X is E[Xn]:

 ∑ xi n p ( xi ) , discrete
 i
E  X  =  ∞
n

 ∫- ∞ x f ( x ) dx, continuous
n

The variance is

Var ( X ) = E ( X − E [ X ] ) 
 2

 
It is sometimes easier to calculate as

Var ( X ) = E  X 2  − ( E [ X ] )
2
Variances of Discrete Random
Variables
Bernoulli: E[X2] = 1(p) + 0(1-p) = p; Var(X) = p – p2 =

p(1-p)
Binomial: Var(X) = np(1-p)

Geometric: Var(X) = 1/p2 (similar trick as for E[X])

Poisson: Var(X) = l : the parameter is also the variance of

the number of “rare events” per interval!


Variances of Continuous Random
Variables
Uniform: Var(X) = (b - a)2/2

Exponential: Var(X) = 1/l

Gamma: Var(X) = ab2

Normal: Var(X) = s 2: the second parameter is the

variance
Jointly Distributed Random Variables
For definitions of joint cdf, pmf, pdf, marginal
distributions, main results that we can use:
E[ X +Y ] = E[ X ] + E[ Y]
E [ aX + bY ] = aE [ X ] + bE [ Y ]
E [ a1 X 1 + a2 X 2 + ... + an X n ] = a1 E [ X1 ] + ... + an E [ X n ]

 It is especially useful with indicator r.v.’s:


IA = 1 if A occurs,
0 otherwise
Independent Random Variables
X and Y are independent if
P { X ≤ a, Y ≤ b} = P { X ≤ a} P { Y ≤ b}

p ( x, y ) = p X ( x ) pY ( y ) (discrete)
f ( x, y ) = f X ( x ) fY ( y ) (continuous)

This implies that:

E  g ( X ) h ( Y )  = E  g ( X )  E h ( Y ) 
Also, if X and Y are independent, then for any functions h and g,
Covariance
The covariance of X and Y is:

Cov ( X , Y ) = E ( X − E [ X ] ) ( Y − E [ Y ] )  = E [ XY ] − E [ X ] E [ Y ]

If X and Y are independent then Cov(X,Y) = 0.


Properties:
Cov ( X , X ) = Var ( X )
Cov ( X , Y ) = Var ( Y , X )
Cov ( cX , Y ) = cCov ( X , Y )
Cov ( X , Y + Z ) = Cov ( X , Y ) + Cov ( X , Z )
Variance of a sum of r.v.’s
 n   n n  n n
Var  ∑ X i  = Cov  ∑ X i , ∑ X j  = ∑ Var ( X i ) + 2 ∑∑ Cov ( X i , X j )
 i=1   i=1 j=1  i=1 i=1 j<i

If X1, X2, …, Xn are independent, then

 n  n
Var  ∑ X i  = ∑ Var ( X i )
 i=1  i=1
Moment generating function
The moment generating function of a r.v. X is

 ∑ i etxi p ( xi ) , X discrete

φ ( t ) = E  etX  =  ∞
 ∫- ∞ e f ( x ) dx, X continuous
tx

Its name comes from the fact that d nφ ( t )


= E  X n 
dt n t =0
Also, if X and Y are independent, then
φ X +Y ( t ) = φX ( t ) φY ( t )
And, there is a one-to-one correspondence between the
m.g.f. and the distribution function of a r.v. – this helps to
identify distributions with the reproductive property
Random Process-Introduction
Random signals cannot be described prior to their
occurrences.

However ,when observed over a long period ,a random


signal or noise may exhibit certain regularities that can be
described in terms of probabilities and statistical averages.

Such a model, in the form of a probabilistic description of


a collection of functions of times ,is called a random
process.
Random Process-Definition
A random process may be thought of as a collection, or
ensemble, of functions of time, any one of which might be
observed on any trial of an experiment.

It is a family or ensemble of signals that correspond to


every possible outcome of a certain signal measurement.

Each signal in this collection is referred to as a sample


function of the process.
Random process
A random process is a collection of time functions, or
signals, corresponding to various outcomes of a
random experiment.

Random
variables
Real number

Sample functions
or realizations
(deterministic
function)
time (t)
Random process-Basic concepts
 Random processes like any other types of signals can be
classified in a number of different ways.

- Continuous or discrete
- Analog or digital
- Deterministic or non-deterministic
- Stationary or non-stationary
- Ergodic or non-Ergodic
Cross-correlation and Auto-
correlation
Correlation determines the degree of similarity between two signals.
If the signals are identical, then the correlation coefficient (ρ)is 1;
 if they are totally different, then ρ =0.
 and if they are identical except that the phase is shifted by exactly 180
degrees(i.e. mirrored), then ρ = -1.
 When two independent signals are compared, the procedure is known
as cross-correlation, and when the same signal is compared to phase
shifted copies of itself, the procedure is known as autocorrelation.
Autocorrelation
Autocorrelation of an energy signal

Autocorrelation of a power signal

For a periodic signal:

Autocorrelation of a random signal

For a WSS process:


Properties of an autocorrelation
function
 For real-valued (and WSS in case of random signals):
1. Autocorrelation and spectral density form a Fourier
transform pair.
2. Autocorrelation is symmetric around zero.
3. Its maximum value occurs at the origin.
4. Its value at the origin is equal to the average power
or energy.
Stationary
A random Process is stationary if all its statistical
properties do not change with time.

- First order stationary process


- Second order stationary process
- Wide Sense stationary process
- Strict Sense Stationary process
First order stationary process
The probability density function of a first order
stationary process satisfies

p x ( x1 , t1 ) = p x ( x1 , t1 + ∆) for all t1 , ∆ ∈ ℜ
If x(t) is a first order stationary process then

E{x(t )} = E{x(t + ∆)} = x = constant


Second order stationary process
The Second order density function of a second order
stationary process satisfies
p x ( x1 , x2 , t1 , t 2 ) = p x ( x1 , x2 , t1 + ∆, t 2 + ∆)
for all t1 , t 2 , ∆ ∈ ℜ

If x(t) is a second order stationary process then

Rxx (t1 , t1 + τ } = E{x(t1 ) x(t1 + τ )} = Rxx (τ )


Second order stationary process
The second order stationary may be more restrictive
than needed in many applications. A more useful form
is Wide Sense Stationary process.
A process is Wide Sense Stationary process if

E{x(t ) } = x = constant
E{x(t1 ) x(t1 + τ )} = Rxx (τ )

Second order stationary Wide Sense Stationary


process.
Stationary process
Definition of Mean

E ( X (kt ))= ∫ xp
( kX ) t ( x) dx =X (kt )
m
−∞
Definition of Autocorrelation

RX (t1 , t2 ) = E[ X (t1 ) X (t2 )]


Where X(t1),X(t2) are random variables obtained at t1,t2
E ( X (tk ) = m X = constant
 A process is (wide-sense) stationary if
RX (t1 , t2 ) = RX (t1 − t2 ) = RX (τ )
Example
x(t ) = A cos(ω0t + φ )
A and ω0 are constants,
φ is uniformly distributed random variable on [0,2π ]
2π 1
E{ x(t)} = ∫ A cos(ω0t + φ ) dφ = 0
0 2π
2π 1
E{ x(t)x(t + τ)} = ∫ A cos(ω0t + φ ) A cos(ω0 (t + τ ) + φ ) dφ = 0
0 2π
A2 A2 A2
= cos(ω0τ ) + E{ cos(2ω0t + ω0τ + 2φ )} = cos(ω0τ )
2 2 2
E{x(t )} = constant 
 ⇒ x(t ) is wide sense stationary
E{ x(t)x(t + τ)} = Rxx ( τ )
Nth order and strict sense Stationary

A random process is stationary of order N if its Nth


order density function is to shift in the origin time.

A random process is strict sense stationary if it is


stationary of all orders N=1,2,3,….
Jointly Wide Sense stationary
x(t ) and y (t ) are jointly wide sensestationaryif
each is widesensestationaryand
E{x(t ) y (t + τ )} = Rxy (τ )
Time Average
The time average of x(t)
1 T
T . A. of x = lim
T →∞ 2T ∫
−T
x(t )dt

The time autocorrelation function is


1 T
R xx (τ ) = lim
T →∞ 2T ∫
−T
x(t ) x(t + τ ) dt
Ergodic Process
A process is ergodic If the time average is equal to the
ensemble average.

If a process is ergodic then a single sample time signal


contains all statistical variations of the process.

We do not need more than one time sample.


Ergodic Process
If the time average and time autocorrelation are equal to
the statistical average and statistical autocorrelation then
the process is ergodic.

∞ 1 T
∫−∞ x(t ) p( x) dx = Tlim
→∞ 2T ∫−T
x(t )dt
∞ 1 T
∫−∞ ( x(t ) x(t − τ ) ) p( x) dx = Tlim
→∞ 2T ∫
−T
x(t ) x(t + τ ) dt

Ergodity is very restrictive. The assumption of ergodity is


used to simplify the problem
Spectral density:
Sx(n)

frequency, n
The spectral density, (auto-spectral density, power spectral density,
spectrum) describes the average frequency content of a random process, x(t)

The quantity Sx(n).δ n represents the contribution∞to σ x


2
from the
σ x = ∫ S x (n) dn
2
frequency increment δ n
0
Units of Sx(n) : [units of x]2 . sec

54
Spectral density:
2 2
Sx (n) = Lim  X T (n) 
T →∞ T
 
Where XT(n) is the Fourier Transform of the process x(t)
taken over the time interval -T/2<t<+T/2

Where XT(n) is the Fourier Transform of the process x(t)


taken over the time interval -T/2<t<+T/2

Usually a Fast Fourier Transform (FFT) algorithm is used


Spectral density:

Sx (n) = 2 ∫ ρ x (τ )e −i 2πnτ dτ
-∞

The spectral density is twice the Fourier


Transform of the autocorrelation function
Inverse relationship:
{ ∞
ρ x (τ ) = Re al ∫ Sx (n)e
0
i 2πnτ
} ∞
dn = ∫ Sx (n)cos(2 πnτ )dn
0

Thus the spectral density and auto-correlation


are closely linked -
they basically provide the same information
about the process x(t)
Cross-correlation :
x(t)

τ x

time, t T
y(t)

y

time, t T
The cross-correlation function describes the
general dependency of x(t) with another
random process y(t+τ ), delayed by a time
delay, τ
cxy (τ) =Lim
T→∞
1 T
T ∫0
[x(t) ][
- x . y(t +τ) - y dt ]
Covariance:
The covariance is the cross correlation function with the
time delay, τ , set to zero

1 T
[ ][ ]
c xy (0) = x′(t).y′(t) = Lim ∫ x(t) - x . y(t) - y dt
T→∞ T 0

Note that here x'(t) and y'(t) are used to denote


the fluctuating parts of x(t) and y(t) (mean parts
subtracted
Correlation coefficient:
The correlation coefficient, ρ , is the covariance
normalized by the standard deviations of x and y

x' (t).y' (t)


ρ=
σ x .σ y
When x and y are identical to each other, the
value of ρ is +1 (full correlation)

When y(t)=−x(t), the value of ρ is − 1

In general, − 1< ρ < +1


Correlation - application:
The fluctuating wind loading of a tower depends on the
correlation coefficient between wind velocities and hence
wind loads, at various heights.

z2

z1

u'(z1 ).u'(z2 )
For heights, z1, and z2 : ρ(z1 , z 2 ) =
σ u (z1 ).σ u (z2 )
Cross spectral density:
By analogy with the spectral density:


Sxy (n) = 2 ∫ c xy (τ )e −i 2πnτ dτ
-∞
The cross spectral density is twice the Fourier Transform of
the cross-correlation function for the processes x(t) and y(t).

The cross-spectral density (cross-spectrum) is a complex


number :
Sxy (n) = C xy (n) + iQxy (n)

Cxy (n) is the co(-incident) spectral density - (in phase)


Qxy (n) is the quad (-rature) spectral density - (out of phase)
Normalized co- spectral density:
C xy (n)
ρ xy (n) =
S x (n).S y (n)
It is effectively a correlation coefficient for fluctuations at
frequency, n
Application : Excitation of resonant vibration of structures by
fluctuating wind forces
If x(t) and y(t) are local fluctuating forces acting at different
parts of the structure, ρ xy (n1) describes how well the forces are
correlated (‘synchronized’) at the structural natural frequency,
n1.
Input - output relationships:
Input x(t) Output y(t)
Linear system

There are many cases in which it is of interest to know how an input


random process x(t) is modified by a system to give a random output
process y(t)

Application : The input is wind force - the output is structural response


(e.g. displacement acceleration, stress). The ‘system’ is the dynamic
characteristics of the structure.

Linear system : 1) output resulting from a sum of inputs, is equal to


the sum of outputs produced by each input individually (additive
property).
Continue…
Linear system(2): output produced by a constant times
the input, is equal to the constant times the output
produced by the input alone (homogeneous property)
Input - output relationships :
2
Sy (n) = A . H(n) .Sx (n)

|H(n)|2 is a transfer function, frequency response function,


or ‘admittance’
Thank You

You might also like