You are on page 1of 9

Fact Sheet 0

MA204 Statistics 2008


Department of Mathematics, IIT Madras

1. Probability Postulates: Let Ai for i ∈ I, a countable index set, be events in the sample
space S, i.e., subsets of S.
1. 0 ≤ P (Ai ) ≤ 1.
2. P (∅) = 0 and P (S) = 1.
P
3. If Ai ’s are mutually exclusive events, then P (∪i Ai ) = i P (Ai ).

2. Discrete Probability: If A is an event in a discrete sample space, then P (A) is the


sum of the probabilities of the individual outcomes. If an experiment has N different
outcomes and an event A constitutes n of them, then P (A) = n/N.

3. Complementarity: If A and A0 are complementary events, then P (A0 ) = 1 − P (A).

4. Sub-event: If A ⊆ B ⊆ S are events, then P (A) ≤ P (B).

5. General Addition: If A, B are any events, then P (A ∪ B) = P (A) + P (B) − P (A ∩ B).

6. Conditional Probability: For two events A, B with P (A) 6= 0, the conditional proba-
bility of B given A is P (B|A) = P (A ∩ B)/P (A).

7. Independence: Two events A and B are independent iff P (A ∩ B) = P (A)P (B).

8. Independence: Events A1 , · · · , An , for n > 2, are independent iff probability of intersec-


tion of any 2, 3, 4, . . . , n of these events equals the product of their respective probabilities.

9. Rule of Elimination: If events B1 , B2 , . . . , Bk constitute a partition of the sample space


S and P (Bi ) 6= 0 for any i, then for any event A in S, P (A) = ki=1 {P (Bi )P (A|Bi )}.
P

10. Bayes’ Theorem: If events B1 , B2 , . . . , Bk constitute a partition of the sample space S


and P (Bi ) 6= 0 for any i, then for any nonempty event A in S,
P (Bj )P (A|Bj )
P (Bj |A) = Pk
i=1 {P (Bi )P (A|Bi )}

11. Chebyshev’s Theorem: Let X be a randam variable with mean µ and standard devi-
ation σ. Then
1
P (|X − µ| < kσ) ≥ 1 − 2 , for any k > 0.
k

(Probability is at least 1 − 1/k 2 that X will take on a value within k standard deviations
of the mean.)

12. Markov’s Inequality: Let X be a randam variable with probability density f (x), where
µ
f (x) = 0 for x < 0. If µ is the mean, then for any a > 0, P (X ≥ a) ≤ .
a
Fact Sheet 1
MA204 Statistics 2008
Department of Mathematics, IIT Madras

1. Discrete Uniform Distribution:


1
pdf is f (x) = for x = x1 , · · · , xk , distinct.
k
σ 2 = (xi − µ)2 /k. [For xi = i, µ = (k + 1)/2, σ 2 = (k 2 − 1)/2]
P P
µ = ( xi )/k,

2. Bernoulli Distribution: Parameter θ = probability of success.


pdf is f (x; θ) = θx (1 − θ)1−x for x = 0, 1.
µ = θ, σ 2 = θ − θ2 . (f (0; θ) = 1 − θ and f (1; θ) = θ.)

3. Binomial Distribution: Parameters n and θ = probability of success in a trial. X is


the no. of success in n trials.
pdf is b(x; n; θ) = nx θx (1 − θ)(n−x) , for x = 0, 1, 2, · · · , n.


µ = nθ, σ 2 = nθ(1 − θ).

4. Poisson Distribution: Parameter λ.


λx e−λ
pdf is p(x; λ) = x!
, for x = 0, 1, 2, · · ·
µ = λ, σ 2 = λ.

5. Uniform Distribution: Parameters α < β.



 1/(β − α) for α < x < β
Density is f (x) =
 0 elsewhere .

µ = (α + β)/2, σ 2 = (β − α)2 /12.

6. Gamma Distribution : Parameters α > 0, β > 0.


Density is (
1
β α Γ(α)
xα−1 e−x/β for x > 0
f (x) =
0 elsewhere.
µ = αβ, σ 2 = αβ 2 .

1
7. Exponential Distribution: Parameter θ > 0.
Density is (
1 −x/θ
θ
e for x > 0
f (x) =
0 elsewhere.
µ = θ, σ 2 = θ2 .

8. Chi-square Distribution : Parameter ν > 0.


Density is  1 (ν/2)−1 −x/2
 2ν/2 Γ(ν/2) x e for x > 0


f (x) =


0 elsewhere.

µ = ν, σ 2 = 2ν. (ν, usually an integer called degrees of freedom.)

9. Beta Distribution : Parameters α > 0, β > 0.


Density is  Γ(α+β)
α−1 β−1
 Γ(α)Γ(β) x (1 − x)
 for 0 < x < 1
f (x) =

0 elsewhere.

α αβ
µ= , σ2 = 2 .
α+β (α + β) (α + β + 1)
10. Normal Distribution : Parameters µ, and σ > 0.
Density is
1 1 x−µ 2
n(x; µ, σ) = √ e− 2 ( σ ) , for − ∞ < x < ∞.
σ 2π
Mean = µ, Variance = σ 2 .
When µ = 0, σ 2 = 1, it is called a standard normal distribution.

11. Theorem-1: Let X have a binomial distribution with the parameters n and θ and let
Y = X/n. Then
θ(1 − θ)
E(Y ) = θ and σY2 = .
n
12. Theorem-2 : If X has a normal distribution with mean µ and standard deviation σ,
X −µ
then Z = has the standard normal distribution.
σ
13. Theorem-3 : If X is a random variable having binomial distribution with the parameters
X − nθ
n and θ, then the moment generating function of Z = p approaches the moment
nθ(1 − θ)
generating function of the standard normal distribution, as n → ∞.

2
Fact Sheet 2
MA204 Statistics 2008
Department of Mathematics, IIT Madras

• If X1 , . . . , Xn constitute a random sample, Pi.e., they are independent and identically


distributed, then the sample mean is X = n1 ni=1 Xi , and the sample variance is
1
Pn
S 2 = n−1 2
i=1 (Xi − X) .
2
= n1 ni=1 (Xi − X)2 .
P
The variance of the sample is V ar = σX
• From a finite population of size N , If a sample X1 , . . . , Xn is drawn where Xi is the i−th
one drawn, the joint probability density of these random variables is given by
1
f (x1 , . . . , xn ) =
N (N − 1) · · · (N − n + 1)
We say that X1 , . . . , Xn constitute a random sample from such a population.
• When a normal distribution is used as an approximation to a binomial distribution,
each nonnegative integer k is represented as the interval [k − 0.5, k + 0.5]. That is, the
binomial probability P (X = k) is computed as the corresponding normal probability
P (k − 0.5 ≤ X ≤ k + 0.5). (Continuity Correction)
• The t-distribution with ν degrees of freedom has the density:
−(ν+1)/2
t2

Γ((ν + 1)/2)
f (t) = √ 1+ for t ∈ R.
πν Γ(ν/2) ν

• The F-distribution with degrees of freedom ν1 and ν2 has the density :


  ν /2  −(ν1 +ν2 )/2
 Γ((ν1 + ν2 )/2) ν1 2 (ν1 /ν2 )−1 ν1
f (x) = x 1+ x for x > 0
 Γ(ν 1 /2)Γ(ν 2 /2) ν 2 ν2
0 otherwise

Theorem 1 If X1 , . . . , Xn constitute a random sample drawn from an infinite population with


mean µ and variance σ 2 , then µX = E(X) = µ and V ar(X) = σX 2
= σ 2 /n.

Theorem 2 : Law of Large Numbers For any positive constant c, the probability that X
σ2
will take on a value between µ − c and µ + c is at least 1 − 2 .
nc
Theorem 3 : Central Limit Theorem If X1 , . . . , Xn constitute a random sample from
an infinite population with mean µ, variance σ 2 , then as n → ∞, the limiting distribution of
X −µ
Z= √ is the standard normal distribution.
σ/ n

1
Theorem 4 If X1 , . . . , Xn constitute a random sample from a normal population with mean
µ, variance σ 2 , then X has a normal distribution with mean µ and variance σ 2 /n.
Theorem 5 If X is the mean of a random sample of size n drawn from a population of size N
σ2 N − n
with mean µ and variance σ 2 , then E(X) = µ and V ar(X) = .
n N −1
Theorem 6 If X1 , . . . , P
Xn are independent random variables each having the standard normal
distribution, then Y = ni=1 Xi2 has the chi-square distribution with ν = n degrees of freedom.
Theorem 7 If X1 , . . . , Xn are independent random variables Pn having chi-square distributions
with degrees of freedom ν1 , . . . , νn , respectively, then Y = i=1 Xi has the chi-square distribu-
tion with ν1 + · · · νn degrees of freedom.
Theorem 8 If X1 and X2 are independent random variables, where X1 has chi-square distri-
bution with degrees of freedom ν1 , X1 + X2 has chi-square distribution with ν1 + ν2 (> ν1 )
degrees of freedom, then X2 has chi-square distribution with degrees of freedom ν2 .
Theorem 9 If X and s2 are the sample mean and the sample variance of a random sample
of size n drawn from a normal population with mean µ and variance σ 2 , then X and s2 are
(n − 1)s2
independent and the random variable Y = has a chi-square distribution with degrees
σ2
of freedom n − 1.
Theorem 10 If Y, Z are independent random variables where Y has chi-square distribution
Z
with degrees of freedom ν, Z has the standard normal distribution, the t = p has the
Y /ν
t-distribution with degrees of freedom ν.
Theorem 11 If X and s2 are the sample mean and the sample variance of a random sample
X −µ
of size n drawn from a normal population with mean µ and variance σ 2 , then t = √ has
s/ n
the t-distribution with degrees of freedom n − 1.
Theorem 12 If U, V are independent random variables having chi-square distributions with
U/ν1
degrees of freedom ν1 , ν2 , respectively, then F = has the F-distribution with degrees of
V /ν2
freedom ν1 and ν2 .
Theorem 13 If s21 and s22 are the sample variances of a random sample of size n1 and n2 ,
respectively, drawn from normal populations with variances σ12 and σ22 , respectively, then
s2 /σ 2 σ 2 s2
F = 12 12 = 22 12 has the F-distribution with degrees of freedom n1 − 1 and n2 − 1.
s2 /σ2 σ 1 s2

2
Fact Sheet 3
MA204 Statistics 2008
Department of Mathematics, IIT Madras

• zα/2 is such that the integral of the standard normal density from zα/2 to ∞ is equal to
α/2.

• tα/2, n−1 is such that if T is a random variable having a t distribution with n − 1 degrees
of freedom, then P (T ≥ tα/2, n−1 ) = α/2.

• χ2α/2, n−1 is such that if X is a random variable having a χ2 distribution with n − 1 degrees
of freedom, then P (X ≥ χ2α/2, n−1 ) = α/2.

• fα/2, n1 −1, n2 −1 is such that if X is a random variable having the F distribution with
n1 − 1 and n2 − 1 degrees of freedom, then P (X ≥ fα/2, n1 −1,n2 −1 ) = α/2. We also have
fα,m,n · f1−α,n,m = 1.

Theorem 1 Let X be the mean of a random sample of size n from a normal population with
the known variance σ 2 . If X is used as an estimator of the√mean of the population, then the
probability is 1 − α that the error will be less than zα/2 · σ/ n.
Theorem 1 is restated as:
Theorem 2 If x is the value of the mean of a random sample of size n from a normal
population with the known variance σ 2 , then a (1 − α)100% confidence interval for the mean
of the population is given by
σ σ
x − zα/2 · √ < µ < x + zα/2 · √ .
n n

If one-sided confidence interval is computed, then Theorem 2 takes the form:


Theorem 3 If x is the value of the mean of a random sample of size n from a normal population
with the known variance σ 2 , then a one-sided (1 − α)100% confidence interval for the mean of
the population is given by
σ
µ < x + zα · √ .
n

Theorem 4 If x and s2 are the values of the mean and the sample variance of a random
sample of size n from a normal population, then a (1 − α)100% confidence interval for the mean
of the population is given by
s s
x − tα/2, n−1 · √ < µ < x + tα/2, n−1 · √ .
n n

1
Theorem 5 If x1 and x2 are the values of the means of independent random samples of sizes
n1 and n2 from normal populations with the known variances σ12 and σ22 , respectively, then a
(1 − α)100% confidence interval for the difference between the two population means is given
by s s
σ12 σ22 σ12 σ22
x1 − x2 − zα/2 · + < µ1 − µ2 < x1 − x2 + zα/2 · + .
n1 n2 n1 n2

Theorem 6 If x1 , x2 , s21 , s22 are the values of the means and sample variances of independent
random samples of sizes n1 and n2 from normal populations with equal (unknown) variances,
then a (1 − α)100% confidence interval for the difference between the two population means is
given by
r r
1 1 1 1
x1 − x2 − tα/2, n1 +n2 −2 · sp + < µ1 − µ2 < x1 − x2 + tα/2, n1 +n2 −2 · sp +
n1 n2 n1 n2
where s2p = {(n1 − 1)s21 + (n2 − 1)s22 }/(n1 + n2 − 2).
Theorem 7 Let X be a binomial random variable with the parameters n and p. For large n,
with p∗ = x/n, an approximate (1 − α)100% confidence interval for p is given by
r r
p ∗ (1 − p∗ ) p∗ (1 − p∗ )
p∗ − zα/2 · < p < p∗ + zα/2 · .
n n
That is, if p∗ is used as an p
estimate of p, then with (1 − α)100% confidence we can assert that
the error is less than zα/2 · p∗ (1 − p∗ )/n.
Theorem 8 Let X1 and X2 be binomial random variables with the parameters n1 , p1 and
n2 , p2 , respectively. Suppose n1 , n2 are large. With p∗1 = x1 /n1 and p∗2 = x2 /n2 , an approximate
(1 − α)100% confidence interval for p1 − p2 is given by
s s
∗ ∗ ∗ ∗
p (1 − p ) p (1 − p ) p∗1 (1 − p∗1 ) p∗2 (1 − p∗2 )
p∗1 − p∗2 − zα/2 · 1 1
+ 2 2
< p1 − p2 < p∗1 − p∗2 + zα/2 · + .
n1 n2 n1 n2

Theorem 9 Let s2 be the value of the sample variance of a random sample of size n from a
normal population. A (1 − α)100% confidence interval for σ 2 is given by
(n − 1)s2 2 (n − 1)s2
< σ < .
χ2α/2, n−1 χ21−α/2, n−1

Theorem 10 Let s21 and s22 be the values of the sample variances of independent random
samples of sizs n1 and n2 from normal populations. A (1 − α)100% confidence interval for
σ12 /σ22 is given by
s21 1 σ12 s21
· < < · fα/2, n1 −1, n2 −1 .
s22 fα/2, n1 −1, n2 −1 σ22 s22

2
Fact Sheet 4
MA204 Statistics 2008
Department of Mathematics, IIT Madras

The likelihood ratio technique yeilds the following critical regions C for a given level of signifi-
cance α:

1. A random sample of size n is drawn from a normal population with known variance σ 2 .
x̄ − µ0
H0 : µ = µ0 ; H11 : µ > µ0 ; H12 : µ < µ0 ; H13 : µ 6= µ0 . Take z = √ . Then
σ/ n
C1 : z ≥ zα ; C2 : z ≤ −zα ; C3 : z ≥ zα/2 .
Note: If n ≥ 30, this can be used for any population, and also s2 can be used in place of
σ 2 if σ 2 is unknown.

2. A random sample of size n < 30 is drawn from a population with unknown variance.
x̄ − µ0
H0 : µ = µ0 ; H11 : µ > µ0 ; H12 : µ < µ0 ; H13 : µ 6= µ0 . Take t = √ . Then
s/ n
C1 : t ≥ tα,n−1 ; C2 : t ≤ −tα,n−1 ; C3 : t ≥ tα/2,n−1 .

3. Two independent random samples of size n1 , n2 are drawn from normal populations with
known variances σ12 , σ22 , respectively.
H0 : µ1 − µ2 = δ; H11 : µ1 − µ2 > δ; H12 : µ1 − µ2 < δ; H13 : µ1 − µ2 6= δ.
x̄1 − x̄2 − δ
Take z = q 2 . Then C1 : z ≥ zα ; C2 : z ≤ zα ; C3 : z ≥ zα/2 .
σ1 σ22
n1
+ n2

Note: If n1 ≥ 30, n2 ≥ 30, this can be used for any population, and also s21 , s22 can be used
in place of σ12 , σ22 if the latter are unknown.

4. Two independent random samples of size n1 < 30, n2 < 30 are drawn from normal
populations with the same unknown variance σ 2 .
H0 : µ1 − µ2 = δ; H11 : µ1 − µ2 > δ; H12 : µ1 − µ2 < δ; H13 : µ1 − µ2 6= δ.
(n1 − 1)s21 + (n2 − 1)s22 x̄1 − x̄2 − δ
Take s2p = and t = q .
n1 + n2 − 2 sp · n11 + n12
Then C1 : z ≥ zα ; C2 : z ≤ zα ; C3 : z ≥ zα/2 .
Note: If n1 = n2 , then s2p = (s21 + s22 )/2.

5. Given two independent random samples of size n1 and n2 from two normal populations.
H0 : σ12 = σ22 ; H11 : σ12 > σ22 ; H12 : σ12 < σ22 ; H13 : σ12 6= σ22 . Then
s21 s22
C1 : ≥ f α,n1 −1,n2 −1 ; C 2 : ≥ fα,n2 −1,n1 −1 ;
s22 s21
s21 2 2 s22
C3 : ≥ f α/2,n1 −1,n2 −1 for s 1 ≥ s 2 , and ≥ fα/2,n2 −1,n1 −1 for s21 < s22 .
s22 s21

1
6. A random sample of size n is drawn from a normal population.
(n − 1)s2
H0 : σ 2 = σ02 ; H11 : σ 2 > σ02 ; H12 : σ 2 < σ02 ; H13 : σ 2 6= σ02 . Take χ2 = . Then
σ02
C1 : χ2 ≥ χ2α,n−1 ; C2 : χ2 ≤ χ21−α,n−1 ; C3 : χ2 ≥ χ2α/2,n−1 or χ2 ≤ χ21−α/2,n−1 .

7. A random sample of size n shows x successes.


H0 : θ = θ0 ; H11 : θ > θ0 ; H12 : θ < θ0 ; H13 : θ 6= θ0 .
Let kα be the smallest integer for which ny=kα b(y; n, θ0 ) ≤ α; kα0 be the largest integer
P
P α0
for which ky=0 b(y; n, θ0 ) ≤ α. Then
0
C1 : x ≥ kα ; C2 : x ≤ kα0 ; C3 : x ≥ kα/2 or x ≤ kα/2 .

8. A random sample of size n > 20 shows x successes.


H0 : θ = θ0 ; H11 : θ > θ0 ; H12 : θ < θ0 ; H13 : θ 6= θ0 .
(x ± 0.5) − nθ0
Take z = p ; Choose + if x < nθ0 and − if x > nθ0 .
nθ0 (1 − θ0 )
Then C1 : z ≥ zα ; C2 : z ≤ −zα ; C3 : z ≥ zα/2 .

9. Let xi be the observed value of a binomial random variable Xi with parameters ni and
θi , respectively, for i = 1, 2, . . . , k. H0 : θ1 = θ2 = · · · θk ; H1 : θi 6= θj for some i, j.
k k 2
x1 + x2 · · · + xk 2 X (xi − ni θ̂)2 X X (fij − eij )2
Take θ̂ = ;χ = = ,
n1 + n2 · · · nk i=1 n i θ̂(1 − θ̂) i=1 j=1
e ij

Note: If H0 is θ1 = θ2 = · · · θk = θ0 and H1 is θi 6= θ0 for some i, then C is χ2 ≥ χ2α,k .

10. In (9), If an r × c table is taken instead of the k × 2 table, where we denote by θi· , the
probability that an item falls into the i-th row, θ·j , the probability that an item falls into
the j-th column, and θij , the probability that an item falls into the i-th row and j-th
column, and H0 as θij = θi· · θ·j , H1 as θij 6= θi· · θ·j , then C is χ2 ≥ χ2α,(r−1)(c−1) .
r Xc
2 2
X (fij − eij )2
Here, χ is computed by χ = , with
i=1 j=1
eij
fij is the observed frequency for the cell in the i-th row and j-th column,
fi· is the sum of all fij in the i-th row, the row total,
f·j is the sum of all fij in the j-th column, the column total,
P P
f is the sum of all entries in the table, i.e., i j fij , and
θ̂i· = fi· /f, θ̂·j = f·j /f, eij = θ̂i· · θ̂·j · f = fi· · f·j /f.
Remark: Also, (9) is used for testing ‘goodness of fit’ when we expect that the observed
data follows some particular distribution. There, we interpret fi ’s as the observed fre-
quencies, and the ei ’s as the expected frequencies obtained by the use of the particular
distribution.

You might also like