You are on page 1of 59

Additional Study Material

Discrete and continuous probability


distributions
Normal distribution
Using normal distribution
Sampling distributions
Sampling distribution for sample mean
Variable has discrete number of possible
outcomes
Probability distribution has probability for
each possible outcome
Expected proportion of cases with that value in the
long run
Probabilities for all outcomes must add up to 1
Number of Heads Probability
0 0.0312
1 0.1562
2 0.3125
3 0.3125
4 0.1562
5 0.0312
Total 1.0000
0.0000
0.0500
0.1000
0.1500
0.2000
0.2500
0.3000
0.3500
0 1 2 3 4 5
Number of heads
P
r
o
b
a
b
i
l
i
t
y
Expected value mean of a probability
distribution is the sum over all possible
values of the value times the probability
Number of Heads Probability yP(y)
0 0.0312 0.0000
1 0.1562 0.1562
2 0.3125 0.6250
3 0.3125 0.9375
4 0.1562 0.6248
5 0.0312 0.1560
Total 1.0000 2.5000
( ) 5 . 2 = =

y yP
For continuous variables taking on all
possible values
Probability distribution is a smooth curve
with the area under the curve for an interval
representing the probability of a value being
in that interval


Probability
Bell curve
Gaussian distribution
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0
Height and weight
IQs and many other test scores
Measurement errors
Sample statistics
Result of processes when values are affected
by a large number of small random effects
Height
IQ
Measurement error
Drawing a sample from a population
Can have varying means, displacing to left or
right
Can have varying standard deviations, making
steeper or flatter
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0
Std dev=1
Std dev=2
Std dev=0.5
Formula for the normal distribution





This implies that normal distribution is fully
specified given the mean and standard
deviation

( ) ( )
t o
o
2
2
2 /
=
X
e
y
Symmetrical
Continuous
Extends to infinity, never reaching zero
Total area under curve is 1.0
About 34 percent of area falls between mean
and one standard deviation above
So about 68 percent is within +/- one
standard deviation of mean
About 95 percent is within +/- two standard
deviations of mean
About 99.7 percent is within +/- three
standard deviations of mean
Normal distribution with mean of zero and
standard deviation of one
Since mean and standard deviation define any
normal distribution
Standard normal distribution can be used for
any normally distributed variable by
converting mean to zero and standard
deviation to onez scores
z score is the number of standard deviations
a value falls from the mean




Converts a value from any normally
distributed variable to a value for the
standard normal distribution

o

=
y
z
Standard normal distribution is a probability
distribution for normally-distributed variables
with mean of zero and standard deviation of
one
Area under curve between two values
corresponds to probability of value falling
between those values
Have tables of areas under standard normal
distribution

0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0
1.55
Second Decimal Place of z
z . . . 0.04 0.05 0.06 . . .
. . . . . . . . . . . . . . . . . .
1.4 . . . 0.0749 0.0735 0.0722 . . .
1.5 . . . 0.0618 0.0606 0.0594 . . .
1.6 . . . 0.0505 0.0495 0.0485 . . .
. . . . . . . . . . . . . . . . . .

0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0
1.55
P=.0606

0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0
-1.55 1.55
P=.0606+.0606
P=.1212

0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0
1.55
P=.5-.0606=.4394

0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0
1.55
P=1-.0606=.9394

V a l u e F r e q u e n c y
2 0 0
2 5 3
3 0 5
3 5 6
4 0 8
4 5 1 3
5 0 5
5 5 6
6 0 3
6 5 1
M o r e 0
Histogram
0
2
4
6
8
10
12
14
2
0
2
5
3
0
3
5
4
0
4
5
5
0
5
5
6
0
6
5
M
o
r
e
Value
F
r
e
q
u
e
n
c
y
68 . 40 = 88 . 9 = o

4372 .
88 . 9
68 . 40 45
=

= z
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
-3 -2 -1 0 1 2 3
.4372
P=.3300

5749 .
88 . 9
68 . 40 35
=

= z
4372 .
88 . 9
68 . 40 45
=

= z
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
-3 -2 -1 0 1 2 3
-.5749 .4372
P=.5-.3300=.1700
P=.5-.2843=.2157
P=.2157+.1700=.3857
Have probability for some unknown z value
Find probability for tail
Look up probability in body of table
Read off z value from row and column
headings
Convert z value to value for variable
o

=
y
z
o + = z y
Probability above
value is 0.20
z score for
probability of 0.20 is
0.84
Use z score to
compute value

0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
-3 -2 -1 0 1 2 3
Area=0.20
z=0.84
( )( ) 98 . 48 68 . 40 88 . 9 84 . 0 = + = y
When the SAT was first developed, the
scoring was standardized so that the scores
had a mean of 500 and a standard deviation
of 100

0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
200 300 400 500 600 700 800
P=.75
P=.25
Area under curve in tail from 75% up is 0.25
Look up 0.25 in body of table to find z score,
which is 0.67
Convert z score to test score
567 500 100 67 . 0 = + = + = o z y

0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
200 300 400 500 600 700 800
0.30
Area under curve from mean in tail below 30%
is 0.30
Look up 0.30 in body of table to find z score,
which is 0.52
Convert z score to test score
448 100 52 . 0 500 = = + = o z y
Draw samples from a large population (or any
population with replacement)
Calculate means for each sample



Means will, of course, vary
Then we can look at the distribution of the
sample means sampling distribution
, , ,
3 2 1
y y y
2500 random samples of 75 selected, with
replacement, from actual data from survey of
library patrons
For each sample, calculated mean of variable
SPEND, How much time did you spend in the
library?
Create histogram showing distribution of
sample means
0
50
100
150
200
250
300
350
400
450
Sampling distribution of the mean SPEND
N-75 Samples=2500 Mean=41.34 SE=4.58
F
r
e
q
u
e
n
c
y
1
3
1
5
1
7
1
9
2
1
2
3
2
5
2
7
2
9
3
1
3
3
3
5
3
7
3
9
4
1
4
3
4
5
4
7
4
9
5
1
5
3
5
5
5
7
5
9
6
1
6
3
6
5
6
7
6
9
7
1
7
3
7
5
7
7
7
9
Would expect mean of sample means to be
close to mean of population
For our example, mean of population


Mean of sample means
38 . 41 =
34 . 41 =
y

Standard deviation of sample means will be


less than standard deviation of population
Will depend on standard deviation of
population and sample size
Called standard error of the mean
n
y
o
o =
Theoretical standard error of the mean
(includes correction for sampling from a finite
population)


Standard error (standard deviation) of sample
means in example
06 . 4 =
y
o
58 . 4 =
y
o
Standard error varies with inverse of square
root of sample size N
Therefore, increasing sample size by 4 only
reduces standard error by 1/2
Diminishing returns to increases in sample
size
If the distribution of the variable in the
population is a normal distribution
Then the distribution of the sample means
will be a normal distribution
Then we can use normal distribution to make
probability statements about sample means
But what if variable is not normally
distributed?
As sample size n increases, distribution of
sample means will approach a normal
distribution with mean of and standard
deviation of
So if variable is not normally distributed, can
still assume normal distribution of sample
means for larger samples, generally 30 or
more
Distribution of SPEND variable, How much
time did you spend in the library? is far from
normal distribution
But as sample size increases, distribution of
sample means approaches normal
distribution

0
12
24
36
48
60
72
84
96
108
120
Frequency distribution of SPEND
Mean= 41.38 SD=40.22
SPEND
F
r
e
q
u
e
n
c
y
10 30 50 70 90 110 130 150 170 190

0
1
2
3
4
5
6
7
Sampling distribution of the mean SPEND
N=8 Samples=50 Mean=40.98 SE=14.65
F
r
e
q
u
e
n
c
y
1
3
1
5
1
7
1
9
2
1
2
3
2
5
2
7
2
9
3
1
3
3
3
5
3
7
3
9
4
1
4
3
4
5
4
7
4
9
5
1
5
3
5
5
5
7
5
9
6
1
6
3
6
5
6
7
6
9
7
1
7
3
7
5
7
7
7
9

0
5
10
15
20
25
30
35
40
45
Sampling distribution of the mean SPEND
N=8 Samples=500 Mean=41.69 SE=14.90
F
r
e
q
u
e
n
c
y
1
3
1
5
1
7
1
9
2
1
2
3
2
5
2
7
2
9
3
1
3
3
3
5
3
7
3
9
4
1
4
3
4
5
4
7
4
9
5
1
5
3
5
5
5
7
5
9
6
1
6
3
6
5
6
7
6
9
7
1
7
3
7
5
7
7
7
9

0
25
50
75
100
125
150
175
Sampling distribution of the mean SPEND
N=8 Samples=2500 Mean=41.25 SE=14.38
F
r
e
q
u
e
n
c
y
1
3
1
5
1
7
1
9
2
1
2
3
2
5
2
7
2
9
3
1
3
3
3
5
3
7
3
9
4
1
4
3
4
5
4
7
4
9
5
1
5
3
5
5
5
7
5
9
6
1
6
3
6
5
6
7
6
9
7
1
7
3
7
5
7
7
7
9

0
50
100
150
200
250
300
350
400
450
Sampling distribution of the mean SPEND
N=8 Samples=2500 Mean=41.25 SE=14.38
F
r
e
q
u
e
n
c
y
1
3
1
5
1
7
1
9
2
1
2
3
2
5
2
7
2
9
3
1
3
3
3
5
3
7
3
9
4
1
4
3
4
5
4
7
4
9
5
1
5
3
5
5
5
7
5
9
6
1
6
3
6
5
6
7
6
9
7
1
7
3
7
5
7
7
7
9

0
50
100
150
200
250
300
350
400
450
Sampling distribution of the mean SPEND
N=25 Samples=2500 Mean=41.43 SE=8.13
F
r
e
q
u
e
n
c
y
1
3
1
5
1
7
1
9
2
1
2
3
2
5
2
7
2
9
3
1
3
3
3
5
3
7
3
9
4
1
4
3
4
5
4
7
4
9
5
1
5
3
5
5
5
7
5
9
6
1
6
3
6
5
6
7
6
9
7
1
7
3
7
5
7
7
7
9

0
50
100
150
200
250
300
350
400
450
Sampling distribution of the mean SPEND
N-75 Samples=2500 Mean=41.34 SE=4.58
F
r
e
q
u
e
n
c
y
1
3
1
5
1
7
1
9
2
1
2
3
2
5
2
7
2
9
3
1
3
3
3
5
3
7
3
9
4
1
4
3
4
5
4
7
4
9
5
1
5
3
5
5
5
7
5
9
6
1
6
3
6
5
6
7
6
9
7
1
7
3
7
5
7
7
7
9

n Mean of Standard Deviation Theoretical
Sample Means of Sample Means Standard Error
8 41.25 14.38 14.06
25 41.48 8.13 7.73
75 41.34 4.58 4.06
Population Mean = 41.38
Continuing example, suppose results from
survey on how much time people spend in
library represent actual population values
Mean 41.29 minutes
Standard deviation 40.018 minutes
Then if you took a sample of size 50 from the
population of library users
What is the probability that the sample mean
would be greater than 35 minutes?
659 . 5
50
018 . 40
= = =
n
y
o
o
112 . 1
659 . 5
29 . 41 35
=

=
y
y
z
o

( ) 866 . 0 134 . 0 1 112 . 1 = = < z P
-1.112
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
-3 -2 -1 0 1 2 3
Then if you took a large number of sample of
size 50 from the population of library users
What is the proportion of the sample means
that would be more than 10 minutes below or
above the population mean?

659 . 5
50
018 . 40
= = =
n
y
o
o
767 . 1
659 . 5
29 . 41 29 . 31
=

=
y
y
z
o

767 . 1
659 . 5
29 . 41 29 . 51
=

=
y
y
z
o

( ) 0768 . 0 0384 . 0 2 767 . 1 or 767 . 1 = = > < z z P
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
-3 -2 -1 0 1 2 3
-1.767 1.767

You might also like