Marginal Probability

Copyright 2003. Do not distribute or copy without permission.
1
Marginal Probability
Probability of a single event occurring.
Event A = price of IBM stock rises by at least $1 in one day
Pr(A) = 0.04 = 4%
2
Joint Probability
Probability of all of multiple events occurring.
Event B = price of GE stock rises by at least $1 in one day
Pr(A) = 0.04 = 4%
Pr(B) = 0.01 = 1%

Probability of both IBM and GE rising by at least $1 in one day
= Pr(A and B) = 0.02 = 2%
3
Joint Probability
Two events are independent if the occurrence of one is not contingent on the
occurrence of the other.

A = Price of IBM rises by at least $1 in one day.
B = Price of IBM rises by at least $1 in one week.

The events are not independent because, the an increase in the probability of
A implies an increase in the probability of B.

For independent events:
Pr(A and B) = Pr(A) Pr(B)
4
Probability of any of multiple events occurring.
Event B = price of GE stock rises by at least $1 in one day
Pr(A) = 0.04 = 4%
Pr(B) = 0.01 = 1%

Pr(A or B) = P(A) + P(B) P(A and B)

Probability of either IBM or GE rising by at least $1 in one day
= Pr(A or B) = 0.04 + 0.01 0.02 = 0.03

Disjoint Probability
5
Venn Diagram
A
B
1
2 3 4
Area Meaning
1 A and B
2 A and B
3 A and B
4 A and B
2 3 A
3 4 B
1 2 4 A or B
1 2 3 A or B
2 3 4 A or B
1 3 4 A or B
1 2 3 4 A or A
1 2 3 4 B or B
Empty A and A
Empty B and B
6
Venn Diagram
A
B
1 0.02 0.02 0.01 = 0.95
0.02 0.02 0.01
A = Price of IBM stock
rises by at least $1 in
one day.
B = Price of GE stock
rises by at least $1 in
one day.
Pr(A and B) = 0.02
Pr(A) = 0.04
Pr(B) = 0.03
7
Venn Diagram
A = Price of IBM stock rises by at least $1 in one day.
B = Price of GE stock rises by at least $1 in one day.
What is the probability of the
price of GE rising by at least
$1 and the price of IBM not
rising by at least $1?
Pr(B and A) = Pr(A and B)
= 0.01
Pr(A and B) = 0.95
What is the probability of
neither the price of IBM rising
by at least $1 nor the price of
GE rising by at least $1?
B
0.01
A
0.02 0.02
0.95
8
Conditional Probability
Probability of an event occurring given that another event has already
occurred.
Event B = price of IBM stock rises by at least $1 in same week

Pr(B|A) = Pr(A and B) / Pr(A)
Pr(A|B) = Pr(A and B) / Pr(B)
Pr(A) = 0.04
Pr(B) = 0.02
Pr(A and B) = 0.01
Pr(B|A) = 0.01 / 0.04 = 0.25
Pr(A|B) = 0.01 / 0.02 = 0.50
9
A
B
0.03 0.01 0.01
A = Price of IBM stock rises
by at least $1 in one
day.
B = Price of IBM stock rises
by at least $1 in same
week.
Pr(A and B) = 0.01
Pr(A) = 0.04
Pr(B) = 0.02
Pr(A|B) = 0.01 / (0.01+0.01)
Pr(B|A) = 0.01 / (0.01+0.03)
10
Table shows number of NYC police officers promoted and not promoted.
Question: Did the force exhibit gender discrimination in promoting?
Male Female
Promoted 288 36
Not Promoted 672 204

Define the events. There are two events
1. An officer can be male.
2. An officer can be promoted.
Being female is not a separate event it is not being a male.

Events
M = Being a male
P = Being promoted
M = Being a female
P = Not being promoted
11
56% 24% 3%
17%
Male Female
Promoted 288 36

M
P
672 288 36
204
Divide all areas by 1,200 to
find the probability
associated with each area.
12
Male Female
Promoted 288 36

M
56% 24% 3%
17%
Pr(M and P) = 0.24
being male and being
promoted?
being female and being
promoted?
Males appear to be
promoted at 8 times the
frequency of females.
Pr(M and P) = 0.03
P
13
Male Female
Promoted 288 36

M
P
56% 24% 3%
17%
Pr(P|M) = Pr(P and M) / Pr(M)
= 0.24 / (0.56 + 0.24) = 0.3

Pr(P|M) = Pr(P and M) / Pr(M)
= 0.03 / (0.03 + 0.17) = 0.15
But, perhaps Pr(M and P) is
greater than Pr(M and P)
simply because there are more
males on the force.

The comparison we want to
make is
Pr(P|M) vs. Pr(P|M).
Males are promoted at 2 times the
frequency of females.
14
Mutually Exclusive and Jointly Exhaustive Events
A set of events is mutually exclusive if no more than one of the events can occur.
A = IBM stock rises by at least $1, B = IBM stock falls by at least $1
A and B are mutually exclusive but not jointly exhaustive

A set of events is jointly exhaustive if at least one of the events must occur.
A = IBM stock rises by at least $1, B = IBM stock rises by at least $2,
C = IBM stock rises by less than $1 (or falls)
A, B, and C are jointly exhaustive but not mutually exclusive

A set of events is mutually exclusive and jointly exhaustive if exactly one of the
events must occur.
A = IBM stock rises, B = IBM stock falls, C = IBM stock does not change
A, B, and C are mutually exclusive and jointly exhaustive
15
Bayes Theorem
Pr(A|B) = Pr(B|A) Pr(A) / Pr(B)

For N mutually exclusive and jointly exhaustive events,
Pr(B) = Pr(B|A
1
) Pr(A
1
) + Pr(B|A
2
) Pr(A
2
) + + Pr(B|A
N
) Pr(A
N
)

16
Bayes Theorem
Your firm purchases steel bolts from two suppliers: #1 and #2.
65% of the units come from supplier #1; the remaining 35% come from supplier
#2.
Inspecting the bolts for quality is costly, so your firm only inspects periodically.
Historical data indicate that 2% of supplier #1s units fail, and 5% of supplier #2s
units fail.
During production, a bolt fails causing a production line shutdown. What is the
probability that the defective bolt came from supplier #1?
The nave answer is that there is a 65% chance that the bolt came from supplier
#1 since 65% of the bolts come from supplier #1.
The nave answer ignores the fact that the bolt failed. We want to know not
Pr(bolt came from #1), but Pr(bolt came from #1 | bolt failed).
17
Bayes Theorem
Define the following events:
S
1
= bolt came from supplier #1, S
2
= bolt came from supplier #2
F = bolt fails
Solution:
We know: Pr(F | S
1
) = 2%, Pr(S
1
) = 65%, Pr(F | S
2
) = 5%
We want to know: Pr(S
1
| F)
Bayes Theorem: Pr(S
1
| F) = Pr(F | S
1
) Pr(S
1
) / Pr(F)

Because S
1
and S
2
are mutually exclusive and jointly exhaustive:
Pr(F) = Pr(F | S
1
) Pr(S
1
) + Pr(F | S
2
) Pr(S
2
)
= (2%)(65%) + (5%)(35%) = 3.1%
Therefore: Pr(S
1
| F) = (2%) (65%) / (3.1%) = 42%
18
Probability Measures: Summary
Pr(A and B) = Pr(A) Pr(B)
where A and B are independent events

Pr(A or B) = Pr(A) + Pr(B) Pr(A and B)

Pr(A|B) = Pr(A and B) / Pr(B)
= Pr(B|A) Pr(A) / Pr(B)

Pr(A) = Pr(A|B
1
) Pr(B
1
) + Pr(A|B
2
) Pr(B
2
) + + Pr(A|B
n
) Pr(B
n
)
where B
1
through B
n
are mutually exclusive and jointly exhaustive
19
Probability Measures: Where Were Going
Random events
Probabilities Given Probabilities not Given
Random event is discrete Random event is continuous
Binomial
Hypergeometric
Poisson
Negative binomial
Exponential
Normal
t
Log-normal
Chi-square
F
Simple Probability
Joint Probability
Disjoint Probability
20
Probability Distributions
So far, we have seen examples in which the probabilities of events are known
(e.g. probability of a bolt failing, probability of being male and promoted).

The behavior of a random event (or random variable) is summarized by the
variables probability distribution.

A probability distribution is a set of probabilities, each associated with a
different event for all possible events.

Example: A die is a random variable. There are 6 possible events that can
occur. The probability of each event occurring is the same (1/6) for all the
events. We call this distribution a uniform distribution.
21
Example:

Let X be the random variable defined as the roll of a die. There are six
possible events: X = {1, 2, 3, 4, 5, 6}.

Pr(X = 1) = 1/6 = 16.7%
Pr(X = 2) = 1/6 = 16.7%
Pr(X = 3) = 1/6 = 16.7%
Pr(X = 4) = 1/6 = 16.7%
Pr(X = 5) = 1/6 = 16.7%
Pr(X = 6) = 1/6 = 16.7%

In general, we say that the probability distribution function for X is:
Pr(X = k) = 0.167

and the cumulative distribution function for X is:
Pr(X k) = 0.167 k
Probability Distributions
Mechanism that selects one event
out of all possible events.
Function that gives the probability
of each event occurring.
Function that gives the probability of any
one of a set of events occurring.
s
22
Discrete vs. Continuous Distributions
Discrete vs. Continuous Distributions

In discrete distributions, the random variable takes on specific values.

For example:
If X can take on the values {1, 2, 3, 4, 5, }, then X is a discrete random
variable.
Number of profitable quarters is a discrete random variable.

If X can take on any value between 0 and 10, then X is a continuous random
variable.
P/E ratio is a continuous random variable.

23
Discrete Distributions

Terminology

Trial An opportunity for an event to occur or not occur.
Success The occurrence of an event.
24
Binomial Distribution

The binomial distribution gives the probability of an event occurring multiple
times.

N Number of trials
x Number of successes
p Probability of a single success
| |
=
|
\ .
Pr( successes out of N trials) (1 )
x N x
N
x p p
x
| |
=
|

\ .
!
!( )!
N
N
x N x x
where
( )
=
=
mean
variance 1
Np
Np p
25
Example

A CD manufacturer produces CDs in batches of 10,000. On average, 2% of the
CDs are defective.

A retailer purchases CDs in batches of 1,000. The retailer will return any
shipment if 3 or more CDs are found to be defective. For each batch received,
the retailer inspects thirty CDs. What is the probability that the retailer will return
the batch?
| |
=
|
\ .
Pr( successes out of N trials) (1 )
x N x
N
x p p
x
| |
=
|

\ .
!
!( )!
N
N
x N x x
N = 30 trials
x = 3 successes
p = 0.02
3 30 3
30
Pr(3 successes out of 30 trials) 0.02 (1 0.02) 0.019 1.9%
3

| |
= = =
|
\ .
26
Example

A CD manufacturer produces CDs in batches of 10,000. On average, 2% of the
CDs are defective.

A retailer purchases CDs in batches of 1,000. The retailer will return any
shipment if 3 or more CDs are found to be defective. For each batch received,
the retailer inspects thirty CDs. What is the probability that the retailer will return
the batch?
N = 30 trials
x = 3 successes
p = 0.02
3 30 3
30
3

| |
= = =
|
\ .
Error

The formula gives us the probability of exactly 3 successes out of 30 trials. But,
the retailer will return the shipment if it finds at least 3 defective CDs. What we
want is
Pr(3 out of 30) + Pr(4 out of 30) + + Pr(30 out of 30)
27
N = 30 trials
x = 3 successes
p = 0.02

N = 30 trials
x = 4 successes
p = 0.02

N = 30 trials
x = 5 successes
p = 0.02

Etc. out to x = 30 successes.
3 30 3
30
3

| |
= = =
|
\ .
4 30 4
30
4

| |
= = =
|
\ .
5 30 5
30
5

| |
= = =
|
\ .
Alternatively

Because Pr(0 or more successes) = 1, we have an easier path to the answer:

Pr(3 or more successes) = 1 Pr(2 or fewer successes)
28
N = 30 trials
x = 0 successes
p = 0.02

N = 30 trials
x = 1 successes
p = 0.02

N = 30 trials
x = 2 successes
p = 0.02
0 30 0
30
Pr(0 successes out of 30 trials) 0.02 (1 0.02) 0.545
0

| |
= =
|
\ .
1 30 1
30
1

| |
= =
|
\ .
2 30 2
30
2

| |
= =
|
\ .
Pr(2 or fewer successes) = 0.545 + 0.334 + 0.099 = 0.978

Pr(3 or more successes) = 1 0.978 = 0.022 = 2.2%
29
Using the Probabilities worksheet:

1. Find the section of the worksheet titled Binomial Distribution.
2. Enter the probability of a single success.
3. Enter the number of trials.
4. Enter the number of successes.
5. For Cumulative? enter FALSE to obtain Pr(x successes out of N trials); enter
TRUE to obtain Pr( x successes out of N trials).
Example:
Prob of a Single Success 0.02
Number of Trials 30
Number of Successes 2
Cumulative? TRUE
P(# of successes) 0.978
1 - P(# of successes) 0.022
TRUE yields Pr(x 2) instead of Pr(x = 2)
Pr(x 2)
1 Pr(x 2) = Pr(x 3)
s
s
s
>
s
30
Application:

Management proposes tightening quality control so as to reduce the defect rate from 2% to
1%. QA estimates that the resources required to implement the additional quality controls
will cost the firm an additional $70,000 per year.

Suppose the firm ships 10,000 batches of CDs annually. It costs the firm $1,000 every time
a batch is returned. Is it worth it for the firm to implement the additional quality controls?
Low QA:

Defect rate = 2%
Pr(batch will be returned) = Pr(3 or more defects out of 30) = 2.2%
Expected annual cost of product returns
= (2.2%)($1,000 per batch)(10,000 batches shipped annually)
= $220,000
High QA:

Defect rate = 1%
Pr(batch will be returned) = Pr(3 or more defects out of 30) = 0.3%
Expected annual cost of product returns
= (0.3%)($1,000 per batch)(10,000 batches shipped annually)
= $30,000
Going with improved QA results
in cost savings of $190,000 at a
cost of $70,000 for a net gain of
$120,000.
31
Application:

Ford suspects that the tread on Explorer tires will separate from the tire causing a fatal
accident. Tests indicate that this will happen on one set of (four) tires out of 5 million. As of
2000, Ford had sold 875,000 Explorers. Ford estimated the cost of a general recall to be
$30 million. Ford also estimated that every accident involving separated treads would cost
Ford $3 million to settle.

Should Ford recall the tires?
What we know:

Success = tread separation
Pr(a single success) = 1 / 5 million = 0.0000002
Number of trials = 875,000

Employing the pdf for the binomial distribution, we have:

Pr(0 successes) = 83.9%
Pr(1 success) = 14.7%
32
Expectation:

An expectation is the sum of the probabilities of all possible events multiplied by the
outcome of each event.

If there are three mutually exclusive and jointly exhaustive events: A, B, and C.
The costs to a firm of events A, B, and C occurring are, respectively, TC
A
, TC
B
, and TC
C
.
The probabilities of events A, B, and C occurring are, respectively, p
A
, p
B
, and p
C
.

The expected cost to the firm is:
E(cost) = (TC
A
)(p
A
) +(TC
B
)(p
B
) + (TC
C
)(p
C
)
Should Ford issue a recall?

Issue recall:

Cost = $30 million

Do not issue recall:

E(cost) = Pr(0 incidents)(Cost of 0 incidents) + Pr(1 incident)(Cost of 1 incident) +
(83.9%)($0 m) + (14.7%)($3 m) + (1.3%)($6 m) + (0.1%)($9 m)
$528,000
~
~
33
Hypergeometric Distribution

The hypergeometric distribution gives the probability of an event occurring
multiple times when the number of possible successes is fixed.

N Number of possible trials
n Number of actual trials
X Number of possible successes
x Number of actual successes
Pr( successes out of trials)
X N X
x n x
x n
N
n
| | | |
| |
\ . \ .
=
| |
|
\ .
| |
=
|

\ .
!
!( )!
N
N
x N x x
where
34
Example

A CD manufacturer ships a batch of 1,000 CDs to a retailer. The manufacturer
knows that 20 of the CDs are defective. The retailer will return any shipment if 3
or more CDs are found to be defective. For each batch received, the retailer
inspects thirty CDs. What is the probability that the retailer will return the batch?
N = 1,000 possible trials
n = 30 actual trials
X = 20 possible successes
x = 3 actual successes
20 1000 20
3 30 3
Pr(3 successes out of 30 trials) 0.017
1000
30
| | | |
| |
\ . \ .
= =
| |
|
\ .
Pr( successes out of trials)
X N X
x n x
x n
N
n
| | | |
| |
\ . \ .
=
| |
|
\ .
35
Example

A CD manufacturer ships a batch of 1,000 CDs to a retailer. The manufacturer
knows that 20 of the CDs are defective. The retailer will return any shipment if 3
or more CDs are found to be defective. For each batch received, the retailer
inspects thirty CDs. What is the probability that the retailer will return the batch?
Error

The formula gives us the probability of exactly 3 successes. The retailer will
return the shipment if there are 3 or more defects. Therefore, we want

Pr(return shipment) = Pr(3 defects) + Pr(4 defects) + + Pr(20 defects)

Note: There are a maximum of 20 defects.
20 1000 20
3 30 3
1000
30
| | | |
| |
\ . \ .
= =
| |
|
\ .
36



Pr(return shipment) = 1 (0.541 + 0.341 + 0.099 = 0.019 = 1.9%
20 1000 20
0 30 0
1000
30
| | | |
| |
\ . \ .
= =
| |
|
\ .
20 1000 20
1 30 1
1000
30
| | | |
| |
\ . \ .
= =
| |
|
\ .
20 1000 20
2 30 2
1000
30
| | | |
| |
\ . \ .
= =
| |
|
\ .
37

1. Find the section of the worksheet titled Hypergeometric Distribution.
2. Enter the number of possible trials.
3. Enter the number of possible successes.
4. Enter the number of actual trials.
5. Enter the number of actual successes.

Note: Excel does not offer the option of calculating the cumulative distribution
function. You must do this manually.
Example:
Pr(x = 3)
1 Pr(x = 3) = Pr(x 3) =
Number of Possible Trials 1,000
Number of Possible Successes 20
Number of Actual Trials 30
Number of Actual Successes 3
P(# of successes in sample) 0.017
1 - P(# successses in sample) 0.983
38
Results using hypergeometric distribution

Possible Trials = 1,000
Actual Trials = 30
Possible Successes = 20
Actual Successes = 0, 1, 2

Pr(return shipment) = 1 (0.541 + 0.341 + 0.099 = 0.019 = 1.9%

Results using binomial distribution

Trials = 30
Successes = 0, 1, 2
Probability of a single success = 20 / 1000 = 0.02

Pr(return shipment) = 2.2%
If we erroneously use the binomial distribution, what is our estimate of the
probability that the retailer will return the batch?
39
Suppose each return costs us $1,000 and we ship 10,000 cases per year.

Estimated cost of returns using hypergeometric distribution

($1,000)(10,000)(1.9%) = $190,000

Estimated cost of returns using binomial distribution

($1,000)(10,000)(2.2%) = $220,000

Using the incorrect distribution resulted in a $30,000 overestimation of costs.
Using the incorrect distribution underestimates the probability of return by only
0.7% who cares?
40
How does hypergeometric distribution differ from binomial distribution?

With binomial distribution, the probability of a success does not change as trials are
realized.

With hypergeometric distribution, the probabilities of subsequent successes change as trials
are realized.

Binomial Example:

Suppose the probability of a given CD being defective is 50%. You have a shipment of 2
CDs.

You inspect one of the CDs. There is a 50% chance that it is defective.
You inspect the other CD. There is a 50% chance that it is defective.

On average, you expect 1 defective CD. However, it is possible that there are no defective
CDs. It is also possible that both CDs are defective.

Because the probability of defect is constant, this process is binomial.
41
How does hypergeometric distribution differ from binomial distribution?

With binomial distribution, the probability of a success does not change as trials are
realized.

With hypergeometric distribution, the probabilities of subsequent successes change as trials
are realized.

Hypergeometric Example:

Suppose there is one defective CD in a shipment of two CDs.

You inspect one of the CDs. There is a 50% chance that it is defective. You inspect the
second CD. Even without inspecting, you know for certain whether the second CD will be
defective or not.

Because you know that one of the CDs is defective, if the first one is not defective, then
the second one must be defective.
If the first one is defective, then the second one cannot be defective.

Because the probability of the second CD being defective depends on whether or not the
first CD was defective, the process is hypergeometric.
42
Example

Andrew Fastow, former CFO of Enron, was tried for securities fraud. As is usual in these
cases, if the prosecution requests documents, then the defense is obligated to surrender
those documents even if the documents contain information that is damaging to the
defense. One tactic is for the defense to submit the requested documents along with many
other documents (called decoys) that are not damaging to the defense. The point is to
bury the prosecution under a blizzard of paperwork so that it becomes difficult for the
prosecution to find the few incriminating documents among the many decoys.

Suppose that the prosecutor requests all documents related to Enrons financial status.
Fastows lawyers know that there are 10 incriminating documents among the set requested.
Fastows lawyers also know that the prosecution will be able to examine only 50 documents
between now and the trial date.

If the prosecution finds no incriminating documents, it is likely that Fastow will be found not
guilty. Assuming that each document requires the same amount of time to examine, and
assuming that the prosecution will randomly select 50 documents out of the total for
examination, how many documents (decoys plus the 10 incriminating documents) should
Fastows lawyers submit so that the probability of the prosecution finding no incriminating
documents is 90%?
43
Example

Success = an incriminating document
N = unknown
n = 50
X = 10
x = 0

N = 4775 Pr(0 successes out of 50 trials) = 0.900
Number of Possible Trials 4,775
Number of Possible Successes 10
Number of Actual Trials 50
Number of Actual Successes -
P(# of successes in sample) 0.900
1 - P(# successses in sample) 0.100
44
Poisson Distribution

The Poisson distribution gives the probability of an event occurring multiple times
within a given time interval.

Average number of successes per unit time.
e 2.71828
e
Pr( successes per unit time)
!
x
x
x
o
o
=
45
Example

Over the course of a typical eight hour day, 100 customers come into a store.
Each customer remains in the store for 10 minutes (on average). One
salesperson can handle no more than three customers in 10 minutes. If it is likely
that more than three customers will show up in a single 10-minute interval, then
the store will have to hire another salesperson.

What is the probability that more than 3 customers will arrive in a single 10-
minute interval?
Time interval = 10 minutes

There are 48 ten-minute intervals during an 8 hour work day.

100 customers per day / 48 ten-minute intervals = 2.08 customers per interval.

= 2.08 successes per interval (on average)
x = 4, 5, 6, successes
e
Pr( successes per unit time)
!
x
x
x
o
o
=
46
Time interval = 10 minutes
= 2.08 successes per interval
x = 4, 5, 6, successes
Pr( 4) 1 Pr( 0) Pr( 1) Pr( 2) Pr( 3) x x x x x > = = + = + = + = (

2.08 0
2.08 1
2.08 2
2.08 3
e 2.08
Pr(0 successes) 0.125
0!
e 2.08
1!
e 2.08
2!
e 2.08
3!
= =
= =
= =
= =
Pr(x 4) = 1 (0.125 + 0.260 + 0.270 + 0.187)
= 0.158 = 15.8%
>
47

1. Find the section of the worksheet titled Poisson Distribution.
2. Enter the average number of successes per time interval.
3. Enter the number of successes per time interval.
4. For Cumulative? enter FALSE to obtain Pr(x successes out of N trials); enter
TRUE to obtain Pr( x successes out of N trials).
Example:
Pr(x 3)
1 Pr(x 3) = Pr(x 4)
E(Successes / time interval) 2.08
Successes / time interval 3
Cumulative? TRUE
P(# successes in a given interval) 0.842
1 - P(# successes in a given interval) 0.158
TRUE yields Pr(x 3) instead of Pr(x = 3) s
s
s >
s
48
Suppose you want to hire an additional salesperson on a part-time basis. On
average, for how many hours per week will you need this person? (Assume a 40-
hour work week.)
There is a 15.8% probability that, in any given 10-minute interval, more than 3
customers will arrive. During these intervals, you will need another salesperson.

In one work day, there are 48 ten-minute intervals. In a 5-day work week, there
are (48)(5) = 240 ten-minute intervals.

On average, you need a part-time worker for 15.8% of these, or (0.158)(240) =
37.92 intervals.

37.92 ten-minute intervals = 379 minutes = 6.3 hours, or 6 hours 20 minutes.

Note: An easier way to arrive at the same answer is: (40 hours)(0.158) = 6.3
hours.
49
Negative Binomial Distribution
Negative Binomial Distribution

The binomial distribution gives the probability of the x
th
occurrence of an event
happening on the N
th
trial.

N Number of trials
p Probability of a single success occurring
| |
=
|
\ .
th th
1
Pr( success occurring on the trials) (1 )
1
x N x
N
x N p p
x
( )
( )
| |
=
|

\ .
1 1 !
1 !( )! 1
N N
x N x x
where
50
Pertinent Information Distribution

Probability of a single success Binomial
Number of trials
Number of successes

Number of possible trials Hypergeometric
Number of actual trials
Number of possible successes
Number of actual successes

Average successes per time interval Poisson
Number of successes per time interval
Discrete Distributions: Summary
51
While the discrete distributions are useful for describing phenomena in which the
random variable takes on discrete (e.g. integer) values, many random variables
are continuous and so are not adequately described by discrete distributions.

Example:

Income, Financial Ratios, Sales.

Technically, financial variables are discrete because they measure in discrete units
(cents). However, the size of the discrete units is so small relative to the typical
values of the random variable, that these variables behaves like continuous
random variables.

E.g. A firm that typically earns $10 million has an income level that is 1 billion
times the size of the discrete units in which the income is measured.
Continuous Distributions
52
The continuous uniform distribution is a distribution in which the probability of
the random variable taking on a given range of values is equal for all ranges of
the same size.

Example:
X is a uniformly distributed random variable that can take on any value in the
range [1, 5].

Pr(1 < X < 2) = 1/4 = 0.25
Pr(2 < X < 3) = 1/4 = 0.25
Pr(3 < X < 4) = 1/4 = 0.25
Pr(4 < X < 5) = 1/4 = 0.25

Note: The probability of X taking on a specific value is zero.
Continuous Distributions
53
The continuous uniform distribution is a distribution in which the probability of
the random variable taking on a given range of values is equal for all ranges of
the same size.

Example:
X is a uniformly distributed random variable that can take on any value in the
range [1, 5].

Pr(1 < X < 2) = 1/4 = 0.25
Pr(2 < X < 3) = 1/4 = 0.25
Pr(3 < X < 4) = 1/4 = 0.25
Pr(4 < X < 5) = 1/4 = 0.25

Note: The probability of X taking on a specific value is zero.
Continuous Uniform Distribution
54
Example:

Pr(1 < X < 2) = 1/4 = 0.25
Pr(2 < X < 3) = 1/4 = 0.25
Pr(3 < X < 4) = 1/4 = 0.25
Pr(4 < X < 5) = 1/4 = 0.25

In general, we say that the probability density function for X is:
pdf(X) = 0.2 for all k

(note: Pr(X = k) = 0 for all k)

and the cumulative density function for X is:
Pr(X k) = (k 1) / 4 s
Continuous Uniform Distribution
( ) ( )
+
=
+
=
=
=
mean
2
variance
12
minimum value of the random variable
maximum value of the random variable
a b
b a b a
a
b
55
Exponential Distribution
Exponential Distribution

The exponential distribution gives the probability of the maximum amount of time
required until the next occurrence of an event.

Average number of time intervals between the occurrence of successes.
x Maximum time intervals until the next success occurs.

= Pr(the next success occuring in or fewer time intervals) 1 e
x
x
=
=
1
2
mean
variance
56
Many continuous random processes are normally distributed. Among them are:
1. Proportions (provided that the proportion is not close to the extremes of 0 or
1).
2. Sample Means (provided that the means are computed based on a large
enough sample size).
3. Differences in Sample Means (provided that the means are computed based
on a large enough sample size).
4. Mean Differences (provided that the means are computed based on a large
enough sample size).
5. Most natural processes (including many economic, and financial processes).
Normal Distribution
57
There are an infinite number of normal distributions, each with a different mean
and variance.

We describe a normal distribution by its mean and variance:
= Population mean
2
= Population variance

The normal distribution with a mean of zero and a variance of one is called the
standard normal distribution.
= 0
2
= 1
Normal Distribution
58
The pdf (probability density function) for normal distributions are bell-shaped. This means
that the random variable can take on any value over the range + to , but the
probability of the random variable straying from its mean decreases as the distance from
the mean increases.

Normal Distribution
59
For all normal distributions, approximately:
50% of the observations lie within
Example:
Suppose the return on a firms stock price is normally distributed with a mean of 10%
and a standard deviation of 6%. We would expect that, at any given point in time:
1. There is a 50% probability that the return on the stock is between 6% and 14%
2. There is a 68% probability that the return on the stock is between 4% and 16%.
o
2 / 3 o
2 o
3 o
Normal Distribution
60
Population Measures:
Population mean
Population variance

Sample Measures (estimates of population measures):
Sample mean
Sample variance

Variance measures the square of the average dispersion of observations around a mean.
( )
2
2
1
1
Sample Variance
1
N
i
i
s x x
N
=
= =
2
o
x
( )
2
2
1
1
Population Variance
N
i
i
x
N
o
=
= =
2
s
Calculated using all possible observations.
Calculated using a subset of all possible observations.
Normal Distribution
61
If we do not have all possible observations, then we cannot compute the
population mean and variance. What to do?

Take a sample of observations and use the sample mean and sample variance
as estimates of the population parameters.

Problem: If we use the sample mean and sample variance instead of the
population mean and population variance, then we can no longer say that 50%
of observations lie within , etc.

In fact, the normal distribution no longer describes the distribution of
observations. We must use the t-distribution.

The t-distribution accounts for the fact that (1) the observations are normally
distributed, and (2) we arent sure what the mean and variance of the
distribution is.
2 / 3 o
Problem of Unknown Population Parameters
62
There are an infinite number of t-distributions, each with different degrees of
freedom. Degrees of freedom is a function of the number of observations in a
data set. The more degrees of freedom (i.e. observations) exist, the closer the t-
distribution is to the standard normal.

For most purposes, degrees of freedom = N 1, where N is the number of
observations in the sample.

The more degrees of freedom that exist, the closer the t-distribution is to the
standard normal distribution.
t-Distribution
63
The standard normal distribution is the same as the t-distribution
with an infinite number of degrees of freedom.
t-Distribution
64
Degrees of Freedom
5 10 20 30
S
t
a
n
d
a
r
d

D
e
v
i
a
t
i
o
n
s

2/3 47% 48% 49% 49% 50%
1 64% 66% 67% 68% 68%
2 90% 93% 94% 95% 95%
3 97% 98% 99% 99% 99%
t-Distribution
65
Example:
Consumer reports tests the gas mileage of seven SUVs. They find that the sample of
SUVs has a mean mileage of 15 mpg with a standard deviation of 3 mpg. Assuming
that the population of gas mileages is normally distributed, based on this sample,
what percentage of SUVs get more than 20 mpg?
t-Distribution
15 mpg
s = 3 mpg
20 mpg
area = ?
We dont know the area indicated because we
dont know the properties of a t-distribution with
a mean of 15 and a standard deviation of 3.
However, we can convert this distribution to a
distribution whose properties we do know.
The formula for conversion is:
Test value mean
Test statistic
standard deviation
=
Test value is the value we are examining (in
this case, 20 mpg), mean is the mean of the
sample observations (in this case, 15 mpg), and
standard deviation is the standard deviation of
the sample observations (in this case, 3 mpg).
66
Example:
Consumer reports tests the gas mileage of seven SUVs. They find that the sample of
SUVs has a mean mileage of 15 mpg with a standard deviation of 3 mpg. Assuming
that the population of gas mileages is normally distributed, based on this sample,
what percentage of SUVs get more than 20 mpg?
t-Distribution
15 mpg
s = 3 mpg
20 mpg
area = ?
0
s = 1
1.67
We can look up
the area to the
right of 1.67 on a
t
6
distribution.
=
Test value mean
Test statistic
standard deviation
20 15
1.67
3
t
6
67
Test statistic 1.670
Degrees of Freedom 6
Pr(t > Test statistic) 7.30%
Pr(t < Test statistic) 92.70%
t Distribution
t-Distribution
15 mpg
s = 3 mpg
20 mpg
area = ?
0
s = 1
1.67
t
6
=
Test value mean
Test statistic
standard deviation
20 15
1.67
3
area = 0.073
68
0
s = 1
t
29
area = 0.001
area = 1 0.001 = 0.999
Example:
A light bulb manufacturer wants to monitor the quality of the bulbs it produces. To monitor
product quality, inspectors test one bulb out of every thousand to find its burn-life. Since the
production machinery was installed, inspectors have tested 30 bulbs and found an average
burn-life of 1,500 hours with a standard deviation of 200. Management wants to recalibrate its
machines anytime a particularly short-lived bulb is discovered. Management defines short-lived
as a burn-life so short that 999 out of 1,000 bulbs burns longer. What is the minimum number
of hours a test bulb must burn for production not to be recalibrated?
t-Distribution
1,500 hrs
s = 200 hrs
X hrs
area = 0.001
-3.3963
Test statistic
Pr(t > Test statistic)
Pr(t < Test statistic)
Pr(t > Critical value) 99.90%
Critical Value -3.3963
t Distribution
69
Example:
A light bulb manufacturer wants to monitor the quality of the bulbs it produces. To monitor
product quality, inspectors test one bulb out of every thousand to find its burn-life. Since the
production machinery was installed, inspectors have tested 30 bulbs and found an average
burn-life of 1,500 hours with a standard deviation of 200. Management wants to recalibrate its
machines anytime a particularly short-lived bulb is discovered. Management defines short-lived
as a burn-life so short that 999 out of 1,000 bulbs burns longer. What is the minimum number
of hours a test bulb must burn for production not to be recalibrated?
t-Distribution
1,500 hrs
s = 200 hrs
821 hrs
area = 0.001
0
s = 1
t
29
= =
Test value mean
Test statistic
standard deviation
1500
3.3963 821
200
X
X
area = 0.001
-3.3963
70
Test statistic
Pr(Z > Test statistic)
Pr(Z < Test statistic)
Pr(Z > Critical value) 0.10%
Critical Value 3.0902
Standard Normal Distribution (Z)
Example:
Continuing with the previous example, suppose we had used the normal distribution
instead of the t-distribution to answer the question.
The probabilities spreadsheet gives us the
following results.
Test value mean
Test statistic
standard deviation
Test value 1,500
3.09
200
Test value ( 3.09)(200) 1,500 882
=
= + =
t-Distribution
71
Correct distribution

Using the t-distribution, we recalibrate production whenever we observe a light bulb
with a life of 821 or fewer hours.

Incorrect distribution

Using the standard normal distribution, we recalibrate production whenever we
observe a light bulb with a life of 882 or fewer hours.

By incorrectly using the standard normal distribution, we would recalibrate
production too frequently.
When can we use the normal distribution?

As an approximation, when the number of observations is large enough that the
difference in results is negligible. The difference starts to become negligible at 30
or more degrees of freedom. For more accurate results, use the t-distribution.
t-Distribution vs. Normal Distribution
72
Terminology

We have been using the terms test statistic and critical value somewhat
interchangeably. Which term is appropriate depends on whether the number
described is being used to find an implied probability (test statistic), or represents a
known probability (critical value).

When we wanted to know the probability of an SUV getting more than 20 mpg, we
constructed the test statistic and asked what is probability of observing the test
statistic?

When we wanted to know what cut-off to impose for recalibrating production of light
bulbs, we found the critical value that gave us the probability we wanted, and then
asked what test value has the probability implied by the critical value?
Test Statistic vs. Critical Value
73
Example

The return on IBM stock has averaged 19.3% over the past 10 years with a standard
deviation of 4.5%. Assuming that past performance is indicative of future results and
assuming that the population of rates of return is normally distributed, what is the
probability that the return on IBM next year will be between 10% and 20%?
1. Picture the problem with respect to the
appropriate distribution.
4. Perform computations to find desired
area based on known areas.
2. Determine what area(s) represents the
answer to the problem.
Question asks for this area.
3. Determine what area(s) you must find
(this depends on how the probability
table or function is defined).
Look up these areas.
Test Statistic vs. Critical Value
74
Example

Convert question
to form that can
be analyzed.
t-Distribution
75
Example
Test value mean
Test statistic
standard deviation
10% 19.3%
Left Test statistic 2.07
4.5%
20% 19.3%
Right Test statistic 0.16
4.5%
= =
= =
t-Distribution
76
Example
Test statistic (2.070)
Degrees of Freedom 9.0
t Distribution
t Distribution
100% 96.58% = 3.42%
-2.07 0.16
t-Distribution
77
Example

3.42% 43.82%
3.42% + 43.82% = 47.24%
100% 47.24% = 52.76%
There is a 53% chance that IBM will yield a return between 10% and 20% next year.
t-Distribution
78
Example
Your firm has negotiated a labor contract that requires that the firm provide annual raises no
less than the rate of inflation. This year, the total cost of labor covered under the contract will
be $38 million. Your CFO has indicated that the firms current financing can support up to a
$2 million increase in labor costs. Based on the historical inflation numbers below, calculate
the probability of labor costs increasing by at least $2 million next year.
Year Inflation Rate Year Inflation Rate
1982 6.2% 1993 3.0%
1983 3.2% 1994 2.6%
1984 4.3% 1995 2.8%
1985 3.6% 1996 3.0%
1986 1.9% 1997 2.3%
1987 3.6% 1998 1.6%
1988 4.1% 1999 2.2%
1989 4.8% 2000 3.4%
1990 5.4% 2001 2.8%
1991 4.2% 2002 1.6%
1992 3.0% 2003 1.8%
Calculate the mean
and standard
deviation for inflation.
Sample mean = 3.2%
Sample stdev = 1.2%
t-Distribution
79
A $2 million increase on a $38 million base is a 2/38 = 5.26% increase
t-Distribution
Example
$2 million increase in labor costs. Based on the historical inflation numbers below, calculate
the probability of labor costs increasing by at least $2 million next year.
5.26% 3.2%
Test Statistic 1.717
1.2%
= =
t Distribution
Sample mean = 3.2%
Sample stdev = 1.2%
N = 22
t
21
80
A 4.79% increase on a $38 million base is (4.79%)($38 million) = $1.82 million.
t-Distribution
Example
$2 million increase in labor costs. The CFO wants to know what the magnitude of a possible
worst-case scenario. Answer the following: There is a 90% chance that the increase in
labor costs will be no more than what amount?
Test statistic
t Distribution
Test Value 3.2%
1.3232 Test Value 4.79%
1.2%
= =
81
The government has contracted a private firm to produce hand grenades. The specifications
call for the grenades to have 10 second fuses. The government has received a shipment of
100,000 grenades and will test a sample of 20 grenades. If, based on the sample, the
government determines that the probability of a grenade going off in less than 8 seconds
exceeds 1%, then the government will reject the entire shipment.

The test results are as follows.

Time to Detonation Number of Grenades
8 seconds 2
9 seconds 3
10 seconds 10
11 seconds 3
12 seconds 1
13 seconds 1

In general, one would not expect time measures to be normally distributed (because time
cannot be negative). However, if the ratio of the mean to the standard deviation is large
enough, we can use the normal distribution as an approximation.

Should the government reject the shipment?
t-Distribution
82
Time to Detonation Number of Grenades
8 seconds 2
9 seconds 3
10 seconds 10
11 seconds 3
12 seconds 1
13 seconds 1

First: What is the ratio of the mean to the standard deviation?
Mean = 10.05 seconds
Standard deviation = 1.20 seconds

Ratio is 8.375.

A ratio of greater than 8 is a decent heuristic.

This is not a rigorous test for the appropriateness of the normal distribution. But,
it is not too bad for a quick and dirty assessment.
t-Distribution
83

Nave answer: Dont reject the shipment because none of the grenades detonated in
less than 8 seconds Pr(detonation in less than 8 seconds) = 0.
0
2
4
6
8
10
12
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Seconds to Detonation
N
u
m
b
e
r

o
f

G
r
e
n
a
d
e
s
No grenades
detonated in less
than 8 seconds.
t-Distribution
Histogram: Shows number of observations according to type
84

Correct answer:

We use the sample data to infer the shape of the population distribution.
Inferred population
distribution shows a
positive probability
of finding detonation
times of less than 8
seconds.
t-Distribution
85

Correct answer:

1. Find the test statistic that corresponds to 8 seconds.
= =
8 10.05
Test statistic 1.71
1.2
Pr(detonation < 8 seconds) = 5.2%
3. Reject the shipment because probability of early detonation is too high.
t-Distribution
t Distribution
2. Find the area to the left of the test statistic.
-1.71
86
In the previous example, we noted that the normal distribution may not properly
describe the behavior of random variables that are bounded.

A normally distributed random variable can take on any value from negative infinity to
positive infinity. If the random variable you are analyzing is bounded (i.e. it cannot
cover the full range from negative to positive infinity), then using the normal
distribution to predict the behavior of the random variable can lead to erroneous
results.
Example:

Using the data from the hand grenade example, the probability of a single grenade
detonating in less than zero seconds is 0.0001. That means that, on average, we can
expect one grenade out of every 10,000 to explode after a negative time interval.

Since this is logically impossible, we must conclude that the normal distribution is not
the appropriate distribution for describing time-to-detonation.
Lognormal Distribution
87
In instances in which a random variable must take on a positive value, it is often the
case that the random variable has a lognormal distribution.

A random variable is lognormally distributed when the natural logarithm of the
random variable is normally distributed.
Example: Return to the hand grenade example.

Time to Detonation Log of Time to Detonation Number of Grenades
8 seconds 2.0794 2
9 seconds 2.1972 3
10 seconds 2.3026 10
11 seconds 2.3979 3
12 seconds 2.4849 1
13 seconds 2.5649 1

As time approaches positive infinity, ln(time) approaches positive infinity.
As time approaches zero, ln(time) approaches negative infinity.
88
Assuming that the times-to-detonation were normally distributed, we found a 2.6%
probability of detonation occurring in under 8 seconds.

Assuming that the times-to-detonation are lognormally distributed, what is the
probability of detonation occurring in under 8 seconds?
Log of Time to Detonation Number of Grenades
2.0794 2
2.1972 3
2.3026 10
2.3979 3
2.4849 1
2.5649 1
Mean = 2.3010
Standard deviation = 0.1175
ln(8) 2.3010
0.1175
= =
t Distribution
Pr(detonation < 8 seconds) = 3.7%
89
Example

You are considering buying stock in a small cap firm. The firms sales over the past
nine quarters are shown below. You expect your investment to appreciate in value
next quarter provided that the firms sales next quarter exceed $27 million. Based on
this assumption, what is the probability that your investment will appreciate in value?

Quarter Sales (millions)
1 $25.2
2 $12.1
3 $27.9
4 $28.9
5 $32.0
6 $29.9
7 $34.4
8 $29.8
9 $23.2
Because sales cannot be negative, it may
be more appropriate to model the firms
sales as lognormal rather than normal.
90
t Distribution
Example

What is the probability that the firms sales will exceed $27 million next quarter?

Quarter Sales (millions) ln(Sales)
1 $25.2 3.227
2 $12.1 2.493
3 $27.9 3.329
4 $28.9 3.364
5 $32.0 3.466
6 $29.9 3.398
7 $34.4 3.538
8 $29.8 3.395
9 $23.2 3.144
Mean = 3.261
ln(27) 3.261
0.311
= =
Pr(sales exceeding $27 million next quarter) = 46%
Odds are that the investment will decline in value.
91
t Distribution
Example

Suppose we, incorrectly, assumed that sales were normally distributed.

Quarter Sales (millions)
1 $25.2
2 $12.1
3 $27.9
4 $28.9
5 $32.0
6 $29.9
7 $34.4
8 $29.8
9 $23.2
Mean = 27.044
27 27.044
6.520
= =
Pr(sales exceeding $27 million next quarter) > 50%
Odds are that the investment will increase in value.
Incorrect distribution yields opposite conclusion.
92
Warning:

The mean of the logs is not the same as the log of the mean.

Mean sales = 27.04
ln(27.044) = 3.298

But:

Mean of log sales = 3.261

The same is true for standard deviation the standard deviation of the logs is not
the same as the log of the standard deviations.

In using the lognormal distribution, we need the mean of the logs, and the
standard deviation of the logs.
93
When should one use the lognormal distribution?

You should use the lognormal distribution if the random variable is either non-negative or non-positive

Can one use the normal distribution as an approximation of the lognormal distribution?

Yes, but only when the ratio of the mean to the standard deviation is large (e.g. greater than 8).

Note: If the random variable is only positive (or only negative), then you are always better off using the
lognormal distribution vs. the normal or t-distributions. The rules above give guidance for using the
normal or t-distributions as approximations.

Hand grenade example
Mean / Standard deviation = 10.05 / 1.20 = 8.38
Normal distribution underestimated probability of early detonation by 1.1% (3.7% for lognormal vs.
2.6% for t-distribution)

Quarterly sales example
Mean / Standard deviation = 27.04 / 6.52 = 4.15
Normal distribution overestimated probability of appreciation by 4.6%
(45.7% for lognormal vs. 50.3% for t-distribution)
94
So far, we have looked at the distribution of individual observations.

Gas mileage for a single SUV.
Burn life for a single light bulb.
Return on IBM stock next quarter.
Inflation rate next year.
Time to detonation for a single hand grenade.
Firms sales next quarter.

In each case, we had sample means and sample standard deviations and asked,
What is the probability of the next observation lying within some range?

Note: Although we drew on information contained in a sample of many observations,
the probability questions we asked always concerned a single observation.

In these cases, the random variable we analyzed was a single draw from the
population.
Distribution of Sample Means
95
We now want to ask probability questions about sample means.

Example:

EPA standards require that the mean gas mileage for a manufacturers cars be at least
20 mpg. Every year, the EPA takes a sampling of the gas mileages of a manufacturers
cars. If the mean of the sample is below 20 mpg, the manufacturer is fined.

In 2001, GM produced 145,000 cars. Suppose five EPA analysts each select 10 cars
and measures their mileages. The analysts obtain the following results.
Analyst #1 Analyst #2 Analyst #3 Analyst #4 Analyst #5
17 22 16 21 24
16 22 20 20 24
19 19 17 22 20
21 22 17 20 22
19 25 23 18 17
21 18 23 22 23
16 16 19 19 22
16 24 22 23 15
19 18 20 17 19
22 15 15 21 15
96
Analyst #1 Analyst #2 Analyst #3 Analyst #4 Analyst #5
17 22 16 21 24
16 22 20 20 24
19 19 17 22 20
21 22 17 20 22
19 25 23 18 17
21 18 23 22 23
16 16 19 19 22
16 24 22 23 15
19 18 20 17 19
22 15 15 21 15
Notice that each analyst obtained a different sample mean. The sample means are:

Analyst #1: 18.6
Analyst #2: 20.1
Analyst #3: 19.2
Analyst #4: 20.3
Analyst #5: 20.1

The analysts obtain different sample means because their samples consist of
different observations. Which is correct?
Each sample mean is an estimate of the population mean.

The sample means vary depending on the observations picked.
The sample means are, themselves, random variables.
97
Notice that we have identified two distinct random variables:

1. The process that generates the observations is one random variable (e.g. the
mechanism that determines each cars mpg).

2. The mean of a sample of observations is another random variable (e.g. the
average mpg of a sample of cars).

The distribution of sample means is governed by the central limit theorem.
Central Limit Theorem

Regardless of the distribution of the random variable generating the observations, the
sample means of the observations are t-distributed.

Example:

It doesnt matter whether mileage is distributed normally, lognormally, or according to
any other distribution, the sample means of gas mileages are t-distributed.
98
Example:

The following slides show sample means taking from a uniformly distributed random
variable.

The random variable can take on any number over the range 0 through 1 with equal
probability.

For each slide, we see the mean of a sample of observations of this uniformly
distributed random variable.
99
One Thousand Sample Means: Each Derived from 1 Observation
0
5
10
15
20
25
30
35
0
.
0
0
0
.
0
4
0
.
0
8
0
.
1
2
0
.
1
6
0
.
2
0
0
.
2
4
0
.
2
8
0
.
3
2
0
.
3
6
0
.
4
0
0
.
4
4
0
.
4
8
0
.
5
2
0
.
5
6
0
.
6
0
0
.
6
4
0
.
6
8
0
.
7
2
0
.
7
6
0
.
8
0
0
.
8
4
0
.
8
8
0
.
9
2
Value of Sample Mean
N
u
m
b
e
r

o
f

S
a
m
p
l
e

M
e
a
n
s

O
b
s
e
r
v
e
d
100
One Thousand Sample Means: Each Derived from 2 Observations
0
10
20
30
40
50
60
0
.
0
0
0
.
0
4
0
.
0
8
0
.
1
2
0
.
1
6
0
.
2
0
0
.
2
4
0
.
2
8
0
.
3
2
0
.
3
6
0
.
4
0
0
.
4
4
0
.
4
8
0
.
5
2
0
.
5
6
0
.
6
0
0
.
6
4
0
.
6
8
0
.
7
2
0
.
7
6
0
.
8
0
0
.
8
4
0
.
8
8
0
.
9
2
N
u
m
b
e
r

o
f

S
a
m
p
l
e

M
e
a
n
s

O
b
s
e
r
v
e
d
101
0
10
20
30
40
50
60
70
80
0
.
0
0
0
.
0
4
0
.
0
8
0
.
1
2
0
.
1
6
0
.
2
0
0
.
2
4
0
.
2
8
0
.
3
2
0
.
3
6
0
.
4
0
0
.
4
4
0
.
4
8
0
.
5
2
0
.
5
6
0
.
6
0
0
.
6
4
0
.
6
8
0
.
7
2
0
.
7
6
0
.
8
0
0
.
8
4
0
.
8
8
0
.
9
2
N
u
m
b
e
r

o
f

S
a
m
p
l
e

M
e
a
n
s

O
b
s
e
r
v
e
d
102
0
20
40
60
80
100
120
140
160
0
.
0
0
0
.
0
4
0
.
0
8
0
.
1
2
0
.
1
6
0
.
2
0
0
.
2
4
0
.
2
8
0
.
3
2
0
.
3
6
0
.
4
0
0
.
4
4
0
.
4
8
0
.
5
2
0
.
5
6
0
.
6
0
0
.
6
4
0
.
6
8
0
.
7
2
0
.
7
6
0
.
8
0
0
.
8
4
0
.
8
8
0
.
9
2
N
u
m
b
e
r

o
f

S
a
m
p
l
e

M
e
a
n
s

O
b
s
e
r
v
e
d
103
0
50
100
150
200
250
300
350
400
0
.
0
0
0
.
0
4
0
.
0
8
0
.
1
2
0
.
1
6
0
.
2
0
0
.
2
4
0
.
2
8
0
.
3
2
0
.
3
6
0
.
4
0
0
.
4
4
0
.
4
8
0
.
5
2
0
.
5
6
0
.
6
0
0
.
6
4
0
.
6
8
0
.
7
2
0
.
7
6
0
.
8
0
0
.
8
4
0
.
8
8
0
.
9
2
N
u
m
b
e
r

o
f

S
a
m
p
l
e

M
e
a
n
s

O
b
s
e
r
v
e
d
104
Notice two things that occur as we increase the number of observations that feed into
each sample.

1. The distribution of sample means very quickly becomes bell shaped. This is the
result of the central limit theorem basing a sample mean on more observations
causes the sample means distribution to approach the normal distribution.

2. The variance of the distribution decreases. This is the result of our next topic the
variance of a sample mean.
The variance of a sample mean decreases as the number of observations comprising the
sample increases.
Standard deviation of the observations
Standard deviation of sample means
Number of observations comprising the sample means
=
(called the standard error)
105
Example:

In the previous slides, we saw sample means of observations drawn from a uniformly
distributed random variable.

The variance of a uniformly distributed random variable that ranges from 0 to 1 is 1/12.

Therefore:
1/12
Variance of sample means based on 1 observation 0.0833
1
1/12
Variance of sample means based on 2 observations 0.0417
2
1/12
5
Variance of sample
= =
= =
= =
1/12
means based on 20 observations 0.0042
20
1/12
200
= =
= =
106
Example:
Let us return to the EPA analysts.
Analyst #1 looked at 10 cars and found an average mileage of 18.6 and a standard
deviation of 2.271. Analyst #1s data was: 17, 16, 19, 21, 19, 21, 16, 16, 19, 22

Based on this sample, GM can expect that 95% of cars to have mileages between what
two extremes?
Test statistic
t Distribution
2.262 -2.262
Left 18.6
2.262 Left ( 2.262)(2.271) 18.6 13.5
2.271
Right 18.6
2.262 Right (2.262)(2.271) 18.6 23.7
2.271
= = +
= = +
107
Example:
Based on this sample, GM can expect that 95% of analysts who look at 10 cars each
will find average mileages between what two extremes?
Test statistic
t Distribution
2.262 -2.262
= =
= = +
= = +
2.271
Standard error 0.718
10
Left 18.6
2.262 Left ( 2.262)(0.718) 18.6 17.0
0.718
Right 18.6
2.262 Right (2.262)(0.718) 18.6 20.2
0.718
108
Example:
Based on this sample, GM can expect that 95% of analysts who look at 20 cars each
will find average mileages between what two extremes?
Test statistic
t Distribution
2.262 -2.262
= =
=
= +
=
= +
2.271
Standard error 0.508
20
Left 18.6
2.262
0.508
Left ( 2.262)(0.508) 18.6 17.5
Right 18.6
2.262
0.508
Right (2.262)(0.508) 18.6 19.7
109
Example:
95% of cars will have mileages between 13.5 mpg and 23.7 mpg.

95% of analysts who look at 10 cars each should find average mileages between 17.0 mpg
and 20.2 mpg.

95% of analysts who look at 20 cars each should find average mileages between 17.5 mpg
and 19.7 mpg.
110
While we cant know the values of the population parameters (unless we have the
entire population of data), we can make statements about how likely it is to find the
population parameters within certain ranges.

We construct confidence intervals to describe ranges over which population parameters
are likely to exist.
Example:

Suppose EPA analyst #1 found the following data:

Sample mean = 18.6 mpg
Sample standard deviation = 2.271 mpg
Sample size = 20

Standard error = 0.508
From the t
9
distribution, we know that:
50% of the sample means lie within 0.7027 standard deviations of the population mean
Confidence Intervals
111
We can use this information to construct confidence intervals around the population
mean, where a confidence interval is:

Measure (critical value)(standard deviation of the measure)

The measure and the standard deviation are found in the data. What critical value we
select depends on the level of confidence we desire.
There is a 50% chance that the population mean is

found within the range: 18.6 (0.7027)(0.508) [18.2, 19.0]
found within the range: 18.6 (1.2297)(0.508)
=
= [18.0, 19.2]
found within the range: 18.6 (2.2622)(0.508) [17.5, 19.7] =
Increasing the level of confidence
widens the range of focus.
112
At the extremes, we can say

1. There is a 100% chance that the population mean is between negative infinity and
positive infinity.

2. There is a 0% chance that the population mean is exactly 18.60000000000000
The first statement gives perfect certainty about an infinitely unfocused range.

The second statement gives zero certainty about an infinitely focused range.
Usually, when statisticians mention error, they are referring to the range on a 95%
confidence interval.
113
Example:

You take a sample of 40 technology companies. The average P/E for the
sample is 71.8. The standard deviation of the P/Es of the 40 companies is
22.4. What is the measurement error associated with this average (at the
95% confidence level)?
The confidence interval is:
=
= =
=
39
Sample mean 71.8
22.4
Standard error 3.54
40
Critical value (from t ) 2.0227
71.8 (2.0227)(3.54) 71.8 7.16 =
Measurement error
Average P/E ratio for all tech companies
114
Example:

Your firm solicits estimates for constructing a new building. You receive the
following seven estimates:

$10 million, $12 million, $15 million, $13 million,
$11 million, $14 million, $12 million

Based on this information, construct a 90% confidence interval for the
estimated cost of the building.
6
Sample mean $12.4 million
Standard deviation $1.7 million
$12.4 million (1.9432)($1.7 million) [$9.1 million, $1
=
=
=
= 5.7 million]
115
6
Sample mean $12.4 million
$1.7 million
Standard deviation of the sample mean $643, 000
7
$12.4 million (1.9432)($6
=
= =
=
43, 000) [$11.2 million, $13.6 million] =
This is not a 90%
confidence interval for
the cost of the building,
but a 90% confidence
interval for the average
cost of seven buildings.
6
Standard deviation $1.7 million
$12.4 million (1.9432)($1.7 million) [$9.1 million, $15.7 million]
=
=
=
This is a 90% confidence interval for the cost of the building.
The difference lies in
the choice of
standard deviations.
What if we had used the standard deviation of the sample
mean (the standard error) instead of the standard deviation
of the observations?
116
Confidence interval for the cost of the building

There is a 90% probability that the cost of a single building will be between $9.1 million
and $15.7 million.

Confidence interval for the average cost of the buildings

There is a 90% probability that, when constructing seven buildings, the average cost per
building will be between $11.2 million and $13.6 million.
117
Proportions are means of categorical data. Categorical data is usually non-numeric and
represents a state or condition rather than a value.

Example:

In a vote between George Bush and Al Gore, the data are categorical. E.g. Bush, Gore,
Gore, Bush, Gore, Bush, Bush, Bush, Gore, etc.

A proportion measures the frequency of a single category relative to all categories. For
example, if the data set includes 10 Bushs and 12 Gores then the category Gore
represents 12 / (10 + 12) = 55% of all the categories.
A population proportion (usually denoted as ) is calculated based on the entire
population of data. A sample proportion (usually denoted as p) is calculated based on a
sample of the data.

The properties of the sample proportion are:
Population mean =

Sample standard deviation =

Distribution = normal (provided Np > 5 and N(1p) > 5)
t
t
(1 ) p p
N
Distribution of Proportions
118
Example:

There are 8.3 million registered voters in Florida. Within the first few hours after the
polls closed in the 2000 election, the count showed 50.5% of the vote going to George
Bush. This estimate was based on only 200,000 votes. Build a 99% confidence interval
for the population proportion of votes for Bush.
Measure 0.505
200, 000
Standard deviation of the measure 0.0011
p
N
= =
=
=
Proportion 0.505
Sample 200,000
Stdev(proportion) 0.0011
Standard Deviation of a Sample Proportion
119
Example:

There are 8.3 million registered voters in Florida. Within the first few hours after the
polls closed in the 2000 election, the count showed 50.5% of the vote going to George
Bush. This estimate was based on only 200,000 votes. Build a 99% confidence interval
for the population proportion of votes for Bush.
Measure 0.505
0.505(1 0.505)
Standard deviation of the measure 0.00112
200, 000
p
= =
= =
Test statistic -
Pr(Z > Test statistic) 50.00%
Pr(Z < Test statistic) 50.00%
Half of 1% is 0.5%.
Critical value (standard normal) 2.5758
There is a 99% probability that the population proportion
of votes for Bush is between 50.2% and 50.8%.
=
120
Example:

Given that a sample of voters shows one candidate with a 1% lead (50.5% vs. 49.5%),
what is the minimal number of votes that can be cast such that a 99.99% confidence
interval for the candidates population proportion is greater than 50%?
Measure 0.505
0.505(1 0.505) 0.249975
Standard deviation of the measure
Critical value (standard normal) 3.8906
p
N N
= =
= =
=
For elections in which the winner wins by at least 1%, one can poll (approximately) 150,000
voters and get, with a margin of error of 0.01%, the same result as that obtained by polling all
voters. This margin of error implies 1 miscalled election out of every 10,000 elections.
0.249975
Left end of confidence interval 0.505 (3.8906)
0.249975
Left end of confidence interval 0.50 0.505 (3.8906) 0.50 151, 353
N
N
N
=
= = =
121
Given these results, why were the political parties so concerned with counting every
vote in Florida?

Polling 150,000 people works only if the people are selected randomly.

In Florida, political parties were advocating recounts only for subsets of voters (i.e.
states and counties) that were predominantly aligned with one or the other party.

The argument in the Florida election ultimately revolved around attempts to introduce
and block sampling biases.
Sampling Bias
122
Sampling bias is a systematic tendency for samples to misrepresent the population from which they
are drawn.

A sample is not biased if it fails to represent the population there is a measurable probability that
a given sample will fail to represent the population. Rather, the data selection process is biased if
repeated samples consistently misrepresent the population.

Types of sampling biases:

Selection bias: Researcher excludes atypical subsets of the population from the data.
E.g. Estimate the average rate of return on low P/B stocks.
Problem: Firms with low P/B fail at a higher rate than firms with high P/B. Failed firms do not
appear in the data set.
Result: Sample mean return is greater than population mean return.

Non-response bias: Atypical subsets of subjects exclude themselves from the data.
E.g. Estimate the standard deviation of household incomes.
Problem: Individuals at the high and low extremes will be less likely to respond.
Result: Sample standard deviation is less than the population standard deviation.

Measurement bias: The measurement applied to the sample atypically approximates the population.
E.g. Estimate average purchasing power by measuring income over time.
Problem: As prices rise, incomes rise, but purchasing power does not.
Result: Sample mean of income exceeds population mean of purchasing power.
Sampling Bias
123
Thus far, we have

1. Estimated the probability of finding single observations that are certain distances
away from the population mean.
2. Estimated the probability of finding sample means that are certain distances away
from the population mean.
3. Estimated left and right boundaries that contain the population mean at varying
degrees of confidence.
We now want to test statements about the population mean.

Procedure for testing a hypothesis:

State a null hypothesis concerning the population parameter. The null
hypothesis is what we will assume is true.
State an alternative hypothesis concerning the population parameter. The
alternative hypothesis is what we will assume to be true if the null hypothesis is
false.
Calculate the probability of observing a sample that disagrees with the null at
least as much as the sample you observed.
Hypothesis Testing
124
Example:

Suppose we want to test the hypothesis that Bush obtained more than 50% of the vote
in Florida.

1. Our null hypothesis is
2. Our alternative hypothesis is
3. Based on a sample of 200,000 votes, we observed p = 0.505. Calculate the
probability of observing p = 0.505 (or less) when, in fact, .
0.5 t >
0.5 t <
0.5 t >
0
a
H : 0.5
H : 0.5
t
t
>
<
Since we are assuming that 0.5, or (in the most conservative case), 0.5,
(0.5)(1 0.5)
we are also assuming that the standard deviation of is 0.001118.
200, 000
p
t t > =
=
Hypothesis Testing
We now ask the question: Assuming that the null hypothesis is true, what is the probability of
observing a sample that disagrees with the null at least as much as the sample we observed?
125
The area to the right of 0.505 is the
probability of finding a sample proportion of
at least 0.505 when, in fact, the population
proportion is 0.5.
The area to the left of 0.505 is the
probability of finding a sample proportion
of at most 0.505 when, in fact, the
population proportion is 0.5.
According to the null hypothesis, we
assume that the center of the
distribution is 0.5.
The sample proportion we found was
0.505.
Hypothesis Testing
126
The area to the left of 0.505 is the
probability of finding a sample proportion
of at most 0.505 when, in fact, the
population proportion is 0.5.
Because the setup of the distribution
assumes that the population proportion is at
least 0.5, we are more concerned with the
alternative tail.
The area of the alternative tail tells us the
probability of observing a sample as good
or worse than the one we observed when,
in fact, the null hypothesis is true.
Using the formula for the test statistic, we
find that the area of the alternative tail is
0.9996.
We say: Assuming that Bush would gain
at least 50% of the vote, there is a
99.96% chance that a sample of 200,000
votes would show at most 50.5% for
Bush.
Hypothesis Testing
127
Assuming that Bush would gain at least 50% of the vote, there is a 99.96% chance
that a sample of 200,000 votes would show at most 50.5% for Bush.
Notice that this statement is not very enlightening. What it says (in effect) is: If
you assume that Bush wins, then the sample results we see are reasonable. This
sounds like a circular argument.
The problem with this line of reasoning is that your belief that the expensive
treatment works is based on the (possibly false) assumption that you had termites.
Example:
1. You buy a new house and, although you have seen no termites in the house, you
assume that the house is in danger of termite infestation.
2. You spend $5,000 on a new treatment that is supposed to guarantee that
termites will never infest your house.
3. Following the treatment, you see no termites.
4. You conclude that the treatment was worth the $5,000.
Hypothesis Testing
128
Following the termite treatment, two things can happen:
You dont see termites in the house.
You do see termites in the house.
Example:
1. You buy a new house and, although you have seen no termites in the house, you
assume that the house is in danger of termite infestation.
2. You spend $5,000 on a new treatment that is supposed to guarantee that
termites will never infest your house.
3. Following the treatment, you see no termites.
4. You conclude that the treatment was worth the $5,000.
If you dont see termites, you can conclude nothing. It could be the case that the
treatment works, or it could be the case that the treatment doesnt work but youll
never know because you dont have termites.

If you do see termites, you can conclude that the treatment doesnt work.
Hypothesis Testing
129
Returning to the election example, finding a sample proportion of 0.505 does not tell
us that the population proportion is greater than 0.5 because we began the analysis
assuming that the population proportion was greater than 0.5.

However, if we found a sample proportion of (for example) 49.8%, this may tell us
something.
0
a
H : 0.5
H : 0.5
t
t
>
<
(0.5)(1 0.5)
Assuming that (in the most conservative case) 0.5, stdev( ) 0.001118.
200, 000
Test value mean 0.498 0.5
standard deviation 0.001118
p t

= = =

= = =
The area of the alternative tail is 3.7%.

We conclude:
If, in fact, the population proportion of
votes for Bush is at least 50%, then there is
only a 3.7% chance of observing a sample
proportion of, at most, 49.8%.
Hypothesis Testing
130
The area corresponding to the alternative hypothesis is called the p-value (p
stands for probability).

In words, the p-value is the probability of rejecting the null hypothesis when, in fact,
the null hypothesis is true.

For example, suppose that the sample of 200,000 voters had a sample proportion of
49.8% voting for Bush.

The null hypothesis is that the population proportion exceeds 0.5 i.e. Bush wins the
election.

So, if Bush were to concede the election before the entire population of votes were
tallied (i.e. if Bush were to reject the null hypothesis), then there is a 3.7% chance
that he would be conceding when, in fact, the population of votes is in his favor.
Hypothesis Testing
131
In making decisions on the basis of samples, you can make either of two types of errors.

Type I Error
Reject the null hypothesis when, in fact, the null hypothesis is true.

Example: Conclude that the termite treatment does work when, in fact, it does not work.

Type II Error
Fail to reject the null hypothesis when, in fact, the null hypothesis is false.

Example: Conclude that the termite treatment does not work when, in fact, it does work.

Because all of our analyses begin with an assumption about the population, our p-values
will always refer to Type I errors. This does not mean that we are immune from Type II
errors. Rather, it means that the calculation of Type II errors is beyond the scope of this
course.
Hypothesis Testing
132
Returning to the EPA example, there are two ways the EPA analyst could construct
hypotheses.
0
a
H : 20
H : 20
>
<
0
a
H : 20
H : 20
s
>
Presumption
GM is in compliance unless the data indicate otherwise.

Implications of Results
Reject the null: GM is not in compliance.
Fail to reject the null: No conclusion.
Presumption
GM is not in compliance unless the data indicate otherwise.

Implications of Results
Reject the null: GM is in compliance.
Fail to reject the null: No conclusion.
Hypothesis Testing
133
Conclusion: If the fleets average mileage did exceed 20 mpg, then the probability of
finding a sample with (at most) an average mileage of 18.6 would be 1.1%.

Alternatively: The null hypothesis is that GMs fleet meets or exceeds EPA requirements.
Based on the sample data, were the EPA to declare GM in violation of EPA requirements
(i.e. reject the null hypothesis), there would be a 1.1% chance that the EPAs ruling
would be incorrect.
0
a
H : 20
H : 20
>
<
Sample mean 18.6
Sample standard deviation of the sample means 0.508
18.6 20
0.508
=
=
= =
Pr(t > Critical value)
Critical Value ###########
t Distribution
Hypothesis Testing
134
Procedure for hypothesis testing using significance level approach:
1. State the null and alternative hypotheses.
2. Picture the distribution and identify the null and alternative areas.
3. Using the significance level, identify the critical values(s) that separate the null
and alternative areas.
4. Calculate the test statistic.
5. Place the test statistic on the distribution. If it falls in the alternative area, reject
the null hypothesis. If it falls in the null area, fail to reject the null hypothesis.
Procedure for hypothesis testing using p-value approach:
1. State the null and alternative hypotheses.
3. Calculate the test statistic.
4. Find the area from the test statistic toward the alternative area(s). This area is
the p-value.
5. Interpretation: p-value is the probability of rejecting the null when, in fact, the
null is true.
Two approaches to hypothesis testing
Hypothesis Testing
135
Example (significance level approach):

Using the EPA data from analyst #1, test the hypothesis that the (population) average
mileage of GMs car fleet exceeds 20 mpg. Test the hypothesis at the 5% significance
level.
Area in alternative tail = 5%
Critical value = -1.833
Test statistic = -1.949

Test statistic falls in alternative tail
Reject the null hypothesis
Hypothesis Testing
136
Example (p-value approach):

Using the EPA data from analyst #1, test the hypothesis that the (population) average
mileage of GMs car fleet exceeds 20 mpg.
Test statistic = -1.949

Area from test statistic toward alternative
area = 4.16%

Interpretation: If we were to reject the
null, there would be a 4.16% chance that
we would be incorrect.
Hypothesis Testing
t
9
137
Example:

Test the hypothesis that the average real rate of return on 12 month municipal bonds
exceeds 3% at a 5% significance level.

Sample data:
50
4.2%
9.9%
x
N
x
s
=
=
=
9.9%
1.4%
50
x
s = =

Test Statistic:
Test statistic
x
x
s

=
4.2% 3%
0.8571
1.4%
= =
Critical value:
Critical value is the value that causes
the alternative tail area to equal the
significance level.
H
a
is to the left.
0.8571
Fail to reject H
0
.
Hypotheses:
0
a
H : 3%
H : 3%
>
<
Hypothesis Testing
-1.677
t
49
138
Example:

A paint manufacturer advertises that, when applied correctly, its paint will resist peeling
for 5 years.

A consumer watchdog group has filed a class action suit against the manufacturer for
false advertisement. Based on the following data (numbers reflect years prior to peeling)
test the manufacturers claim at the 1% level of significance.

Sample data:
4.9, 5.2, 3.7, 5.3, 4.8, 4.5, 5.1, 5.8, 4.1, 4.7
10
4.81
0.6064
x
N
x
s
=
=
=
0.6064
0.1917
10
x
s = =
4.81 5
0.1917
x
x
s

= = =
0
a
H : 5
H : 5
>
<
Presumption of innocence
H
a
on the left
Test statistic falls in null tail
fail to reject null hypothesis.
Critical value = 2.8214
t
9
-0.991
Hypothesis Testing
139
Example:

A paint manufacturer advertises that, when applied correctly, its paint will resist peeling
for 5 years.

A consumer watchdog group has filed a class action suit against the manufacturer for
false advertisement. Based on the following data (numbers reflect years prior to peeling)
calculate the p-value for the manufacturers claim.

Sample data:
4.9, 5.2, 3.7, 5.3, 4.8, 4.5, 5.1, 5.8, 4.1, 4.7
Conclusion: Assuming that the null hypothesis is
true, there is a 17.4% chance that we would find
a sample mean (based on 10 observations) of
4.81 or less.
Alternatively: We can reject the null hypothesis,
but there would be a 17.4% chance that we
would be wrong in doing so.
Hypothesis Testing
Using the p-value approach, we find the area of
the alternative tail, starting at the test statistic.
t
9
-0.991
4.81 5
0.1917
x
x
s

= = =
area = 0.174
140
Frequently, we are interested in comparing the means of two populations.

Statistically, this is a more complicated problem than simply testing a single sample
mean.

In the means tests we have seen thus far, we have always compared a sample mean to
some fixed number.

Example: In testing the hypothesis that the mean return on bonds exceeds 3%, we
compared a random variable (the sample mean) to a fixed number (3%).

When we perform a test on a single sample mean, we are comparing a single random
variable to a fixed number.

When we perform a test comparing two sample means, we are comparing two random
variables to each other.
Distribution of a Difference in Sample Means
141
The properties of the difference in sample means are:
( )
2 2
2
2 2
df
4 4
Population mean
Sample standard deviation
Distribution , where df
1 1
a b
a b
a b
a b
a b
x x
x x
a b
x x
x x
a b
s s
s
N N
s s
t
s s
N N

=
= = +
+
= =
+

Let be a difference in sample means.
a b
x x
Distribution of a Difference in Sample Means
142
Example:

Test the hypothesis (at a 1% significance level) that the average rate of return on 12
month Aaa bonds is less than the average rate of return on 12 month municipal bonds.

We draw two samples from two different populations (Aaa bonds and municipal bonds).

We now have two random variables (the sample means from each population).
muni
muni
muni
We obtain the following sample data:
43, 50
5.1%, 4.2%
1.4%, 1.1%
Aaa
Aaa
Aaa
N N
x x
s s
= =
= =
= =
muni
muni
2 2
2 2
muni
(0.014) (0.011)
0.003
43 50
Aaa
Aaa
x x
x x
Aaa
s s
s
N N
= + = + =
0 muni
a muni
Our hypotheses are:
H : 0%
H : 0%
Aaa
Aaa

s
>
Difference in Means Test
143
Example:

0 muni
a muni
Our hypotheses are:
H : 0%
H : 0%
Aaa
Aaa

s
>
muni
muni
muni
muni
We obtain the following sample data:
43, 50
5.1%, 4.2%
1.4%, 1.1%
0.003
Aaa
Aaa
Aaa
Aaa
x x
N N
x x
s s
s

= =
= =
= =
=
( )
2
2 2
2
2 2
4 4 4 4
The degrees of freedom are:
0.014 0.011
43 50
df 79
0.014 0.011
1 1
43 50
43 1 50 1
a b
a b
x x
x x
a b
s s
s s
N N
| |
| | | |
| +
| |
|
+
\ . \ .
\ .
= =
| | | |
+
| |

\ . \ .
+

( )
0.051 0.042 0
0.003

= =
144
Example:

0 muni
a muni
Our hypotheses are:
H : 0%
H : 0%
Aaa
Aaa

s
>
( )
0.051 0.042 0
0.003

= =
Test statistic falls in alternative
tail reject null hypothesis.
Test statistic
t Distribution
145
Example:

Find the p-value for the hypothesis that the average rate of return on 12 month Aaa
bonds is less than the average rate of return on 12 month municipal bonds.
0 muni
a muni
Our hypotheses are:
H : 0%
H : 0%
Aaa
Aaa

s
>
( )
0.051 0.042 0
0.003

= =
Probability of finding a sample that
disagrees with the null by at least as much
as the sample we observed when, in fact,
the null hypothesis is true = 0.05%.
t Distribution
We can reject the null hypothesis, but there
is a 0.05% chance that we would be wrong
in doing so.
146
Example:

Find the p-value for the hypothesis that the average rate of return on 12 month Aaa
bonds is less than the average rate of return on 12 month municipal bonds.
X
1
bar 0.051 Stdev(X
1
bar - X
2
bar) 0.003
Sx
1
0.014 Test statistic (distributed t) 3.407
N
1
43 df 79.28
X
2
bar 0.042
Sx
2
0.011
N
2
50
muni
muni
muni
43, 50
5.1%, 4.2%
1.4%, 1.1%
Aaa
Aaa
Aaa
N N
x x
s s
= =
= =
= =
147
Using Data Set #1, test the hypothesis (at a 5% level of significance) that the average
cost of unleaded gas is more expensive today than in the past.

Suggestion: The data set ranges from January 1976 through April 2003. Split the data set
into three parts (1/76 through 12/84, 1/85 through 12/93, and 1/94 through 4/03) and
test for a difference in population means between the first and third parts.
1 3
1 3
1 3
108, 112
80.58, 106.92
24.49, 15.37
N N
x x
s s
= =
= =
= =
0 1 3
a 1 3
H :
H :

s
>
0 1 3
a 1 3
H : 0
H : 0

s
>
1. State the hypotheses
2. Calculate sample statistics and test statistic
3. Find appropriate critical value
X
1
bar 80.580 Stdev(X
1
bar - X
2
bar) 2.768
Sx
1
24.490 Test statistic (distributed t) (9.515)
N
1
108 df 178.85
X
2
bar 106.920
Sx
2
15.370
N
2
112
Test statistic
t Distribution
148

Suggestion: The data set ranges from January 1976 through April 2003. Split the data set
into three parts (1/76 through 12/84, 1/85 through 12/93, and 1/94 through 4/03) and
test for a difference in population means between the first and third parts.
4. Compare test statistic to critical value
Test statistic falls in null area fail to reject null hypothesis.
0 1 3
a 1 3
H :
H :

s
>
149

Note: The question asks if the average cost of unleaded gas is more expensive today
than in the past. One way to interpret this is in terms of price (which we have done).
Another way to interpret this is in terms of purchasing power. If the researcher intended
this latter interpretation, then we may have introduced measurement bias the price of
gas in dollars may not reflect the cost of gas in purchasing power.
cost of unleaded gas (in terms of purchasing power) is more expensive today than in the
past.

Suggestion: Again, split the data set in three parts and compare the sample means of
parts 1 and 3. This data set includes average hourly earnings of private sector
employees. Use the ratio of the price of gas to average hourly earnings as a
measurement of the purchasing power cost of gas.

Note: The cost of gas (in terms of purchasing power) is the price of gas divided by the
wage rate ($ / gal) / ($ / hr) = hr / gal = how many hours a person must work to be
able to afford 1 gallon of gas.
150
past.
1 3
1 3
1 3
108, 112
0.116, 0.082
0.021, 0.009
N N
x x
s s
= =
= =
= =
0 1 3
a 1 3
H :
H :

s
>
0 1 3
a 1 3
H : 0
H : 0

s
>
X
1
bar 0.116 Stdev(X
1
bar - X
2
bar) 0.002
Sx
1
N
1
108 df 143.91
X
2
bar 0.082
Sx
2
0.009
N
2
112
Test statistic
t Distribution
151
15.508
Test statistic falls in the alternative tail reject null hypothesis.
0 1 3
a 1 3
H :
H :

s
>
past.
152
You work for a printing firm. In the past, the firm employed people to maintain the high
speed copier. Six months ago, in an effort to reduce costs, management laid off the
maintenance crew and contracted out service of the machine. You are looking at
maintenance logs for the copier and note the following times between copier
breakdowns. Test the hypothesis (at the 5% level) that the contracted maintenance does
as good of a job as the in-house maintenance did.

In-House Maintenance
26, 27, 22, 13, 8, 10, 28, 7, 16, 23, 26, 25

Contracted Maintenance
17, 13, 21, 17, 8, 6, 27, 6, 2, 20, 8, 9
Data represents time, so we can consider using
the lognormal distribution. Note: This is an
analysis of the sample means. The Central Limit
Theorem tells us that sample means are
(asymptotically) t-distributed regardless of the
distribution of the underlying data. So, while
taking logs will improve accuracy, it is not
necessary (and becomes less necessary the
larger the data set).
Convert data to logs:

In-House Maintenance
3.26, 3.30, 3.09, 2.56, 2.08, 2.30, 3.33, 1.95, 2.77, 3.14, 3.26, 3.22

Contracted Maintenance
2.83. 2.56. 3.04. 2.83. 2.08. 1.79. 3.30. 1.79. 0.69. 3.00. 2.08. 2.20
153
ln(In-House Maintenance)
3.26, 3.30, 3.09, 2.56, 2.08, 2.30, 3.33, 1.95, 2.77, 3.14, 3.26, 3.22

ln(Contracted Maintenance)
2.83. 2.56. 3.04. 2.83. 2.08. 1.79. 3.30. 1.79. 0.69. 3.00. 2.08. 2.20
0 in-house contracted
a in-house contracted
H :
H :

=
=
ln(in-house) ln(contracted)
12, 12
2.85, 2.35
0.508, 0.729
N N
x x
s s
= =
= =
= =
X
1
bar 2.855 Stdev(X
1
bar - X
2
bar) 0.256
Sx
1
N
1
12 df 19.64
X
2
bar 2.350
Sx
2
0.729
N
2
12
154

= =
= =
0 in-house contracted in-house contracted
a in-house contracted in-house contracted
H : 0
H : 0
12, 12
2.85, 2.35
0.508, 0.729
Test statistic 1.967, df 19
N N
x x
s s
= =
= =
= =
= =
3. Find appropriate critical values
2.5% 2.5%
Test statistic
t Distribution
Test statistic
t Distribution
155
1.967

= =
= =
H : 0
H : 0
12, 12
2.85, 2.35
0.508, 0.729
N N
x x
s s
= =
= =
= =
= =
Test statistic falls in the null area fail to reject null hypothesis.
156
breakdowns. Test the hypothesis (at the 10% level) that the contracted maintenance
does as good of a job as the in-house maintenance did.

= =
= =
H : 0
H : 0
12, 12
2.85, 2.35
0.508, 0.729
N N
x x
s s
= =
= =
= =
= =
Test the hypothesis at the 10% significance level.
1.967
Test statistic falls in the alternative area
Reject the null hypothesis.
Test statistic
t Distribution
Test statistic
t Distribution
157
breakdowns. Test the hypothesis that the contracted maintenance does as good of a job
as the in-house maintenance did.
Conclusion:

1. We reject the hypothesis that the contracted maintenance does as good a job as
the in-house maintenance at a 10% level of significance.

2. We fail to reject the hypothesis that the contracted maintenance does as good a
job as the in-house maintenance at a 5% level of significance.

3. p-value = (3.20%)(2) = 6.4% Probability of rejecting the null when the null is
true.
t Distribution
Multiply the p-value by two
because this is a two-tailed test
an equal portion of the
alternative tail exists on the
opposite side of the distribution.
158
A plaintiff claims that an on-the-job injury has reduced his ability to earn tips. He is suing
for lost future income. His tips for twelve weeks before and after the injury are shown
below. Test the hypothesis that his injury reduced his earning power.

Before injury
200, 210, 250, 180, 220, 200, 210, 230, 240, 190, 220, 250

After injury
200, 230, 190, 180, 200, 190, 210, 200, 220, 200, 180, 220

s s
> >
0 before after before after
a before after before after
H : 0
H : 0
before after
before after
before after
12, 12
216.67, 201.67
22.697, 15.859
df 19
N N
x x
s s
= =
= =
= =
=
=
1.877
t Distribution
Presumption of innocence
159
A difference in proportions test examines samples from two populations in an attempt to
compare the two population proportions.
Population proportion
(1 ) (1 )
Sample standard deviation
Distribution standard normal provided
5, (1 ) 5, 5, and (1 ) 5
a b
a b
a a b b
p p
a b
a a a a b b b b
p p p p
s
N N
N p N p N p N p
t t
=

= = +
=
> > > >
Let be a difference in sample proportions.
a b
p p
Distribution of a Difference in Proportions
160
An ABC News poll (summer 2003) of 551 women and 478 men shows that 31% of men
and 36% of women would rather see Hillary Clinton as President in 2004 than George
Bush.

Test the hypothesis that the two proportions are equal.
0 men women
a men women
H : 0
H : 0
t t
t t
=
=
men women
men women
478, 551
0.31, 0.36
N N
p p
= =
= =
men women
(0.31)(1 0.31) (0.36)(1 0.36)
0.029
478 551
p p
s

= + =
men men
men men
women women
women women
148.2 5
(1 ) 329.8 5
198.4 5
(1 ) 352.6 5
N p
N p
N p
N p
= >
= >
= >
= >
Difference in sample proportions is
normally distributed.
Difference in Proportions Test
161
p
1
0.310 Stdev(p1 - p2) 0.029
N
1
478 Test statistic (distributed stnd norm) (1.699)
p
2
0.360
N
2
551
An ABC News poll (summer 2003) of 551 women and 478 men shows that 31% of men
and 36% of women would rather see Hillary Clinton as President in 2004 than George
Bush.

0 men women
a men women
H : 0
H : 0
t t
t t
=
=
Pr(Z > Test statistic) 95.53%
Pr(Z < Test statistic) 4.47%
Critical Value #NUM!
p-value = (4.47%)(2) = 8.94%
Probability of rejecting the null hypothesis
when the null is true = 9%.
162
So far, we have assumed that the population of data is infinite. For example, in the case
of bond yield data, the population of data representing the return on IBM bonds is all the
returns that ever were, ever will be, or ever could have been.

There are some instances in which the population data is not only finite, but small in
comparison to the sample size.
In these instances, the sample data reflects more information than normal because it
represents, not a sample from an infinitely sized population, but a significant portion of
the entire population.
Finite Population Correction Factor
163
For example, suppose we want to construct a 95% confidence interval for the average
price of retail gas in Pittsburgh. There are 500 gas stations and we have the following
sample data:

$1.05, $1.12, $1.15, $1.17, $1.08, $0.99, $1.15, $1.22, $1.14, $1.17, $1.05, $1.10.

The mean for the sample is $1.12
The standard deviation is $0.06

The question is about the mean of the the price of gas. According to the Central Limit
Theorem, sample means are t-distributed regardless of the distributions of the underlying
data, therefore we can skip the lognormal transformation. The critical value for a 95%
confidence interval on a t
11
distribution is 2.201.

The 95% confidence interval is:
$0.06
$1.12 (2.201) [$1.08, $1.16]
12
| |
=
|
\ .
164
sample data:

$1.05, $1.12, $1.15, $1.17, $1.08, $0.99, $1.15, $1.22, $1.14, $1.17, $1.05, $1.10.
Now, suppose that we have the same sample, but that there are only 25 gas stations in
Pittsburgh. The 12 observations in our sample now constitute a large portion of the total
population. As such, the information we obtain from the sample should more clearly
reflect the population than it did when there were 500 gas stations in the population.
To account for this additional information, we adjust the standard deviation of the mean
by the finite population correction factor. The fpcf reduces the size of the standard
deviation of the mean to reflect the fact that the sample represents a large portion of the
total population.
$0.06
$1.12 (2.201) [$1.08, $1.16]
12
| |
=
|
\ .
165
sample data:

$1.05, $1.12, $1.15, $1.17, $1.08, $0.99, $1.15, $1.22, $1.14, $1.17, $1.05, $1.10.
Correcting the standard deviation of the sample mean by the finite population correction
factor, we have:
Corrected
1
Population size
Sample size
x x
N n
s s
N
N
n
| |
=
|
\ .
=
=
$0.06
$1.12 (2.201) [$1.08, $1.16]
12
| |
=
|
\ .
166
sample data:

$1.05, $1.12, $1.15, $1.17, $1.08, $0.99, $1.15, $1.22, $1.14, $1.17, $1.05, $1.10.
Notes on the finite population correction factor:

1. The correction does not apply to standard deviations of observations. The fpcf only
applies to standard deviations covered by the central limit theorem (including
standard deviations of means, standard deviations of proportions, standard
deviations of differences in means, and standard deviations of differences in
proportions).

2. The correction becomes necessary only when the sample size exceeds 5% of the
population size.
25 12 $0.06
$1.12 (2.201) [$1.10, $1.14]
25 1
12
| |
=
|
|
\ .
167
The analyses we have seen thus far all involve single observations or sample means.
Often, we will also want to conduct tests on variances.

Example:
Two paint companies both claim that their paints will resist peeling for an average of 10
years. You collect relevant durability data on both brands of paint.

Brand A
10, 12, 10, 9, 10, 11, 8, 12, 9, 9

Brand B
12, 6, 6, 1, 6, 17, 5, 17, 17, 13

Both samples have means of 10. But, the sample from brand A exhibits a standard
deviation of 1.3 compared to 5.9 for brand B.

While both brands appear to have the same average performance, brand A has more
uniform product quality (i.e. lower variance).
Distribution of Sample Variances
168
The properties of a sample standard deviation:
2
2
1
2
Population standard deviation
( 1)
is distributed
N
N s
o
_
o

=
Let be a sample standard deviation. s

Distribution of Sample Variances
169
Test statistic
Pr(
2
> Test statistic)
100.00%
Pr(
2
< Test statistic)
0.00%
Pr(
2
> Critical value) 5.00%
Chi-Square Distribution (
2
)
2
11
_
Example:

A tire manufacturer wants to keep the standard deviation of useful miles below 20,000. A
sample of 12 tires has a standard deviation of 18,000 miles. At a 5% level of significance,
test the hypothesis that the production process does not require adjustment.
0
a
H : 20, 000
H : 20, 000
o
o
s
>
2
2
(12 1)(18, 000 )
Test statistic 8.91
20, 000
= =
Test statistic falls in null area
fail to reject null hypothesis
Variance Test
170
Test statistic
Pr(
2
> Test statistic)
100.00%
Pr(
2
< Test statistic)
0.00%
Pr(
2
2
) 2
11
_
Example:

test the hypothesis that the production process does require adjustment.
0
a
H : 20, 000
H : 20, 000
o
o
>
<
2
2
(12 1)(18, 000 )
Test statistic 8.91
20, 000
= =
fail to reject null hypothesis
Variance Test
171
Example:

test the hypothesis that the production process does require adjustment.
0
a
H : 20, 000
H : 20, 000
o
o
>
<
0
a
H : 20, 000
H : 20, 000
o
o
s
>
We have tested both sets of hypotheses and, in each case, failed
to reject the null hypothesis.
Isnt this contradictory because the two nulls are opposites?
No.
Remember: failing to reject the null (technically) leaves us with no conclusion.
Therefore, what happened is that we ran two tests and neither resulted in a
conclusion.
Variance Test
172
Pr(
2
> Test statistic)
63.02%
Pr(
2
< Test statistic)
36.98%
Pr(
2
> Critical value)
2
)
2
11
_
Example:

sample of 12 tires has a standard deviation of 18,000 miles. What is the p-value for the
hypothesis that the production process does not require adjustment?
0
a
H : 20, 000
H : 20, 000
o
o
s
>
2
2
(12 1)(18, 000 )
Test statistic 8.91
20, 000
= =
p-value is the area from the test statistic toward
the alternative area.
p-value is the probability of erroneously rejecting
the null hypothesis.
Variance Test
173
2
11
_
Example:

sample of 12 tires has a standard deviation of 18,000 miles. What is the p-value for the
hypothesis that the production process does require adjustment?
2
2
(12 1)(18, 000 )
Test statistic 8.91
20, 000
= =
Pr(
2
> Test statistic)
63.02%
Pr(
2
< Test statistic)
36.98%
Pr(
2
> Critical value)
2
)
p-value is the area from the test statistic toward
the alternative area.
p-value is the probability of erroneously rejecting
the null hypothesis.
0
a
H : 20, 000
H : 20, 000
o
o
>
<
Variance Test
174
Example:

sample of 12 tires has a standard deviation of 18,000 miles. The production process
requires is faulty if the population standard deviation exceeds 20,000. The production
process is OK if the population standard deviation is less than 20,000.
0
a
H : 20, 000
H : 20, 000
o
o
>
<
0
a
H : 20, 000
H : 20, 000
o
o
s
>
There is a 63% chance that we would be wrong in believing that
the production process required adjustment.
There is a 37% chance that we would be wrong in believing that
the production process is OK.
Under most circumstances, we only regard probabilities below 5% as unusual.
Therefore, the sample data does not clearly refute either null hypothesis.
The data tell us nothing.
Variance Test
175
Test statistic
Pr(
2
> Test statistic)
100.00%
Pr(
2
< Test statistic)
0.00%
Pr(
2
2
)
Test statistic
Pr(
2
> Test statistic)
100.00%
Pr(
2
< Test statistic)
0.00%
Pr(
2
2
)
2
11
_
Example:

test the hypothesis that the population standard deviation equals 20,000.
2
2
(12 1)(18, 000 )
Test statistic 8.91
20, 000
= =
0
a
H : 20, 000
H : 20, 000
o
o
=
=
Fail to reject the null hypothesis.
Variance Test
176
Example:

Inspectors check chlorine levels in water at a processing facility several times each day.
The city has two goals: (1) to maintain an average chlorine level of 3 ppm, and (2) to
maintain a standard deviation of no more than 0.4 ppm (too little chlorine and people die
of disease; too much chlorine and people die of poisoning). Over a two day period,
inspectors take the following readings. Test the hypothesis that the water is adequately
treated at the 1% significance level.

Chlorine samples (ppm)
3.7. 2.5. 3.8. 3.4. 2.9. 2.6. 2.4. 3.3. 4.3. 3.4
Variance Test
Note: Although the data is non-negative, for the analysis of the sample mean, it is not
necessary to perform the log-normal transformation. According to the Central
Limit Theorem, sample means are t-distributed regardless of the distribution of
the underlying data. Having said this, performing the log-transformation will not
hurt and may improve the accuracy of the results somewhat.
177
Example:


3.7. 2.5. 3.8. 3.4. 2.9. 2.6. 2.4. 3.3. 4.3. 3.4
Sample mean of logs 1.156
Sample stdev of logs 0.195
1.156 ln(3)
0.195
10
=
=
= =
0
a
H : 3
H : 3
=
=
Fail to reject null hypothesis.
Variance Test
178
Example:


3.7. 2.5. 3.8. 3.4. 2.9. 2.6. 2.4. 3.3. 4.3. 3.4
Variance Test
Note: Although the data is non-negative, for the analysis of the sample variance, it is
not necessary to perform the log-normal transformation. This is because the
distributions we use for analyzing variances and standard deviations (the chi-
square and F-distributions) account for the fact that sample variance is non-
negative.
179
Example:

Test the hypothesis that the water is adequately treated at the 1% significance level.

3.7. 2.5. 3.8. 3.4. 2.9. 2.6. 2.4. 3.3. 4.3. 3.4
2
2
Sample standard deviation 0.622
(10 1)(0.622 )
0.4
=
= =
0
a
H : 0.4
H : 0.4
o
o
s
>
Variance Test
Test statistic -
Pr(
2
> Test statistic)
100.00%
Pr(2 < Test statistic) 0.00%
Pr(
2
2
)
Test statistic falls in alternative area
Reject null hypothesis.
180
Example:


3.7. 2.5. 3.8. 3.4. 2.9. 2.6. 2.4. 3.3. 4.3. 3.4

The tests we conducted were at the 1% significance level. This means that there is a 1%
probability that we might draw a sample that caused us to erroneously reject the null
hypothesis.

Suppose we want to err on the side of caution wherein we would rather risk finding that
the water is not adequately treated when, in fact it is, than to risk finding that the water
is adequately treated when, in fact, it is not.

How should we adjust our significance level?
Increase significance level of the test increases the probability of rejecting the null
when, in fact, the null is true.
Variance Test
181
Example:


3.7. 2.5. 3.8. 3.4. 2.9. 2.6. 2.4. 3.3. 4.3. 3.4
Sample mean of logs 1.156
Sample stdev of logs 0.195
1.156 ln(3)
0.195
10
=
=
= =
0
a
H : 3
H : 3
=
=
Fail to reject null hypothesis.
Variance Test
182
Example:

Test the hypothesis that the water is adequately treated at the 10% significance level.

3.7. 2.5. 3.8. 3.4. 2.9. 2.6. 2.4. 3.3. 4.3. 3.4
2
2
Sample standard deviation 0.622
(10 1)(0.622 )
0.4
=
= =
0
a
H : 0.4
H : 0.4
o
o
s
>
Variance Test
Test statistic -
Pr(
2
> Test statistic)
100.00%
Pr(2 < Test statistic) 0.00%
Pr(
2
2
)
Test statistic falls in alternative area
Reject null hypothesis.
183
When we constructed confidence intervals for sample means and for observations, we
used the formula:
measure (critical value)(stdev of measure)
This formula comes from the test statistic for normally (and t-) distributed random
variables. Note:
measure upper limit
measure (cv)(stdev) upper limit cv
stdev
measure lower limit
measure (cv)(stdev) lower limit cv
stdev
+ = =
= =
The formula for the critical value shown above (cv) is the same as the formula for the
test statistic.
estimate parameter
test statistic
stdev of estimate
=
Therefore, when we find a confidence interval, what we are really doing is:
1. Setting the test statistic equal to the critical value that gives us the desired level of
confidence, and
2. Solving for parameter.
Confidence Interval for a Variance
184
Because the formula for the test statistic for a sample variance is different than the
formula for the test statistic for a sample mean, we would expect the formula for the
confidence interval to be different also.
2
2
( 1) estimate
Test statistic
parameter
N
=
Setting the test statistic equal to the critical value that gives us the desired level of
confidence, and solving for parameter, we get:
2 2
2
( 1) estimate ( 1) estimate
parameter parameter
critical value critical value
N N
= =
Note that we use only the positive root because standard deviations are non-negative.
185
Example:

A sample of 10 observations has a standard deviation of 3. Find the 95% confidence
interval for the population standard deviation.

To find a 95% confidence interval, we need the two critical values that give 2.5% in the
upper and lower tails.
Test statistic
Pr(
2
> Test statistic)
100.00%
Pr(
2
< Test statistic)
0.00%
Pr(
2
2
)
Test statistic
Pr(
2
> Test statistic)
100.00%
Pr(
2
< Test statistic)
0.00%
Pr(
2
2
)
2
2
(10 1) 3
Upper limit 5.48
2.700
(10 1) 3
Lower limit 2.06
19.023
= =
= =
We find that there is a 95% probability that the
population standard deviation lies between 2.06
and 5.48.
186
In the same way that we had different procedures for testing a sample mean versus
testing a difference in two sample means, we similarly have different procedures for
testing a sample variance versus testing a difference in two sample variances.

Where variances are concerned, however, we look not at the difference in the sample
variances, but at the ratio of the sample variances.
Let be a ratio of two sample standard deviations.
If , then the ratio will be greater than 1.
If , then the ratio will be less than 1.
If , then the ratio will equal 1.
a
b
a
a b
b
a
a b
b
a
a b
b
s
s
s
s s
s
s
s s
s
s
s s
s
>
<
=
Distribution of a Difference in Sample Variances
187
The properties of a ratio of sample standard deviations:
2
1, 1
2
Population ratio
is distributed F
a b
a
b
a
N N
b
s
s
o
o

=
Let and standard deviations taken from different populations.
a b
s s
Distribution of a Difference in Sample Variances
188
Example:

A recent consumer behavior study was designed to test the beer goggles effect. A
group of volunteers was shown pictures (head shots) of members of the opposite sex
and asked to rate the people in the pictures according to attractiveness. Another group of
volunteers was given two units of alcohol, shown the same pictures, and also asked to
rate the people in the pictures according to attractiveness.

Test the hypothesis that, when subjects consume alcohol, they (on average) find pictures
of the opposite sex more attractive.
The straightforward hypothesis test is a difference of means test where:
0 drunk sober
a drunk sober
H : 0
H : 0

>
<
Difference in Variances Test
189
Example:

0 drunk sober
a drunk sober
H : 0
H : 0

>
<
Suppose we collect data, run the appropriate tests and fail to reject the null hypothesis.
Can we conclude (roughly speaking) that, on average, drinking alcohol causes one to find
the opposite sex more attractive?

Yes. However, it may be the case that the alcohol only affects a subset of the population.
For example, perhaps only men are affected; or, perhaps only those who rarely drink are
affected.

The difference in means test does not detect these cases it only detects differences in
the average of all subjects in the samples.
190
Example:

Scenario #1

Sober
3, 2, 3, 1, 1, 3, 4, 2, 3, 4

Drunk
4, 3, 4, 2, 2, 4, 5, 3, 4, 5
Consider the following two scenarios (calculate the means and stdevs for the data sets):
Average rating for sober is 2.6 compared to
an average rating for drunk of 3.6.

Standard deviations for both sober and drunk
are 1.07 because all 10 subjects were
affected by the alcohol.
Average rating for sober is 2.6 compared to an
average rating for drunk of 3.6.

Standard deviation for sober is 1.07, but for drunk is
1.90 because only males (the last 5 observations)
were affected by the alcohol.
Scenario #2

Sober
3, 2, 3, 1, 1, 3, 4, 2, 3, 4

Drunk
3, 2, 3, 1, 1, 5, 6, 4, 5, 6
Only males are affected. Everyone is affected.
191
Example:

Consider the following two scenarios (calculate the means and stdevs for the data sets):
Implication:

A difference in means test would report the same result for scenarios #1 and #2
(population mean for drunk is greater than population mean for sober).

But, a difference in variances test would show that all of the subjects were affected by
the alcohol in scenario #1, while only some of the subjects were affected by the alcohol
in scenario #2.
192
Example:
Using the scenario #2 data, test the hypotheses (at the 10% significance level):
0 drunk sober
a drunk sober
H :
H :
o o
o o
=
=
Test statistic
df in Numerator 9
df in Denominator 9
Pr(F > Test statistic) 100.00%
Pr(F < Test statistic) 0.00%
Pr(F > Critical value) 95.00%
F Distribution
Test statistic
df in Numerator 9
df in Denominator 9
F Distribution
193
Example:

0 drunk sober
a drunk sober
drunk sober
drunk sober
H :
H :
10, 10
1.90, 1.07
N N
s s
o o
o o
=
=
= =
= =
Test statistic falls in the null area
Fail to reject the null hypothesis.
2 2
drunk
2 2
sober
1.90
Test statistic 3.12
1.07
s
s
= = =
3.12
194
3.12
In scenario #2, we know for certain that only
males were affected, so we should expect to
see a difference in the standard deviations
across the two samples (sober vs. drunk).

Why did we end up failing to reject the null
hypothesis?

The result may be due to the small number
of observations in the samples.
What if we had only one more observation in
each sample, but our sample standard
deviations remained the same?
Test statistic -
df in Numerator 10
df in Denominator 10
F Distribution
One more observation in each
sample df = 10
Sample stdevs dont change, so the
test statistic doesnt change.
2.98
Critical value changes now we
reject the null hypothesis.
195
Procedure for Hypothesis Testing
Hypothesis Testing: Summary
1. State hypotheses
2. Picture distribution*
3. Identify null and alternative regions
4. Calculate test statistic*
5. p-value = area from test statistic
toward alternative tail(s). p-value
is prob of being wrong in
rejecting the null, or prob of
results being due to random
chance rather than due to null.
5. Find critical value(s) that define
alternative area(s) equal to the
significance level
6. If test statistic falls in alternative
area, reject null hypothesis. If test
statistic falls in null area, fail to
reject null hypothesis.
significance level approach p-value approach
*procedure varies depending on the type of test being performed
196
Hypothesis Test

Mean

Difference in means

Proportion

Difference in proportions

Variance

Difference in variances
( )
1 2
1 2
1 2
1 2
1
2
2 2
1 2 1 2
4 4
1 2
1 2 1 2
1 1 2 2
1 1 2
( ) ( )
, where
1 1
Standard Normal provided
5, (1 ) 5
Standard Normal provided
( ) ( )
5, 5,
(1 ) 5, (1
N
x
x x
N
x x
x x
p
x
t
s
s s
x x
t N
s s
s
N N
p
Np N p
p p
N p N p
N p N p
t t

t
o
t t
o
+

=
+

> >

> >
>
1 2
2
2
2
1
2
2
1
1, 1
2
2
) 5
( 1)
F
N
N N
N s
s
s
_
o

>
Test Statistic Distribution

Hypothesis Testing: Summary
197
The goal of exploratory analysis is to obtain a measure of a phenomenon.

Example:
Subjects are given a new breakfast cereal to taste and asked to rate the cereal.

The measured phenomenon is taste. Although taste is subjective, by taking the average
of the measures from a large number of subjects, we can measure the underlying
objective components that give rise to the subjective feeling of taste.
Causal vs. Exploratory Analysis
198
The goal of causal analysis is to obtain the change in measure of a phenomenon due to the
presence vs. absence of a control variable.

Example:
Two groups of subjects are given the same breakfast cereal to taste and are asked to rate the
cereal. One group is given the cereal in a black and white box. The other in a multi-colored
box.

The two groups of subjects exist under identical conditions (same cereal, same testing
environment, etc.), with the exception of the color of the cereal box. Because the color of the
cereal box is the only difference between the two groups, we call the color of the box the
control variable. If we find a difference in subjects reported tastes, then we know that the
difference in perceived taste is due to the color (or lack of color) of the cereal box.

It is possible that, apart from random chance, one group of subjects reports liking the cereal
and the other does not (e.g. one group was tested in the morning and the other in the
evening). We would call this a confound. A confound is the presence of an additional (and
unwanted) difference in the two groups. When a confound is present, it makes it difficult
(perhaps impossible) to determine how much of the difference in reported taste between the
two groups is due to the control and how much is due to the confound.
199
Because the techniques for causal and exploratory analysis are identical (with the
exception that causal analysis includes the use of a control variable whereas exploratory
analysis does not), we will limit our discussion to causal analysis.
200
The Likert Scale

We use the Likert scale to rate responses to qualitative questions.

Example:

Which of the following best describes your opinion of the taste of Coke?
Too Sweet Very Sweet Just Right Slightly Sweet Not Sweet
1 2 3 4 5
The Likert scale elicits more information than a simple Yes/No response the analyst
can gauge the degree rather than simply the direction of opinion.
Designing Survey Instruments
201
Rules for Using the Likert Scale

1. Use 5 or 7 gradations of response.
fewer than 5 yields too little information
more than 7 creates too much difficulty for respondents in distinguishing one
response from another
2. Always include a mid-point (or neutral) response.
3. When appropriate, include a separate response for Not applicable, or Dont
know.
4. When possible, include a descriptor with each response rather than simply a single
descriptor on each end of the scale.

Example:
Very Bad Bad Neutral Good Very Good
1 2 3 4 5
Yes
No
Very Bad Good
1 2 3 4 5
The presence of the lone words at the ends of the scale will introduce a bias by
causing subjects to shun the center of the scale.
202

5. Use the same words and (where possible) the same number of words for each
descriptor.

Example:
1 2 3 4 5
Yes
No
Bad Poor OK Better Best
1 2 3 4 5
When using different words for different descriptors, subjects may perceive varying
quantities of difference between points on the scale.

For example, subjects may perceive that the difference between Bad and Poor
is less than the difference between Poor and OK.
203

6. Avoid using zero as an endpoint on the scale.

Example:
1 2 3 4 5
Yes
No
On average, subjects will associate the number zero with bad. Thus, using zero at
the endpoint of the scale can bias subjects away from the side of the scale with the
zero.
0 1 2 3 4
204

7. Avoid using unbalanced negative numbers.

Example:
-2 -1 0 1 2
Yes
No
Subjects associate negative numbers with bad. If you have more negative
numbers on one side of the scale than the other, subjects will be biased away from
that side of the scale.
-3 -2 -1 0 1
205

8. Keep the descriptors balanced.

Example:
Yes
No
Subjects will be biased toward the side with more descriptors.
1 2 3 4 5
Very Bad Bad Slightly Good Good Very Good
1 2 3 4 5
206

9. Arrange the scale so as to maintain (1) symmetry around the neutral point, and (2)
consistency in the intervals between points.

Example:
Yes
No
1 2 3 4 5
1 2 3 4 5
No
1 2 3 4 5
In the second example, subjects perceive the difference between Neutral and
Very Bad to be greater than the difference between Neutral and Very Good.
Responses will be biased toward the right side of the scale.

In the third example, subjects perceive the difference between Very Bad and
Bad to be greater than the difference between Bad and Neutral.
Responses will be biased toward the center of the scale.
207

10. Use multi-item scales for ill-defined constructs.

Example:
Yes
No
I liked the product.
Strongly Agree Agree Neutral Disagree Strongly Disagree
1 2 3 4 5

I am satisfied with the product.
1 2 3 4 5

I believe that this is a good product.
1 2 3 4 5
1 2 3 4 5
208


Ill-defined constructs may be interpreted differently by different people. Use the
multi-item scale (usually three items) and then average the items to obtain a single
response for the ill-defined construct.

Example:
The ill-defined construct is Product satisfaction

We construct three questions, each of which touch of the idea of product
satisfaction. A subject gives the following responses:

I liked the product. 4
I am satisfied with the product. 4
I believe that this is a good product. 3

Average response for Product satisfaction is 3.67
209


Be careful that the multi-item scales all measure the same ill-defined construct.
Yes
No
I believe that this is a good product.
I will purchase the product.
The statement I will purchase the product includes the consideration of price
which the other two questions do not.
210

11. Occasionally, it is useful to verify that the subjects are giving considered (as
opposed to random) answers. To do this, ask the same question more than once at
different points in the survey. Look at the variance of the responses across the
multiple instances of the question. If the subject is giving considered answers, the
variance should be small.
211

12. Avoid self-referential questions.
Yes How do you perceive that others around you feel right now?
No How do you feel right now?
Self-referential questions elicit bias because they encourage the respondent to
answer subsequent questions consistently with the self-referential question.

Example:
If we ask the subject how he feels and he responds positively, then his subsequent
answers will be biased in a positive direction. The subject will, unconsciously,
attempt to behave consistently with his reported feelings.

Exception:
You can ask a self-referential question if it is the last question in the survey. As long
as the subject does not go back and change previous answers, there is no
opportunity for the self-reference to bias the subjects responses.
212
Example:

We want to test the effect of relevant news on purchase decisions. Specifically, we
want to know if the presence of positive news about a low-cost product increases the
probability of consumers purchasing that product.
Causal Design:

We will expose two subjects to news announcements about aspirin. The control group
will see a neutral announcement that says nothing about the performance of aspirin.
The experimental group will see a positive announcement that says that aspirin has
positive health benefits.

After exposure to the announcements, we will ask each group to rate their attitudes
toward aspirin. Our hypothesis is that there is no difference in the average attitudes
toward aspirin between the two groups.

To account for possible preconceptions about aspirin, before we show the subjects
the news announcements, we will ask how frequently they take aspirin. To account
for possible gender effects, we will also ask subjects to report their genders.
213
How often do you take aspirin?

Infrequently Occasionally Frequently
1 2 3 4 5 6 7

Please identify your gender (M/F).
All subjects are first asked to respond to
these questions.
214
Subjects in the control
group see this news
announcement. The
analyst reads the
headline and the
introductory paragraph.
Please rate your attitude toward aspirin.
Unfavorable Neutral Favorable
1 2 3 4 5 6 7
Subjects in the control group
are then asked to answer this
question.
215
Subjects in the
experimental group see
this news announcement.
The analyst reads the
headline and the
introductory paragraph.
Please rate your attitude toward aspirin.
Unfavorable Neutral Favorable
1 2 3 4 5 6 7
Subjects in the experimental
group are then asked to
answer this question.
216
Results:
Results for an actual experiment are shown below.
The data is in Data Set #3.
Test the following hypotheses:
0 control baseline
a control baseline
0 control baseline
a control baseline
H :
H :
H :
H :

o o
o o
=
=
=
=
Rejecting the null in the first
set of hypotheses would
indicate that the news did have
an impact on subjects attitudes
toward aspirin.

Rejecting the null in the second
set of hypotheses would
indicate that the news had an
impact on the degree of
disparity in subjects attitudes
toward aspirin.
Attitude Use Gender (1=male, 0=female) Group (1=control, 0=baseline)
7 1 1 0
4 2 1 0
5 3 1 0
5 3 0 0
4 4 0 0
6 4 0 0
7 1 1 0
4 4 1 0
5 4 0 0
1 2 0 0
5 4 1 0
5 2 0 1
5 3 1 1
3 1 1 1
2 1 1 1
5 2 0 1
5 2 0 1
4 1 1 1
6 3 1 1
4 1 0 1
6 2 1 1
4 1 0 1
5 4 1 1
217
Note: The survey responses are non-negative (the lowest possible response is 1). This
may suggest that a log-normal transformation is appropriate. However, we are
testing the mean of the observations, therefore, by the Central Limit Theorem,
do not need to perform the log-normal transformation.
218

=
=
0 experimental control
a experimental control
H :
H :
=
=
=
=
=
=
control
control
control
experimental
experimental
experimental
4.82
1.66
11
4.50
1.17
12
x
s
N
x
s
N
p-value = (30.13%)(2) = 60.26%

There is a 60% chance that we would be incorrect
in believing that the news altered the subjects
average attitude toward aspirin.
X
1
bar 4.820 Stdev(X
1
bar - X
2
bar) 0.604
Sx
1
N
1
11 df 17.82
X
2
bar 4.500
Sx
2
1.170
N
2
12
t Distribution
219
0 control baseline
a control baseline
H :
H :
o o
o o
=
=
=
=
=
=
= = =
control
control
experimental
experimental
2 2
control
2 2
baseline
1.66
11
1.17
12
1.66
Test statistic 2.01
1.17
s
N
s
N
s
s
p-value = (13.38%)(2) = 26.76%

There is a 27% chance that we would be
incorrect in believing that the news altered
the disparity in subjects attitudes toward
aspirin.

Conclusion
In market research, we typically use 10%
as the cut-off for determining significance
of results.
Advertising had no significant effect on the
average attitude toward aspirin nor on the
disparity of attitudes toward aspirin.
df in Numerator 10
df in Denominator 11
F Distribution
220
The results appear to indicate that the news announcement had no effect at all on the
subjects. It is possible that the news announcement does not affect people who do not
take aspirin.

Let us filter the data set, removing all subjects who report that they infrequently use
aspirin. Our filtered data set will include only those subjects who responded with at least 2
to the question regarding frequency of use.
Attitude Use Gender (1=male, 0=female) Group (1=control, 0=baseline)
4 2 1 0
5 3 1 0
5 3 0 0
4 4 0 0
6 4 0 0
4 4 1 0
5 4 0 0
1 2 0 0
5 4 1 0
5 2 0 1
5 3 1 1
5 2 0 1
5 2 0 1
6 3 1 1
6 2 1 1
5 4 1 1
Filtered data set
221

=
=
H :
H :
=
=
=
=
=
=
control
control
control
experimental
experimental
experimental
4.33
1.41
9
5.33
0.52
7
x
s
N
x
s
N
p-value = (3.90%)(2) = 7.8%

There is a 8% chance that we would be incorrect in
believing that the news altered the subjects average
attitude toward aspirin.
X
1
bar 4.330 Stdev(X
1
bar - X
2
bar) 0.509
Sx
1
1.410 Test statistic (distributed t) (1.963)
N
1
9 df 10.61
X
2
bar 5.330
Sx
2
0.520
N
2
7
t Distribution
222
o o
o o
=
=
H :
H :
=
=
=
=
= = =
control
control
experimental
experimental
2 2
control
2 2
experimental
1.41
9
0.52
7
1.41
Test statistic 7.35
0.52
s
N
s
N
s
s
p-value = (1.28%)(2) = 3.56%

There is a 4% chance that we would be incorrect
in believing that the news altered the disparity in
subjects attitudes toward aspirin.
df in Numerator 8
df in Denominator 6
Critical Value ##########
F Distribution
223
The results using the filtered data appear to indicate that, for subjects who report using
aspirin more than infrequently:

1. The news announcement significantly changed (increased) subjects average
attitude toward aspirin.

2. The news announcement significantly changed (decreased) the disparity in
subjects attitudes toward aspirin.
The increase in subjects attitudes toward aspirin is what the aspirin manufacturer would
hope for.

The decrease in disparity of attitudes is an added bonus. This can be interpreted as a
reduction in the uncertainty of the benefit of aspirin.
224
Thus far, we have learned the following statistical techniques

Calculating probabilities using
Marginal probability
Joint probability
Disjoint probability
Conditional probability
Bayes theorem

Estimating probabilities for
Binomial processes
Hypergeometric processes
Poisson processes

Constructing confidence intervals for
Single observations
Population means
Population proportions
Population variances

Conducting hypothesis tests for
Population mean
Population proportion
Population variance
Difference in two population means
Difference in two population proportions
Difference in two population variances
A Look Back
225
In regression analysis, we look at how one variable (or a group of variables) can affect
another variable.

We use a technique called ordinary least squares or OLS. The OLS technique looks at a
sample of two (or more) variables and filters out random noise so as to find the underlying
deterministic relationship among the variables.
Example:

A retailer suspects that monthly sales follow unemployment rate announcements with a one-
month lag. When the Bureau of Labor Statistics announces that the unemployment rate is up,
one month later, sales appear to fall. When the BLS announces that the unemployment rate
is down, one month later, sales appear to rise.

The retailer wants to know if this relationship actually exists. If so, the retailer can use BLS
announcements to help predict future sales.
In linear regression analysis, we assume that the relationship between the two variables (in
this example, sales and unemployment rate) is linear and that any deviation from the linear
relationship must be due to noise (i.e. unaccounted randomness in the data).
Regression Analysis
226
Example:

The chart below shows data (see Data Set #4) on sales and the unemployment rate collected
over a 10 month period.
Notice that the relationship (if there is one)
between the unemployment rate and sales is
subject to some randomness.

Over some months (e.g. May to June), an increase
in the previous months unemployment rate
corresponds to a decrease in the current months
sales.

But, over other months (e.g. June to July), an
increase in the previous months unemployment
rate corresponds to an increase in the current
months sales.
Date Montly Sales Unemployment Rate
(current month) (current month) (previous month)
January $257,151 4.5%
February $219,202 4.7%
March $222,187 4.6%
April $267,041 4.4%
May $265,577 4.8%
June $192,566 4.9%
July $197,655 5.0%
August $200,370 4.9%
September $203,730 4.7%
October $181,303 4.8%
Regression Analysis
227
Example:

It is easier to picture the relationship between unemployment and sales if we graph the data.
Since we are hypothesizing that changes in the unemployment rate cause changes in sales,
we put unemployment on the horizontal axis and sales on the vertical axis.
$160,000
$180,000
$200,000
$220,000
$240,000
$260,000
$280,000
4.3% 4.4% 4.5% 4.6% 4.7% 4.8% 4.9% 5.0% 5.1%
Unemployment Rate (previous month)
S
a
l
e
s

(
c
u
r
r
e
n
t

m
o
n
t
h
)
Regression Analysis
228
y = -11,648,868x + 771,670
$160,000
$180,000
$200,000
$220,000
$240,000
$260,000
$280,000
4.3% 4.4% 4.5% 4.6% 4.7% 4.8% 4.9% 5.0% 5.1%
S
a
l
e
s

(
c
u
r
r
e
n
t

m
o
n
t
h
)
Example:

OLS finds the line that most closely fits the data. Because we have assumed that the
relationship is linear, two numbers describe the relationship: (1) the slope, and (2) the
vertical intercept.
Vertical intercept = 771,670
Slope = 11,648,868
^
Sales 771,670 11,648, 868 (unemp rate) =
Regression Analysis
229
y = -11,648,868x + 771,670
$160,000
$180,000
$200,000
$220,000
$240,000
$260,000
$280,000
4.3% 4.4% 4.5% 4.6% 4.7% 4.8% 4.9% 5.0% 5.1%
S
a
l
e
s

(
c
u
r
r
e
n
t

m
o
n
t
h
)
The graph below shows two relationships:
1. The regression model is the scattering of dots and represents the actual data.
2. The estimated (or fitted) regression model is the line and represents the regression
model after random noise has been removed.
1
Sales (unemp rate )
t t t
u o |

= + +
^
1
Sales (unemp rate ) t

t
o |

= +
Regression model
Estimated regression model
True intercept and slope
Noise (also called error term)
Unemp rate of 4.5%
is observed with sales of $257,151
After eliminating noise, we estimate that sales should
have been 771,670 (11,648,868)(0.045) = $247,471
^
Estimated noise associated with this observation
Sales Sales $257,151 $247, 471 $9, 680
t
u = = = =
^
Sales 771,670 11,648, 868 (unemp rate) =
Estimated intercept, slope, and sales
after estimating and removing noise
Regression Analysis
230
Terminology:

Variables on the right hand side of the regression equation are called exogenous, or
explanatory, or independent variables. They usually represent variables that are assumed to
influence the left hand side variable.

The variable on the left hand side of the regression equation is called the endogenous, or
outcome, or dependent variable. The dependent variable is the variable whose behavior you
are interested in analyzing.

The intercept and slopes of the regression model are called parameters. The intercept and
slopes of the estimated (or fitted) regression model are called estimated parameters.

The noise term in the regression model is called the error or noise. The estimated error is
called the residual, or estimated error.
Y X u o | = + +
u Y Y =

Y X o | = +
Regression model Fitted (estimated) model
Explanatory variable
Outcome variable
Parameters
Parameter estimates
Fitted (estimated)
outcome variable
Residual (estimated error)
Error (noise)
Regression Analysis
231
y = -11,648,868x + 771,670
$160,000
$180,000
$200,000
$220,000
$240,000
$260,000
$280,000
4.3% 4.4% 4.5% 4.6% 4.7% 4.8% 4.9% 5.0% 5.1%
S
a
l
e
s

(
c
u
r
r
e
n
t

m
o
n
t
h
)
OLS estimates the regression model parameters by selecting parameter values that minimize
the variance of the residuals.
= Residual difference between actual and fitted values of the outcome variable.
Regression Analysis
232
y = -11,648,868x + 771,670
$160,000
$180,000
$200,000
$220,000
$240,000
$260,000
$280,000
4.3% 4.4% 4.5% 4.6% 4.7% 4.8% 4.9% 5.0% 5.1%
S
a
l
e
s

(
c
u
r
r
e
n
t

m
o
n
t
h
)
OLS estimates the regression model parameters by selecting parameter values that minimize
the variance of the residuals.
= Residual difference between actual and fitted values of the outcome variable.
Choosing different
parameter values moves
the estimated regression
line away (on average)
from the data points. This
results in increased
variance in the residuals.
Regression Analysis
233
To perform regression in Excel: (1) Select TOOLS, then DATA ANALYSIS
(2) Select REGRESSION
Regression Analysis
234
To perform regression in Excel: (3) Enter the range of cells containing outcome (Y) and
explanatory (X) variables
(4) Enter a range of cells for the output
Constant is zero
Check this box to force the vertical intercept to be
zero.

Confidence level
Excel automatically reports 95% confidence intervals.
Check this box and enter a level of confidence if you
want a different confidence interval.

Residuals
Check this box if you want Excel to report the
residuals.

Standardized residuals
Check this box if you want Excel to report the
residuals in terms of standard deviations from the
mean.
Regression Analysis
235
Regression results
Vertical intercept
estimate
Slope estimate
Standard deviation of vertical intercept estimate
Standard deviation of slope estimate
Test statistic and p-value for H
0
: parameter = 0
95% confidence interval around parameter estimate
Regression Analysis
236
The properties of a regression parameter estimates:
Population parameter
Standard deviation of varies depending on the regression mode
is distributed , where = number of parameters in the regression model

N k
t k
|
|
|

=
If we select a different sample of observations from a population and then perform OLS, we
will obtain slightly different parameter estimates.

Thus, regression parameter estimates are random variables.
Let be a regression parameter estimate. |

Distribution of Regression Parameter Estimates
237
Regression demo
Enter population values here.
Spreadsheet selects a sample from
the population and calculates
parameter estimates based on the
sample.
Press F9 to select a new sample.
Distribution of Regression Parameter Estimates
238
Example:

Proponents of trade restrictions claim that free trade costs American jobs because of foreign
competition. Free trade advocates claim that free trade creates American jobs because of
foreign demand for American products.

Using regression analysis, test the hypothesis that higher levels of unemployment accompany
lower levels of trade restrictions.
Regression Analysis
239
1. State the regression model.
0 1
Unemp Rate (Freedom of Trade )
t t t
u | | = + +
Problem: We dont have a measure for freedom of trade.
Solution: Greater trade freedom results in more trade, so use total trade as a proxy for
freedom of trade.
Problem: Because the economy grows over time, we would expect total trade to grow
over time also.
Solution: Instead of looking at total trade, look at trade as a percentage of GDP. This
measure tells us what percentage of total economic activity is devoted to trade.
0 1
Unemp Rate (Total Trade )
t t t
u | | = + +
0 1
Total Trade
Unemp Rate
GDP
t t
t
u | |
| |
= + +
|
\ .
Regression Analysis
240
2. Collect the data.
Data Set #5 contains the following information (for the U.S., 1/92 through 3/03):

1. Unemployment rate
2. Volume of Exports
3. Volume of Imports
4. Gross domestic product (GDP)

Calculate total trade as a % of GDP
0 1
Total Trade
Unemp Rate
GDP
t t
t
u | |
| |
= + +
|
\ .
Regression model
Regression Analysis
241
3. State the hypotheses.
Our hypothesis is: Higher levels of unemployment accompany lower levels of trade
restrictions.

The explanatory variable we are using is a proxy for freedom of trade, not trade restrictions.

Restating in terms of freedom of trade, our hypothesis becomes: Higher levels of
unemployment accompany higher levels of freedom of trade.
0 1
Total Trade
Unemp Rate
GDP
t t
t
u | |
| |
= + +
|
\ .
Regression model
In statistical notation, the hypotheses are:
0 1
a 1
H : 0
H : 0
|
|
>
<
Regression Analysis
242
4. Estimate the regression parameters using OLS.
0 1
Total Trade
Unemp Rate
GDP
t t
t
u | |
| |
= + +
|
\ .
Regression model
0 1
a 1
H : 0
H : 0
|
|
>
<
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.895688128
R Square 0.802257223
Adjusted R Square 0.800770435
Standard Error 0.004771175
Observations 135
ANOVA
df SS MS F Significance F
Regression 1 0.012283307 0.012283307 539.5909374 1.19383E-48
Residual 133 0.003027626 2.27641E-05
Total 134 0.015310933
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 0.190194429 0.00586219 32.444264 4.82099E-65 0.178599252 0.201789605 0.178599252 0.201789605
X Variable 1 -7.205346804 0.310186266 -23.22909678 1.19383E-48 -7.818882827 -6.591810782 -7.818882827 -6.591810782
0
|
1
|
0
s
|
1
s
|
Regression Analysis
243
5. Construct the test statistic.
0 1
Total Trade
Unemp Rate
GDP
t t
t
u | |
| |
= + +
|
\ .
Regression model
0 1
a 1
H : 0
H : 0
|
|
>
<
1
1 1
Test value hypothesized value 7.205 0

standard deviation 0.310 s
|
| |
= = = =
Regression Analysis
244
0 1
Total Trade
Unemp Rate
GDP
t t
t
u | |
| |
= + +
|
\ .
Regression model
0 1
a 1
H : 0
H : 0
|
|
>
<
t
133
Regression Analysis
245
7. Insert the test statistic and find the area of the alternative tail (p-value approach).
0 1
Total Trade
Unemp Rate
GDP
t t
t
u | |
| |
= + +
|
\ .
Regression model
0 1
a 1
H : 0
H : 0
|
|
>
<
-23.23
p-value = 0.00%
The probability of our being wrong in
believing that higher levels of unemployment
are associated with lower levels of free trade
is virtually 0%.
t Distribution
t
133
Regression Analysis
246
7. Insert the test statistic and find the area of the alternative tail (p-value approach).
0 1
Total Trade
Unemp Rate
GDP
t t
t
u | |
| |
= + +
|
\ .
Regression model
0 1
a 1
H : 0
H : 0
|
|
>
<
SUMMARY OUTPUT
R Square 0.802257223
Observations 135
ANOVA
Regression 1 0.012283307 0.012283307 539.5909374 1.19383E-48
Residual 133 0.003027626 2.27641E-05
Total 134 0.015310933
Intercept 0.190194429 0.00586219 32.444264 4.82099E-65 0.178599252 0.201789605 0.178599252 0.201789605
X Variable 1 -7.205346804 0.310186266 -23.22909678 1.19383E-48 -7.818882827 -6.591810782 -7.818882827 -6.591810782
Note: The test statistic and p-value (for two-tailed test)
are given in the output.
Regression Analysis
247
8. Check results by looking at a graph of the data.
0 1
Total Trade
Unemp Rate
GDP
t t
t
u | |
| |
= + +
|
\ .
Regression model
0 1
a 1
H : 0
H : 0
|
|
>
<
January 1992 - March 2003
0%
1%
2%
3%
4%
5%
6%
7%
8%
9%
1.5% 1.6% 1.7% 1.8% 1.9% 2.0% 2.1% 2.2% 2.3%
Trade as %of GDP
U
n
e
m
p
l
o
y
m
e
n
t

R
a
t
e
Regression Analysis
248
0 1
Total Trade
Unemp Rate
GDP
t t
t
u | |
| |
= + +
|
\ .
Regression model
Our results only indicate that higher levels of free trade are associated with lower
levels of unemployment. The results do not say anything about causality.

Example:

The incidence of alarm clocks going off is strongly associated with the rising of the
sun. However, this does not mean that alarm clocks cause the sun to rise. The
relationship is correlational not causational.

Example:

Could it be that the relationship between free trade and the unemployment rate is
reverse causal?
Perhaps lower levels of unemployment cause higher levels of trade rather than
higher levels of trade causing lower levels of unemployment.
Correlation vs. Causation
249
0 1
Total Trade
Unemp Rate
GDP
t t
t
u | |
| |
= + +
|
\ .
Regression model
One way to check for causality (though, technically, this is not a rigorous test), is to
look for a relationship that spans time.

Example:

If higher levels of free trade causes lower levels of unemployment, then past trade
levels should be negatively related to future unemployment levels.
To run this (quasi) test for causality, let us alter our regression model as follows:
0 1
6
Total Trade
Unemp Rate
GDP
t t
t
u | |
| |
= + +
|
\ .
The unemployment rate today
is a function of trade six months ago.
250
0 1
6
Total Trade
Unemp Rate
GDP
t t
t
u | |
| |
= + +
|
\ .
Regression model
SUMMARY OUTPUT
R Square 0.694872218
Observations 129
ANOVA
Regression 1 0.008797804 0.008797804 289.219064 1.55366E-34
Residual 127 0.003863235 3.04192E-05
Total 128 0.012661039
Intercept 0.168489919 0.006784649 24.83399257 1.39934E-50 0.155064324 0.181915514 0.155064324 0.181915514
X Variable 1 -6.109445472 0.359243017 -17.00644184 1.55366E-34 -6.820322542 -5.398568401 -6.820322542 -5.398568401
0 1
a 1
H : 0
H : 0
|
|
>
<
1
1 1
Test value hypothesized value

Test statistic
standard deviation
6.109 0
17.01
0.359 s
|
| |
=

= = =
Probability of wrongly rejecting the null
hypothesis is (virtually) 0%.
251
0 1
6
Total Trade
Unemp Rate
GDP
t t
t
u | |
| |
= + +
|
\ .
Regression model
Notice that our regression model is expressed in terms of levels. The regression assumes that
the level of the unemployment rate is a function of the level of trade (as a % of GDP).
Another way to test for causality is to look at the relationship between changes instead of
levels of data.

Such a relationship would assume that the change in the unemployment rate is a function of
the change in trade (as a % of GDP).
The level relationship says: When trade is high, unemployment is low.

The change relationship says: When trade increases, unemployment decreases.
252
0 1
6
Total Trade
Unemp Rate
GDP
t t
t
u | |
| |
A = + A +
|
\ .
Regression model
We use capital delta to signify change. By convention, a delta in front of a variable indicates
the change from the previous observation to the current observation.

The regression model shown above assumes that the change in unemployment from time
t 1 to time t is a function of the change in total trade (as a % of GDP) from time t 7 to
time t 6.
Change in the unemployment rate from month t 1 to month t.
253
0 1
6
Total Trade
Unemp Rate
GDP
t t
t
u | |
| |
A = + A +
|
\ .
Regression model
Jan-92 0.073 0.01651
Feb-92 0.074 0.01664 0.073 0.01651 0.001 0.00013
Mar-92 0.074 0.01643 0.074 0.01664 0.000 -0.00020
Apr-92 0.074 0.01649 0.074 0.01643 0.000 0.00005
May-92 0.076 0.01650 0.074 0.01649 0.002 0.00002
Jun-92 0.078 0.01684 0.076 0.01650 0.002 0.00034
Jul-92 0.077 0.01701 0.078 0.01684 -0.001 0.00017
Aug-92 0.076 0.01642 0.077 0.01701 -0.001 -0.00059 0.00013
Sep-92 0.076 0.01673 0.076 0.01642 0.000 0.00032 -0.00020
Oct-92 0.073 0.01697 0.076 0.01673 -0.003 0.00024 0.00005
Nov-92 0.074 0.01671 0.073 0.01697 0.001 -0.00026 0.00002
Dec-92 0.074 0.01674 0.074 0.01671 0.000 0.00002 0.00034
Jan-93 0.073 0.01664 0.074 0.01674 -0.001 -0.00010 0.00017
Feb-93 0.071 0.01648 0.073 0.01664 -0.002 -0.00017 -0.00059
Mar-93 0.07 0.01709 0.071 0.01648 -0.001 0.00062 0.00032
Apr-93 0.071 0.01706 0.070 0.01709 0.001 -0.00003 0.00024
Unemployment
t
Trade
GDP t
1
Unemployment
t
1
Trade
GDP t
Unemployment
t
A
Trade
GDP t
A
6
Trade
GDP t
A Date
When computing changes and taking lags, be extremely careful not to make errors in lining
up the data with the dates. The chart below shows the first few rows of data for Data Set
#4 after the appropriate changes and lags have been made.
Outcome variable Explanatory variable
We must discard these observations
because there are no matching
observations in the explanatory
variable.
254
0 1
6
Total Trade
Unemp Rate
GDP
t t
t
u | |
| |
A = + A +
|
\ .
Regression model
SUMMARY OUTPUT
R Square 0.03256123
Observations 128
ANOVA
Regression 1 8.2764E-06 8.2764E-06 4.240800722 0.041523461
Residual 126 0.000245903 1.95161E-06
Total 127 0.00025418
Intercept -0.000128959 0.00012384 -1.04133302 0.299714994 -0.000374035 0.000116117 -0.000374035 0.000116117
X Variable 1 -0.978057529 0.474941881 -2.059320451 0.041523461 -1.917953036 -0.038162022 -1.917953036 -0.038162022
0 1
a 1
H : 0
H : 0
|
|
>
<
1
1 1
0.978 0
0.4749 s
|
| |
= = =
Probability of being incorrect in
rejecting the null hypothesis is
2.1%.
t Distribution
Warning: This is a two-tailed p-value.
255
0 1
6
Total Trade
Unemp Rate
GDP
t t
t
u | |
| |
A = + A +
|
\ .
Regression model
0 1
a 1
H : 0
H : 0
|
|
>
<
1
1 1
0.978 0
0.4749 s
|
| |
= = =
Probability of being incorrect in
rejecting the null hypothesis is
2.1%.
t Distribution
Conclusion:

The data support the proposition that an increase in trade (as a % of GDP) today is
associated with a decrease in the unemployment rate six months later.
256
Applications of regression analysis:

1. Impact study
2. Prediction
Impact study:

Impact studies are concerned with measuring the impact of explanatory variables on an
outcome variable. Whether or not the resultant regression model adequately predicts the
outcome variable is (for the most part) inconsequential.

Prediction:

Prediction models are concerned with accounting for as many sources of influence on the
outcome variable as possible. The more sources of influence that can be accounted for, the
better the model is able to predict the outcome variable. To what extent the explanatory
variables impact the outcome variable is (for the most part) inconsequential.
Regression Analysis
257
SUMMARY OUTPUT
R Square 0.03256123
Observations 128
ANOVA
Regression 1 8.2764E-06 8.2764E-06 4.240800722 0.041523461
Residual 126 0.000245903 1.95161E-06
Total 127 0.00025418
Intercept -0.000128959 0.00012384 -1.04133302 0.299714994 -0.000374035 0.000116117 -0.000374035 0.000116117
X Variable 1 -0.978057529 0.474941881 -2.059320451 0.041523461 -1.917953036 -0.038162022 -1.917953036 -0.038162022
0 1
6
Total Trade
Unemp Rate
GDP
t t
t
u | |
| |
A = + A +
|
\ .
Regression model
R
2
measures the proportion of variation in the outcome
variable that is accounted for by variations in the
explanatory variables.

Example:

In our regression model, fluctuations in the change in our
trade measure (lagged 6 months) account for 3.3% of
fluctuations in the change in the unemployment rate.
Regression Analysis
258
0 1
6
2
Total Trade
Unemp Rate
GDP
0.033
t t
t
u
R
| |
| |
A = + A +
|
\ .
=
Regression model
If our model accounts for 3.3% of the fluctuations in changes in the unemployment
rate, then the remaining 96.7% of the fluctuations are unaccounted. Remember that the
error term represents all factors that influence changes in unemployment other than
those explicitly appearing in the model.
We have said two (apparently) contradictory things:
1. The slope coefficient is non-zero changes in trade significantly affect changes in
unemployment.
2. The R
2
is small fluctuations in changes in trade only account for 3% of
fluctuations in changes in unemployment.

These two statements are not contradictory because the slope coefficient and the R
2

measure different things.

What the results tell us is that the influence of trade on unemployment is consistent
enough to be detected against the background noise. However, the background noise is
extremely loud.
Regression Analysis
259
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
0 1
0.5
t t t
u
Y X u | |
o
= + +
=
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
0 1
1.0
t t t
u
Y X u | |
o
= + +
=
0
1
2
5.55
0.08
0.72 R
|
|
o
o
=
=
=
0
1
2
11.09
0.16
0.44 R
|
|
o
o
=
=
=
Regression Analysis
260
In multiple regression analysis the OLS technique finds the linear relationship between an
outcome variable and a group of explanatory variables.

As in simple regression analysis, OLS filters out random noise so as to find the underlying
deterministic relationship. OLS also identifies the individual effects of each of the multiple
0 1 t t t
Y X u | | = + +
Simple regression
0 1 1, 2 2, ,
...
t t t m m t t
Y X X X u | | | | = + + + + +
Multiple regression
Multiple Regression Analysis
261
Example:

A trucking company wants to be able to predict the round-trip travel time of its trucks. Data
Set #6 contains historical information on miles traveled, number of deliveries per trip, and
total travel time. Use the information to predict a trucks round-trip travel time.
Approach #1: Calculate Average Time per Mile

Trucks in the data set required a total of 87 hours to
travel a total of 4,000 miles. Dividing hours by miles,
we find an average of 0.02 hours per mile journeyed.

Problem:

This approach ignores a possible fixed effect. For
example, if travel time is measured starting from the
time that out-bound goods begin loading, then there
will be some fixed time (the time it takes to load the
truck) tacked on to all of the trips. For longer trips this
fixed time will be amortized over more miles and will
have less of an impact on the time/mile ratio than for
shorter trips.

This approach also ignores the impact of the number
of deliveries.
Miles Traveled Deliveries Travel Time (hours)
500 4 11.3
250 3 6.8
500 4 10.9
500 2 8.5
250 2 6.2
400 2 8.2
375 3 9.4
325 4 8
450 3 9.6
450 2 8.1
262
Example:

Approach #2: Calculate Average Time per Mile and Average Time per Delivery

Trucks in the data set averaged 87 / 4,000 = 0.02 hours per mile journeyed,
and 87 / 29 = 3 hours per delivery.

Problem:

Like the previous approach, this approach ignores a possible fixed effect.

This approach does account for the impact of both miles and deliveries, but the approach
ignores the possible interaction between miles and deliveries. For example, trucks that travel
more miles likely also make more deliveries. Therefore, when we combine the time/miles
and time/delivery measures, we may be double-counting time.
263
Example:

Approach #3: Regress Time on Miles

The regression model will detect and isolate any fixed effect.

Problem:

The model ignores the impact of the number of deliveries. For example, a 500 mile journey
with 4 deliveries will take longer than a 500 mile journey with 1 delivery.
500 4 11.3
250 3 6.8
500 4 10.9
500 2 8.5
250 2 6.2
400 2 8.2
375 3 9.4
325 4 8
450 3 9.6
450 2 8.1
0 1
Time (miles )
i i i
u | | = + +
264
Example:

Approach #4: Regress Time on Deliveries

The regression model will detect and isolate any fixed effect and will account for the impact
of the number of deliveries.

Problem:

The model ignores the impact of miles traveled. For example, a 500 mile journey with 4
deliveries will take longer than a 200 mile journey with 4 deliveries.
500 4 11.3
250 3 6.8
500 4 10.9
500 2 8.5
250 2 6.2
400 2 8.2
375 3 9.4
325 4 8
450 3 9.6
450 2 8.1
0 1
Time (deliveries )
i i i
u | | = + +
265
Example:

Approach #5: Regress Time on Both Miles and Deliveries

The multiple regression model (1) will detect and isolate any fixed effect, (2) will account
for the impact of the number of deliveries, (3) will account for the impact of miles, and (4)
will eliminate out the overlapping effects of miles and deliveries.
500 4 11.3
250 3 6.8
500 4 10.9
500 2 8.5
250 2 6.2
400 2 8.2
375 3 9.4
325 4 8
450 3 9.6
450 2 8.1
0 1 2
Time (miles ) (deliveries )
i i i i
u | | | = + + +
266
Example:

0 1 2
Regression model:
i i i i
u | | | = + + +
SUMMARY OUTPUT
R Square 0.903788975
Observations 10
ANOVA
Regression 2 21.60055651 10.80027826 32.87836743 0.00027624
Residual 7 2.299443486 0.328491927
Total 9 23.9
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 1.131298533 0.951547725 1.188903619 0.273240329 -1.118752683 3.38134975
X Variable 1 0.01222692 0.001977699 6.182396959 0.000452961 0.007550408 0.016903431
X Variable 2 0.923425367 0.221113461 4.176251251 0.004156622 0.400575489 1.446275244
^
0 1 2
0
1
2
2
Estimated regression model:

1.13 (0.952) [0.2732]
0.01 (0.002) [0.0005]
0.92 (0.221) [0.0042]

0.90
i
i i
R
| | |
|
|
|
= + +
=
=
=
=
Standard deviations of
parameter estimates and p-
values are typically shown in
parentheses and brackets,
respectively, near the
parameter estimates.
267
Example:

Notes on results:
1. Constant is not significantly
different from zero.
2. Slope coefficients are significantly
different from zero.
3. Variation in miles and deliveries,
together, account for 90% of the
variation in time.
^
0 1 2
0
1
2
2

1.13 (0.952) [0.2732]
0.01 (0.002) [0.0005]
0.92 (0.221) [0.0042]

0.90
i
i i
R
| | |
|
|
|
= + +
=
=
=
=
The parameter estimates are measures of
the marginal impact of the explanatory
variables on the outcome variable.

Marginal impact measures the impact of
one explanatory variable after the impacts
of all the other explanatory variables are
filtered out.
Marginal impacts of explanatory variables

0.01 = increase in time given increase of
1 mile traveled.
0.92 = increase in time given increase of
1 delivery.
268
Example:

total travel time. Use the information to predict the round-trip travel time for a truck that is
traveling 600 miles and making 1 delivery.
Approach #1: Prediction based on average time-per-mile
^
^
Time (Average hours per mile)(miles )
Time 0.02(600) 12 hours
i
i
i
=
= =
0
2
4
6
8
10
12
14
16
Approach #1 Approach #2
P
r
e
d
i
c
t
e
d

T
r
a
v
e
l

T
i
m
e

(
h
o
u
r
s
)
Prediction
269
Example:

Approach #2: Prediction based on average time-per-mile and time-per-delivery
^
^
Time (Average hours per mile)(miles ) (Average hours per delivery)(deliveries )
Time 0.02(600) 3(1) 15 hours
i
i i
i
= +
= + =
0
2
4
6
8
10
12
14
16
Approach #1 Approach #2
P
r
e
d
i
c
t
e
d

T
r
a
v
e
l

T
i
m
e

(
h
o
u
r
s
)
Prediction
270
Example:

Approach #3: Prediction based on simple regression of time on miles
^
0 1
^

Time (miles )
Time 3.27 0.01(600) 9.3 hours
i
i
i
| | = +
= + =
0
2
4
6
8
10
12
14
16
Approach #1 Approach #2 Approach #3
P
r
e
d
i
c
t
e
d

T
r
a
v
e
l

T
i
m
e

(
h
o
u
r
s
)
Prediction
271
Example:

Approach #4: Prediction based on simple regression of time on deliveries
^
0 1
^

Time (deliveries )
Time 5.38 1.14(1) 6.5 hours
i
i
i
| | = +
= + =
0
2
4
6
8
10
12
14
16
Approach #1 Approach #2 Approach #3 Approach #4
P
r
e
d
i
c
t
e
d

T
r
a
v
e
l

T
i
m
e

(
h
o
u
r
s
)
Prediction
272
Example:

Approach #5: Prediction based on multiple regression of time on miles and deliveries
^
0 1 2
^

Time 1.13 0.01(600) 0.92(1) 8.1 hours
i
i i
i
| | | = + +
= + + =
0
2
4
6
8
10
12
14
16
Approach #1 Approach #2 Approach #3 Approach #4 Approach #5
P
r
e
d
i
c
t
e
d

T
r
a
v
e
l

T
i
m
e

(
h
o
u
r
s
)
Prediction
273
Example:

Compare the R
2
(goodness of fit) from the three regression models (approaches #3,
#4, and #5)
In approach #3, 66% of the variation in time
is explained. This leaves 34% of the variation
in time unexplained and, therefore,
unpredictable.

In approach #4, only 38% of the variation in
time is explained. This leaves 62% of the
variation in time unexplained and, therefore,
unpredictable.

In approach #5, 90% of the variation in time
is explained. This leaves only 10% of the
variation in time unexplained and
unpredictable.
0 1
2
0 1
2
0 1 2
2
Time (miles )
0.66
Time (deliveries )
0.38
0.90
i i i
i i i
i i i i
u
R
u
R
u
R
| |
| |
| | |
= + +
=
= + +
=
= + + +
=
Approach #3
Approach #4
Approach #5
Prediction and Goodness of Fit
274
Example:

In the table below, we have added a new explanatory variable (called Random see
Data Set #7) that contains randomly derived numbers. Because the numbers are
random, they have no impact on the dependent variable.
Miles Traveled Deliveries Random Travel Time (hours)
500 4 0.087 11.3
250 3 0.002 6.8
500 4 0.794 10.9
500 2 0.910 8.5
250 2 0.606 6.2
400 2 0.239 8.2
375 3 0.265 9.4
325 4 0.842 8
450 3 0.662 9.6
450 2 0.825 8.1
Estimate the following regression model:
0 1 2 3
Time (miles ) (deliveries ) (random )
i i i i i
u | | | | = + + + +
275
Example:

Notice that the goodness of fit measure has increased from 0.904
(in Approach #5) to 0.909. This would seem to indicate that this
model provides a better fit than did Approach #5.

It turns out that, every time you add an explanatory variable, the
R
2
increases. This is because OLS looks for any portion of the
remaining noise that the new variable can explain. At the very
worst, OLS will find no explanatory power to attribute to the new
variable and so the R
2
will not change but adding another
explanatory variable never causes R
2
to fall.
SUMMARY OUTPUT
R Square 0.908753912
Observations 10
ANOVA
Regression 3 21.7192185 7.2397395 19.91874794 0.001603903
Residual 6 2.1807815 0.363463583
Total 9 23.9
Intercept 1.455892534 1.150895704 1.265008226 0.252777599 -1.360249864 4.272034931
X Variable 1 0.012185716 0.002081561 5.85412499 0.001097066 0.007092317 0.017279115
X Variable 2 0.894281428 0.238113014 3.755701594 0.009446164 0.311639445 1.476923411
X Variable 3 -0.394347384 0.690166078 -0.57138042 0.588489374 -2.083124175 1.294429407
276
Example:

To determine whether or not a new explanatory variable adds
anything of substance, we look at the adjusted R
2
. The adjusted
R
2
includes a penalty for adding more explanatory variables.
Approach #5 had an adjusted R
2
of 0.876. When we added the
random explanatory variable, the R
2
dropped to 0.863. This
indicates that the extra explanatory power the new variable adds
does not make up for the loss in degrees of freedom from adding
the variable to the model. Therefore, your model is actually
improved by leaving the new variable out.
SUMMARY OUTPUT
R Square 0.908753912
Observations 10
ANOVA
Regression 3 21.7192185 7.2397395 19.91874794 0.001603903
Residual 6 2.1807815 0.363463583
Total 9 23.9
Intercept 1.455892534 1.150895704 1.265008226 0.252777599 -1.360249864 4.272034931
X Variable 1 0.012185716 0.002081561 5.85412499 0.001097066 0.007092317 0.017279115
X Variable 2 0.894281428 0.238113014 3.755701594 0.009446164 0.311639445 1.476923411
X Variable 3 -0.394347384 0.690166078 -0.57138042 0.588489374 -2.083124175 1.294429407
277
Technical notes on R
2
and adjusted R
2

1. Regardless of the number of explanatory variables, R
2
always measures the
proportion of variation in the outcome variable explained by variations in the
2. You cannot compare R
2
s or adjusted R
2
s from two models that use different
outcome variables.
3. Adjusted R
2
is often written as .
2
R
278
Provided the data you are analyzing is well behaved, the parameter estimates that you
obtain via the OLS procedure have the the following properties:

1. Unbiasedness
2. Consistency
3. Efficiency
Properties of OLS Parameter Estimates
279
obtain via the OLS procedure have the following properties:
1. The parameter estimates are unbiased.
An estimate is unbiased when the expected value of the estimate is equal to the
parameter the estimate intends to measure.

Example:
Consider rolling a die. The population mean of the die rolls is 3.5. Suppose we take
a sample of N rolls of the die. Let X
i
be the i
th
die roll. We then estimate the
population mean via the equation
1
1
Parameter Estimator #1
N
i
i
X
N
=
=

Parameter Estimator #1 is unbiased because, on average, it will equal 3.5.

Suppose we use a different equation, called Parameter Estimator #2, to estimate
the population mean of the die rolls.
1
1
1
N
i
i
X
N
=
=
+

Parameter Estimator #2 is biased because, on average, it will be less than 3.5.
280
2. The parameter estimates are consistent.
An estimate is consistent when the expected difference between the estimate and
the population parameter decreases as the sample size increases.

Example:
1
1
N
i
i
X
N
=
=

Parameter Estimator #1 is unbiased because, on average, it will equal 3.5. It is also
consistent because the estimate comes closer to 3.5 (on average) as N increases.

Similarly, Parameter Estimate #2 is biased but it is consistent. Parameter Estimate
#2 is, on average, less than 3.5, but as the number of observations increases,
Parameter Estimate #2 becomes closer (on average) to 3.5.
1
1
1
N
i
i
X
N
=
=
+

281
2. The parameter estimates are consistent.
An estimate is consistent when the expected difference between the estimate and
the population parameter decreases as the sample size increases.

Example:

Suppose we use a different equation, called Parameter Estimator #3, to estimate
the population mean of the die rolls. For the i
th
die roll:
1 if is odd
6 if is even
i
i
Parameter Estimator #3 is unbiased because, on average, it will equal 3.5. But,

Parameter Estimator #3 is inconsistent because, as the sample size increases, the
parameter estimator does not come closer to the population parameter of 3.5.
282
3. The parameter estimates are efficient.
An estimate is efficient when it has the lowest achievable standard deviation
(among all linear, unbiased estimators).

Example:

Suppose we use Parameter Estimator #4, to estimate the population mean of the
die rolls. Parameter Estimator #4 multiplies the N observations and then takes the
N
th
root of the product.
1
Parameter Estimator #4 0.5
N
N
i
i
X
=
= +
[
Parameter Estimator #4 is unbiased because, on average, it will equal 3.5.

Parameter Estimator #4 is consistent because, as the sample size increases, the
parameter estimator comes closer (on average) to the population parameter of 3.5.

Parameter Estimator #4 is inefficient because the standard deviation of Parameter
Estimator #4 is not the minimum achievable standard deviation. Parameter
Estimator #1 has a lower standard deviation.
283
Summary of properties of OLS parameter estimates (assuming well-behaved data):
Unbiasedness: ( )
Consistency: plim( )
Efficiency: minimum of all linear, unbiased estimators of
E
s
|
| |
| |
|
=
=
=
1
1
Unbiased and consistent
1
( )
(| |) approaches zero as increases
Biased and consistent
1
1
( )
(| |) approaches zero as increases
N
i
i
N
i
i
X X
N
E X
E X N
X X
N
E X
E X N
=
=
=
=
=
+
<
1
Unbiased and inconsistent
1 if is odd
6 if is even
( )
(| |) does not approach zero as increases
Biased and inconsistent
1
3
( )
(| |) does not approach zero as increases
N
i
i
i
X
i
E X
E X N
X X
N
E X
E X N
= +
>
Let be a sample estimator for the population mean, . X

284
What does well behaved mean?

Well behaved is short-hand term meaning The data conform to all the applicable
assumptions.

The full scope of the OLS assumptions are beyond the scope of this course. Some of the
assumptions are:
1. The error term is normally distributed.
2. The error term has a population mean of zero.
3. The error term has a population variance that is constant and finite.
4. Past values of the error term are unrelated to future values of the error term.
5. The underlying relationship between the outcome and explanatory variables is
linear.
6. The explanatory variables are not measured with error.
7. There are no relevant explanatory variables excluded from the regression model.
8. There are no irrelevant explanatory variables included in the regression model.
9. The regression parameters do not change over the sample.
285
We will look at a few of the more egregious violations of the OLS assumptions (called
statistical anomalies).

Statistical anomalies cause OLS parameter estimates to no longer be unbiased,
consistent, and efficient.

Our goal is to:
1. Recognize the impact of the anomalies on the regression results.
2. Test for the presence of statistical anomalies.
3. Correct for the statistical anomalies.
We will cover the anomalies in their (approximate) order of severity.

Note that some of these anomalies are specific to either time-series or cross-sectional
data.

Time-series: Data is indexed by time. The order of the data matters.
Cross-sectional: Data is not indexed by time. The order of the data does not matter.
Statistical Anomalies
286
Non-stationarity (also called unit root) occurs when at least one of the variables in a
time-series model has an infinite population variance.

Example:

Stock prices are non-stationary. If you plot the Dow-Jones Industrial Average (see Data
Set #8), you will see that stock prices follow a trend. Data series that follow trends have
infinite population variances.
Dow Jones Industrial Average
0
2000
4000
6000
8000
10000
12000
1
8
9
6
1
9
0
0
1
9
0
4
1
9
0
8
1
9
1
2
1
9
1
6
1
9
2
0
1
9
2
4
1
9
2
8
1
9
3
2
1
9
3
6
1
9
4
0
1
9
4
4
1
9
4
8
1
9
5
2
1
9
5
6
1
9
6
0
1
9
6
4
1
9
6
8
1
9
7
2
1
9
7
6
1
9
8
0
1
9
8
4
1
9
8
8
1
9
9
2
1
9
9
6
2
0
0
0
Non-Stationarity
287
Standard Deviation 1896 to Indicated Date
0
500
1000
1500
2000
2500
1
9
0
0
1
9
0
4
1
9
0
8
1
9
1
2
1
9
1
6
1
9
2
0
1
9
2
4
1
9
2
8
1
9
3
2
1
9
3
6
1
9
4
0
1
9
4
4
1
9
4
8
1
9
5
2
1
9
5
6
1
9
6
0
1
9
6
4
1
9
6
8
1
9
7
2
1
9
7
6
1
9
8
0
1
9
8
4
1
9
8
8
1
9
9
2
1
9
9
6
2
0
0
0
The chart below shows the standard deviation of the DJIA from 1896 to the indicated
date. Because the DJIA follows a trend, the standard deviation increases over time. This
means that the population standard deviation is infinite.
Non-Stationarity
288
Implications of non-stationarity:

1. Parameter estimates are biased and inconsistent.
2. Standard deviations of parameter estimates are biased and inconsistent.
3. R
2
measure is biased and inconsistent.
4. These results hold for all parameter estimates, regardless of which variable(s) is
(are) non-stationary.
The implications indicate that, in the presence of non-stationarity, none of the OLS
results are useful to us. This makes non-stationarity one of the most severe of the
statistical anomalies.
Non-Stationarity
289
Example of the implications of non-stationarity:

Using Data Set #8, estimate the following regression model.
0 1 1
DJIA (DJIA )
t t t
u | |

= + +
You should get the results shown below. Note that the results seem too good to be true.

1. The R
2
measure is very close to 1.
2. Some of the p-values are exceptionally close to zero.
SUMMARY OUTPUT
R Square 0.970470193
Observations 105
ANOVA
Regression 1 522302143.5 522302143.5 3385.001074 1.31605E-80
Residual 103 15892792.83 154298.9595
Total 104 538194936.4
Intercept 25.36791839 42.79835355 0.592731175 0.554660204 -59.51244428 110.2482811
X Variable 1 1.067973529 0.018356128 58.18076206 1.31605E-80 1.031568512 1.104378547
1. The model explains virtually all of the
variation in the DJIA.
2. The probability of the slope coefficient
equaling zero is about the same as the
probability of six killer asteroids all
hitting the Earth within the next 60
seconds.
Non-Stationarity
290
Example of the implications of non-stationarity:

Using Data Set #8, estimate the following regression model.
0 1 1
DJIA (DJIA )
t t t
u | |

= + +
To see the impact of non-stationarity, split the data set into three parts:

1. 1897 through 1931
2. 1897 through 1966
3. 1897 through 2002
Estimate the regression model for each of the three subsets and compare the results.
Non-Stationarity
291
SUMMARY OUTPUT
R Square 0.69794424
Observations 35
Coefficients Standard Error t Stat P-value
Intercept 19.81139741 10.86212922 1.823896311 0.077235343
X Variable 1 0.826611264 0.094662408 8.732201973 4.30227E-10
SUMMARY OUTPUT
R Square 0.9578567
Observations 70
Intercept 1.743586516 7.660180603 0.227616894 0.820627082
X Variable 1 1.051148157 0.026737665 39.31338637 1.71371E-48
SUMMARY OUTPUT
R Square 0.970470193
Observations 105
Intercept 25.36791839 42.79835355 0.592731175 0.554660204
X Variable 1 1.067973529 0.018356128 58.18076206 1.31605E-80
1896 through 1931
1896 through 1966
1896 through 2002
As we add observations, R
2
is
approaching one, and
uncertainty is approaching zero.
Non-Stationarity
292
The implication is that, eventually, we will be able to predict perfectly next years DJIA
given this years DJIA with absolute certainty.

We call these results spurious because they appear reasonable but are really the result
of a statistical anomaly, not an underlying statistical relationship.
Non-Stationarity
293
Detecting non-stationarity:

1. For each variable in the model: (a) regress the variable on itself lagged one period,
(b) regress the variable on a constant term and itself lagged one period, and (c)
regress the variable on a constant term, a time trend, and itself lagged one period.
2. Test the null hypothesis that the absolute value of the coefficient on the lagged
variable is greater than or equal to 1. If the slope coefficient greater than or equal
to one (in absolute value) for any of the three tests, then the variable is non-
stationary.

Note: This is only an approximate test. Because this test assumes non-stationarity, the
test statistic is not t-distributed, but tau-distributed. As the tau-distribution is
beyond the scope of this course, you can use the t-distribution as an
approximation.

Note also that the tails on the tau-distribution are fatter than the tails on the t-
distribution. Therefore, if you fail to reject the null (in step 2 above) using the
t-distribution, then you would also fail to reject the null using the tau-distribution.
Non-Stationarity
294
Example:

Test the DJIA for non-stationarity.
SUMMARY OUTPUT
R Square 0.970369467
Observations 105
ANOVA
Regression 1 522247933.7 522247933.7 3405.893011 9.67537E-81
Residual 104 15947002.72 153336.5646
Total 105 538194936.4
Intercept 0 #N/A #N/A #N/A #N/A #N/A
X Variable 1 1.072811655 0.016390124 65.45476184 2.41925E-86 1.040309466 1.105313844
1 1
DJIA (DJIA )
t t t
u |

= +
A test of the null hypothesis that the slope is greater than or equal to one yields a test
statistic of 4.439 and a p-value of (virtually) zero. We therefore conclude that the DJIA is
non-stationary.

Because the DJIA is non-stationary, any regression including the DJIA contains biased
and inconsistent results.
Non-Stationarity
295
Correcting for non-stationarity:

1. Remove the trend from the non-stationary variable by: (a) taking the first
difference, (b) taking the natural log, (c) taking the percentage change, or (d)
taking the second difference.
2. Test the transformed version of the variable to verify that the transformed variable
is now stationary.
3. Re-run the regression using the transformed version of the variable.

Note: If you have a model in which one of the variables is non-stationary and another is
not, you need only perform this transformation on the non-stationary variable.
However, often it is easier to interpret the results if you perform the same
transformation on all the variables in the model.
Non-Stationarity
296
Correct the DJIA model for non-stationarity:

1. Transform the DJIA into the growth rate in the DJIA. The transformation is:

2. Test the growth rate in the DJIA to verify that the non-stationarity has been
removed (test 1 of 3 regress dependent on lagged dependent)
1
1
DJIA DJIA
GDJIA
DJIA
t t
t
t
=
1 1
GDJIA (GDJIA )
t t t
u |

= +
SUMMARY OUTPUT
Multiple R 65535
R Square -0.12944963
Adjusted R Square -0.13915837
Observations 104
ANOVA
Regression 1 -0.660786781 -0.66078678 -11.80514097 #NUM!
Residual 103 5.765372783 0.055974493
Total 104 5.104586002
Intercept 0 #N/A #N/A #N/A #N/A #N/A
X Variable 1 0.001742434 0.098580251 0.017675287 0.985932089 -0.193768065 0.197252934
A test of the null hypothesis that the slope is
greater than or equal to one yields a test statistic of
10 and a p-value of (virtually) 100%. We therefore
conclude that GDJIA is stationary.
Non-Stationarity
297

removed (test 2 of 3 regress dependent on lagged dependent and constant)
SUMMARY OUTPUT
R Square 0.01648699
Observations 104
ANOVA
Regression 1 0.08415926 0.08415926 1.709863516 0.193942749
Residual 102 5.020426742 0.04921987
Total 103 5.104586002
Intercept 0.090018858 0.023138826 3.890381371 0.000178584 0.044123129 0.135914587
X Variable 1 -0.128568196 0.098322481 -1.307617496 0.193942749 -0.323590273 0.06645388
Non-Stationarity
A test of the null hypothesis that the slope is greater
than or equal to one yields a test statistic of 11.5
and a p-value of (virtually) 100%. We therefore
298

removed (test 3 of 3 regress dependent on lagged dependent, constant, and time
trend)
Non-Stationarity
SUMMARY OUTPUT
R Square 0.021932325
Observations 104
ANOVA
Regression 2 0.111955438 0.055977719 1.132418977 0.326309523
Residual 101 4.992630564 0.049431986
Total 103 5.104586002
Intercept 0.06182561 0.044173173 1.399618949 0.164691437 -0.025802071 0.149453291
X Variable 1 -0.134769286 0.098880518 -1.362950854 0.175928971 -0.330921607 0.061383035
X Variable 2 0.000546484 0.000728767 0.749874369 0.455073571 -0.000899194 0.001992162
A test of the null hypothesis that the slope is greater
than or equal to one yields a test statistic of 11.5
and a p-value of (virtually) 100%. We therefore
299
Now that we know that GDJIA is stationary, we can estimate our transformed model:
0 1 1
GDJIA (GDJIA )
t t t
u | |

= + +
SUMMARY OUTPUT
R Square 0.01648699
Observations 104
ANOVA
Regression 1 0.08415926 0.08415926 1.709863516 0.193942749
Residual 102 5.020426742 0.04921987
Total 103 5.104586002
Intercept 0.090018858 0.023138826 3.890381371 0.000178584 0.044123129 0.135914587
X Variable 1 -0.1285682 0.098322481 -1.3076175 0.193942749 -0.323590273 0.06645388
^
1
GDJIA 0.09 0.129(GDJIA ) t
t
=
Our fitted model is:
Non-Stationarity
300
Using the fitted model, predict DJIA for 2003:
^
1
2001
2002
2002 2001
2002
2001
^
2003
2002
^
^
2003
2
2003
GDJIA 0.09 0.129(GDJIA )
DJIA 11005
DJIA 10104
DJIA DJIA 10104 11005
GDJIA 0.082
DJIA 11005
GDJIA 0.09 0.1286(GDJIA ) 0.09 0.1286( 0.082) 0.101
DJIA DJIA
GDJIA
t
t
=
=
=

= = =
= = =
=
^
^
2003
002
2003
2002
DJIA 10104
0.101 DJIA 11125
DJIA 10104
= =
As of today, the DJIA for 2003 is 9,710. This is significantly different from the
prediction of 11,125. Note that the regression model has an R
2
of less than 0.02. This
means that the model fails to explain 98% of the variation in the growth rate of the
DJIA.
Non-Stationarity
301
Using the spurious model, predict DJIA for 2003:
^
1
2002
^
2003
DJIA 25.4 1.07(DJIA )
DJIA 10104
DJIA 25.4 1.07(10104) 10837
t
t
= +
=
= + =
Although the spurious prediction is closer to the actual than was the prediction using
the stationary model, the prediction is extremely far from the actual considering the
(reported) R
2
of 0.97.

Prediction from stationary model: 11,125 (15% overestimated)
Prediction from non-stationary model: 10,837 (12% overestimated)
Actual: 9,710
Note that it is simply random chance that the non-stationary model gave a (slightly)
closer prediction. We would not necessarily expect the non-stationary model to give a
better (or worse) prediction.

What is important is that the non-stationary model mislead us (via the high R
2
and low
standard deviations) into thinking that it would produce good predictions.
Non-Stationarity
302
Non-linearity occurs when the relationship between the outcome and explanatory
variables is non-linear.

Example:

Suppose that the true relationship between two variables is:
2
0 1 i i i
Y X u | | = + +
OLS assumes (incorrectly in this case) that the relationship between the outcome and
explanatory variables is linear. When OLS attempts to find the best fitting linear
relationship, it will end up with something like that shown in the figure below.
-10.0
-5.0
0.0
5.0
10.0
15.0
20.0
25.0
30.0
35.0
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5
X
Y
Non-linear data can cause the fitted model to
be biased in one direction at the extremes and
biased in the other direction in the center.
Non-Linearity
303
Implications of non-linearity:

1. Parameter estimates are biased and inconsistent.
2. Standard deviations of parameter estimates are biased and inconsistent.
3. R
2
measure is biased and inconsistent.
The implications indicate that, in the presence of non-linearity, none of the OLS results
are useful to us. Like non-stationarity, this makes non-linearity one of the most severe of
the statistical anomalies.
Non-Linearity
304
In the regression demo, enter a 2 for the X Exponent. The demo now generates data
according to the model:
2
0 1 i i i
Y X u | | = + +
Repeatedly press F9 and notice that the confidence interval for the slope coefficient does
not include the population value. OLS is producing biased results.
Non-Linearity
305
Example:

As Director of Human Resources, you are charged with generating estimates of the cost
of labor for a firm that is opening an office in Pittsburgh. To estimate the cost of labor,
you need two numbers for each job description: (1) base salary, and (2) benefits.

You are comfortable with your base salary estimates. You need to generate estimates
for benefits. Data Set #9 contains median salary and benefits numbers for a random
sampling of white-collar jobs in the Pittsburgh area.

Using this data, generate a model that can be used to predict the cost of benefits given
base salary.
0 1
Benefits (Salary )
i i i
u | | = + +
Estimate the following model:
Non-Linearity
306
^
Benefits 6332 0.5976(Salary ) i
i
= +
Fitted model:
SUMMARY OUTPUT
R Square 0.820635165
Observations 14
ANOVA
Regression 1 8360260745 8360260745 54.90274589 8.16758E-06
Residual 12 1827288004 152274000.3
Total 13 10187548749
Intercept -6331.559158 7782.474687 -0.813566303 0.431740633 -23288.11456 10624.99625
X Variable 1 0.597573741 0.080648162 7.409638715 8.16758E-06 0.421856495 0.773290987
According to the output, this model accounts for 82% of the variation in benefits.
Example:

You expect someone earning $90,000 base salary to cost
the firm an additional 6332+(0.5976)($90,000) = $47,452
in benefits.
Non-Linearity
307
Now, create a plot of the data and overlay the fitted regression model. Notice that the line appears
to overestimate the center observations and underestimate the end observations.

Warning: The apparent over and under estimation may be due to a few outliers in the data. That is
if you obtained more data, this apparent non-linearity may go away.

So, do we have non-linearity or not?

Can you find a theoretical justification for non-linearity?
$0
$20,000
$40,000
$60,000
$80,000
$100,000
$120,000
$0 $20,000 $40,000 $60,000 $80,000 $100,000 $120,000 $140,000 $160,000 $180,000
Base Salary
B
e
n
e
f
i
t
s
Yes.

The value of most benefits are tied to
pay (e.g. firm contributes 5% of gross
salary to 401k). But, as salary rises, the
number of benefits also increases (e.g.
basic health, retirement, dental, stock
options, car, expense account, use of
corporate jet, ).

Because both the value and the number
of benefits increase with salary, we
should expect a non-linear relationship.
Non-Linearity
308
Example:

What is the form of the non-linearity?

We dont know, but we can try different forms and compare the R
2
.

Note: We can compare the R
2
s from the different models because the outcome variable
is the same in all the models.
0 1
0 1
Salary
0 1
1
0 1
2
0 1
3
0 1
Benefits (Salary )
Benefits ln(Salary )
Benefits e
Benefits (Salary )
Benefits (Salary )
Benefits (Salary )
i
i i i
i i i
i i
i i i
i i i
i i i
u
u
u
u
u
u
| |
| |
| |
| |
| |
| |
= + +
= + +
= + +
= + +
= + +
= + +
Note: In this model, the value of exp(salary) is too
large. Therefore, for this model, we first divide
salary by 100,000, then take the exponential. This
will cause the slope coefficient and the stdev of the
slope coefficient to scale down, but the ratio of the
estimate to the stdev and the other regression
results will not change.
Non-Linearity
309
0 1
0 1
Salary
0 1
1
0 1
2
0 1
3
0 1
Benefits (Salary )
Benefits ln(Salary )
Benefits e
Benefits (Salary )
Benefits (Salary )
Benefits (Salary )
i
i i i
i i i
i i
i i i
i i i
i i i
u
u
u
u
u
u
| |
| |
| |
| |
| |
| |
= + +
= + +
= + +
= + +
= + +
= + +
2
2
2
2
2
2
R 0.821
R 0.750
R 0.844
R 0.622
R 0.843
R 0.828
=
=
=
=
=
=
Model Squared Correlation
The squared and exponential models explain more of the variation in benefits
than does the linear model, and the two yield almost identical R
2
s. Since the
squared model is less complicated, well use that model to predict benefits.
Non-Linearity
310
2
0 1
Benefits (Salary )
i i i
u | | = + +
Regression model:
SUMMARY OUTPUT
R Square 0.842961561
Observations 14
ANOVA
Regression 1 8587712000 8587712000 64.4144123 3.63765E-06
Residual 12 1599836749 133319729.1
Total 13 10187548749
Intercept 14734.7501 4959.966734 2.970735671 0.011685201 3927.911135 25541.58907
X Variable 1 3.34675E-06 4.16996E-07 8.025858976 3.63765E-06 2.43819E-06 4.2553E-06
^
2
Benefits 14735 0.00000347(Salary ) i
i
= +
Non-Linearity
311
Estimate the cost of benefits for the following salaries using both the linear model and
the preferred non-linear model.

Salaries:
$20,000, $40,000, $60,000, $80,000, $100,000.
2
2
2
2
2
14735 0.00000347(20000 ) $16,123
14735 0.00000347(40000 ) $20, 090
14735 0.00000347(60000 ) $26, 783
14735 0.00000347(80000 ) $36,154
14735 0.00000347(100000 ) $48, 202
+ =
+ =
+ =
+ =
+ =
Preferred non-linear model:
6332 0.5976(20000) $5, 620
6332 0.5976(40000) $17,571
6332 0.5976(60000) $29,523
6332 0.5976(80000) $41, 474
6332 0.5976(100000) $53, 426
+ =
+ =
+ =
+ =
+ =
Linear model:
Compared to the preferred non-linear model, the linear model is biased downward at
low salary levels and biased upward at high salary levels.
Non-Linearity
312
Regime change occurs when the parameters change value at one (or more) points in the
data set.

Example:

Conventional wisdom says that (Reagan aside) Democrats contribute to greater deficits
(i.e. smaller surpluses) than do Republicans. Data Set #10 contains relevant macro-
economic data and data on political parties in power from 1929 through 2001. Test the
hypothesis (at 5% significance) that a change in control of the Congress by Democrats
corresponds to a change in Federal government surplus (as a % of GDP).

1. Generate the Federal budget surplus as a % of GDP.

2. State the regression model.

3. Test the hypothesis.
Budget Surplus
(% Congressional Seats Held by Democrats)
GDP
t t
t
u o | = + +
|
|
=
=
0
a
H : 0
H : 0
Regime Change
313
data set.

Example:
SUMMARY OUTPUT
R Square 0.015456638
Observations 73
ANOVA
Regression 1 0.002813412 0.002813412 1.114650028 0.294652679
Residual 71 0.179206277 0.002524032
Total 72 0.182019689
Intercept 0.014202688 0.040708434 0.348888091 0.728205865 -0.066967664 0.095373039
X Variable 1 -0.073210394 0.069343136 -1.055769874 0.294652679 -0.211476747 0.06505596
Budget Surplus
GDP
t t
t
u o | = + +
Fail to reject the null.
Regime Change
|
|
=
=
0
a
H : 0
H : 0
Test statistic
t Distribution
314
data set.

Example:

This analysis ignores the effect of war. If the country is involved in a war, it is forced to
run greater deficits regardless of which party controls the Congress. How can we
account for this war effect?

The war effect is a regime change. We are hypothesizing that, during war, the
regression model changes.

Let us propose two regression models:
1
Budget Surplus
GDP
t t
t
u o | = + +
2
Budget Surplus
GDP
t t
t
u o | = + +
Regression model when there is peace
Regression model when there is war
The models assume different
baseline surpluses.
The models assume the same
marginal surpluses.
Regime Change
315
Example:
If we run two separate regressions, we not only fail to hold the marginal effects constant,
but we lose information from the observations that we have removed from the
regression.
SUMMARY OUTPUT
R Square 0.294948784
Observations 51
Intercept 0.060045099 0.017698364 3.392692117 0.001377543 0.024478927 0.09561127
X Variable 1 -0.136171953 0.030076456 -4.52752662 3.82563E-05 -0.196612817 -0.07573109
SUMMARY OUTPUT
R Square 0.089777609
Observations 22
Intercept -0.325767374 0.197127821 -1.652569243 0.114031776 -0.736968613 0.085433864
X Variable 1 0.47422938 0.337647235 1.404511366 0.175507104 -0.230090084 1.178548844
Estimated peace year model
Estimated war year model
Regime Change
316
Another way to solve the problem is to think of the change in the baseline (the constant
term) as a regime change. In this regime change, the value of the constant term is
different over some subset of the data than it is over other subsets.
Let us define a dummy variable as follows:
1 if year is a war year
0 otherwise
t
t
D

=

Using the dummy variable, we can combine our two models into one (avoiding the
information loss that comes from splitting the data) and hold the marginal effect
constant.
Budget Surplus
GDP
t t t
t
D u o | = + + +
For peace years, is zero. The term disappears, and we are left with our "peace year" model.
For war years, is one. The term becomes , so the constant term is . Therefore,
is the co
t t
t t
D D
D D
o o + +
nstant term for the "war year" model. For both models, the marginal effect, , is the same. |
Regime Change
317
Let us test our hypothesis accounting for a possible regime shift in the constant term
between war and peace years.
0 otherwise
t
t
D

=

Budget Surplus
GDP
t t t
t
D u o | = + + +
SUMMARY OUTPUT
R Square 0.095683876
Observations 73
Intercept 0.022906038 0.039447193 0.580676002 0.56332367 -0.055768843 0.10158092
X Variable 1 -0.030824288 0.012369247 -2.492010132 0.015073295 -0.055493952 -0.006154624
X Variable 2 -0.07220134 0.066932075 -1.07872555 0.284413228 -0.205693045 0.061290366
The estimated regression model is
Budget Surplus
0.023 0.031 0.072(% Democrats)
GDP
t t
t
D =
Regime Change
Fail to reject the null.
Test statistic
t Distribution
|
|
=
=
0
a
H : 0
H : 0
318
Is there a regime change from war to peace years? If there is no regime change, then
the coefficient attached to the dummy variable will be (statistically) zero.
0 otherwise
t
t
D

=

Budget Surplus
GDP
t t t
t
D u o | = + + +
0
a
H : 0
H : 0
=
=
SUMMARY OUTPUT
R Square 0.095683876
Observations 73
Intercept 0.022906038 0.039447193 0.580676002 0.56332367 -0.055768843 0.10158092
X Variable 1 -0.030824288 0.012369247 -2.492010132 0.015073295 -0.055493952 -0.006154624
X Variable 2 -0.07220134 0.066932075 -1.07872555 0.284413228 -0.205693045 0.061290366
Reject the hypothesis that there is no regime change.
Regime Change
Test statistic
t Distribution
319
We designed our regression model to account for a possible regime change in the
constant term. It is possible that there is a regime change in the slope coefficient.

The slope coefficient measures the marginal effect on the budget surplus of increasing
the percentage of the Congress that is controlled by Democrats. It is possible that the
marginal effect of Democrats in Congress changes in war vs. peace years.

Consider the following model:
Budget Surplus
(% Democrats) ( )(% Democrats)
GDP
t t t t
t
D u o | o = + + +
In peace years, D
t
= 0, so the model becomes
Budget Surplus
(% Democrats)
GDP
t t
t
u o | = + +
In war years, D
t
= 1, so the model becomes
Budget Surplus
( )(% Democrats)
GDP
t t
t
u o | o = + + +
Regime Change
320
To test for a regime change in the slope coefficient, we generate a new regressor that is
% Democrats multiplied by the dummy variable. We include this new regressor in our
model.
Budget Surplus
GDP
t t t t
t
D u o | o = + + +
0 otherwise
t
t
D

=

SUMMARY OUTPUT
R Square 0.078535584
Observations 73
Intercept 0.019037674 0.039724529 0.479242286 0.633260099 -0.060190336 0.098265684
X Variable 1 -0.067417636 0.06761427 -0.99709183 0.322154281 -0.202269935 0.067434663
X Variable 2 -0.046776582 0.021368623 -2.189031199 0.031931708 -0.089394921 -0.004158243
Regime Change
|
|
=
=
0
a
H : 0
H : 0
Test statistic
t Distribution
o
o
=
=
0
a
H : 0
H : 0
321
SUMMARY OUTPUT
R Square 0.196485905
Observations 73
Intercept 0.060045099 0.039522283 1.51927203 0.13326319 -0.018799674 0.138889871
X Variable 1 -0.385812473 0.121226874 -3.182565561 0.002189546 -0.627653394 -0.143971552
X Variable 2 -0.136171953 0.067163847 -2.027459116 0.04647992 -0.27016012 -0.002183787
X Variable 3 0.610401333 0.207468917 2.942133892 0.004435278 0.196512296 1.02429037
We can account for both possible regime changes in the baseline (constant term) and
marginal effect (slope) in the same model as follows:
0 otherwise
t
t
D

=

o | o = + + + +
Budget Surplus
GDP
t t t t t
t
D D u
o
o
=
=
0
a
H : 0
H : 0
0
a
H : 0
H : 0
=
=
Regime Change
Reject the null.
Conclusion:
Adjusting for the impact of war on
the budget, evidence suggests that
an increase in Democratically
controlled seats increases the
budget surplus in war and
increases the budget deficit in
peace.
|
|
=
=
0
a
H : 0
H : 0
Test statistic
t Distribution
322
We can split our results into two estimated regression models. We use one to predict the
impact of political party on the budget surplus in war years, and the other to predict the
impact in peace years.
( )
= +
Budget Surplus
0.060 0.386 0.136(% Democrats) 0.610 (% Democrats )
GDP
t t t
t
D D
=
= +
For war years ( 1), the estimated regression model is
Budget Surplus
0.326 0.474(% Democrats )
GDP
t
t
t
D
=
=
For peace years ( 0), the estimated regression model is
Budget Surplus
0.060 0.136(% Democrats )
GDP
t
t
t
D
Regime Change
323
We interpret the slope coefficient as follows:

After accounting for a baseline impact of war, in peace time every 1 percentage point (0.01)
increase in Democrat-controlled seats is associated with a 0.136 percentage point (0.00136)
decrease in the budget surplus (relative to GDP). In war time, every 1 percentage point (0.01)
increase in the budget surplus.

To put the numbers in perspective:

1. There are currently 440 members of Congress.
2. Replacing one Republican with a Democrat increases the number of Democrats by
0.23 percentage points (0.0023).
3. In peace time, we expect this to be associated with a -0.03% [-0.03% = (-0.136)(0.23%)]
change in the surplus (relative to GDP).
4. GDP is currently $12 trillion.
5. So, we expect that every replacement of a Republican with a Democrat will cost the
Federal government (0.03%)($12 trillion) = $3.6 billion.
Regime Change
( )
= +
Budget Surplus
GDP
t t t
t
D D
324
We interpret the slope coefficient as follows:

After accounting for a baseline impact of war, in peace time every 1 percentage point (0.01)
decrease in the budget surplus (relative to GDP). In war time, every 1 percentage point (0.01)
increase in the budget surplus.

To put the numbers in perspective:

1. In war time, the replacement of one Republican with a Democrat is associated with a 0.34%
[0.34% = (-0.136+0.610)(0.23%)] change in the surplus (relative to GDP).
2. GDP is currently $12 trillion.
3. In war, we expect that every replacement of a Republican with a Democrat will save the
Federal government (0.34%)($12 trillion) = $40.6 billion.
Regime Change
( )
= +
Budget Surplus
GDP
t t t
t
D D
325
Implications of regime change:

1. Parameter estimates may be biased and inconsistent.
2. Standard deviations of parameter estimates may be biased and inconsistent.
Unlike the cases of non-stationarity and non-linearity, the R
2
is a reliable estimator.
Therefore, you can compare the adjusted R
2
s in models with and without regime
change corrections to decide whether or not it is necessary to account for a regime
change.
If the data is time-series and the regime shift will not occur again, then parameter
estimates will be biased but consistent. As more data is added, the regime shift is
pushed further into the past and becomes increasingly insignificant.
Regime Change
326
Detecting regime change:

1. Create a dummy variable that is 1 in one state and 0 in the other state.
2. Include the dummy itself as a regressor.
3. For each regressor, X, create a new regressor that is the dummy multiplied by X.
Include all of these new regressors in the regression.
4. Test the hypotheses that the coefficients attached to the dummy and the new
regressors is zero.
5. A parameter estimate that fails the zero test indicates the presence of a regime
shift for that regressor (or the constant term).
Regime Change
327
Correcting for regime change:

1. After determining which regressors (and/or the constant) are subject to regime
changes, include dummies for those regressors (and/or the constant).
2. You can correct for regime change using the level or deviation approach. You can
use different approaches for different regressors (and/or the constant).
3. For the deviation approach: For each regressor, X, associated with a regime
change, generate a new regressor: (D)(X). Include this new regressor in the
regression model.
4. For the level approach: For each regressor, X, associated with a regime change,
generate two new regressors: (D)(X) and (1D)(X). Remove the original regressor
X from the regression and replace it with these two new regressors.
Regime Change
328
An omitted variable is an explanatory regressor that belongs in the regression model (i.e.
the explanatory variable has a significant impact on the outcome variable) but which
does not appear in the regression model.

Example:

Suppose an outcome variable, Y, is determined by two explanatory variables, X and W.
This results in the true regression model:
0 1 2 i i i i
Y X W u | | | = + + +
Suppose we hypothesize a different regression model that excludes W.
0 1 i i i
Y X u | | = + +
When we estimate the hypothesized model, OLS will assign some of the impact that
should have gone to W to the constant term and to X. This will result in the parameter
estimates being biased and inconsistent.
Omitted Variables
329
Example:

Data Set #11 contains voter demographics and the percentage of voters (by voting
district) who claim to have voted for a candidate in the last election. Your goal is to
attempt to use the voter demographics to predict what percentage of the vote your
candidate will garner in other districts.

You hypothesize the following regression model:
0 1
Votes Garnered (Income )
i i i
u | | = + +
SUMMARY OUTPUT
R Square 0.055459687
Observations 25
Intercept 0.471662433 0.115189263 4.094673574 0.000444586 0.233375611 0.709949255
X Variable 1 2.32719E-06 2.00258E-06 1.162096993 0.257113527 -1.81546E-06 6.46984E-06
Looking at the marginal effect:
Every $1,000 increase in average income in a district implies a
projected (1,000)(0.000002) = 0.2% increase in garnered votes.
0 1
a 1
H : 0
H : 0
|
|
=
=
Omitted Variables
330
Suppose that garnered votes are not only a function of average income within a district,
but also the disparity of income across households. Unknown to you, the true regression
model is:

Your hypothesized model excludes Income Disparity therefore your model suffers from
the omitted variable problem. The results below are those you would have obtained had
you included Income Disparity in the model.
0 1 2
Votes Garnered (Income ) (Income Disparity )
i i i i
u | | | = + + +
Looking at the marginal effect:
Every $1,000 increase in average income in a district implies a
projected (1,000)(0.000004) = 0.4% increase in garnered votes.
This is twice the impact that you estimated.
By excluding Income Disparity from your model, you force OLS to
attribute some of the negative impact of Income Disparity to
Income. This causes your estimate of the coefficient on Income to
be biased downward.
SUMMARY OUTPUT
R Square 0.293118866
Observations 25
Intercept 0.448319072 0.102249939 4.384541224 0.000235831 0.23626545 0.660372694
X Variable 1 3.9192E-06 1.86557E-06 2.100806052 0.047339969 5.02412E-08 7.78817E-06
X Variable 2 -6.31882E-06 2.32338E-06 -2.719665179 0.012513327 -1.11372E-05 -1.50042E-06
Omitted Variables
331
Implications of omitted variables:

1. Parameter estimates may be biased and inconsistent.
2. Standard deviations of parameter estimates may be biased and inconsistent.
Unlike the cases of non-stationarity and non-linearity, the R
2
is a reliable estimator.
Therefore, you can compare the adjusted R
2
s in models with and without regime
change corrections to decide whether or not it is necessary to account for a regime
change.
The higher the correlation between the omitted variable and the other variables in the
model, the greater will be the bias and inconsistency. If the omitted variable is not
correlated with one or more of the included variables, then those variables will be
unbiased and consistent.
Omitted Variables
332
Detecting and correcting omitted variables:

1. If you have reason to believe that a given explanatory variable is excluded, include
the variable and test if its coefficient is non-zero.
2. If the coefficient is non-zero, the variable should be included in the regression
model.
Warning: It is possible that, by random chance, a given explanatory variable will pass
the test for a non-zero coefficient when, in fact, the variable does not belong in the
equation. Therefore, you should first have a theoretically justifiable reason why the
variable should be included before considering inclusion.
Omitted Variables
333
An extraneous variable is an explanatory regressor that does not belong in the regression
model but which does appear in the regression model.

Example:

Suppose an outcome variable, Y, is determined by one explanatory variable: X. This
results in the true regression model:
0 1 2 i i i i
Y X W u | | | = + + +
Suppose we hypothesize a different regression model that includes both X and another
variable, W.
0 1 i i i
Y X u | | = + +
When we estimate the hypothesized model, OLS will pick up some (randomly occurring)
relationship between W and Y, and will attribute that relationship to W when, in fact, it
should be attributed to the error term, u. This will result is the parameter estimates being
inefficient.
Extraneous Variables
334
Example:

Applying the following regression model to the data in Data Set #11, we obtain the
results shown below.
0 1 2
Votes Garnered (Income ) (Income Disparity )
i i i i
u | | | = + + +
Intercept 0.448319072 0.102249939 4.384541224 0.000235831 0.23626545 0.660372694
X Variable 1 3.9192E-06 1.86557E-06 2.100806052 0.047339969 5.02412E-08 7.78817E-06
X Variable 2 -6.31882E-06 2.32338E-06 -2.719665179 0.012513327 -1.11372E-05 -1.50042E-06
We can generate a third variable consisting of randomly selected numbers and include
this in the regression. Because this third variable does not impact the outcome variable,
the third variable is extraneous. The results of this regression are shown below.
0 1 2 3
Votes Garnered (Income ) (Income Disparity ) (Random )
i i i i i
u | | | | = + + + +
Intercept 0.448888291 0.105133686 4.269690417 0.000340931 0.230250784 0.667525797
X Variable 1 3.93864E-06 1.94016E-06 2.030056577 0.05520916 -9.61497E-08 7.97342E-06
X Variable 2 -6.33939E-06 2.40567E-06 -2.635186476 0.015473615 -1.13423E-05 -1.33652E-06
X Variable 3 -0.00262222 0.046489209 -0.05640492 0.955552469 -0.099301839 0.094057399
The presence of an extraneous variable
increases the standard errors of the
parameter estimates.
335
Implications of extraneous variables:

1. Parameter estimates are unbiased and consistent.
2. Parameter estimates are inefficient.
Because the implications of extraneous variables are much less onerous than those of
omitted variables, when in doubt as to whether to include a given explanatory variables
in a model, it is usually wise to err on the side of including rather than excluding.
336
Detecting and correcting extraneous variables:

1. If you have reason to believe that a given explanatory variable is extraneous, test
whether the coefficient attached to the variable is (statistically) zero.
2. If the coefficient is zero, the variable should be excluded from the regression
model.
Warning: It is possible that, by random chance, a given explanatory variable will pass
the test for a zero coefficient when, in fact, the variable does belong in the equation.
Therefore, if you have a theoretically justifiable reason for why the variable should be
included in the model, you may want to leave the variable in the model even if its
coefficient is zero. If the variable truly does influence the outcome variable, the
coefficient may come up as non-zero with different sample data.
337
Multicollinearity occurs when two or more of the explanatory variables are correlated.

Example:

Data Set #12 contains clinical trial data for a new blood pressure drug. Using the data,
estimate the following regression model.
0 1 2 3
Blood Pressure (Dosage ) (Reported Stress ) (Daily Caffeine Intake )
i i i i i
u | | | | = + + + +
SUMMARY OUTPUT
R Square 0.455756539
Observations 50
Intercept 111.7045551 10.91943115 10.22988777 1.96572E-13 89.72490125 133.684209
X Variable 1 -0.10538001 0.042907405 -2.455986577 0.01788627 -0.191748054 -0.019011967
X Variable 2 3.549658078 1.950808713 1.81958285 0.075334043 -0.37711244 7.476428595
X Variable 3 1.557211753 1.780563939 0.874560985 0.386356235 -2.026874137 5.141297643
Dosage of the drug appears to have a strongly significant impact
on blood pressure (p = 0.02).
Stress appears to have a slightly significant affect on blood
pressure (p = 0.08).
Caffeine intake appears not to affect blood pressure (p = 0.39).
Multicollinearity
338

Example:

Now estimate the model with Daily Caffeine Intake removed.
0 1 2
Blood Pressure (Dosage ) (Reported Stress )
i i i i
u | | | = + + +
Stress appears to have a remarkably significant affect on blood
pressure (p = 0.00).
The results for the marginal impact of stress on blood pressure
changed dramatically when we dropped Caffeine Intake from the
model.
SUMMARY OUTPUT
R Square 0.446707226
Observations 50
Intercept 115.0315253 10.20971156 11.26687318 5.93644E-15 94.49225435 135.5707963
X Variable 1 -0.105731825 0.042798055 -2.470481988 0.017177119 -0.191830325 -0.019633324
X Variable 2 5.046727412 0.933290929 5.407453621 2.09574E-06 3.169190011 6.924264813
Multicollinearity
339

Example:

Now estimate the model with Daily Caffeine Intake included and Reported Stress
removed.
0 1 3
Blood Pressure (Dosage ) (Daily Caffeine Intake )
i i i i
u | | | = + + +
Caffeine Intake appears to have a remarkably significant affect on
blood pressure (p = 0.00).
The results for the marginal impact of caffeine on blood pressure
changed dramatically when we dropped Reported Stress from the
model.
SUMMARY OUTPUT
R Square 0.41658424
Observations 50
Intercept 110.3315158 11.15791351 9.888185256 4.59794E-13 87.88471037 132.7783213
X Variable 1 -0.107984588 0.043925114 -2.458379208 0.017696906 -0.196350436 -0.019618739
X Variable 2 4.400144179 0.874724943 5.030317487 7.59199E-06 2.640426232 6.159862125
Multicollinearity
340
Example:

The results you are seeing are typical of multicollinearity. It is likely that Caffeine Intake
and Reported Stress are correlated. Because they are correlated, they (at least in part)
reflect the same information. When you include only one (either one) of the regressors in
the model, you get a significant marginal effect. But, when you include both, OLS
attempts to allocate an amount of explanatory that is worthy of only one regressor to
two regressors. As a result, neither of them appear overly significant.
Coefficients Standard Error P-value
0
111.705 10.919 0.000
1 -0.105 0.043 0.018
2
3.550 1.951 0.075
3 1.557 1.781 0.386

0 115.032 10.210 0.000
1
-0.106 0.043 0.017
2 5.047 0.933 0.000

0
110.332 11.158 0.000
1
-0.108 0.044 0.018
3 4.400 0.875 0.000

All regressors included
Reported Stress included
Caffeine Intake excluded
Reported Stress excluded
Caffeine Intake included
Multicollinearity
341
Implications of multicollinearity:

1. Parameter estimates are unbiased and consistent.
2. Parameter estimates are inefficient.
The higher the correlation between the multicollinear regressors, the greater the
inefficiency (i.e. the greater the standard errors associated with the parameter
estimates).

In the extreme case of perfect multicollinearity (one explanatory regressor is an exact
linear function of another), the regression will fail. Either the software will return an
error or the results will show an R
2
of one and standard errors of zero or infinity.
Multicollinearity
342
Detecting multicollinearity:

1. To detect multicollinearity, calculate the Variance Inflation Factor (VIF) for each
explanatory variable.
2. A VIF greater than 4 indicates detectable multicollinearity. A VIF greater than 10
indicates severe multicollinearity.
Correcting multicollinearity:

The correction for multicollinearity often introduces worse anomalies than the
multicollinearity. The correction is to drop from the model the explanatory variable with
the greatest VIF. However, if the offending explanatory variable does affect the outcome
variable, then by dropping the variable you eliminate multicollinearity but create an
omitted variable.

As the implications of the omitted variable anomaly are more onerous than those of
multicollinearity, it is usually desirable to just live with the multicollinearity.

An exception is in the case of severe multicollinearity (a VIF greater than 10). In this
case, the bias and inconsistency caused by omitting the variable may be of less
consequence than the inefficiency caused by the multicollinearity.
Multicollinearity
343
Variance Inflation Factor:

To compute the VIF for explanatory regressor j, regress explanatory variable j on a
constant term and all of the other explanatory regressors.
2
1
VIF
1 R
j
j
=
Multicollinearity
344
Example:

Calculate the VIFs for Dosage, Reported Stress and Caffeine Intake.
Dosage
1
VIF 1.01
1 0.0076
= =
0 2 3
Dosage (Reported Stress ) (Daily Caffeine Intake )
i i i i
u | | | = + + +
Reported Stress
1
VIF 4.38
1 0.7717
= =
0 1 3
Reported Stress (Dosage ) (Daily Caffeine Intake )
i i i i
u | | | = + + +
Daily Caffeine Intake
1
VIF 4.38
1 0.7715
= =
0 1 2
Daily Caffeine Intake (Dosage ) (Reported Stress )
i i i i
u | | | = + + +
Multicollinearity
345
The VIFs indicate that there is detectable multicollinearity for Reported Stress and Daily
Caffeine Intake. However, because the VIFs are well less than 10, we would not drop
either variable from the model.
Dosage
1
VIF 1.01
1 0.0076
= =
Reported Stress
1
VIF 4.38
1 0.7717
= =
Daily Caffeine Intake

1
VIF 4.38
1 0.7715
= =
Multicollinearity
346
Anomaly Properties of OLS Parameter Estimates

Non-stationarity Biased, inconsistent, inefficient

Non-linearity Biased, inconsistent, inefficient

Regime change Biased, (possibly) inconsistent, inefficient

Omitted variables Biased, inconsistent, inefficient

Extraneous variables Unbiased, consistent, inefficient

Multicollinearity Unbiased, consistent, inefficient
Summary of Statistical Anomalies

Marginal Probability

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Marginal Probability

Uploaded by

Copyright:

Available Formats

Copyright 2003. Do not distribute or copy without permission.

There is a 50% chance that the population mean is

Let be a sample standard deviation. s

Test Statistic Distribution

Sales (unemp rate ) t

is distributed , where = number of parameters in the regression model

Let be a regression parameter estimate. |

Test value hypothesized value 7.205 0

Test value hypothesized value

1.13 (0.952) [0.2732]

0.01 (0.002) [0.0005]

0.92 (0.221) [0.0042]

1.13 (0.952) [0.2732]

0.01 (0.002) [0.0005]

0.92 (0.221) [0.0042]

Parameter Estimator #3 is unbiased because, on average, it will equal 3.5. But,

Let be a sample estimator for the population mean, . X

1 -0.105 0.043 0.018

3 1.557 1.781 0.386

0 115.032 10.210 0.000

2 5.047 0.933 0.000

3 4.400 0.875 0.000

Daily Caffeine Intake

You might also like