Professional Documents
Culture Documents
Inferential Statistics:
Hypothesis Testing
Section 5
Dr. Paul Bottomley
Bottomleypa@cardiff.ac.uk (Room F03)
Silver, pp. 206-213.
Statistical Hypotheses
A statistical hypothesis is an assertion, claim or prediction
about a population parameter (e.g., , , ).
Statistical inference: does the sample statistic support such
a claim about the population parameter?
The truth is never known with certainty unless we examine
the whole population > problem of sampling error.
Hypotheses are usually formulated in the hope of rejecting
them! But we need strong empirical evidence to do so.
H 0 : CS1 CS 2
H 1 : CS1 CS 2
A test of any hypothesis where the alternative hypothesis
(H1) is directional (one-sided) is called a one-tailed test.
H 0 : CS1 CS 2 H 0 : CS1 CS 2
H 1 : CS1 CS 2 H 1 : CS1 CS 2
Now for an Old Multiple Choice Question
A motor vehicle breakdown companys records suggest that it takes
on average 45 minutes to reach customers (motorists). The firm
makes many changes in an effort to improve its response time and
conducts a survey to test the success of the changes. Identify the
appropriate pair of null & alternative hypotheses to test this claim.
_ Unbiased estimate
x
The standard deviation of all the possible sample means (SD of
the sampling distribution) is known as the standard error.
s
_ Estimate of true standard error
x n n
Sampling Distribution
Of the Sample Means
(SDSM)
Testing Means for Large Samples
Null hypothesis: population mean
equals K, implies that the SDSM
Standard Normal (Z)
has a mean equal to K (CLT).
Only 5% of all sample means H0
lie beyond 1.96 standard errors Reject Reject
from the hypothesized value (K).
Test: find out how many standard
errors our sample mean (X bar) 2.5% Dont 2.5%
is from K. Reject
_
H0
X
Z -1.96 K +1.96
s n
Decision Rule
If |Z| < 1.96 we retain (accept) Ho as being true.
If |Z| > 1.96 we reject H0 and H1 is supported.
Department Store Shopping
Average monthly spending on department store shopping
of people living in Bath is claimed to be 180. A random
sample of 300 households shows spending on average is
185 with a standard deviation of 55. Use a 5% level of
significance to check the validity of this claim.
Computations:
Department Store Shopping Cont.
H0 _
X Reject Reject
Z(0,1)
35 = 40 -2.33 0
_
X 35 40 5
Z 3.33
s / n 15 / 100 1.5
As the calculated Z value (absolute) is greater than the critical
value, we reject H0. At the 1% significance level, we can conclude
that average journey times to and from the city have decreased.
When to Retain H0
5% Significance Level Rule
Decision Rule:
If the chance of the result occurring under H0 is < 5%, reject H0.
If the chance of our result occurring under H0 is > 5%, retain H0.
Be Careful:
Rejection of H0 never proves H1 is truebut it does signify
support for H1.
Retention of H0 never proves H0 is trueour tests are only
capable of disproving (but not confirming) a hypothesis.
In Practice:
Computers report significance levels () YOU decide whether
to retain or reject H0.
Recap: Is my correlation a real effect?
Value of Cumulative % of values
r Frequency Frequency below r
-1 0 0 0
Q: Is there a correlation between room temp. and
-0.99
-0.97
0
2
0
2
0
0.0002
productivity? Take a small sample (n=7), r = -0.81
-0.95 2 4 0.0004 But, could a value of r as big as this have
-0.93 5 9 0.0010
-0.91 4 13 0.0015
occurred by chance? How well could chance do?
-0.89 11 24 0.0027 The histogram shows values of r calculated from
-0.87 14 38 0.0042
-0.85 10 48 0.0053
random 8973 7-point scatter plots (dartboards).
-0.83 14 62 0.0069
-0.81 17 79 0.0088
Very few will Most will look Very few will
-0.79 14 93 0.0104
-0.77 32 125 0.0139 look like this like this look like this
-0.75 37 162 0.0181
12 10 14
-0.73 38 200 0.0223 10
9 12
8
10
-0.71 42 242 0.0270 8 7
6 8
6 5
-0.69 38 280 0.0312 4
4
3
6
4
2
-0.67 40 320 0.0357 2
1
2
0
0 0
0 2 4 6 8
0 2 4 6 8 0 2 4 6 8
09
19
29
39
49
59
69
79
89
99
-1
1
0.95 3 8971 0.9998
.9
.8
.7
.6
.5
.4
.3
.2
.1
.0
0.
0.
0.
0.
0.
0.
0.
0.
0.
0.
-0
-0
-0
-0
-0
-0
-0
-0
-0
-0
0.97 2 8973 1
0.99 0 8973 1
1.00 0 8973 1
Recap: Hypothesis Test for Correlations
Null: no correlation between room temperature and production (r = 0); Alt: r 0.
Step1: Decide on a value for the significance
Usually 0.05 (5%) for social science data (industry standard)
Step 2: Is it a 1- or 2-tail test?
Could correl be either +ve or ve (2-tail), only +ve (1-tail) or only ve (1-tail)
Calculate r
Step 3: Work out degrees of freedom (df) ( n-2 for a correlation coefficient )
Two random points will always lie on a straight line (r = +1 or -1 every time)
Step 4: Calculate a critical value for r and identify the critical region
r = -1 r =0 r = +1
r =-0.81 Seven pairs of data points (n = 7)
If r is in critical region, reject null hypothesis
A New Corporate Identity
A design agency develops a new logo for a client. A
random sample of 60 customers views the logo and their
thoughts and opinions are recorded. The logo receives an
average rating of 4.25 with a standard deviation of 0.75 on
a seven-point scale (1 = very bad, 7 = very good). Test
whether the logo scored above the scale midpoint (at the
5% level of significance).
= 4 4.25 0 1.64
_
X 4.25 4 0.25
Z 2.58
s / n 0.75 / 60 0.75 / 7.75
Since the calculated Z value is greater than the critical value,
we reject H0. The logo was favorably received, scoring above
the scale midpoint, at the 5% (and 1%) level of significance.
Conclusions
Be careful selecting the significance level ().
Less than 5% (1%) - difference was (highly) significant.
Is somewhat conservative - favors retaining H0.
But are these statistical differences practically meaningful?