You are on page 1of 62

Two sample Test

A.Ramesh
Department of Management Studies
Indian institute of Technology Roorkee

Two-Sample Tests Overview







The means of two independent populations


The means of two related populations
The proportions of two independent populations
The variances of two independent populations

Two-Sample Tests Overview


Two Sample Tests

Independent
Population
Means

Means,
Related
Populations

Independent
Population
Proportions

Same group
before vs. after
treatment

Proportion 1vs.
Proportion 2

Independent
Population
Variances

Examples
Group 1 vs.
Group 2

Variance 1 vs.
Variance 2

Two-Sample Tests
Independent
Population Means

1 and 2 known
1 and 2 unknown

Goal: Test hypothesis or form


a confidence interval for the
difference between two
population means, 1 2
The point estimate for the
difference between sample
means:
X1 X2

Sampling Distribution of the Difference Between


Two Sample Means
X

Population 1

X1 =

x
n

X X
1

x
n
2

X
Population 2

Sampling Distribution of the Difference


between Two Sample Means

X2

X2

X2

2
1

2
2
2

Z Formula for the Difference in Two Sample


Means

n1 30, n2 30, and Independent Samples

Z=

( X X ) ( )
1

n n
1

Confidence Interval to Estimate 1 - 2 When


n1 and n2 are large and 1, 2 are unknown

(X X ) Z
1

Pr ob[

S
n

2
1
1

S
+
n

2
2
2

( X X ) Z Sn + Sn
1

(X
1

X ) Z
2

S
n

( X X ) Z Sn + Sn
1

2
1
1

S
+
n

2
2
2

] = 1

Two-Sample Tests
Independent Populations
Two Independent Populations, Comparing Means
Lower-tail test:

Upper-tail test:

Two-tail test:

H0: 1 2
H1: 1 < 2

H0: 1 2
H1: 1 > 2

H0: 1 = 2
H1: 1 2

i.e.,

i.e.,

i.e.,

H0: 1 2 0
H1: 1 2 < 0

H0: 1 2 0
H1: 1 2 > 0

H0: 1 2 = 0
H1: 1 2 0

Two-Sample Tests
Independent Populations
Two Independent Populations, Comparing Means
Lower-tail test:
Upper-tail test:
Two-tail test:
H0: 1 2 0
H1: 1 2 < 0

H0: 1 2 0
H1: 1 2 > 0

-z
Reject H0 if Z < -Za

z
Reject H0 if Z > Za

H0: 1 2 = 0
H1: 1 2 0
/2

-z/2

/2

z/2

Reject H0 if Z < -Za/2


or Z > Za/2

Problem 1: Two Sample Z test


A random sample of 32 advertising
managers from across the United States is
taken. The advertising managers are
contacted by telephone and asked what their
annual salary is.
A similar random sample is taken of 34
auditing managers. The resulting salary data
are listed in Table , along with the sample
means, the population standard deviations,
and the population variances.

Hypothesis Testing for Differences Between


Means: The Wage Example
Advertising Managers

Auditing Managers
69.962

77.136

43.649

55.052

66.035

63.369

57.828

54.335

59.676

63.362

42.494

54.449

37.194

83.849

46.394

35.394

99.198

67.160

71.804

74.194

86.741

61.254

37.386

72.401

80.742

65.360

57.351

73.065

59.505

56.470

39.672

73.904

48.036

72.790

67.814

45.652

54.270

60.053

71.351

71.492

93.083

59.045

66.359

58.653

61.261

63.508

74.256

57.791

71.115

96.234

65.145

67.574

89.807

96.767

59.621

93.261

77.242

62.483

103.030

67.056

69.319

74.195

64.276

75.932

n = 32
X = 70.700
S = 16.253
S = 264.164
1

63.384

68.508

n = 34
X = 62.187
S = 12.900
S = 166 .411
2

Hypothesis Testing for Differences Between Means:


The Wage Example
Ho:1 2 = 0
Ha:1 2 0

Rejection
Region

Rejection
Region

= .025

= .025

Non Rejection Region

X2

Critical Values

X X
1

Hypothesis Testing for Differences Between Means:


The Wage Example
If Z < - 1.96 or Z > 1.96, reject Ho.
Rejection
Region

Rejection
Region

= .025

= .025

Non Rejection Region

Z = 1.96
c

0
Critical Values

Z =1.96
c

If - 1.96 Z 1.96, do not reject Ho.

Hypothesis Testing for Differences between


Means: The Wage Example
If Z < -1.96 or Z > 1.96, reject Ho.
Rejection
Region

Rejection
Region

If -1.96 Z 1.96, do not reject Ho.

(X X ) ( )
Z=
1

= . 025

0
Critical Values

.
Z = 233
c

S +S
n n

= .025

Non Rejection Region

Z = 2.33

(70.700 62.187) (0) = 2.35


256.253 166.411
+
32
34

Since Z = 2.35 > 1.96, reject H o.

Problem 2: Two Sample Z test







Greystone Department Stores, Inc., operates two stores in


Buffalo, New York: One is in the inner city and the other is in
a suburban shopping center.
The regional manager noticed that products that sell well in
one store do not always sell well in the other.
The manager believes this situation may be attributable to
differences in customer demographics at the two locations.
Customers may differ in age, education, income, and so on.
Suppose the manager asks us to investigate the difference
between the mean ages of the customers who shop at the two
stores.

Data






1 = 10 and 2 = 10
= .05
n1 = 30
n2 = 40
X1 bar = 82 and x2 bar= 78.

Solution


The margin of error is 4.06 years and the 95%


confidence interval estimate of the difference
between the two population means is 5 - 4.06= .94
years to 5 - 4.06 = 9.06 years.
Do not reject Ho.

Two-Sample Tests
Independent Populations: 1 and 2 unknown

Independent
Population Means

Assumptions:
 Samples are randomly and
independently drawn

1 and 2 known

 Populations are normally


distributed

1 and 2 unknown

 Population variances are


unknown but assumed equal

Two-Sample Tests
Independent Populations
Independent
Population Means

1 and 2 known
1 and 2 unknown

Forming interval estimates:


 The population variances
are assumed equal, so use
the two sample standard
deviations and pool them to
estimate
 the test statistic is a t value
with (n1 + n2 2) degrees
of freedom

The t Test for Differences in Population Means


Each of the two populations is normally
distributed.
The two samples are independent.
At least one of the samples is small, n < 30.
The values of the population variances are
unknown.
The variances of the two populations are equal.
12 = 22

t Formula to Test the Difference in Means


Assuming 12 = 22

t =

X ) ( )
S (n 1) + S (n 1) 1 + 1
n +n 2
n n

(X

Problem 1: Independent Populations and 1


and 2 unknown and equal
At the Hernandez Manufacturing Company, an application
of this test arises.
New employees are expected to attend a three-day seminar
to learn about the company. At the end of the seminar, they
are tested to measure their knowledge about the company.
The traditional training method has been lecture and a
question-and-answer session. Management decided to
experiment with a different training procedure, which
processes new employees in two days by using DVDs and
having no question-and-answer session.
If this procedure works, it could save the company
thousands of dollars over a period of several years.
However, there is some concern about the effectiveness of
the two-day method, and company managers would like to
know whether there is any difference in the effectiveness
of the two training methods.

Hernandez Manufacturing Company: Test


Scores for New Employees After Training
Training Method A

Training Method B

56

51

45

59

57

53

47

52

43

52

56

65

42

53

52

53

55

53

50

42

48

54

64

57

47

44

44

n = 15
X = 47.73
S = 19.495
1

n = 12
X = 56.5
S = 18.273
2

2
2

Hernandez Manufacturing Company


Ho: 1 2 = 0
Ha: 1 2 0

Rejection
Region

Rejection
Region

.05
=.025
2
2
df = n1 + n2 2 = 15 12 2 = 25

0 .25, 25

= 2.060

If t < - 2.060 or t > 2.060, reject Ho.


If - 2.060 t 2.060, do not reject Ho.

=.025

=.025
Non Rejection Region

.025,25

= 2060
.

0
Critical Values

.025,25

= 2060
.

Hernandez Manufacturing Company


t =

X ) ( )
S (n 1) + S (n 1) 1 + 1
n +n 2
n n

(X

( 47 .73 56 .50 ) 0
(19 .495 )(14 ) + (18 .273 )(11)
15 + 12 2

1
1
+
15 12

= 5.20

If t < -2.060 or t > 2.060, reject Ho.


If -2.060 t 2.060, do not reject Ho.

Since t = -5.20 < -2.060, reject H o .

Confidence Interval to Estimate 1 2 with Small Samples and 12 = 22


S (n
2

(X

) t

) S (n

1 +

n +n
1

w here df =

n +n
1

+
1

Problem 2: Independent Populations and 1


and 2 unknown and equal
You are a financial analyst for a brokerage firm. Is there a
difference in dividend yield between stocks listed on the NYSE
& NASDAQ? You collect the following data:
NYSE NASDAQ (National Association of Securities Dealers

Automated Quotations.)

Number
Sample mean
Sample std dev

21
3.27
1.30

25
2.53
1.16

Assuming both populations are approximately normal with equal


variances, is there a difference in average yield ( = 0.05)?

Solution



H0: 1 - 2 = 0 i.e. (1 = 2)
H1: 1 - 2 0 i.e. (1 2)

Two-Sample Tests
Independent Populations
The test statistic is:

(
X
t=

X 2 (1 2 )
1 1
S +
n1 n2
2
p

(3.27 2.53 ) 0
1
1
1.5021 +

21 25

= 2.040

2
2
2
2
(
)
(
)
(
)
(
)
n

1
S
+
n

1
S
21

1
1.30
+
25

1
1.16
1
2
2
S2 = 1
=
p

(n1 1) + (n2 1)

(21 - 1) + (25 1)

= 1.5021

Two-Sample Tests
Independent Populations







H0: 1 - 2 = 0 i.e. (1 = 2)
H1: 1 - 2 0 i.e. (1 2)
= 0.05
df = 21 + 25 - 2 = 44
Critical Values: t = 2.0154
Test Statistic: 2.040

Reject H0
.025

Reject H0
.025

-2.0154 0 2.0154

2.040

Decision: Reject H0 at = 0.05


Conclusion: There is evidence
of a difference in the means.

Two-Sample Tests: Dependent Samples


Before and After
Measurements on
the same
individual
Studies of twins
Studies of spouses

Individual

Before

After

32

39

11

15

21

35

17

13

30

41

38

39

14

22

Two-Sample Tests
Related Populations
Tests Means of 2 Related Populations
Paired or matched samples
Repeated measures (before/after)
Use difference between paired values:
D = X1 - X2

Assumptions:
Both Populations Are Normally
Distributed

Two-Sample Tests
Related Populations
The ith paired difference is Di ,
where
Di = X1i - X2i
The point estimate for the population mean
paired difference is D :
n

D
D=

i =1

Two-Sample Tests
Related Populations
Suppose the population standard deviation of
the difference scores, D, is known.
The test statistic for the mean difference is a Z
value:
D D
Z=
D
n
Where
D = hypothesized mean difference
D = population standard deviation of differences
n = the sample size (number of pairs)

Two-Sample Tests
Related Populations
If D is unknown, you can estimate the
unknown population standard deviation with a
sample standard deviation:
n

2
(D

D
)
i

SD =

i =1

n 1

Two-Sample Tests
Related Populations
The test statistic for D is now a t statistic:
D D
t=
SD
n
n

(D

Where t has n - 1 d.f.


and SD is:

SD =

D)

i=1

n 1

Two-Sample Tests
Related Populations
Lower-tail test:

Upper-tail test:

Two-tail test:

H0: D 0
H1: D < 0

H0: D 0
H1: D > 0

H0: D = 0
H1: D 0

-t
Reject H0 if t < -ta

t
Reject H0 if t > ta

/2

-t/2

/2

t/2

Reject H0 if t < -ta/2


or t > ta/2

Problem 1: Two-Sample Tests


Related Populations
Assume you send your salespeople to a customer
service training workshop. Has the training made
a difference in the number of complaints? You
collect the following data:
Salesperson

Number of Complaints
Before (1)

After (2)

Difference, Di
(2-1)

C.B.

-2

T.F.

20

-14

M.H.

-1

R.K.

M.O

-4

Two-Sample Tests
Related Populations Example
Salesperson

Number of Complaints
Before (1)

Difference, Di
(2-1)

After (2)

C.B.

-2

T.F.

20

-14

M.H.

-1

R.K.

M.O

-4

D
D=

i =1

SD =

= 4.2

2
(D

D
)
i

= 5.67

n 1

Two-Sample Tests
Related Populations Example
Has the training made a difference in the number
of complaints (at the = 0.01 level)?
H0: D = 0
H1: D 0

Critical Value = 4.604


d.f. = n - 1 = 4

Test Statistic:
D D 4.2 0
t=
=
= 1.66
SD / n 5.67/ 5

Two-Sample Tests
Related Populations Example
Reject

Reject
/2

/2
- 4.604

4.604
- 1.66

Decision: Do not reject H0


(t statistic is not in the reject
region)

Conclusion: There is no
evidence of a significant change
in the number of complaints

Two-Sample Tests
Related Populations
The confidence interval for D (
known) is:

D
DZ
n
Where
n = the sample size (number of pairs in the paired sample)

Two-Sample Tests
Related Populations
The confidence interval for D ( unknown) is:

SD
D t n1
n
n

2
(D

D
)
i

where

SD =

i =1

n 1

Sampling Distribution of Differences


in Sample Proportions
For large sam ples
1.
2.
3.
4.

n
n
n
n

p > 5,
q > 5,
 > 5,
p
q > 5

and
w here q = 1 - p

the difference in sam ple proportions is norm ally distributed w ith

p

=
p 2

p 1 p 2 =

P Q
n
1

and

P Q
n
2

Z Formula for the Difference


in Two Population Proportions
p p ) ( P P )
(
Z =
1

P Q
n
1

P Q
n
2

p = proportion from sam ple 1


p = proportion from sam ple 2
n = size of sam ple 1
n = size of sam ple 2
P = proportion from population 1
P = proportion from population 2
Q = 1- P
Q = 1- P
1

Z Formula to Test the Difference


in Population Proportions
p p ) ( P P )
(
Z=
1

1
1
( P Q ) +
n1 n 2

+ X
X
P =
n +n
p + n p
n
=
n +n
1

Q = 1 P

Two Population Proportions


Hypothesis for Population Proportions
Lower-tail test:

Upper-tail test:

Two-tail test:

H0: 1 2
H1: 1 < 2

H0: 1 2
H1: 1 > 2

H0: 1 = 2
H1: 1 2

i.e.,

i.e.,

i.e.,

H0: 1 2 0
H1: 1 2 < 0

H0: 1 2 0
H1: 1 2 > 0

H0: 1 2 = 0
H1: 1 2 0

Two Population Proportions


Hypothesis for Population Proportions
Lower-tail test:

Upper-tail test:

Two-tail test:

H0: 1 2 0
H1: 1 2 < 0

H0: 1 2 0
H1: 1 2 > 0

H0: 1 2 = 0
H1: 1 2 0

-z
Reject H0 if Z < -Z

z
Reject H0 if Z > Z

/2

-z/2

/2

z/2

Reject H0 if Z < -Z/2


or Z > Z/2

Two Independent Population


Proportions: Example
Is there a significant difference between the
proportion of men and the proportion of
women who will vote Yes on Proposition
A?
In a random sample of 72 men, 36 indicated
they would vote Yes and, in a sample of 50
women, 31 indicated they would vote Yes
Test at the .05 level of significance

Two Independent Population


Proportions: Example
H0: 1 2 = 0 (the two proportions are
equal)
H1: 1 2 0 (there is a significant
difference between proportions)
The sample proportions are:
Men:

p1 = 36/72 = .50

Women:
p2 = 31/50 = .62
 The pooled estimate for the overall proportion is:
p=

X1 + X 2 36 + 31 67
=
=
= .549
n1 + n 2 72 + 50 122

Two Independent Population


Proportions: Example
The test statistic for 1 2 is:
z=

( p1 p2 ) ( 1 2 )
1 1
p (1 p) +
n1 n 2
( .50 .62) ( 0)
1
1
.549 (1 .549) +
72 50

Reject H0

Reject H0

.025

.025

-1.96
-1.31

1.96

= 1.31

Critical Values = 1.96


For = .05

Decision: Do not reject H0


Conclusion: There is no evidence of a
significant difference in proportions who
will vote yes between men and women.

Two Independent Population


Proportions
The confidence interval for 1 2 is:

( p1 p2 ) Z

p1 (1 p1 ) p2 (1 p2 )
+
n1
n2

F Test for Two Population Variances


F =

S
S

2
1
2
2

df numerator = 1 =
df deno

min ator

n 1
= = n 1
2

F Distribution with 1 = 10 and 2 = 8


0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0.00

1.00

2.00

3.00

4.00

5.00

6.00

A Portion of the F Distribution Table


for = 0.025
F

.025,9 ,11

Numerator Degrees of Freedom


1
1 647.79
2
38.51
3
17.44
4
12.22
Denominator
5
10.01
Degrees of Freedom
6
8.81
7
8.07
8
7.57
9
7.21
10
6.94
11
6.72
12
6.55

2
799.48
39.00
16.04
10.65
8.43
7.26
6.54
6.06
5.71
5.46
5.26
5.10

3
864.15
39.17
15.44
9.98
7.76
6.60
5.89
5.42
5.08
4.83
4.63
4.47

4
899.60
39.25
15.10
9.60
7.39
6.23
5.52
5.05
4.72
4.47
4.28
4.12

5
921.83
39.30
14.88
9.36
7.15
5.99
5.29
4.82
4.48
4.24
4.04
3.89

6
937.11
39.33
14.73
9.20
6.98
5.82
5.12
4.65
4.32
4.07
3.88
3.73

7
948.20
39.36
14.62
9.07
6.85
5.70
4.99
4.53
4.20
3.95
3.76
3.61

8
956.64
39.37
14.54
8.98
6.76
5.60
4.90
4.43
4.10
3.85
3.66
3.51

9
963.28
39.39
14.47
8.90
6.68
5.52
4.82
4.36
4.03
3.78
3.59
3.44

Testing Population Variances




Purpose: To determine if two independent


populations have the same variability.

H0: 12 = 22
H1: 12 22

H0: 12 22
H1: 12 < 22

H0: 12 22
H1: 12 > 22

Two-tail test

Lower-tail test

Upper-tail test

Suppose a machine produces metal sheets that are


specified to be 22 milli meters thick.
Because of the machine, the operator, the raw material, the
manufacturing environment, and other factors, there is
variability in the thickness.
Two machines produce these sheets. Operators are
concerned about the consistency of the two machines. To
test consistency, they randomly sample 10 sheets produced
by machine 1 and 12 sheets produced by machine 2.
The thickness measurements of sheets from each machine
are given in the table on the following page. Assume sheet
thickness is normally distributed in the population.
How can we test to determine whether the variance from
each sample comes from the same population variance
(population variances are equal) or from different
population variances (population variances are not equal)?

Sheet Metal Example: Hypothesis Test for


Equality of Two Population Variances
2

= 005
.

Ho:1 = 2
2

n = 10
n = 12

.025,9,11

Ha: 1 2

.975,11,9

F =

S
S

1
359
.
= 028
.

1
2

n
= = n
2

1
.025,9,11

df numerator = 1 =
df deno m in ator

=359
.

1
2

If F<0.28 or F > 3.59, reject Ho.

If 0.28 F 359
. , do reject Ho.

Sheet metal Manufacturer


Rejection Regions

If F<0.28 or F > 3.59, reject H.o


If 0.28 F 359
. , do reject H.o
Non Rejection
Region

.975 ,11, 9

= 0.28

.025 , 9 ,11

= 3.59

Critical Values

Sheet Metal Example


Machine 1

n
S

1
2
1

= 10
= 0.1138

Machine 2

22.3

21.8

22.2

22.0

22.2

22.0

21.8

21.9

21.6

22.1

22.0

22.1

22.3

22.4

21.8

21.7

21.9

21.6

22.5

21.9

21.9

22.1

S
F=
S

2
1
2
2

01138
.
=
= 5.63
0.0202

Since F = 5.63 > Fc = 3.59, reject Ho.

n
S

2
2
2

= 12
= 0.0202

You might also like