You are on page 1of 103

1

2
Determine the confidence interval for the mean
when is known or n > 30.
Determine the confidence interval for the mean
when is unknown and n < 30.
Determine the minimum sample size for finding a
confidence interval for the mean.

3
Population
o All items of interest
o Group of interest to
investigator

4
Population
o All items of interest
o Group of interest to
Sample
investigator
o Portion of population
o Will be used to reach
conclusions about population
5
1. Easier than studying the whole population
2. Costs less
3. Takes less time
4. Sometimes testing involves risk
5. Sometimes testing requires the destruction
of the item being studied

6
Population
Parameter
The complete collection of
A number that describes a
measurements outcomes,
population characteristics
object or individual under
study

Sample Statistic
A subset of a population, A number that describes a sample
characteristics.
containing the objects or
The value of a statistic is known
outcomes that are actually
only after we have taken the
observed sample.
We use the statistic to estimate the
parameter.
7
Sample statistics
n

x
X n i 1
n
n

Truth (not observable) (x X i n)


2

2 s 2 i 1
n 1

Sample *hat notation ^ is often used to


indicate estitmate
Population (observation)
parameters
N N

x
i 1
(x )
i
2

2 i 1
N N Make guesses about
8
the whole population
The objective of estimation is to determine
the value of a population parameter on
the basis of a sample statistic.
There are two types of estimators:
Point Estimator
Interval estimator

9
A point estimate An interval estimate/
a specific numerical confidence interval
value of a parameter. an interval or a range of
The best point values used to estimate the
estimate of the parameter.
population mean is This estimate may or may not
the sample mean X . contain the value of the
parameter being estimated.
Stated in terms of Probability
Never 100% Sure

10
Elements of Confidence Interval
Estimation

A Probability That the Population Parameter


Falls Somewhere Within the Interval.

Confidence Interval Sample


Statistic

Confidence Limit Confidence Limit


(Lower) (Upper)

11
Population Random Sample I am 95% confident
that is between 40
& 60.
Mean, , is Mean
unknown X = 50

Sample

12
Of all the unbiased estimators, we prefer the estimator
whose sampling distribution has the smallest spread or
variability.
The distance between an estimate and the true value of
the parameter is the error of estimation.
The distance between the bullet and the
bulls-eye.

13
A confidence interval
a specific interval estimate of a parameter determined
by using data obtained from a sample and by using
the specific confidence level of the estimate.
The confidence level
is the probability that the interval estimate will
contain the parameter.
Say we want to be 95% certain that our confidence
interval contains population mean

14
Probability that the unknown population
parameter is in the confidence interval in 100
trials.

Denoted (1 - ) % = level of confidence


e.g. 90%, 95%, 99%

(Level of Significance) Is Probability That the


Parameter Is Not Within the Interval in 100 trials
(NOT THIS TRIAL ALONE!)

15
The value of the statistic in my sample (eg.,
mean, odds ratio, etc.)

point estimate (measure of how confident we


want to be) (standard error)

From a Z table or a T table, depending on


the sampling distribution of the statistic.

Standard error of the statistic.

16
Data variation
Measured by
Sample size

X
n
Level of confidence
100 1 %

17
18
Confidence interval for population
means
4 cases that need to consider:
i. A large sample taken from a normally
distributed population with known variance
population.
ii. A small sample taken from a normally
distributed population with known variance
population.
iii. A large sample taken from a population with
unknown variance population.
iv. A small sample taken from a normally
distributed population with unknown variance
population.
19
Is Yes Use z/2 values no matter
known? what the sample size is. *

No

Yes Use z/2 values and s in


Is n > 30?
place of .
No

Use t/2 values and


s in the formula. **
*Variable must be normally distributed when n < 30.
**Variable must be approximately normally distributed.
20
Confidence Intervals (Known )

Assumptions
Population Standard Deviation Is Known
Population Is Normally Distributed
Confidence Interval Estimate


X Z / 2 ( )
n

21
s
x z
2 n

22
A random sample of 50 males showed a mean
average daily intake of dairy products equal to
756 grams with a standard deviation of 35 grams.
Calculate a 95% confidence interval for the
population average .

s 35
x z 756 1.96 7 56 9 .70
2 n
50 (746.3,765 .7)
or 746.30 765.70 grams.
23
Confidence Intervals
(Unknown, n < 30 )

Assumptions
Population Is Normally Distributed
Use Students t Distribution
Confidence Interval Estimate

S
X t / 2, n1 ( )
n

24
Students t Distribution

Standard
Normal

t (df = 13)

Bell-Shaped
Symmetric
t (df = 5)
Fatter Tails

Z
t
0
25
Degrees of Freedom (df)

Number of Observations that Are Free to Vary


After Sample Mean Has Been Calculated
Example
Mean of 3 Numbers Is 2 degrees of freedom = n -1
= 3 -1
X1 = 1 (or Any Number) =2
X2 = 2 (or Any Number)
X3 = 3 (Cannot Vary)
Mean = 2

26
An unconfined compression test performed on 15
concrete cylinders produced the following strength
results (in psi):
2670 2580 2400 2490 2640 2590 2440 2170

2410 2590 2730 2690 2730 2480 2360


Determine a 95% confidence interval for the true
average strength of the concrete.

27
28
Sometimes we are interested in comparing the
means of two populations.
The average growth of plants fed using two different
nutrients.
The average scores for students taught with two
different teaching methods
To make this comparison,
A random sample of size n 1 drawn from
population 1 with mean 1 and variance 12 .
A random sample of size n 2 drawn from
population 2 with mean 2 and variance 22 .
29
We compare the two averages by making inferences
about 1-2, the difference in the two population
averages.
If the two population averages are the same, then
1-2= 0.
The best estimate of 1-2 is the difference in the
two sample means,

x1 x 2

30
12 22
X 1 X 2 z / 2
n1

n2

This relationship is exact if the two populations are


normally distributed.

31
Aluminium spars from two different suppliers are used in
manufacturing the wing of a commercial aircraft. You have
been asked to determine if the latest shipments from each
supplier are equally strong. From past experience, the standard
deviations of the tensile strengths are known to be 1.5 kg/mm2
for Supplier 1 and 1.0 kg/mm2 for Supplier 2 (who has tighter
quality control). A sample of 12 spars from Supplier 1 has a
mean tensile strength of 87.6 kg/mm2 and a sample of 10 spars
from Supplier 2 has a mean tensile strength of 72.5 kg/mm2. If
1 and 2 denote the true mean tensile strengths for the two
shipments of spars, find a 90% confidence interval on the
difference in mean strength.

32
Confidence interval for 1 - 2 :
s12 s 22
(x1 x 2 ) z / 2
n1 n 2

33
Average Daily Intakes Men Women
Sample size 50 50
Sample mean 756 762
Sample Std Dev 35 30

Compare the average daily intake of dairy products of men and


women using a 95% confidence interval.

2 2
s s
(x1 x2 ) z 1
2

2 n1 n2
352 30 2
(756 762) 1.96 6 12 .78
50 50
(-18.78,6.78)
34
the variances population are unequal:

X1 X 2 t / 2 s12 s 22

n1 n 2

35
Where; the degrees of freedom are

2
s 2
s
2

1 2

v n1 n2
2 2
s1 s2
2 2

n1 n2
n1 1 n2 1

36
The drying time of pavement marking paint is of concern to
transportation engineers. Of two such paints from a particular
manufacturer, it is suspected that yellow paint dries faster
than white paint. Sample measurements of the drying times of
both paints (in minutes) are given below.
White: 120, 132, 123, 122, 140, 110, 120, 107

Yellow: 126, 124, 116, 125, 109, 130, 125, 117, 129, 120
Determine a 95% confidence interval on the difference in
mean drying times, assuming that the drying times are
normally distributed and the standard deviations population of
the drying times are not equal.
37
If random samples of size n1 and n2 are drawn from two
normal populations with equal but unknown variances,

X 1 X 2 t / 2 s ( )
2
p
1 1
n1 n2

d.f. = n1 + n2 2

38
(n1 1) s (n2 1) s
2 2
s
2 1 2

n1 n2 2
p

39
The drying time of pavement marking paint is of concern to
transportation engineers. Of two such paints from a particular
manufacturer, it is suspected that yellow paint dries faster than
white paint. Sample measurements of the drying times of both
paints (in minutes) are given below.
White: 120, 132, 123, 122, 140, 110, 120, 107

Yellow: 126, 124, 116, 125, 109, 130, 125, 117, 129, 120
Determine a 95% confidence interval on the difference in mean
drying times, assuming that the drying times are normally
distributed and the standard deviations population of the drying
times are equal.

40
Dependent Samples
samples that are paired or matched in some way.

Samples in which the same subjects are used in a


pre-post situation are dependent.

41
sd
d t
,n 1 n
2

this can only be used if both populations are


normally distributed

42
The manager of a fleet of automobiles is testing two brands of
radial tires. He assigns one tire of each brand at random to the
two front wheels of eight different cars and runs the cars until
the tires wear out. The tire lives (in miles) are shown below.
Assuming that the tire lives for both brands are normally
distributed, determine a 99% confidence interval on the
difference in mean life.

43
Car Brand 1 Brand 2 D= Brand1-Brand 2
1 36925 34318
2 45300 42280
3 36240 35500
4 32100 31950
5 37210 38015
6 48360 47800
7 38200 37810
8 33500 33215

44
45

HYPOTHESIS TESTING
Understand the definitions used in hypothesis
testing.
State the null and alternative hypotheses.
Determine critical values for the z and t test .
State the five steps used in hypothesis testing.
Test means for large samples using the z test.
Test means for small samples using the t test.

46
A hypothesis is a claim
(assumption) about the
population parameter
I believe that mean weight of
Parameter may be cereal packages is 300 grams!
population mean,
proportion, correlation
coefficient,...
The parameter must
be identified before
analysis

47
Prevalent opinion is
that mean age in that
group is 50 (null
Population hypothesis) Reject null
hypothesis!
Sample mean is
J J only 45!

J
J J
J Random sample
J Mean
age = 45
J J

48
Is Yes Use z/2 values no matter
known? what the sample size is. *

No

Yes Use z/2 values and s in


Is n > 30?
place of .
No

Use t/2 values and


s in the formula. **
*Variable must be normally distributed when n < 30.
**Variable must be approximately normally distributed.
49
Not Guilty until proved otherwise!
Null hypothesis remains valid
until proved otherwise!

Sometimes it happens that innocent


person is proved guilty.
Same may happen in hypothesis testing:
We may reject null hypothesis although it is
true. (there is always a risk of being
wrong when we reject null hypothesis;
risk is due to sampling error).

50
H0 True H0 False

Reject
H0
Error Correct
Type I Decision

Do
not Correct Error
reject
H0
Decision Type II

51
The level of significance is the maximum
probability of committing a type I error. This
probability is symbolized by ; that is ,
.
P ( ty p e I e rro r )
The probability of a type II error is
symbolized by . That is,

P ( ty p e II e rro r )

52
Step 1: State the hypothesis, and identify the
claim.
Step 2: Compute the test value.
Step 3: Find the critical value from the appropriate
table.
Step 4: Make the decision to reject or not reject
the null hypothesis.
Step 5: Summarize the results.

53
Statistical hypothesis:
A statement about the parameters of one or more
populations.

There are two types of statistical hypothesis for each


situation:
the null hypothesis
the alternative hypothesis.

54
H0: Mean height of males equals 174.
H1: Mean height is not equals 174.

H0 : Half of the population is in favour of nuclear power plant.


H1 : More than half of the population is in favour of nuclear
power plant.

H0 : The amount of overtime work is equal for males and females.


H1 : The amount of overtime work is not equal for males and
females.

H0 : There is no correlation between interest rate and gold price.


H1 : There is correlation between interest rate and gold price.

55
Null hypothesis H0 Alternative hypothesis Ha
contains a statement of contains a statement of
equality such as , = or . inequality such as < , or >

Complementary Statements

If I am false,
If I am false,
you are true
you are true

H0 H1

56

Is greater than Is less than

Is increased Is decreased or
reduced from

Is greater than or equal Is less than or equal to
to
Is at least Is at most

Is equal to Is not equal to

Has not changed from Has changed from

57
After stating the hypotheses, the
researchers next step is to design the
study. The researcher selects the correct
statistical test, chooses an appropriate
level of significance, and formulates a
plan for conducting the study.

58
A statistical test uses the data obtained from
a sample to make a decision about whether
or not the null hypothesis should be rejected.

test value
The numerical value obtained from a statistical
test.

59
After a significance level is chosen, a
critical value is selected from a table for
the appropriate test.
The critical value(s) separates the critical
region from the noncritical region.

60
The critical or rejection region
the range of values of the test value that indicates
that there is a significant difference and that the
null hypothesis should be rejected.
The noncritical or nonrejection region
the range of values of the test value that indicates
that the difference was probably due to chance
and that the null hypothesis should not be rejected.

61
If the alternative hypothesis uses an equal sign, this
indicates a two tailed test(nondirectional).
the region of rejection is located in both tails.
If the alternative hypothesis uses a greater or less
than sign (<>), this is a directional test.
the region of rejection is located is one tail of the sampling
distribution
A one-tailed test is either right-tailed or left-tailed,
depending on the direction of the inequality of the
alternative hypothesis.

62
Right-tailed test
which the sample statistic is hypothesized to be
at the right tail of the sampling distribution

Left-tailed test
which the sample statistic is hypothesized to be
at the left tail of the sampling distribution.

63
H0: k 0.10, C. V. 1.28
H1: k 0.05, C. V. 1.65
0.01, C. V. 2.33

Noncritical
Critical region
region

-z 0

64
H0: k 0.10, C. V. 1.28
H1: k 0.05, C. V. 1.65
0.01, C. V. 2.33

Noncritical Critical
region region

0 +z

65
H 0 : k 0.10,C.V. 1.65
H1 : k 0.05,C.V. 1.96
0.01,C.V. 2.58

Noncritical
Noncritical Critical
Critical region
region Critical
region
region region

-z 00 +z

66
H0: 3 Critical
Value(s)
H1: < 3
Rejection Regions 0

H0: 3
H1: > 3
0
/2
H0: 3
H1: 3
0

67
The test statistic falls in the specified
region of the sampling distribution.
Rejection of the null hypothesis leads one
to believe that the alternative hypothesis
is true.

68
P - Value
The P-value is the smallest level of significance at which H0
would be rejected when a specified test procedure is used on a
given data set.
p-value is the smallest type I error rate if we reject Ho at the
observed value

If :
p value Do Not Reject H0

p value < , Reject H0

69
70
Hypothesis Testing
for population means
4 cases that need to consider:
i. A large sample taken from a normally
distributed population with known variance
population.
ii. A small sample taken from a normally
distributed population with known variance
population.
iii. A large sample taken from a population with
unknown variance population.
iv. A small sample taken from a normally
distributed population with unknown variance
population.
71
Is Yes Use z/2 values no matter
known? what the sample size is. *

No

Yes Use z/2 values and s in


Is n > 30?
place of .
No

Use t/2 values and


s in the formula. **
*Variable must be normally distributed when n < 30.
**Variable must be approximately normally distributed.
72
Assumptions
Population is normally distributed

Z test statistic

X X X
Z
X / n

73
A manufacturer of light bulbs claims that its light
bulbs have a mean life of 1520 hours with a
standard deviation of 85 hours. A random
sample of 40 such bulbs is selected for testing. If
the sample produces a mean value of 1505
hours, is there sufficient evidence to claim that
the mean life is significantly less than the
manufacturer claimed?

74
The central limit theorem
when the population standard deviation is
unknown, the sample standard deviation s can be
used in the formula as long as the sample size is 30 or
more.

X
z
s n

75
A certain type of brick is being considered for use
in a particular construction project. The brick will
be used unless sample evidence strongly suggests
that the true average compressive strength is more
than 3200 psi. A random sample of 36 bricks is
selected and each is tested to failure. The sample
average compressive strength is 3109 psi with a
standard deviation of 156 psi. At a level of
significance of = 0.05, should the brick be used?

76
If the population being sampled is known to be
normally or approximately normally distributed but
the sample size is small (typically n < 30),

X
t
s n

The degrees of freedom (d.f) = n1

77
In order to test gasoline mileage performance for a new
version of one of its compact cars, an automobile
manufacturer selected six nonprofessional drivers to drive
test cars from Phoenix to Los Angeles. At the conclusion of
the trip, the resulting gas mileage numbers for the six cars
were:
32.2 29.3 31.5 28.7 30.2 30.0
The manufacturer wishes to advertise that this car gets 30
mpg or better on the highway. Do the sample data support
the claim that the manufacturer would like to make?
Assume = 0.05.

78
Jar of honey are filled by a machine. It has been found
that the quantity of honey in the jar has mean 460.3 g.,
with standard deviation 3.2 g. It is believed that the
machine controls have been altered in such way that,
although the standard deviation is unaltered, the mean
quantity may have changed. A random sample of 60 jars is
taken and the mean quantity of honey per jar is found to
be 461.2 g. State the suitable null and alternative
hypotheses, and carry out a test using a 5% level of
significance.

79
A sample of eight containers is selected at random from a
large batch. The containers have powder contents with
masses x g,
1998.5, 2000.4, 1999.9, 2005.8,
2011.5, 2007.6, 2001.3, 2002.4
Assuming a normal distribution for the masses of the
contents, show that there is significant evidence, at the 5%
level of significant , that the mean mass of the contents of
the containers in this batch is greater than 2000 g.

80
TRY !

Does an average box of cereal


contain more than 368 grams of
cereal? A random sample of 36
368 gm.
boxes showed mean weight is 372.5
grams and standard deviation 15
grams. Test at 1% level of significant.

81
Hypothesis testing for

is unknown
is known

X
ztest n
n
n

X X
ztest ttest
s n s n

82
TWO SAMPLE HYPOTHESIS
TESTS ON MEAN

83
Hypothesis Remarks

H0: 1 - 2. = 0 O H0: 1 = 2 Testing an alternative


H1: 1 - 2. 0 R H1: 1 2 hypothesis that the means of
two populations are different

H0: 1 - 2. 0 O H0: 1 2 Testing an alternative


H1: 1 - 2. < 0 R H1: 1 < 2 hypothesis that the means of
the first population is less than
the mean of the second
population
H0: 1 - 2 0 O H0: 1 2 Testing an alternative
H1:1 - 2 > 0 R H1: 1 > 2 hypothesis that the means of
the first population is greater
than the mean of the second
population
84
Assumptions for the test to determine the difference
between two means:
1. The samples must be independent of each other;
that is, there can be no relationship between the
subjects in each sample.
2. The populations from which the samples were
obtained must be normally distributed. If the
standard deviations of the variable known, or the
sample sizes greater than or equal to 30. Use z
test.

85
Formula for the z test for comparing two
means from independent populations

( X1 X 2 ) ( 1 2 )
z
12 22

n1 n 2

86
A random sample of 20 specimens of cold-rolled steel
had an average yield strength of 29.8 ksi. A second
random sample of 25 galvanized steel specimens
gave an average yield strength of 34.7 ksi. Assuming
that the two yield strength distributions are normal
with 1 = 4.0 and 2 = 5.0, do the data indicate that
the true average yield strengths, 1 and 2, are
different? Assume = 0.01.

87
If the populations being sampled are known to
be normally distributed and the sample size is
large (typically n 30),

( x1 x2 ) ( 1 2 )
z
s12 s22

n1 n2

88
Difference Between Two Means (1
and 2 Unknown, Small Samples)

A t test is used to test the difference between means


when:
the two samples are independent,
the sample sizes are small,
the samples are taken from two normally or approximately
normally distributed populations.
There are two different options for the use of t tests.
i. the variances of the populations are not equal.
ii. the variances population are equal.

89
If the populations being sampled are known to be normally
distributed but the standard deviations are unknown and
cannot be presumed to be equal to each other.

( X 1 X 2 ) ( 1 2 )
t
2 2
s1 s2

n1 n2

90
where the degrees of freedom (df):

2
s 2
s
2

1 2

v n1 n 2
2 2
s1 s 2
2 2

n1 n 2
n1 1 n 2 1

91
The drying time of pavement marking paint is of concern
to transportation engineers. Of two such paints from a
particular manufacturer, it is suspected that yellow paint
dries faster than white paint. Sample measurements of the
drying times of both paints (in minutes) are given below.
White: 120, 132, 123, 122, 140, 110, 120, 107
Yellow: 126, 124, 116, 125, 109, 130, 125, 117, 129, 120
Test at 5% significance level on the suspected in mean
drying times. Assuming that the drying times are normally
distributed and the standard deviations population of the
drying times are not equal.

92
If the populations being sampled are known to be normally
distributed but the sample sizes are small (typically n < 30), and
variances population are assumed to be equal:

t
X 1 X 2 1 2
1 1
s
2.
p
n1 n2
where;
df=n1 n2 2.

93
(n1 1) s (n2 1) s
2 2
s
2 1 2

n1 n2 2
p

94
A Pooled estimate of the variance
is a weighted average of the variance using the
two sample variances and the degrees of
freedom of each variance as the weights.
The pooled estimate of variance is used to
calculate the standard error in the t test when
the variances are equal.

95
The drying time of pavement marking paint is of concern to
transportation engineers. Of two such paints from a particular
manufacturer, it is suspected that yellow paint dries faster
than white paint. Sample measurements of the drying times of
both paints (in minutes) are given below.
White: 120, 132, 123, 122, 140, 110, 120, 107
Yellow: 126, 124, 116, 125, 109, 130, 125, 117, 129, 120
Test at 5% significance level on the suspected in mean drying
times. Assuming that the drying times are normally
distributed and the standard deviations population of the
drying times are equal.

96
X 1 X 2 o
ztest
12 22 are known 12 22

n1 n2

12 22 are unknown

n1 ,n2
12 22
n1 ,n2
12 = 22
ttest
X 1 X 2 1 2

ztest
X 1 X 2 o

X 1 X 2 1 2 s12 s22
ttest
s2
s 2
1 1 n1 n2
1
2
sp 2
n1 n2 n1 n2 s12 s 22

v 1
n n 2

( n 1) s 2
( n 1) s 2
s12
2
s 22
2

s 2p 1 1 2 2
n1 n2 2 1 2
n n
n1 1 97 n 2 1
Hypotheses:
Two-tailed Left-tailed Right-tailed

H 0: D 0 H 0: D 0 H 0: D 0

H 1: D 0 H 1: D < 0 H 1: D > 0

D is the expected mean of the differences of the


matched pairs.

98
The formula for the t test for dependent samples:

D D
t
sD n

with d.f. = n 1

99
In an experiment designed to evaluate an additive to
increase the strength of concrete, each of five
batches of concrete was divided in half and the
additive added to one half of each batch. The
resulting compressive strength measurements (load
in kips at failure) were shown below. Does the
additive work? Test at = 0.05.
Treated 16.1 14.7 17.4 13.7 16.9
Untreated 14.8 13.2 15.5 12.3 15.9

100
Concrete Treated Untreated D= treated-untreated

1 16.1 14.8
2 14.7 13.2
3 17.4 15.5
4 13.7 12.3
5 16.9 15.9

101
There is a relationship between confidence
intervals and hypothesis testing.
When the null hypothesis is rejected in a
hypothesis testing situation, the confidence
interval for the mean using the same level of
significance will not contain the hypothesized
mean.
Likewise, when the null hypothesis is not
rejected, the confidence interval computed
will contain the hypothesized mean.

102
103

You might also like