Professional Documents
Culture Documents
Single Population
Mean
Term report
Submitted to:
Submitted by:
[Pick the date]
Statistical Inference
- Comparing Two Population Means Using Dependent
Samples or Matched Pairs ( 1 vs. 2 ) (Daniel, Section 7.4)
When using dependent samples each observation from population 1 has a one-to-one
correspondence with an observation from population 2. One of the most common cases
where this arises is when we measure the response on the same subjects before and after
treatment. This is commonly called a pre-test/post-test situation. However, sometimes
we have pairs of subjects in the two populations meaningfully matched on some prespecified criteria. For example, we might match individuals who are the same race,
gender, socio-economic status, height, weight, etc... to control for the influence these
characteristics might have on the response of interest. When this is done we say that we
are controlling for the effects of race, gender, etc.... By using matched-pairs of subjects
we are in effect removing the effect of potential confounding factors, thus giving us a
clearer picture of the difference between the two populations being studied.
DATA FORMAT
Matched Pair X 1i
1
X 11
2
X 12
X 13
3
...
...
X 1n
n
X 2i
d i X 1i X 2i
X 21
X 22
d1
X 23
d3
...
...
X 2n
dn
d2
H a : d o or H a : d o or H a : d o
d t
sd
This interval has a 100(1- )% chance of covering the true mean paired difference.
Research Questions:
Is there evidence to suggest that Captopril results in a systolic blood pressure
decrease of at least 10 mmHg on average in patients 30 minutes after taking it?
Is there evidence to suggest that Captopril results in a diastolic blood pressure
decrease of at least 5 mmHg on average in patients 30 minutes after taking it?
For each blood pressure we need to consider paired differences of the form
d i BPprei BPpost i . For paired differences defined this way, positive values
correspond to a reduction in their blood pressure hour after taking Captopril. To
answer research questions above we need to conduct the following hypothesis tests:
H o : syspre syspost 10 mmHg
H a : syspre syspost 10 mmHg
and
Below are the relevant statistical summaries of the paired differences for both blood
pressure measurements.
Systolic BP
Diastolic BP
We can use the t-Probability Calculator in JMP to find the associated p-values or better
yet use JMP to conduct the entire t-test.
Systolic Blood Pressure
Both tests result in rejection of the null hypotheses. This we have sufficient evidence to
suggest that taking Captopril will result in mean decrease in systolic blood pressure
exceeding 10 mmHg (p = _______) and a mean decrease in diastolic blood pressure
exceeding 5 mmHg (p = _______). Furthermore we estimate that the mean change in
systolic blood pressure will be somewhere between _______ mmHg and ______ mmHg,
and that the mean change in diastolic blood pressure could be as large as ______ mmHg.
To answer the question of interest we need tools for comparing the population mean
hemoglobin level for dogs not exposed to cadmium oxide vs. that for dogs that have had
cadmium oxide exposure, i.e. how does control compare to exp osed .
Basic Idea:
( X 1 X 2 ) t SE ( X 1 X 2 )
where
SE ( X 1 X 2 )
1
2 1
s p
n1 n 2
where
2
sp
( n1 1) s1 (n 2 1) s 2
n1 n2 2
if n1 n 2
s 2p
s12 s 22
if n1 n 2
2
Test Statistic
t
(X1 X 2 ) 0
~ t-distribution with df n1 n 2 2
SE ( X 1 X 2 )
Do men and women have the same normal body temperature? Putting this into a
statement involving parameters that can be tested:
H o : F M or ( F M ) 0
H a : F M or ( F M ) 0
Intuitive Decision
In order to determine whether or not the null or alternative hypothesis is true, you could
review the summary statistics for the variable you are interested in testing across the two
groups. Remember, these summary statistics and/or graphs are for the observations you
sampled, and to make decisions about all observations of interest, we must apply some
inferential technique (i.e. hypothesis tests or confidence intervals)
One of the best graphical displays for this situation is the side-by-side boxplots. To get
side-by-side boxplots, select Analyze > Fit Y by X. Place Gender in the X box and
Temperature in the Y box. Place the mean diamonds on the boxplots and jitter the
points. The more separation there is in the mean diamonds, the more likely we are to
reject the null hypothesis (i.e data tends to support the alternative hypothesis).
Summary Statistics
x F 98.39
x M 98.10
s F .743
s M .699
n F 65
n M 65
Assumptions
1. The two groups must be independent of each other.
2. The observation from each group should be normally distributed.
3. Decide whether or not we wish to assume the population variances are equal.
Assessing Normality of the Two Sampled Populations
To assess normality we select Normal Quantile Plot from the Oneway Analysis pulldown menu as shown below.
8
Normality appears to
be satisfied here.
Ho : F M
Ha : F M
JMP gives four different tests for examining the equality of population variances. To use
the results of these tests simply examine the resulting p-values. If any/all are less than .10
or .05 then worry about the assumption of equal variances and use the unequal variance tTest instead of the pooled t-Test.
Several new boxes of output will appear below the graph once the appropriate option has
been selected, some of which we will not concern ourselves with. The relevant box for us
will be labeled t Test as shown below for the mean body temperature comparison.
Because we have concluded
that the equality of variance
assumption is reasonable for
these data we can refer to the
output for the t-Test assuming
equal variances.
x A xB
SE ( x A x B )
x A xB
1
1
s 2p
n A nB
~ t distribution
where,
2
sp
(n A 1) s A2 (nY 1) s B2
or
(n A 1) (n B 1)
10
sp
s A sB
2
x F 98.39
x M 98.10
s F .743
s M .699
n F 65
n M 65
x A xB
SE ( x A x B )
x A xB
1
1
s 2p
n A nB
~ t distribution
where,
sp
(n 1) s A2 (nY 1) s B2
A
(n A 1) (n B 1)
or
CI for ( A B )
sp
s sB
A
2
Assumptions:
For this case we make the following assumptions
1. The samples from the two populations were drawn independently.
2. The population variances/standard deviations are NOT equal.
(This can be formally tested or use rule othumb)
3. The populations are both normally distributed. This assumption can be relaxed
when the samples from both populations are large.
100(1 -
( X 1 X 2 ) t SE ( X 1 X 2 )
where
2
SE ( X 1 X 2 )
s1
s
2
n1 n2
and
df
s1 2 s 2 2
n n
2
1
s1 2
n
1
s2 2
n
2
n1 1
n2 1
The t-quantiles are the same as those we have seen previously.
Hypothesis Testing
Test Statistic
t
(X1 X 2 ) 0
~ t-distribution with df = (see formula above)
SE ( X 1 X 2 )
12
tumor cells were generally larger than the radii of benign breast tumor cells. Assuming
the researchers initially hypothesized that cancerous breast tumor cells have larger radii
than non-cancerous cells, conduct a test to see if this is supported by these data.
The cell radii of the malignant tumors certainly appear to be larger than the cell radii of
the benign tumors. The summary statistics support this with sample means/medians of
rough 17 and 12 units respectively. The 95% CIs for the mean cell radius for the two
tumor groups do not overlap, which further supports a significant difference in the cell
radii exists.
Formally Testing the Equality of Population Variances (see Section 7.8)
In JMP
Ho :
2
2
Ha :
2
2
2
1
2
1
or equivalently
Ho : 1 2
Ha :1 2
Test Statistic
s12 s 22
F max 2 , 2 which has an F-distribution with
s 2 s1
Conclusion:
14
where,
SE ( p 1 p 2 )
p 1 (1 p 1 ) p 2 (1 p 2 )
n1
n2
Confidence Level
95 % ( .05)
90 % ( .10 )
99 % ( .01 )
and
z
1.96
1.645
2.576
Hypothesis Testing
Test Statistic
z
( p 1 p 2 ) 0
~ standard normal dist. provided n1 , n 2 are large (see above)
SE ( p 1 p 2 )
1
1
n1 n 2
pq
where
p
n1 p 1 n 2 p 2
n1 n 2
q 1 p
15
In JMP, select Analyze > Fit Y by X and place Surgery in the X box and Age in the Y.
The following output from JMP is obtained.
The results of Fisher's Exact Test are always included in the JMP output whenever we are working with a
2 X 2contingency table.
17
These data come from a study looking at the effects of smoking during pregnancy on
birth weight. Amongst the 381 non-smokers in the study, 13 had babies with low birth
weight, while amongst the 299 mothers who smoked during pregnancy, 28 had babies
with low birth weight. Is there evidence to suggest that the proportion of babies born
with low birth weight is greater for mothers who smoked during pregnancy?
Normal
Birth
Weight
Low Birth
Weight
Nonsmoker
368
96.59%
13
3.41%
381
Smoker
271
90.64%
28
9.36%
299
Column
Totals
639
41
Smoking
Status
Row
Totals
680
Hypothesis Test:
1)
2)
3)
4)
5)
Construct and interpret a 95% CI for ( p smo ker p non smo ker )
18
Conclusion:
Find and interpret the RR and OR for low birth weight associated with smoking
during pregnancy (Note: this was on Assignment 3).
19
1) OR =
________
Disease
Present
Disease
Absent
a
c
b
d
RR =
_________________
1 1 1 1
a b c d
Smoking
Status
Normal
Birth
Weight
Low Birth
Weight
Nonsmoker
368
96.59%
13
3.41%
381
Smoker
271
90.64%
28
9.36%
299
Column
Totals
639
41
Row
Totals
680
20
2) Compute SE(ln(RR)) =
b
d
a ( a b) c ( c d )
Normal
Birth
Weight
Low Birth
Weight
Nonsmoker
368
96.59%
13
3.41%
381
Smoker
271
90.64%
28
9.36%
299
Column
Totals
639
41
Row
Totals
680
21