You are on page 1of 35

Research Methods

Lecture 8
Introduction to Inferential Statistics

Topics
The z Test: What It Is and What It
Does
Confidence Intervals Based on the
z Distribution
The t Test: What It Is and What It
Does
Confidence Intervals Based on the
t Distribution

Topics
The Chi-Square (2) Goodness-ofFit Test: What It Is and What It
Does
Correlation Coefficients and
Statistical Significance

Warning
Pay attention! the material covered
in this lecture is not easy and you
will need to go over it several times
to get the full meaning

The z Test:
What It Is and What It Does
The z test compares the mean of a sample
to the mean of the population
z test: a parametric inferential statistical
test of the null hypothesis for a single
sample where the population variance is
known
Sampling distribution: a distribution of
sample means based on random
samples of a fixed size from a population
Standard error of the mean: the standard
deviation of the sampling distribution

The z Test
The key idea here is that we are
comparing the distribution of
individual scores which make up
the population with the statistics of
a sample of N
Because we have a sample of N we
expect the variation to be smoothed
out compared to the population
the Central Limit Theorem:

The z Test
Central limit theorem
States that for any population with a
mean and a standard deviation ,
the distribution of sample means for
sample size N:
Will have a mean of
Will have a standard deviation of

X =
N

Will approach a normal distribution as N


approaches infinity

The z Test
Formula for z:

Example data - 1
The population statistic for IQ is a
mean of 100 with a SD of 15
Suppose that we test a class of 75
students and find that their mean IQ
is 103.5 are they a special class
with an above average IQ?
This is a one tailed z test

Calculations for the


One-Tailed z Test
We can calculate

We now use 1.73 in the z-test


formula:

Interpreting the One-Tailed z Test


How do we now interpret z = 2.02
Critical value
The value of a test statistic that marks the
edge of the region of rejection in a sampling
distribution
Values equal to it or beyond it fall in the
region of rejection

Region of rejection
The area of a sampling distribution that lies
beyond the test statistics critical value
When a score falls within this region, H0, is
rejected

Interpreting the One-Tailed z Test

From standard tables the critical value of z for


this situation is 1.645 we therefore reject
the null hypothesis with 95% confidence the
class is clever!

Example data - 2
Students within a certain age range
have a mean weight of 90 pounds
(USA) with a SD of 17 pounds
We select a group of 50 who are
taking part in an exercise
programme and their mean weight
is 86 pounds. Is the programme
having any effect on their weight
This is a two tailed z test

Calculations for the


Two-Tailed z Test
The calculations follow exactly the
same steps as the one tail z test

Is the z of -1.67 significant we


need to know the critical z value

Interpreting the Two-Tailed z Test

For N = 50 the confidence value of 5% is


effectively split into 2 parts for a two tail test
Our result of 1.67 is less than 1.96 and so
we cannot reject the null hypothesis

Statistical Power
One-tailed test: statistically a more
powerful test than a two-tailed test
Statistical power: the probability of
correctly rejecting a false H0
With a one-tailed test, we are more
likely to reject H0
zobt does not have to be as large to be
considered significantly different from
the population mean

The z Test
As the sample size increases:
The standard error of the mean
decreases
This increases the statistical power

The z test is appropriate to use:


If the parameters, such as and , are
known
With interval or ratio data

Otherwise the t test is appropriate to use:


In cases where the sample size is small or
Where is not known

Notation
One tail test: Ha: 0<1 or 0>1
Two tail test: Ha: 01
Null hypothesis: H0: 0=1
z (N=50) = -1.67, p<0.05 (one
tailed)

Confidence Intervals
We can estimate the population mean from a
sample within a certain degree of
confidence
Confidence interval: an interval of a certain
width that we feel confident will contain
Statisticians recommend a 95% or a 99%
confidence interval
Formula for the confidence interval:

Example
Using our previous example with the weights
of students we found a mean of 86 pounds
with a SD of 17 pounds from a sample of
50
We found:
X = 2.40

In a z distribution (2 tail) 95% of the results


are found within z = 1.96 of the mean

We therefore estimate the population mean


as 86.0 4.7 (2.4x1.96) with 95%
confidence

The t Test:
What It Is and What It Does
t test
A parametric inferential statistical test of the
null hypothesis for a single sample where the
population variance is not known

Students t distribution: a set of


distributions that, although symmetrical
and bell-shaped, are not normally
distributed
Degrees of freedom (df): the number of
scores in a sample that are free to vary
generally df = N-1

The t Test
Since we do not know the population
statistics we need to estimate them
from our data. The steps are:
Estimated standard error
on the mean

s
sX =
N
s=

(X X)
N 1

X
t=
sX

Estimated standard
deviation of the
population

Calculations for the t Test


Calculations for the t test follow
exactly the same steps as for the z
test including the choice of a one or
two tailed test
Obviously different tables are used
to find the critical values to see
whether the null hypothesis can be
rejected

Example one tail t test


SAT Score
1010
1200
1310
1075
1149
1078
1129
1069
1350
1390

The mean SAT score of


students entering a US
university is 1090
Biology majors (10) have a
mean SAT score of 1176
are they cleverer than
average?

Example
s = 131.80
sX = 41.71
1176 1090
t=
= +2.06
41.71
From tables for 95% confidence with df = 9
we need a t of 1.833 (one tail)

2.06>1.833 and so we reject the null


hypothesis
t(9) = 2.06, p<0.05 (one tailed)

The t Test
Assumptions of the t test
Data are interval or ratio
Population distribution of scores is
symmetrical

The t test is used in situations in which:


Population mean is known
But the population standard deviation () is
not known

If these criteria are not met:


A non-parametric test is more appropriate

Confidence Intervals
Based on the t Distribution
For a one-sample t test, the
confidence interval is determined
by:

Typically, statisticians recommend


using either the 95% or 99%
confidence interval

The Chi-Square (2)


Goodness-of-Fit Test
Chi-square (2) goodness-of-fit test
A nonparametric inferential procedure that
determines how well an observed frequency
distribution fits an expected distribution

Observed frequency: the frequency with


which participants fall into a category
Expected frequency: the frequency
expected in a category if the sample data
represent the population

The Chi-Square (2)


Goodness-of-Fit Test
Formula for chi-square:

where
O is the observed frequency
E is the expected frequency

In approved style, the result is


reported as:

The Chi-Square (2)


Goodness-of-Fit Test
Assumptions and appropriate use
Appropriate for nominal (categorical)
data
The frequencies in each expected
frequency cell should not be too small
(not less than 5)
The sample should be randomly
selected and the observations must be
independent

Example
In the USA 17% of teenagers at
High School get pregnant
In a certain school 7 girls were
pregnant out of 80
Does this school have a
significantly lower incidence of
teenage pregnancy?

Example
Frequency

Pregnant

Not pregnant

Observed

73

Expected

14

66

In this example:
2 = 4.24
From tables the critical value at 95% is
3.84 and so the null hypothesis is rejected
With 2 categories (pregnant and not
pregnant) we have df=1

Correlation Coefficients and


Statistical Significance
We can have confidence levels for the correlation
coefficients that we met earlier
A one-tailed test of a correlation coefficient
Means that we have predicted the expected
direction of the correlation coefficient
A two-tailed test
Means that we have not predicted the
direction of the correlation coefficient
Degrees of freedom for the Pearson product:
N 2, where N represents the total number of
pairs of observations

Summary
Parametric tests: the z test and the t test
The distributions should be bell-shaped
Certain parameters should be known
Data should be interval or ratio

Nonparametric test: chi-square test


Population parameters are not needed
The underlying distribution of scores is not
assumed to be normal
Data are most commonly nominal or ordinal

Memory test
Population average is 7
http://faculty.washington.edu/
chudler/stm0.html

You might also like