You are on page 1of 15

HYPOTHESIS TESTING*

main source: Vernoy & Vernoy (1997) & Falik & Brown (1983)

SOME ESSENTIAL ISSUES PERTAINING TO


STATISTICAL ANALYSES PERFORMED ON
EXPERIMENTAL DATA
9 Were the data collected from a sample that trully representing
th population?
the
l ti ? What
Wh t iis th
the likelihood
lik lih d off a given
i
outcome
t
b
by
chance alone (probalility)?
9 How a wellwell-controlled and well
well--designed experiment
(research) looks like?
9 Suppose that I have run a wellwell-controlled and well
well--designed
experiment and I observed a notable difference in behavior
between the groups (ie. samples). Is it valid to draw a direct
generalization of my findings to the entire population?
9 What are the statistical techniques to be employed that would
permit me to draw a valid generalization?

The main aims of this section is to


provide you with the basics and
essentials pertaining to the third
issues

Suppose that I have run a wellwell-controlled and wellwelldesigned experiment and I observed a notable
difference in behavior between the groups (ie.
samples). Is it valid to draw a direct generalization
of my findings to the entire population?

A Very Strong Note!!


A difference in behavior observed between your groups, as
i di
indicated
db
by the
h diff
difference iin the
h sample
l means, iti may exist
i
for one of two reasons:
a. there is no actual difference between the groups
because both samples were taken from the same
population - the observed difference is just a chance
occurrence due to the error involved in sampling.
b a difference
b.
diff
actually
t ll exists
i t because
b
each
h sample
l came
from a different population and the difference is
therefore real.

Two Types of Hypotheses


The Research Hypothesis or
Alternative Hypothesis (H1)
The Null Hypothesis
yp
((Ho)
The hypothesis which states
that there is no real difference
between the sample means or
between the sample mean and
the population mean
H0 :

1 = 2

The hypothesis which states that the


difference between the sample means
or between the sample mean and the
population mean is real (i.e. the
difference does exist!)
H1 : 1 2
or

H1 : 1 < 2
or

H1 : 1 > 2
Strong Notes: H0 and H1 may involve statistical indexes other than means (refer to examples)!!

Some Examples of Null Hypothesis


9 Mean of M_Test of Group 1 is equal to mean of M_Test Group 2
H0 :

1 = 2

9 Sample mean is equal to the population mean


=

H0 :

9 The three population means are equal to each other

1 = 2 = 2

H0 :

9 The standard deviation of Group A is the standard deviation of


Group B
H0 :

1 = 2

Some Examples of Alternative Hypothesis


9 Running time in the 100-metre dash is lower for men than women
H1 :

males < females

9 Practice improves test scores


H1 : pg > cg
9 The relationship between income (X) and education (Y) for
Adsians is different from those live in European communities
H0 :

XYAsians XYEuropeans

The Experiment:
Experiment
Two groups of subjects are experimented to see the effects of
rehearsal of faces once every hour via mental imagery (this is
called facial memory). One group of subjects practice the
rehearsal while the other dont. A test of facial memory are
conducted 8 hours later to check the facial memory of the
bj t
subjects.
The Hypotheses:
Hypotheses
Ho :

There is no real difference between (means of test scores)


subjects who use mental imagery and those who do not.
(Any observed difference is due to chance - to the fact that
subjects in each group were not identically matched)

H1 :

Subjects who rehearse the faces once an hour via mental


imagery score higher than subjects who do not rehearse.
(There is a real difference (of means of test scores) between
subjects who use mental imagery and those who do not)

The Idea of Hypothesis Testing


Ho may involve MCT/MV index other than mean such as median and
variance. However, such Ho would be more difficult to be tested
statistically.
Several statistical tests (which are largely mean-based) are formulated in
such a way that they test the null hypothesis.
hypothesis They are set up to test
whether the difference between the groups' performances (the difference
between the sample means) is significantly large enough for researchers
to rule out the possibility that the difference is due to chance (i.e. the fact
that subjects in each group were not identical).
If the results of the tests indicate that the difference is real - that it is
significant - then researchers can reject the null hypothesis and accept
the research (alternative) hypothesis
hypothesis.
On the other hand, if the results indicate that the difference between the
groups is not significant,
significant researchers say that they fail to reject the null
hypothesis. They cannot accept the null hypothesis; they can only fail to
reject it because there's always a chance
chance, however small, that the
difference is real but your experiment was not sensitive enough to
confirm your research hypothesis.

IN THE CASE WHERE THE TEST FIND


THE EXISTENCE OF SIGNIFICANT
DIFFERENCE!
The difference may be real or simply due to
chance!
How
do we determine
H
d
d t
i whether
h th the
th difference
diff
between samples is real or chance?

The Experiment:
Two groups of subjects are experimented to
see the effects of rehearsal of faces once
every hour via mental imagery. One group
of subject practice the rehearsal while the
other dont. A test of facial memory are
conducted 8 hours later to check the facial
j
memoryy of the subjects.
The Hypotheses:
Ho: There is no real difference between
subjects who use mental imagery and
those who do not.
H, :

Subjects who rehearse the faces once


an hour via mental imagery score
higher than subjects who do not
rehearse.

Suppose you are a researcher


who has just conducted an
experiment investigating these
hypotheses and you need to
hypotheses,
decide which one is actually
true. As you make this decision,
there are two ways you can be
correct and two ways you can
be wrong (see Fig.7
Fig.7).

Fig. 7 The Four Possible Outcomes of A Statistical Decision

You will be CORRECT if you reject the null hypothesis when it is in reality
false or if you fail to reject it when it is in reality true.
However, if you decide to reject the null hypothesis when it is actually true
(you accept a false research hypothesis), you will have committed to an
error called a Type I error (the probablility is symbolized as ).

If you fail to reject the null hypothesis when it is actually false (you fail to
accept a true research hypothesis), again you will have committed to an
error called a Type II error (the probablility is symbolized as ).

Strong Notes:
It is imperative that researchers take great pains to avoid making these errors,
particularly Type I errors.
errors When researchers accept research hypotheses that are
in truth false, they can mislead not only themselves and other researchers but
also people who apply the research results to the real world.
If Type I errors are committed in a series of learning studies and if the results are
applied to the classroom, countless hours of learning time may be wasted. Thus,
Type I errors are considered to be much more serious than Type II errors.
errors
When a Type II error is made because a researcher fails to reject the null
hypothesis when the research hypothesis is actually true, the researcher or some
other researcher may have to repeat the experiment, for a variation of it,
sometime in the future. This may result in lost time, but it will not result in
fallacious theories being implemented in the real world. Type II errors may slow
g
y don't lead it down blind alleys.
y
down the p
progress
of science,, but they
Actually, if researchers select the appropriate statistical tests (the most widely
used tests are explained in the following chapters) and apply their results to the
tests correctly, the chance of committing a Type I error is very small (about 0.05
or 5 out of 100). The purpose of these tests is to determine whether the statistical
differences recorded among the sample groups are significant.

Fig. 8 shows a distribution of all possible sample means;


means
any one particular sample mean will fall somewhere within
this distribution.

Fig. 8

The Distribution of Sample Means

If that sample mean is near the population


mean, as is X1 in Fig. 9, it is likely that Ho is
true, and that any difference between the
experimental group receiving the
independent variable and the rest of the
population not receiving it is negligible and
is due merely to a sampling error. In such a
case, the researcher should accept the
sample as being part of the greater
population.
On the other hand, if the sample mean lies
in one of the tails of the distribution, as does
X2 in Fig. 9, it is quite unlikely that Ho is
true. In this case, the researcher should
reject the null hypothesis and accept the
research hypothesis. In other words, the
experimental group receiving the
independent variable is different enough
from the population not receiving it that the
researcher should consider the
experimental group as part of a separate
population that behaves in a distinct manner
because of the effects of the independent
variable.

Fig. 9

Sample mean X1 is near the mean of the


distribution of sample means and
therefore within the area where the null
hypothesis is likely, whereas sample mean
X2 is out in the tail of the distribution where
the null hypothesis is unlikely.

Questions::
Questions
How far out in the tail should the sample mean lie before Ho
can be rejected?
How much of a chance do researchers want to take that
they will not commit a Type I error?

Answer:
Mostt psychologists
M
h l i t agree that
th t an levell off 0.05
0 05 (or
(
simply = 0.05
05) is reasonable, which means that
the null hypothesis can reasonably be rejected if
there is less than 0.05 probability of committing a
Type I error. Therefore, the levels should be set at
the points in the tails where only 5% (0.05) of the
distribution will yield more extreme scores. If a
particular sample mean falls within these areas of
the curve, we can reject Ho (see Fig. 10).

Fig. 10 The placement of areas for (a and b) a one-tailed test


and (c) a two-tailed test. If a sample mean falls within the
area, we can reject the null hypothesis.

ONE--TAILED HYPOTHESES
ONE
The decision to spread the areas
between the two tails of the distribution or
to concentrate all of the area in only one
tail of the distribution depends on the
research hypothesis.
If the research hypothesis specifies that the
sample mean will definitely be above or
definitely be below the population mean, it
is of course reasonable to look in only one
tail of the theoretical sampling distribution
to see whether the sample mean is
significant. For example, if our research
hypothesis states that eating chocolate will
improve memory, then we only need to look
in the tail above the population mean for
significance.
significance
Research hypotheses that specify the
direction of the experimental effect are
called one
one--tailed hypotheses because we
need to look in only the one specified tail
for significance. Tests of hypotheses such
as these are called one
one--tailed tests (see
Fig. 11).

Fig.
Fi 11

The
Th placement
l
t off areas for
f (a
( and
d b) a
one-tailed test and (c) a two-tailed test. If
a sample mean falls within the area, we
can reject the null hypothesis.

10

TWO--TAILED HYPOTHESES
TWO
Many research hypotheses suggest
merely that the sample mean will be
different from the population mean
without specifying the direction of the
difference. Because the direction is
specified it is necessary to look in
not specified,
both tails of the theoretical sampling
distribution for significance (see Fig.
12). Tests of hypotheses such as
these are known as two
two--tailed tests
tests.
As an example, if our research
hypothesis states that a diet including
chocolate changes a person's
memory for faces
faces, it is not clear
whether this change is for the better
or for the worse. Since the hypothesis
does not specify the direction of the
change, it is a twotwo-tailed hypothesis
hypothesis.

Fig. 12 Two-tailed statistical tests split and


put half of alpha in the tail below the
mean and the other half of in the tail
above the mean

How do we know when to use a oneone-tailed test and when to


use a twotwo-tailed test?
The decision to use a one-tailed test or a two-tailed
test begins with your literature search and the
statement of your research hypothesis.
If your knowledge of previous research leads you to
believe that your experiment will result in the mean
l
l off the
th independent
i d
d t
off one group or one level
variable being greater than the mean of the other,
then your research hypothesis should state this
prediction and you will run a oneone-tailed test.
test
If it is not clear from your knowledge of previous
research which group or level will have the larger
mean, then your hypothesis will reflect this
uncertainty and you will do a twotwo-tailed test.
test.
Summary:
* When it predicts a direction, do a oneonetailed test
* When it does not predict a direction, then
do a twotwo-tailed test.
test

Example:.
If you know that caffeine aids
memory, you can hypothesize that
people who use caffeine before
taking a memory test will perform
better than people who do not use
caffeine. This hypothesis leads to a
one--tailed test because the
one
prediction is that one group will
perform better than the other group.

In the real world of research,


most researchers take the
conservative approach and
use the twotwo-tailed test more
often than the one-tailed.

11

Eg. 1:
Using Table z, identify the critical values need for rejection
of the null hypothesis under the following conditions:

Directionality
one-tailed

two-tailed

0.10
0.15
0.001
source: Fallik & Brown (1983) p 322 Pr10 Pr 15

Fig. 10 The placement of areas for (a and b) a one-tailed test


and (c) a two-tailed test. If a sample mean falls within the
area, we can reject the null hypothesis.

12

Eg. 2:
A group of 36 students were selected to undergo a special program
claimed to be effective maths learning program. At the end of one-year
program, their maths performance is then measured using a math test,
which gives mean of 67 marks. It is assumed that mean and standard
deviation of the maths test scores of the whole population are 45 and 16
respectively. Run a suitable test to see if the test actually has assisted
this
hi group off llearners to llearn maths
h more effectively
ff i l as compared
d with
ih
the whole population.

a. State all assumptions required before you can run the z-test.
b. State the nul and alternative . Run the hypothesis testing using
= 0.01
c. Determine the minimum score of the maths test which defines the
cut-off value for the test to be significantly effective in assisting
this group of learners in the maths learning as compared to the
whole population at = 0.01

13

14

15

You might also like