Professional Documents
Culture Documents
STATISTICS FOR
EDUCATIONAL
RESEARCH
Project Directors:
Module Writer:
Moderators:
Developed by:
Printed by:
Table of Contents
Course Guide
Course Assignment Guide
ix xvii
xxi xxiii
Topic 1
Introduction to Statistics
1.1
What is Statistics?
1.2
Two Kinds of Statistics
1.2.1 Descriptive Statistics
1.2.2 Inferential Statistics
1.2.3 Descriptive or Inferential Statistics
1.3
Variables
1.3.1 Independent Variable
1.3.2 Dependent Variable
1.4
Operational Definition of Variables
1.5
Sampling
1.6
Sampling Techniques
1.6.1 Simple Random Sampling
1.6.2 Systematic Sampling
1.6.3 Stratified Sampling
1.6.4 Cluster Sampling
1.7
SPSS Software
Summary
Key Terms
1
1
3
3
4
4
5
6
7
7
8
10
10
12
12
14
14
15
16
Topic 2
Descriptive Statistics
2.1
What are Descriptive Statistics?
2.2
Measures of Central Tendency
2.2.1 Mean
2.2.2 Median
2.2.3 Mode
2.3
Measures of Variability or Dispersion
2.3.1 Range
2.3.2 Standard Deviation
2.4
Frequency Distribution
2.4.1 Tables
2.4.2 SPSS Procedure
2.5
Graphs
2.5.1
Bar Charts
2.5.2 Histogram
2.5.3 Line Graphs
17
17
18
18
19
20
21
21
22
25
25
26
26
26
28
29
iv
TABLE OF CONTENTS
Summary
Key Terms
30
30
Topic 3
Normal Distribution
3.1
What is Normal Distribution?
3.2
Why is Normal Distribution Important?
3.3
Characteristics of The Normal Curve
3.3.1 Mean, Median and Mode
3.4
Three-Standard-Deviations Rule
3.5
Inferential Statistics and Normality
3.5.1 Assessing Normality using Graphical Methods
3.5.2 Assessing Normality using Statistical Techniques
3.6
What to Do if The Distribution is Not Normal?
Summary
Key Terms
31
31
32
32
33
34
35
35
47
50
50
51
Topic 4
Hypothesis Testing
4.1
What is a Hypothesis?
4.2
Testing A Hypothesis
4.2.1 Null Hypothesis
4.2.2 Alternative Hypothesis
4.3
Type I And Type II Error
4.4
Two-tailed and One-tailed Test
4.4.1 Two-tailed Test
4.4.2 One-tailed Test
Summary
Key Terms
53
53
55
55
57
57
60
60
63
65
66
Topic 5
t-test
5.1
What is t-test?
5.2
Hypothesis Testing using t-test
5.3
t-test for Independent Means
5.4
t-test for Independent Means Using SPSS
5.5
t-test for Dependent Means
5.6
t-test for Dependent Means Using SPSS
Summary
Key Terms
67
67
68
69
77
79
83
87
88
TABLE OF CONTENTS
Topic 6
89
92
93
94
99
101
108
108
Topic 7
109
109
112
116
116
117
121
121
Topic 8
Correlation
8.1
What is a Correlation Coefficient?
8.2
Pearson Product-Moment Correlation
Coefficient
8.2.1 Range of Values of rxy
8.3
Calculation of the Pearson Correlation
Coefficient (r Or rxy)
8.4
Pearson Product- Moment Correlation using SPSS
8.4.1 SPSS Output
8.4.2 Significance of the Correlation Coefficient
8.4.3 Hypothesis Testing for Significant
Correlation
8.4.4 To Obtain a Scatter Plot using SPSS
8.5
Spearman Rank Order Correlation Coefficient
8.6
Spearman Rank Order Correlation Using SPSS
Summary
Key Terms
122
122
123
129
130
130
131
Linear Regression
9.1
What is Simple Linear Regression?
9.2
Estimating Regression Coefficient
9.3
Significant Test for Regression Coefficients
9.3.1 Testing the Assumption of Linearity
9.3.2 Testing the Significance of the Slope
9.4
Simple Linear Regression using SPSS
9.5
Multiple Regression
9.6
Multiple Regression using SPSS
137
137
138
140
140
141
142
145
148
Topic 9
125
127
132
133
134
136
136
vi
TABLE OF CONTENTS
Topic 10
Summary
Key Terms
152
152
Non-parametric Tests
10.1
Parametric Versus Non-Parametric Tests
10.2
Chi Square Tests
10.2.1 One Variable or Goodness-of-Fit Test
10.2.2 2 Test for Independence: 2 X 2
10.3
Mann-Whitney U tests
10.4
Kruskal-Wallis Rank Sum Tests
Summary
Key Terms
153
153
157
157
161
167
173
178
179
Appendix
183
COURSE GUIDE
viii
COURSE GUIDE
COURSE GUIDE
ix
2.
3.
COURSE GUIDE
4.
2.
Course Synopsis
To enable you to achieve the FOUR objectives of the course, HMEF5113 is
divided into 10 topics. Specific objectives are stated at the start of each topic,
indicating what you should be able to do after completing the topic.
Topic 1:
Introduction
The topic introduces the meaning of Statistics and explains the
difference between descriptive and inferential statistics. As
inferential statistics is used to make inferences about the
population on specific variables based on a sample, this topic also
explains the meanings of different types of variables and
highlights the different sampling techniques in educational
research.
Topic 2:
Descriptive Statistics
The topic introduces the different descriptive statistics, namely the
mean, the median, the mode and the standard deviation, and how
they are computed. SPSS procedures on how to obtain these
descriptive statistics are also provided.
Topic 3:
COURSE GUIDE
xi
Topic 4:
Hypothesis Testing
The topic explains the difference between the null and alternative
hypotheses and their use in research. It also introduces the
concepts of Type I error and Type II error. It illustrates the
difference between the two-tailed and one-tailed tests and
explains when they are used in hypothesis testing.
Topic 5:
T - test
This topic explains what the t-test is and its use in hypothesis
testing. It also highlights the assumptions for using the t-test. Two
types of t-test are elaborated in the topic. The first one is the t-test
for independent means, while the second one is the t-test for
dependent means. Computation of the t-statistic using formulae,
as well as the SPSS procedures, is explained.
Topic 6:
Topic 7:
Analysis of Covariance
This topic explains what analysis of covariance (ANCOVA) is
about and the assumptions for using ANCOVA in hypothesis
testing. It also demonstrates how to compute and interpret
ANCOVA using SPSS.
Topic 8:
Correlation
This topic explains the concept of linear relationship between
variables. It discusses the use of statistical tests to determine
correlation and demonstrates how to compute correlation between
variables using SPSS and interpret correlation results.
Topic 9:
Linear Regression
This topic explains the concept of causal relationship between
variables. It discusses the use of statistical tests to determine slope,
intercept and the regression equation. It also demonstrates how to
run regression analysis using SPSS and interpret the results.
xii
COURSE GUIDE
Topic 10:
Non-parametric Tests
This topic provides a brief explanation on the parametric and nonparametric test. Detailed description on chi-square, MannWhitney and Kruskal-Wallis tests and the assumptions underlying
these statistical techniques are provided to facilitate student
learning. It demonstrates how the non-parametric statistical
procedures can be computed using formulae as well as SPSS and
how the statistical results should be interpreted.
INTRODUCTION
Lists the headings and subheadings of each topic to provide an overview of the
contents of the topic and prepare you for the major concepts to be studied and
learned.
LEARNING OUTCOMES
This is a listing of what you should be able to do after successful
completion of a topic. In other words, whether you are be able to explain,
compare, evaluate, distinguish, list, describe, relate and so forth. You
should use these indicators to guide your study. When you have finished a
topic, you must go back and check whether you have achieved the learning
outcomes or be able to do what is required of you. If you make a habit of
doing this, you will improve your chances of understanding the contents of
the course.
Copyright Open University Malaysia (OUM)
COURSE GUIDE
xiii
SELF-CHECK
Questions are interspersed at strategic points in the topic to encourage
review of what you have just read and retention of recently learned
material. The answers to these questions are found in the paragraphs
before the questions. This is to test immediately whether you have
understood the few paragraphs of text you have read. Working through
the questions will help you determine whether you understand the topic
ACTIVITY
These are situations drawn from research projects to show how
knowledge of the principles of research methodology may be applied to
real-world situations. The activities illustrate key points and concepts
dealt with in each topic.
The main ideas of each topic are listed in brief sentences to provide a review of
the content. You should ensure that you understand every statement listed. If
you do not, go back to the topic and find out what you do not know.
Key Terms discussed in the topic are placed at the end of each topic to make you
aware of the main ideas. If you are unable to explain these terms, you should go
back to the topic to clarify.
xiv
COURSE GUIDE
DISCUSSION QUESTIONS:
At the end of each topic, a list of questions is presented that are best solved
through group interaction and discussion. You can answer the questions
individually. But, you are encouraged to work with your coursemates and
discuss online and during the seminar sessions.
At the end of each topic a list of articles and titles of books is provided that is
directly related to the contents of the topic. As far as possible the articles and
books suggested for further reading will be available in OUMs Digital Library
(which you can access) and OUMs Library. Also, relevant Internet resources are
made available to enhance your understanding of selected curriculum concepts
and principles as applied in real-world situations.
COURSE GUIDE
xv
Facilitator
Your facilitator will mark your assignment. Do not hesitate to discuss during the
seminar session or online if:
You do not understand any part of the course content or the assigned
readings
2.
The most important step is to read the contents of this Course Guide
thoroughly.
(b)
Organise a study schedule. Note the time you are expected to spend
on each topic and the date for submission of your assignment as well
as seminar and examination dates. These are stated in your Course
Assessment Guide. Put all this information in one place, such as your
diary or a wall calendar. Whatever method you choose to use, you
should decide on and jot down your own dates for working on each
topic. You have some flexibility as there are 10 topics spread over a
period of 14 weeks.
(c)
Once you have created your own study schedule, make every effort to
stick to it. The main reason students are unable to cope is because
they get behind in their coursework.
xvi
COURSE GUIDE
(d)
Work through the topic. (The contents of the topic have been
arranged to provide a sequence for you to follow)
Do the Activities (to see if you can apply the concepts learned to
real-world situations)
(e)
When you have completed the topic, review the learning outcomes to
confirm that you have achieved them and are able to do what is
required.
(f)
If you are confident, you can proceed to the next topic. Proceed topic
by topic through the course and try to pace your study so that you
keep yourself on schedule.
(g)
After completing all topics, review the course and prepare yourself for
the final examination. Check that you have achieved all topic learning
outcomes and the course objectives (listed in this Course Guide).
FINAL REMARKS
Once again, welcome to the course. To maximise your gain from this course
you should try at all times to relate what you are studying to the real world.
Look at the environment in your institution and ask yourself whether the ideas
discussed apply. Most of the ideas, concepts and principles you learn in this
course have practical applications. It is important to realise that much of what
Copyright Open University Malaysia (OUM)
COURSE GUIDE
xvii
COURSE ASSIGNMENT
GUIDE
xx
xxi
INTRODUCTION
This guide explains the basis on which you will be assessed in this course during
the semester. It contains details of the facilitator-marked assignments, final
examination and participation required for the course.
One element in the assessment strategy of the course is that all students should
have the same information as facilitators about the answers to be assessed.
Therefore, this guide also contains the marking criteria that facilitators will use in
assessing your work.
Please read through the whole guide at the beginning of the course.
ACADEMIC WRITING
(a)
Plagiarism
(i)
What is Plagiarism?
Any written assignment (essays, project, take-home exams, etc)
submitted by a student must not be deceptive regarding the abilities,
knowledge or amount of work contributed by the student. There are
many ways that this rule can be violated. Among them are:
Other
sources:
Works by
others:
Duplication
The student submits the same essay for two or more courses.
Copyright Open University Malaysia (OUM)
xxii
(b)
(c)
Documenting Sources
Whenever you quote, paraphrase, summarise, or otherwise refer to the
work of another, you are required to cite its original source documentation.
Offered here are some of the most commonly cited forms of material.
Direct Citation
Indirect Citation
Referencing
All sources that you cite in your paper should be listed in the Reference
section at the end of your paper. Here is how you should do your
Reference.
xxiii
Journal Article
Online Journal
Webpage
Book
Article in a
Book
Printed
Newspaper
ASSESSMENT
Please refer to myVLE.
xxiv
Topic
Introduction
to Statistics
LEARNING OUTCOMES
By the end of this topic, you should be able to:
1. Define statistics;
2. Differentiate between descriptive and inferential statistics;
3. Compare the different types of variables;
4. Explain the importance of sampling; and
5. Differentiate between the types of sampling procedures.
INTRODUCTION
This topic introduces the meaning of statistics and explains the difference between
descriptive and inferential statistics. As inferential statistics is used to make
inference about the population on specific variables based on a sample, this topic
also explains the meanings of different types of variables and highlights the
different sampling techniques in educational research.
1.1
WHAT IS STATISTICS?
Note that the word "mathematics" is mentioned in two of the definitions above,
while "science" is stated in the other definition. Some students are afraid of
mathematics and science. These students feel that since they are from the fields of
humanities and social sciences, they are weak in mathematics. Being terrified of
mathematics does not just happen overnight. Chances are that you may have had
bad experiences with mathematics in earlier years (Kranzler, 2007).
Fear of mathematics can lead to a defeatist attitude which may affect the way you
approach statistics. In most cases, the fear of statistics is due to irrational beliefs.
Just because you had difficulty in the past, does not mean that you will always
have difficulty with quantitative subjects. You have come this far in your
education and by doing this course in statistics, it is not likely that you are an
incapable person.
You have to convince yourself that statistics is not a difficult subject and you need
not worry about the mathematics involved. Identify your irrational beliefs and
thoughts about statistics. Are you telling yourself: "I'll never be any good in
statistics." I'm a loser when it comes to anything dealing with numbers," or
"What will other students think of me if I do badly?"
INTRODUCTION TO STATISTICS
For each of these irrational beliefs about your abilities, ask yourself what evidence
is there to suggest that "you will never be good in statistics" or that "you are weak
at mathematics." When you do that, you will begin to replace your irrational
beliefs with positive thoughts and you will feel better. You will realise that your
earlier beliefs about statistics are the cause of your unpleasant emotions. Each
time you feel anxious or emotionally upset, question your irrational beliefs. This
may help you to overcome your initial fears.
Keeping this in mind, this course has been written by presenting statistics in a
form that appeals to those who fear mathematics. Emphasis is on the applied
aspects of statistics and with the aid of a statistical software called Statistical
Package for the Social Sciences (or better known as SPSS), you need not worry
too much about the intricacies of mathematical formulas. Computations of
mathematical formulas have been kept to a minimum. Nevertheless, you still need
to know about the different formulas used, what they mean and when they are
used.
1.2
Statistics are all around you. Television uses a lot of statistics: for example, when
it reports that during the holidays, a total of 134 people died in traffic accidents;
the stock market fell by 26 points; or that the number of violent crimes in the city
has increased by 12%. Imagine a football game between Manchester United and
Liverpool and no one kept score! Without statistics, you could not plan your
budget, pay your taxes, enjoy games to their fullest, evaluate classroom
performance and so forth. Are you beginning to get the picture? We need
statistics. Generally, there are two kinds of statistics:
Descriptive Statistics
Inferential Statistics
1.2.1
Descriptive Statistics
Descriptive statistics are used to describe the basic features of the data in a study.
Historically, descriptive statistics began during Roman times when the empire
undertook census of births, deaths, marriages and taxes. They provide simple
summaries about the sample and the measures. Together with simple graphics
analysis, they form the basis of virtually every quantitative analysis of data. With
descriptive statistics, you are simply describing what is or what the data show.
1.2.2
Inferential Statistics
1.2.3
Descriptive statistics and inferential statistics are interrelated. You must always
use techniques of descriptive statistics to organise and summarise the information
obtained from a sample before carrying out an inferential analysis. Furthermore,
the preliminary descriptive analysis of a sample often reveals features that lead
you to the choice of the appropriate inferential method.
Copyright Open University Malaysia (OUM)
INTRODUCTION TO STATISTICS
As you proceed through this course, you will obtain a more thorough
understanding of the principles of descriptive and inferential statistics. You should
establish the intent of your study. If the intent of your study is to examine and
explore the data obtained for its own intrinsic interest only, the study is
descriptive. However, if the information is obtained from a sample of a population
and the intent of the study is to use that information to draw conclusions about the
population, the study is inferential. Thus, a descriptive study may be performed on
a sample as well as on a population. Only when an inference is made about the
population, based on data obtained from the sample, does the study become
inferential.
SELF-CHECK 1.1
1. Define statistics.
2. Explain the differences between descriptive and inferential statistics.
3. When would you use the two types of statistics?
4. Explain two ways in which descriptive statistics and inferential
statistics are interrelated.
1.3
VARIABLES
Before you can use a statistical tool to analyse data, you need to have data which
have been collected. What is data? Data is defined as pieces of information which
are processed or analysed to enable interpretation. Quantitative data consist of
numbers, while qualitative data consist of words and phrases. For example, the
scores obtained from 30 students in a mathematics test are referred to as data. To
explain the performance of these students you need to process or analyse the
scores (or data) using a calculator or computer or manually. We collect and
analyse data to explain a phenomenon. A phenomenon is explained based on the
interaction between two or more variables. The following is an example of a
phenomenon:
Intelligence Quotient (IQ) and Attitude Influence
Performance in Mathematics
Note that there are THREE variables explaining the particular phenomenon,
namely, Intelligence Quotient, Attitude and Mathematics Performance.
What is a Variable?
A variable is a construct that is deliberately and consciously invented or adopted
for a special scientific purpose. For example, the variable Intelligence is a
construct based on observation of presumably intelligent and less intelligent
behaviours. Intelligence can be specified by observing and measuring using
intelligence tests, as well as interviewing teachers about intelligent and less
intelligent students. Basically, a variable is something that varies and has a
value. A variable is a symbol to which are assigned numerals or values. For
example, the variable mathematics performance is assigned scores obtained
from performance on a mathematics test and may vary or range from 0 to 100.
A variable can be either a continuous variable or categorical variable. In the
case of the variable gender there are only two values, i.e. male and female, and
is called a categorical variable. Other examples of categorical variables include
graduate non-graduate, low income high income, citizen non-citizen. There
are also variables which have more than two values. For example, religion such as
Islam, Christianity, Sikhism, Buddhism and Hinduism may have several values.
Categorical variable are also known as nominal variables. A continuous variable
has numeric value like 1, 2, 3, 4, 10...etc. An example is the scores on
mathematics performance which range from 0 to 100. Other examples are salary,
age, IQ, weight, etc.
When you use any statistical tool, you should be very clear on which variables
have been identified as independent and which are dependent variables.
1.3.1
Independent Variable
INTRODUCTION TO STATISTICS
1.3.2
Dependent Variable
Put it another way, the DV is the variable predicted to, whereas the independent
variable is predicted from. The DV is the presumed effect, which varies with
changes or variation in the independent variable.
1.4
Even though there are general principles of the discovery method, its application
in the classroom may vary. In other words, you have to define the variable
operationally or how it is used in the experiment.
SELF-CHECK 1.2
1.
What is a variable?
2.
3.
1.5
SAMPLING
Every day, we make judgments and decisions based on samples. For example,
when you pick a grape and taste it before buying the whole bunch of grapes, you
are doing a sampling. Based on the one grape you have tasted, you will make the
decision whether to buy the grapes or not. Similarly, when a teacher asks a student
two or three questions, he is trying to determine the students grasp of an entire
subject. People are not usually aware that such a pattern of thinking is called
sampling.
Sample is that part of the population or universe which we select for the
purpose of investigation. The sample is used as an "example" and in fact the
word sample is derived from the Latin exemplum, which means example. A
sample should exhibit the characteristics of the population or universe; it
should be a "microcosm," a word which literally means "small universe." In
Figure 1.2, the sample also consists of one #, $, @, & and %.
INTRODUCTION TO STATISTICS
In most studies, investigation of the sample is the only way of finding out
about a particular phenomenon. In some cases, due to financial, time and
physical constraints, it is practically impossible to study the whole population.
Hence, an investigation of the sample is the only way of making a study.
If one were to study the population, then every item in the population is
studied. Imagine having to study 500,000 Form 5 students in Malaysia!
Wonder what the costs will be! Even if you have the money and time to study
the entire population of Form 5 students in the country, it may take so much
time that the findings will be no use by the time they become available.
10
Studying the population may not be necessary, since we have sound sampling
techniques that will yield satisfactory results. Of course, we cannot expect
from a sample exactly the same answer that might be obtained from studying
the whole population.
1.6
SAMPLING TECHNIQUES
When some students are asked how they selected the sample for a study, quite a
few are unable to explain convincingly the techniques used and the rationale for
selecting the sample. If you have to draw a sample, you must choose the method
for obtaining the sample from the population. In making that choice, keep in mind
that the sample will be used to draw conclusions about the entire population.
Consequently, the sample should be a representative sample, that is, it should
reflect as closely as possible the relevant characteristics of the population under
consideration.
1.6.1
All individuals in the defined population have an equal and independent chance of
being selected as a member of the sample. Independent means that the selection
of one individual does not affect in any way the selection of any other individual.
So, each individual, event or object has an equal probability of being selected.
Copyright Open University Malaysia (OUM)
INTRODUCTION TO STATISTICS
11
Suppose for example there are 10,000 Form 1 students in a particular district and
you want to select a simple random sample of 500 students, when we select the
first case, each student has one chance in 10,000 of being selected. Once the
student is selected, the next student to be selected has a 1 in 9,999 chance of being
selected. Thus, as each case is selected, the probability of being selected next
changes slightly because the population from which we are selecting has become
one case smaller.
Using a Table of Random Numbers (refer to Figure 1.3) to select a sample,
obtain a list of all Form 1 students in Daerah Petaling and assign a number to each
student. Then, get a table of random numbers which consists of a long series of
three or four digit numbers generated randomly by a computer. Using the table,
you randomly select a row or column as a starting point, then select all the
numbers that follow in that row or column. If more numbers are needed, proceed
to the next row or column until enough numbers have been selected to make up
the desired sample.
Say, for example, you choose line 3 and begin your selection. You will select
student #265, followed by student #313 and student #492. When you come to
805 you skip the number because you only need numbers between 1 and 500.
You proceed to the next number, i.e. student #404. Again you skip 550 and
proceed to select student #426. You continue until you have selected all 500
students to form your sample. To avoid repetition, you also eliminate numbers
that have occurred previously. If you have not found enough numbers by the time
you reach the bottom of the table, you move over to the next line or column.
12
SELF-CHECK 1.3
1.
2.
3.
1.6.2
Systematic Sampling
1.6.3
Stratified Sampling
In certain studies, the researcher wants to ensure that certain sub-groups or stratum
of individuals are included in the sample and for this stratified sampling is
preferred. For example, if you intend to study differences in reasoning skills among
students in your school according to socio-economic status and gender, random
sampling may not ensure that you have sufficient number of male and female
students with the socio-economic levels. The size of the sample in each stratum is
Copyright Open University Malaysia (OUM)
INTRODUCTION TO STATISTICS
13
160
140
360
340
TOTAL
1,000
14
ACTIVITY 1.3
Male, full-time teachers
Male, part-time teachers
Female, full-time teachers
Female, part-time teachers
=
=
=
=
90
18
63
9
The data above shows the number of full-time and part-time teachers in a
school according to gender.
Select a sample of 40 teachers using stratified sampling.
1.6.4
Cluster Sampling
In cluster sampling, the unit of sampling is not the individual but rather a naturally
occurring group of individuals. Cluster sampling is used when it is more feasible
or convenient to select groups of individuals than it is to select individuals from a
defined population. Clusters are chosen to be as heterogeneous as possible, that is,
the subjects within each cluster are diverse and each cluster is somewhat
representative of the population as a whole. Thus, only a sample of the clusters
needs to be taken to capture all the variability in the population.
For example, in a particular district there are 10,000 households clustered into 25
sections. In cluster sampling, you draw a random sample of five sections or
clusters from the list of 25 sections or clusters. Then, you study every household
in each of the five sections or clusters. The main advantage of cluster sampling is
that it saves time and money. However, it may be less precise than simple random
sampling.
1.7
SPSS SOFTWARE
INTRODUCTION TO STATISTICS
15
Descriptive statistics include the construction of graphs, charts and tables and
the calculation of various descriptive measures such as averages (means) and
measures of variation (standard deviations).
Operational definition means that variables used in the study must be defined
as it is used in the context of the study.
Population (universe) is defined as an aggregate of people, objects, items, etc.
possessing common characteristics, while sample is that part of the population
or universe we select for the purpose of investigation.
In cluster sampling, the unit of sampling is not the individual but rather a
natural group of individuals.
Copyright Open University Malaysia (OUM)
16
Cluster sampling
Dependent variable
Descriptive statistics
Independent variable
Inferential statistics
Nominal variable
Ordinal variable
Random sampling
Sampling
Statistics
Stratified sampling
Systematic sampling
Variable
Topic Descriptive
Statistics
LEARNING OUTCOMES
By the end of this topic, you should be able to:
1. Explain what is meant by descriptive statistics;
2. Compute the mean;
3. Compute the standard deviation;
4. Explain the implication of differences in standard deviations;
5. Identify the median and the mode; and
6. Explain the types of charts used to display data.
INTRODUCTION
This topic introduces the different descriptive statistics, namely the mean, the
median, the mode and the standard deviation, and how they are computed. SPSS
procedures on how to obtain these descriptive statistics are also provided.
2.1
18
scores. Graphical methods are better suited than numerical methods for
identifying patterns in the data. Numerical approaches are more precise and
objective.
Descriptive statistics are typically distinguished from inferential statistics. With
descriptive statistics you are simply describing what is or what the data show
based on the sample. With inferential statistics, you are trying to reach
conclusions based on the sample that extend beyond the immediate data. For
instance, we use inferential statistics to infer from the sample data what the
population might think. Or, we use inferential statistics to make judgments of the
probability that an observed difference between groups is dependable or might
have happened by chance in this study. Thus, we use inferential statistics to make
inferences from our data to more general conditions; we use descriptive statistics
simply to describe what is going on in our data.
Descriptive statistics are used to present quantitative descriptions in a manageable
form. In a research study, we may have lots of measures or we may measure a
large number of people on any measure. Descriptive statistics help us to simply
depict large amounts of data in a sensible way. Each descriptive statistic reduces
lots of data into a simpler summary. For instance, consider Grade Point Average
(GPA). This single number describes the general performance of a student across
a potentially wide range of course experiences. The number describes a large
number of discrete events such as the grade obtained for each subject taken.
However, every time you try to describe a large set of observations with a single
indicator you run the risk of distorting the original data or losing important details.
The GPA does not tell you whether a student was in a difficult or easy course, or
whether the student was taking courses in his major field or in other disciplines.
Given these limitations, descriptive statistics provide a powerful summary of
phenomena that may enable comparisons across people or other units.
2.2
2.2.1
Mean and the standard deviation are the most widely used statistical tools in
educational and psychological research. Mean is the most frequently used
measure of central tendency, while standard deviation is the most frequently used
measure of variability or dispersion.
19
Mean or X =
23 + 22 + 26 + 21 + 30 + 24 + 20 + 27 + 25
+ 32
= 250
X 250
25.0
N
10
In the computation of the mean, every item counts. As a result, extreme values at
either end of the group or series of scores severely affect the value of the mean.
The mean could be "pulled towards" as a result of the extreme scores which may
give a distorted picture of the groups or series of scores or data.
However, in general, the mean is a good measure of central tendency for roughly
symmetric distributions but can be misleading in skewed distributions (see the
example on page 20) since it can be greatly influenced by extreme scores.
2.2.2
Median
Median is the score found at the exact middle of the set of values. One way to
compute the median is to list all scores in ascending order and then locate the
score in the centre of the sample. For example, if we order the following seven
scores as shown below, we would get:
12, 18, 22, 25, 30, 37, 40
Score 25 is the median because it represents the halfway point for the distribution
of scores.
Look at this set of eight scores. What is the median score?
15, 15, 15, 20, 20, 21, 25, 36
There are eight scores. The fourth score (20) and the fifth score (20) represent the
halfway point. Since both of these scores are 20, the median is 20.
20
If the two middle scores had different values, you have to interpolate to determine
the median by adding up the two values and dividing the sum by 2. For example,
15, 15, 15, 18, 20, 21,
25, 36
2.2.3
Mode
Mode is the most frequently occurring value in the set of scores. To determine the
mode, you might again order the scores as shown below and then count each one.
15, 15, 15, 20, 20, 21, 25, 36
The most frequently occurring value is the mode. In our example, the value 15
occurs three times and is the mode. In some distributions, there is more than one
modal value. For instance, in a bimodal distribution there are two values that
occur most frequently.
If the distribution is truly normal (i.e. bell-shaped), the mean, median and mode
are all equal to each other.
Should You Use the Mean or the Median?
The mean and median are two common measures of central tendencies of a
typical score in a sample. Which of these two should you use when describing
your data? It depends on your data. In other words, you should ask yourself
whether the measure of central tendency you have selected gives a good
indication of the typical score in your sample. If you suspect that the measure of
central tendency selected does not give a good indication of the typical score, then
you most probably have chosen the wrong one.
The mean is the most frequently used measure of central tendency and it should
be used if you are satisfied that it gives a good indication of the typical score in
your sample. However, there is a problem with the mean. Since it uses all the
scores in a distribution, it is sensitive to extreme scores.
Example:
The mean for these set of nine scores:
20 + 22 + 25 + 26 + 30 + 31 + 33 + 40 + 42 is 29.89
Copyright Open University Malaysia (OUM)
21
If we were to change the last score from 42 to 70, see what happens to the mean:
20 + 22 + 25 + 26 + 30 + 31 + 33 + 40 + 70 is 33.00
Obviously, this mean is not a good indication of the typical score in this set of
data. The extreme score has changed the mean from 29.89 to 33.00. If these were
test scores, it may give the impression that students performed better in the later
test when in fact only one student scored highly.
NOTE: Keep in mind this characteristic when interpreting the mean
obtained from a set of data.
If you find that you have an extreme score and you are unable to use the mean,
then you should use the median. The median is not sensitive to extreme scores. If
you examine the above example, the median is 30 in both distributions. The
reason is simply that the median score does not depend on the actual scores
themselves beyond putting them in ascending order. So the last score in a
distribution could be 80, 150 or 5,000 and the median still would not change. It is
this insensitivity to extreme scores that makes the median useful when you cannot
use the mean.
2.3
MEASURES OF VARIABILITY OR
DISPERSION
Variability or dispersion refers to the spread of the values around the central
tendency. There are two common measures of dispersion, the range and the
standard deviation.
2.3.1
Range
Range is simply the highest value minus the lowest value. For example, in a
distribution, if the highest value is 36 and the lowest is 15, the range is 36 15 = 21.
22
2.3.2
Standard Deviation
n 1 i 1
(a)
X X
N 1
(b)
23
Column 2
X X
X X
23
22
26
21
30
24
20
27
25
32
23 25 = 2
22 25 = 3
26 25 = + 1
21 25 = 4
30 25 = + 5
24 25 = + 1
20 25 = 5
27 25 = + 2
25 25 = 0
32 25 = + 7
4
9
1
16
25
1
25
4
0
49
Column 3
2
X X = 134
2
X = 25
Apply the formula:
Std. Deviation =
(c)
X X
N1
134
134
3.8586
10 1
9
24
In Class B (Figure 2.2), there is low variance or a small standard deviation which
explains why most of the scores are clustered around the mean. Most of the scores
are bunching around the mean i.e. most of the scores are 3 from the mean. If
the mean is 50, approximately 95% of the students scored between 47 and 53.
ACTIVITY 2.1
Below are the scores obtained by students in two classes on a history test:
Class A marks: 15, 25, 20, 20, 18, 22, 16, 24, 28, 12
Class B marks: 10, 30, 13, 27, 16, 24, 5, 35, 28, 12
(a) Compute the mean of the two classes.
(b) Compute the standard deviation of the two classes.
(c) Explain the implication of differences in standard deviations.
25
FREQUENCY DISTRIBUTION
2.4
2.4.1
Tables
Tables can contain a great deal of information but they also take up a lot of space
and may overwhelm readers with details. How should tables be presented in a
manner that can be easily understood? In general, frequency tables are best for
variables with different numbers of categories (see Table 2.2).
Table 2.2: Question: Should Sex Education be Taught in Secondary School?
Frequency
Percent
Valid Percent
Cumulative
Percent
4. Strongly Agree
7.7
7.7
7.7
3. Agree
23.1
23.1
30.8
2. Disagree
30.8
30.8
61.5
1. Strongly Disagree
38.5
38.5
100.0
13
100.0
100.0
Total
Table 2.2 summarises the responses of 13 teachers with regard to the teaching of
sex education in secondary school.
The first column contains the values or categories of the variables (opinion
on teaching sex education in schools extent of agreement).
The percent column lists the percentage of the whole sample in each
category. These percentages are based on the total sample size, including
those who did not answer the question. Those who did not answer will be
shown as missing cases in this column.
The valid percent column contains the percentage of those who gave a valid
response to the question that belongs to each category. When there are no
missing cases, the valid percent column is similar to the percent column.
Copyright Open University Malaysia (OUM)
26
2.4.2
SPSS Procedure
2.
3.
Select the variable(s) you require (i.e. opinion on sex education) and click
on the button to move the variable into the Variables(s) box.
4.
5.
In the Central Tendency box, select the Mean, Median and Mode check
boxes.
6.
In the Dispersion box, select the Std. deviation and Range check boxes.
7.
2.5
GRAPHS
2.5.1
Bar Charts
The following are elements of a graph that should be given due consideration
(refer to Figure 2.3):
The X-axis represents the values of the variables being displayed. The Xaxis may be divided into discrete categories (bar charts) or continuous
Copyright Open University Malaysia (OUM)
27
values (line graphs). Which units are used depend on the level of
measurement of the variable being graphed.
In the example in Figure 2.3, the X-axis represents the students gain scores
after undergoing an innovative instructional programme.
28
2.5.2
Histogram
Histograms are different from bar charts because they are used to display
continuous variables (see the histogram in Figure 2.4).
Figure 2.4: Percentage who agreed that sex education should be taught
in secondary schools
The X-axis represents the different age groups, while the Y-axis represents the
percentages of respondents.
Each bar in the X-axis represents one age group in ascending order.
The Y-axis in this case represents the percentages of respondents in the Sex
Education survey.
Among the 18 to 28 age group, only 20% agreed that sex education should
be taught in schools compared to 60% in the 51 to 61 age group.
About 40% in the 40 to 50 age group and 50% among the 29 to 39 age
group agreed that sex education should be taught in secondary schools.
Only 10% of those aged 73 years and older agreed that secondary school
students should be taught sex education.
2.5.3
29
Line Graphs
The line graph serves a similar function as a histogram. It should be used for
continuous variables. The main differences between a line graph and a histogram
are that on a line graph, the frequency of any value on the X-axis is represented by
a point on a line rather than by a single column and the values of the continuous
variable are not automatically grouped into a smaller number of groups as they are
in histograms. As such, the line graph reflects the frequencies or percentages of
every value of the x variable and thus avoids potential distortions due to the way
in which the values are grouped.
The line graph in Figure 2.5 shows the frequency of using the library among a
group of male and female respondents. The level of measurement of the Y-axis
variable is ordinal or interval. Line graphs are more suitable for variables that
have more than five or six categories. They are less suited for variables with a
very large number of values as this can produce a very jagged and confusing
graph.
Since a separate line is produced for each category of the x variable, only x
variables with a small numbers of categories should be used. This will normally
mean that the x variable is a nominal or ordinal variable.
30
ACTIVITY 2.2
Interpret the line graph (Figure 2.5) showing the frequency of a group of
respondents visiting the library. A separate line is used for male and
female respondents.
Mean, median and mode are common descriptive statistics used to measure
central tendency, while standard deviation is the commonly used statistic to
measure variability or dispersion of data.
Graphs are also used to condense large sets of data and these include the use
of bar charts, histograms and line graphs.
Frequency distribution
Graphs
Mean
Measures of central tendency
Measures of variability or dispersion
Median
Mode
Range
Standard deviation
Topic Normal
Distribution
LEARNING OUTCOMES
By the end of this topic, you should be able to:
1.
2.
3.
4.
5.
INTRODUCTION
This topic explains what normal distribution is and introduces the graphical as
well as the statistical techniques used in assessing normality. It also presents SPSS
procedures for assessing normality.
3.1
Now that you know what mean stands for, as well as the standard deviation of a
set of scores, we can proceed to examine the concept of normal distribution. The
normal curve was developed mathematically in 1733 by DeMoivre as an
approximation to the binomial distribution. Laplace used the normal curve in 1783
to describe the distribution of errors. However, it was Gauss who popularised the
normal curve when he used it to analyse astronomical data in 1809 and it became
known as the Gaussian distribution.
The term normal distribution refers to a particular way in which scores or
observations tend to pile up or distribute around a particular value rather than be
Copyright Open University Malaysia (OUM)
32
3.2
Many kinds of statistical tests (such as t-test, ANOVA) are derived from a
normal distribution. In other words, most of these statistical tests work best
when the sample tested is distributed normally.
Fortunately, these statistical tests work very well even if the distribution is only
approximately normally distributed. Some tests work well even with very wide
deviations from normality. They are described as robust tests that are able to
tolerate the lack of a normal distribution.
3.3
33
Mean is 100.
As you can see, the distribution is symmetric. If you folded the graph in the
centre, the two sides would match, i.e. they are identical.
3.3.1
The centre of the distribution is the mean. The mean of a normal distribution is
also the most frequently occurring value (i.e. the mode) and it is also the value
that divides the distribution of scores into two equal parts (i.e. the median). In any
normal distribution, the mean, median and the mode all have the same value (i.e.
100 in the example above).
34
3.4
THREE-STANDARD-DEVIATIONS RULE
Normal distribution shows the area under the curve. The three-standard-deviations
rule, when applied to a variable, states that almost all the possible observations or
scores of the variable lie within three standard deviations to either side of the
mean. The normal curve is close to (but does not touch) the horizontal axis
outside the range of the three standard deviations to either side of the mean. Based
on the graph in Figure 3.1, you will notice that with a mean of 100 and a standard
deviation of 15;
68% of all IQ scores fall between 85 (i.e. one standard deviation less than the
mean which is 100 15 = 85) and 115 (i.e. one standard deviation more than
the mean which is 100 + 15 = 115).
95% of all IQ scores fall between 70 (i.e. two standard deviations less than the
mean which is 100 30 = 70) and 130 (i.e. two standard deviations more than
the mean which is 100 + 30 = 130).
99% of all IQ scores fall between 55 (i.e. three standard deviations less than
the mean which is 100 45 = 55) and 145 (i.e. three standard deviations more
than the mean which is 100 + 45 = 145).
A normal distribution can have any mean and standard deviation. However, the
percentage of cases or individuals falling within one, two or three standard
deviations from the mean is always the same. The shape of a normal distribution
does not change. Means and standard deviations will differ from variable to
variable but the percentage of cases or individuals falling within specific intervals
is always the same in a true normal distribution.
35
ACTIVITY 3.1
1. What is meant by the statement that a population is normally
distributed?
2. Two normally distributed variables have the same means and the
same standard deviations. What can you say about their distributions?
Explain your answer.
3. Which normal distribution has a wider spread: the one with mean 1
and standard deviation 2 or the one with mean 2 and standard
deviation 1? Explain your answer.
4. The mean of a normal distribution has no effect on its shape. Explain.
5. What are the parameters for a normal curve?
3.5
Often in statistics, one would like to assume that the sample under investigation
has a normal distribution or an approximate normal distribution. However, such
an assumption should be supported in some way by some techniques. As
mentioned earlier, the use of several inferential statistics such as the t-test and
ANOVA require that the distribution of the variables analysed are normally
distributed or at least approximately normally distributed. However, as discussed
in Topic 1, if a simple random sample is taken from a population, the distribution
of the observed values of a variable in the sample will approximate the
distribution of the population. Generally, the larger the sample, the better the
approximation tends to be. In other words, if the population is normally
distributed, the sample of observed values would also be normally distributed if
the sample is randomly selected and it is large enough.
3.5.1
36
sample is reasonably large and it comes from a normal population, its distribution
should look more or less normal.
For example, when you administer a questionnaire to a group of school principals,
you want to be sure that your sample of 250 principals is normally distributed.
Why? The assumption of normality is a prerequisite for many inferential
statistical techniques and there are two main ways of determining the normality of
distribution.
The normality of a distribution can be determined using graphical methods (such
as histograms, stem-and-leaf plots and boxplots) or using statistical procedures
(such as the Kolmogorov-Smirnov statistic and the Shapiro-Wilk statistics).
SPSS Procedures for Assessing Normality
There are several procedures to obtain the different graphs and statistics to
assess normality, for example the EXPLORE procedure is the most convenient
when both graphs and statistics are required.
From the main menu, select Analyse.
Click Descriptive Statistics and then Explore ....to open the Explore dialogue
box.
Select the variable you require and click the arrow button to move this variable
into the Dependent List: box.
Click the Plots...command push button to obtain the Explore: Plots subdialogue box.
Click the Histogram check box and the Normality plots with tests check box,
and ensure that the Factor levels together radio button is selected in the
Boxplots display.
Click Continue.
In the Display box, ensure that Both is activated.
Click the Options...command push button to open the Explore: Options subdialogue box.
In the Missing Values box, click the Exclude cases pairwise (if not selected
by default)
Click Continue and then OK.
(a)
37
(b)
38
What does it mean? It means that more students were getting low scores in
the test and this indicates that the test was too difficult. Alternatively, it
could mean that the questions were not clear or the teaching methods and
materials did not bring about the desired learning outcomes.
Refer to Figure 3.4 which shows the distribution of the scores obtained by
students on a test. There is a negative skew because it has a longer tail in the
negative direction or to the left (towards the lower values on the horizontal
axis).
What does it mean? It means that more students were getting high scores on
the test. This may indicate that either the test was too easy or the teaching
methods and materials were successful in bringing about the desired
learning outcomes.
39
(c)
(i)
Low Kurtosis: Data with low kurtosis tend to have a flat top near the
mean rather than a sharp peak.
(ii)
High Kurtosis: Data with high kurtosis tend to have a distinct peak
near the mean, decline rather rapidly and have a heavy tail.
Copyright Open University Malaysia (OUM)
40
If, on the other hand, the kurtosis is flat, its value is less than 0, or
platykurtic (Graph C) and has a negative kurtosis.
(d)
41
42
(i)
The BOX
The box has hinges that form the outer boundaries of the box. The
hinges are the scores that cut off the top and bottom 25% of the data.
Thus, 50% of the scores fall within the hinges. The thick horizontal
line through the box represents the median. In the case of a normal
distribution, the line runs through the centre of the box.
If the median is closer to the top of the box, then the distribution is
negatively skewed. If it is closer to the bottom of the box, then it is
positively skewed.
(ii)
WHISKERS
The smallest and largest observed values within the distribution are
represented by the horizontal lines at either end of the box, commonly
referred to as whiskers.
The two whiskers indicate the spread of the scores.
Scores that fall outside the upper and lower whiskers are classified as
extreme scores or outliers. If the distribution has any extreme scores,
i.e. 3 or more box lengths from the upper or lower hinge, these will be
represented by a circle (o).
Outliers tell us that we should see why it is so extreme. Could it be that
you may have made an error in data entry?
Why is it important to identify outliers? This is because many of the
statistical techniques used involve calculation of means. The mean is
sensitive to extreme scores and it is important to be aware whether
your data contain such extreme scores if you are to draw conclusions
from the statistical analysis conducted.
(e)
43
sample is from a normal distribution, then the observed values or scores fall
more or less in a straight line. The normal probability plot is formed by:
SPSS Procedures
1. Select Analyze from the main menu.
2. Click Descriptive Statistics and then Explore.....to open the Explore
dialogue box.
3. Select the variable you require (i.e. mathematics score) and click on
the arrow button to move this variable to the Dependent List: box.
4. Click the Plots....command push button to obtain the Explore: Plots
sub dialogue box.
5. Click the Histogram check box and the Normality plots with tests
check box and ensure that the Factor levels together radio button is
selected in the Boxplots display.
6. Click Continue.
7. In the Display box, ensure that both are activated.
8. Click the Options....command push button to open the Explore:
Options sub-dialogue box.
9. In the Missing Values box, click on the Exclude cases pairwise radio
button. If this option is not selected then, by default, any variable with
missing data will be excluded from the analysis. That is, plots and
statistics will be generated only for cases with complete data.
10. Click on Continue and then OK.
Note that these commands will give you the 'Histogram', 'Stem-and-leaf
plots', 'Boxplots' and 'Normality Plots'.
44
When you use a normal probability plot to assess the normality of a variable,
you must remember that ascertaining whether the distribution is roughly
linear and is normal is subjective. The graph in Figure 3.10 is an example of
a normal probability plot. Though none of the value falls exactly on the line,
most of the points are very close to the line.
Values that are above the line represent units for which the observation
is larger than its normal score
Values that are below the line represent units for which the observation
is smaller than its normal score
Note that there is one value that falls well outside the overall pattern of the
plot. It is called an outlier and you will have to remove the outlier from the
sample data and redraw the normal probability plot.
45
Even with the outlier, the values are close to the line and you can conclude
that the distribution will look like a bell-shaped curve. If the normal scores
plot departs only slightly from having all of its dots on the line, then the
distribution of the data departs only slightly from a bell-shaped curve. If one
or more of the dots departs substantially from the line, then the distribution
of the data is substantially different from a bell-shaped curve.
Outliers:
Refer to the normal probability plot in Figure 3.11. Note that there are
possible outliers which are values lying off the hypothetical straight line.
Outliers are anomalous values in the data which may be due to recording
errors, which may be correctable, or they may be due to the sample not
being entirely from the same population.
46
47
ACTIVITY 3.2
3.5.2
48
basis of a statistical test alone. In particular, when the sample is large, statistical
tests for normality can be sensitive to very small (i.e. negligible) deviations in
normality. Therefore, if the sample is very large, a statistical test may reject the
assumption of normality when the data set, as shown using graphical methods, is
essentially normal and the deviation from normality is too small to be of practical
significance.
(a)
Kolmogorov-Smirnov Test
You could use the Kolmogorov-Smirnov test to evaluate statistically
whether the difference between the observed distribution and a theoretical
normal distribution is small enough to be just due to chance. If it could be
due to chance, you would treat the distribution as being normal. If the
distribution between the actual distribution and the theoretical normal
distribution is larger, then it is likely to be due to chance (sampling error)
and then you would treat the actual distribution as not being normal.
In terms of hypothesis testing, the Kolmogorov-Smirnov test is based on Ho:
that the data are normally distributed. The test is used for samples which
have more than 50 subjects.
H0:
Ha:
DISTRIBUTION: NORMAL
SCORE
Statistic
df
Sig.
.21
1598
.000*
(b)
49
Shapiro-Wilk Test
Another powerful and most commonly employed test for normality is the
Shapiro-Wilk test by Shapiro and Wilk. It is an effective method for testing
whether a data set has been drawn from a normal distribution.
If the normal probability plot has curvature that is evidence of nonnormality in the tails of a distribution, the test statistic will be
relatively low.
In terms of hypothesis testing, the Shapiro-Wilk test is based on Ho: that the
data are normally distributed. The test is used for samples which have less
than 50 subjects.
H0:
Ha:
DISTRIBUTION: NORMAL
Statistic
df
Sig.
Group 1
.912
22
.055
Group 2
166
14
.442
Group 3
.900
16
.084
The Shapiro-Wilk normality tests indicate that the scores are normally
distributed in each of the three groups. All the p-values reported are more
than 0.05 and hence you DO NOT REJECT the null hypothesis.
50
NOTE:
It should be noted that with large samples, even a very small deviation from
normality can yield low significance levels. So a judgment still has to be made as
to whether the departure from normality is large enough to matter.
3.6
You have TWO choices if the distribution is not normal and they are:
(a)
(b)
51
SCORE
Statistic
0.57
df
999
Sig.
.200*
The use of several inferential statistics such as t-tests and ANOVA requires
that the variables analysed are normally distributed or at least approximately
normally distributed.
The graphical methods used to assess normality are the histogram, the boxplot
and the normality probability plot.
The statistical techniques used to assess normality are the KolmogorovSmirnov test and Shapiro-Wilk test.
52
Boxplot
Histogram
Kolmogorov-Smirnov test
Normal distribution
Normality probability plot
Shapiro-Wilk test
Topic Hypothesis
Testing
LEARNING OUTCOMES
By the end of this topic, you should be able to:
1. Explain the difference between null and alternative hypothesis and their
use in research;
2. Differentiate between Type I and Type II errors; and
3. Explain when the two-tailed and one-tailed test is used.
INTRODUCTION
The topic explains the difference between the null and alternative hypotheses and
their use in research. It also introduces the concepts of Type I error and Type II
error. It illustrates the difference between the two-tailed and one-tailed tests and
explains when they are used in hypothesis testing.
4.1
WHAT IS A HYPOTHESIS?
Your car did not start. You have a hunch and put forward the hypothesis that "the
car does not start because there is no petrol." You check the fuel gauge to either
accept or reject the hypothesis. If you find there is petrol, you reject the
hypothesis.
Next, you hypothesise that "the car did not start because the spark plugs are dirty."
You check the spark plugs to determine if they are dirty. You find that the spark
plugs are indeed dirty. You do not reject the hypothesis.
54
(ii)
Children who attend kindergarten are more likely to have higher reading
scores.
(iii) The discovery method of teaching may enhance the creative thinking skills
of students.
(iv) Children who go for tuition tend to perform better in mathematics.
All these are examples of hypotheses. However, these statements are not
particularly useful because of words such as "may," "tend to" and "more likely."
Using these tentative words does not suggest how you would go about proving it.
To solve this problem, a hypothesis should state:
A possible prediction
Examine the hypothesis in Figure 4.1. It has all the attributes mentioned:
The variables are "critical thinking" and "gender," which are both measurable.
55
ACTIVITY 4.1
1. Rewrite the four hypotheses using the formalised style shown. Ensure
that each hypothesis has all the attributes stated.
2. Write two more original hypotheses of your own using this form.
4.2
4.2.1
TESTING A HYPOTHESIS
Null Hypothesis
The null hypothesis is a hypothesis (or hunch) about the population. It represents a
theory that has been put forward because it is believed to be true. The word "null"
means nothing or zero. So, a null hypothesis states that nothing happened. For
example, there is no difference between males and females in critical thinking
skills or there is no relationship between socio-economic status and academic
performance. Such a hypothesis is denoted with the symbol "Ho:". In other words,
you are saying,
56
Say, for example, you conduct an experiment to test the effectiveness of the
discovery method in learning science compared to the lecture method. You select
a random sample of 30 students for the discovery method group and 30 students
for the lecture method group (see Topic 1 on Random Sampling).
Based on your sample, you hypothesise that there are no differences in science
achievement between students in the discovery method group and students in the
lecture method group. In other words, you make the claim that there are no
differences in science scores between the two groups in the population. This is
represented by the following two types of null hypotheses with the following
notation or Ho:
Ho: =
OR
Ho: - = 0
The science mean scores for the discovery method group () is EQUAL to
the mean scores for the lecture method group ( ).
The science mean scores for the discovery method group () MINUS the
mean scores for the lecture method group ( ) is equal to ZERO.
The null hypothesis is often the reverse of what the researcher actually believes in
and it is put forward to allow the data to contradict it (You may find it strange but
it has its merit!).
Based on the findings of the experiment, you found that there was a significant
difference in science scores between the discovery method group and the lecture
method group.
In fact, the mean score of subjects in the discovery method group was HIGHER
than the mean of subjects in the lecture method group. What do you do?
You REJECT the null hypothesis because earlier you had said they would be
equal.
4.2.2
57
Alternative Hypothesis
Ha: The Alternative Hypothesis might be that the science mean scores between the
discovery method group and the lecture method group are DIFFERENT.
Ha: >
Ha: The Alternative Hypothesis might be that the science mean scores of the
discovery method group are HIGHER than the mean scores of the lecture method
group.
Ha: <
Ha: The Alternative Hypothesis might be that the science mean scores of the discovery
method group are LOWER than the mean scores of the lecture method group.
SELF-CHECK 4.1
1. What is the meaning of a null hypothesis?
2. What do you mean when you "reject" the null hypothesis?
3. What is the alternative hypothesis?
4. What do you mean when you "accept" the alternative hypothesis?
4.3
You can claim that the two means are not equal in the population when in fact
they are.
Or you can fail to say that there is a difference when there is really no
difference.
Copyright Open University Malaysia (OUM)
58
The null hypothesis can be true or false and you can reject or not reject the null
hypothesis. There are four possible situations which arise in testing a hypothesis
and they are summarised in Figure 4.2.
FALSE
TRUE
Correct Decision
[no problem]
Risk committing
Type 2 Error
Reject Ho:
[Say it is FALSE]
Risk committing
Type 1 Error
Correct Decision
[no problem]
59
You decide to Reject the Null Hypothesis (Ho). You have a correct decision if
in the real world the null hypothesis is TRUE.
You decide to Reject the Null Hypothesis (Ho). You risk committing Type 1
Error if in the real world the hypothesis is TRUE.
You decide NOT to Reject the Null Hypothesis (Ho). You risk committing
Type 2 Error if in the real world the hypothesis is FALSE.
You decide NOT to Reject the Null Hypothesis (Ho). You have made a correct
decision if in the real world the null hypothesis is FALSE.
In other words, when you detect a difference in the sample you are studying and a
difference is also detected in the population, you are OK. When there is no
difference in the sample you are studying and there is no difference in the
population you are OK.
ACTIVITY 4.3
You can use the logic of hypothesis testing in the courtroom. A student
is being tried for stealing a motorcycle. The judicial system is based on
the premise that a person is "innocent until proven guilty." It is the court
that must prove based on sufficient evidence that the student is guilty.
Thus, the null and alternative hypotheses would be:
Ho: The student is innocent
Ha: The student is guilty
1. Using the table in Figure 4.2, state the four possible outcomes of the
court's decision.
2. Interpret the Type I and Type II errors in this context.
60
4.4
Note:
A hypothesis test is called a ONE-TAILED TEST if it is either left-tailed or righttailed; i.e. if it is not TWO-TAILED.
4.4.1
Two-tailed Test
EXAMPLE:
You conducted a study to determine if there is a difference in spatial thinking
between male and female adolescents. Your sample consists of 40 males and 42
female adolescents. You administer a 30-item spatial thinking test to the sample
and the results showed that males scored 23.4 and females scored 24.1.
Step 1:
You want to test the following null and alternative hypotheses:
Ho : 1 = 2
Ha : 1 2
61
Step 2:
Using the t-test for an independent variable (which we will discuss in detail in
Topic 5) means you obtained a t-value of 1.50. Based on the alternative
hypothesis, you decide that you are going to use a two-tailed test.
Step 3:
If you are using an alpha () of .05 for a two-tailed test, you have to divide .05 by
2 and you get 0.025 for each side of the rejection area.
Step 4:
The df = n-1 = (40 + 42) - 2 = 80. Look up the t table in Table 4.1 and find that
the critical value is 1.990 and the graph in Figure 4.3 shows that it ranges from
1.990 to + 1.990 which forms the Do Not Reject area.
62
0.250
0.100
0.050
0.025
0.010
0.005
Two
0.500
0.200
0.100
0.050
0.020
0.010
50
0.679
1.299
1.676
2.009
2.403
2.678
60
0.679
1.296
1.671
2.000
2.390
2.660
70
0.678
1.294
1.667
1.994
2.381
2.648
80
0.678
1.292
1.664
1.990
2.374
2.639
90
0.677
1.291
1.662
1.987
2.368
2.632
df
Step 5:
The t-value you have obtained is 1.554 (We will discuss the formula for
computing the t-value in Topic 5). This value does not fall in the Rejection
Region. What is your conclusion? You do not reject Ho. In other words, you
conclude that there is NO SIGNIFICANT DIFFERENCE in spatial thinking
between male and female adolescents. You could also say that the test results are
not statistically significant at the 5% level and provide at most weak evidence
against the null hypothesis.
At = 0.05, the data does not provide sufficient evidence to conclude that the
mean scores on spatial thinking of females is superior to that of males, even
though the mean scores obtained is higher than that of males.
ACTIVITY 4.4
1. How would you have concluded if the t-value obtained is 2.243?
2. Explain how you might commit a Type I or Type II error.
4.4.2
63
One-tailed Test
EXAMPLE:
You conduct a study to determine if students taught to use mind maps are better in
recalling concepts and principles in economics. A sample of 10 students were
administered a 20-item economics test before the treatment (i.e. pretest). The
same test was administered after the treatment (i.e. posttest) which lasted six
weeks.
Step 1:
The null and alternative hypotheses are:
Ha: 1 > 2 (Mean score of the posttest is greater than the mean score of
the pretest)
Step 2:
Decide on the significant level (alpha). Here, you have set it at the 5% significant
level or alpha () = 0.05.
Step 3:
Computation of the test statistic. Using the dependent t-test formula, you obtained
a t-value of 4.711.
Step 4:
The critical value for the right-tailed test is t with df = n-1. The number of
subjects is n = 10 and = 0.05. You check the "Table of Critical Values for the tTest" and it reveals that for df = 10 1 = 9. The critical value is 1.833 (Figure
4.4).
64
Step 5:
You find that the t-value obtained is 4.711. It falls in the Rejection Region. What is
your conclusion? You reject Ho. In other words, you conclude that there is a
SIGNIFICANT DIFFERENCE in the performance in economics before and after the
treatment. You could also say that the test results are statistically significant at the 5%
level. Put it another way, the p-value is less than the specified significance level of
0.05. (The p-value is provided in most outputs of statistical packages such as SPSS.)
At = 0.05, the data provides sufficient evidence to conclude that the mean scores
on the posttest are superior to the mean scores obtained in the pretest. Evidently,
teaching students mind mapping enhances their recall of concepts and principles
in economics.
ACTIVITY 4.5
A researcher conducted a study to determine the effectiveness of
immediate feedback on the recall of information in biology. The
experimental group of 30 students was provided with immediate
feedback on the questions that were asked. The control group consisted
of 30 students who were given delayed feedback on the questions asked.
1. Determine the null hypothesis for the hypothesis test.
2. Determine the alternative hypothesis for the hypothesis test.
3. Classify the hypothesis test as two-tailed, left-tailed or right-tailed.
Explain your answer.
Copyright Open University Malaysia (OUM)
65
There are two types of error: Type I and Type II errors. Both relate to the
rejection or acceptance of the null hypothesis.
Type I error is committed when the researcher rejects the null when the null is
indeed true; in other words incorrectly rejecting the null.
The probability level where the null is incorrectly rejected is called the
significance level, denoted by the symbol a value set a priori (before even
conducting the research) by the researcher.
Type II error is committed when the researcher fails to reject the null when the
null is indeed false, in other words wrongly accepting the null.
In any research, the intention of the researcher is to correctly reject the null; if
the design is carefully selected and the samples represent the population, the
chances of achieving this objective are high. Thus, the power of the study is
defined as 1 - .
66
Alternate hypothesis
Hypothesis
Inferential statistics
Null hypothesis
Power
Type I error
Type II error
Topic t-test
LEARNING OUTCOMES
By the end of this topic, you will be able to:
1.
2.
3.
4.
INTRODUCTION
This topic explains what t-test is and its use in hypothesis testing. It also
highlights the assumptions for using the t-test. Two types of t-test are elaborated
in the topic. The first is t-test for independent means while the second is the t-test
for dependent means. Computation of the t-statistic using formulae as well as
SPSS procedures is also explained.
5.1
WHAT IS t-TEST?
The t-test was developed by a statistician, W.S. Gossett (see Figure 5.1), who
worked in a brewery in Dublin, Ireland. His pen name was student and hence,
the term students t-test was published in the scientific journal, Biometrika, in
1908. The t-test is a statistical tool used to infer differences between small
samples based on the mean and standard deviation.
68
TOPIC 5 T-TEST
5.2
How do we go about establishing whether the differences in the two means are
statistically significant or due to chance? You begin by formulating a hypothesis
about the difference. This hypothesis states that the two means are equal or the
difference between the two means is zero and is called the null hypothesis.
Using the null hypothesis, you begin testing the significance by saying: "There is
no difference in the score obtained in science between subjects in the Discovery
group and the Lecture group."
Copyright Open University Malaysia (OUM)
TOPIC 5 T-TEST
69
(a)
OR
Ho : 1 - 2 = 0
(b)
If you reject the null hypothesis, it means the difference between the two means
have statistical significance. On the other hand, if you do not reject the null
hypothesis, it means the difference between the two means is NOT statistically
significant and the difference is due to chance.
Note:
For a null hypothesis to be accepted, the difference between the two means need
not be equal to zero since sampling may account for the departure from zero.
Thus, you can accept the null hypothesis even if the difference between the two
means is not zero provided the difference is likely to be due to chance. However,
if the difference between the two means appears too large to have been brought
about by chance, you reject the null hypothesis and conclude that a real difference
exists.
ACTIVITY 5.1
1.
State TWO null hypothesis in your area of interest that can be tested
using the t-test.
2.
5.3
The t-test is a powerful statistical tool that enables you to determine that the
differences obtained between two groups is statistically significant. When two
groups are independent of each other, it means the sample drawn came from two
populations. In other words, it means that the two groups are independent or
belong to "unpaired groups" and "unpooled groups."
70
(a)
TOPIC 5 T-TEST
Illustration
Say, for example, you conduct a study to determine the spatial reasoning
ability of 70 ten-year-old children in Malaysia. The sample consisted of 35
males and 35 females (see Figure 5.2). The sample of 35 males was drawn
from the population of ten-year-old males in Malaysia and the sample of 35
females was drawn from the population of ten-year-old females in Malaysia.
Note that they are independent samples because they come from two completely
different populations.
Research Question:
"Is there a significant difference in spatial reasoning between male and
female ten-year-old children?"
Null Hypothesis or Ho:
"There is no significant difference in spatial reasoning between male and
female ten-year-old children."
TOPIC 5 T-TEST
(b)
71
X1 X 2
SE(X1 X 2 )
SE(X1 X 2 )
var1
var2
(n1 1) (n 2 1)
X1 X 2
var1
var2
(n1 1) (n 2 1)
Group 1: Males
Mean
12
SD
2.0
N
35
Variance
4.0
Group 2: Females
10
2.0
35
4.0
72
TOPIC 5 T-TEST
12 10
4.01
4.02
(35 1) (35 1)
2
00.1177 0.1177
2
4.124
0.485
Note: The t-value will be positive if the mean for Group 1 is larger or more than
(>) the mean of Group 2 and negative if it is smaller or less than (<).
(d)
(e)
Alpha Level
As with any test of significance, you need to set the alpha level. In most
educational and social research, the "rule of thumb" is to set the alpha level
at .05. This means that 5% of the time (five times out of a hundred) you
would find a statistically significant difference between the means even if
there is none ("chance").
(f)
Degrees of Freedom
The t-test also requires that we determine the degrees of freedom (df) for the
test. In the t-test, the degrees of freedom are the sum of the subjects or
persons in both groups minus 2. Given the alpha level, the df, and the tvalue, you look up the Table of Critical Values for Student's t-test (available
as an appendix in the back of most statistics texts) to determine whether the
t-value is large enough to be significant.
(g)
TOPIC 5 T-TEST
73
conclude that the difference between the means for the two groups is
different. In other words, males scored significantly higher than females on
the spatial reasoning test.
However, you do not have to go through this tedious process, as statistical
computer programs such as SPSS provide the significance test results,
saving you from looking them up in a table.
Table 5.1: Table of Critical Values for Student's t-test
One-tail
0.250
0.100
0.050
0.025
0.010
0.005
Two-tail
0.500
0.200
0.100
0.050
0.020
0.010
21
0.686
1.323
1.721
2.080
2.518
2.831
22
0.686
1.321
1.717
2.074
2.508
2.819
23
0.685
1.319
1.714
2.069
2.500
2.807
24
0.685
1.318
1.711
2.064
2.492
2.797
25
0.684
1.316
1.708
2.060
2.485
2.787
26
0.684
1.315
1.706
2.056
2.479
2.779
27
0.684
1.314
1.703
2.052
2.473
2.771
28
0.683
1.313
1.701
2.048
2.467
2.763
29
0.683
2.462
1.311
1.699
2.045
2.756
30
0.683
1.310
1.697
2.042
2.457
2.750
40
0.681
1.303
1.684
2.021
2.423
2.704
50
0.679
1.299
1.676
2.009
2.403
2.678
60
0.679
1.296
1.671
2.000
2.390
2.660
70
0.678
1.294
1.667
1.994
2.381
2.648
80
0.678
1.292
1.664
1.990
2.374
2.639
90
0.677
1.291
1.662
1.987
2.368
2.632
100
0.677
1.290
1.660
1.984
2.364
2.626
100
0.674
1.282
1.645
1.960
2.326
2.576
df
74
TOPIC 5 T-TEST
ACTIVITY 5.2
(h)
1.
Would you reject Ho if you had set the alpha at 0.01 for a two-tailed
test?
2.
Scale of Measurement
The data that you collect for the dependent variable should be based on
an instrument or scale that is continuous or ordinal. For example,
scores that you obtain from a 5-point Likert scale: 1, 2, 3, 4, 5 or marks
obtained in a mathematics test, the score obtained on an IQ test or the
score obtained on an aptitude test.
(ii)
Random Sampling
The sample of subjects should be randomly sampled from the
population of interest.
(iii) Normality
The data come from a distribution that has one of those nice bellshaped curves known as a normal distribution. Refer to Topic 3: The
Normal Distribution, which provides both graphical and statistical
methods for assessing normality of a sample or samples.
(iv) Sample Size
Fortunately, it has been shown that if the sample size is reasonably
large, quite severe departures from normality do not seem to affect the
conclusions reached. Then again what is a reasonable sample size? It
has been argued that as long as you have enough people in each group
(typically greater or equal to 30 cases) and the groups are close to
equal in size, you can be confident that the t-test will be a good and
strong tool for getting the correct conclusions. Statisticians say that the
t-test is a "robust" test. Departure from normality is most serious when
Copyright Open University Malaysia (OUM)
TOPIC 5 T-TEST
75
Homogeneity of Variance
It has often been suggested by some researchers that homogeneity of
variance or equality of variance is actually more important than the
assumption of normality. In other words, are the standard deviations of
the two groups pretty close to equal? Most statistical software
packages provide a "test of equality of variances" along with the
results of the t-test and the most common being Levene's test of
homogeneity of variance. Refer to Table 5.2.
Table 5.2: Levene's Test of Homogeneity of Variance
Equal
Variance
Assumed
Unequal
Variances
Assumed
Sig
Sign.
Twotail
Mean
Difference
Std. Error
Difference
Upper
Lower
3.39
.080
.848
20
.047
1.00
1.18
-1.46
3.46
.848
16.70
.049
1.00
-1.49
3.40
1.18
76
TOPIC 5 T-TEST
The Levene test is robust in the face of departures from normality. The Levene's
test is based on deviations from the group mean.
The Levene test is more robust in the face of non-normality than more
traditional tests like Bartlett's test.
ACTIVITY 5.3
Refer to Table 5.2. Based on Levenes Test of Homogeneity of variance,
what is your conclusion? Explain.
To establish the statistical significance of the means of these two groups, the ttest is used. Use SPSS.
TOPIC 5 T-TEST
5.4
77
Mean
Male
451
7.9512
3.4618
2.345
Female
495
8.9980
3.1427
3.879
78
TOPIC 5 T-TEST
Equal
Variance
Assumed
Sig
4.720
.030
-4.875
-4.853
Unequal
Variances
Assumed
95% Confidence
Interval
Sign.
Twotail
Mean
Difference
Std. Error
Difference
Upper
Lower
944
.000
-1.0468
-2.147
-1.4682
-.6254
911.4
.049
-1.0468
-2.146
-1.4701
-.6234
The SPSS output below displays the results of the t-test to test whether the
difference between the two sample means is significantly different from zero.
Remember, the null hypothesis states there is no real difference between the
means (Ho: 1 = 2).
Interpretation:
t-value
This "t" value tells you how far away from 0, in terms of the number of standard
errors, the observed difference between the two sample means falls. The "t" value
is obtained by dividing the Mean Difference ( 1.0468) by the Std. Error ( .2146)
which is equal to 4.878.
Copyright Open University Malaysia (OUM)
TOPIC 5 T-TEST
79
p-value
If the p-value as shown in the "sig (2 tailed) column is smaller than your chosen
alpha level you do not reject the null hypothesis and argue that there is a real
difference between the populations. In other words, we can conclude that the
observed difference between the samples is statistically significant.
Mean Difference
This is the difference between the means (labelled "Mean Difference") i.e. 7.9512
8.9980 = 1.0468.
5.5
The Dependent means t-test or the Paired t-test or the Repeated measures ttest is used when you have data from only one group of subjects i.e. each subject
obtains two scores under different conditions. For example, when you give a pretest and after a particular treatment or intervention give the same subjects a posttest. In this form of design, the same subjects obtain a score on the pretest and,
after some intervention or manipulation obtain a score on the posttest. Your
objective is to determine whether the difference between means for the two sets of
scores is the same or different.
Example:
Research Questions:
Null Hypotheses:
There is no significant difference between the pretest and the posttest for the
discovery method group.
There is no significant difference between the pretest and the posttest for the
chalk and talk group.
80
TOPIC 5 T-TEST
t=
D
D -
N
N N-1
Where,
t
= t-ratio
= Average difference
= Different scores squared then summed
D
( D)
2
EXAMPLE:
A researcher conducted a study on personality changes in 15 college women from
Year 1 to Year 4. A 30-item personality test was administered in Year 1 and then
again in Year 4 to the same 15 women. The results of the study are shown in
Table 5.4.
Table 5.4: Results of the Study
Subject
Number
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Year 1 Test
(Pretest)
21
18
13
10
22
15
17
24
25
20
21
19
17
20
16
Year 4 Test
(Posttest)
24
20
15
15
20
19
18
22
28
23
25
22
16
26
19
X =18.5
X =20.8
D2
+3
+2
+3
+5
-2
+4
+1
-2
+3
+3
+4
+3
-1
+6
+3
+9
+4
+9
+25
+4
+14
+1
+4
+9
+9
+16
+9
+1
+36
+9
D 35
159
TOPIC 5 T-TEST
81
Step 1:
Calculate the mean score for the Year 1 Test by adding up all the Year 1 Test
scores and divide by the number of subjects. This will give you the mean score of
18.5. Similarly, calculate the mean score of the Year 4 Test and this will give you
the mean score of 20.8.
Step 2:
Next, calculate the value of standard deviation using the formula as follows.
SD =
2 D
D n
N-1
35
159
SD =
15
15 1
159 81.67
5.52 2.35
14
Step 3:
D
)
SD
Calculate effect size, the mean difference divided by the standard deviation.
Applying the t-test for Dependent Means formula: (
The mean difference is 20.8 18.5 = 2.3 and the standard deviation is 2.35.
Substitute these values in the above equation, i.e. 2.3 / 2.35 = 0.979.
To determine the likelihood that the effect size is a function of chance, first
calculate the t-ratio by multiplying the effect size by the square root of the number
of pairs.
82
TOPIC 5 T-TEST
0.100
0.050
0.025
0.010
0.005
Tail Two
0.200
0.100
0.050
0.020
0.010
10
1.372
1.812
2.228
2.764
3.169
11
1.363
1.796
2.201
2.718
3.106
12
1.356
1.782
2.179
2.681
3.055
13
1.350
1.771
2.160
2.650
3.012
14
1.345
1.761
2.145
2.624
2.977
15
1.341
1.753
2.131
2.602
2.947
16
1.337
1.746
2.120
2.583
2.921
17
1.333
1.740
2.110
2.567
2.898
18
1.330
1.734
2.101
2.552
2.878
19
1.328
1.729
2.093
2.539
2.861
20
1.325
1.725
2.086
2.528
2.845
df
Step 4:
Having computed the t-value (which is 3.79) you look up the t-value in The Table
of Critical Values for Student's t-test or The Table of Significance which tells us
whether the ratio is large enough to say that the difference between the groups is
significant. In other words, the difference observed is not likely due to chance or
sampling error. Refer to Table 5.5.
Alpha Level
The researchers set the alpha level at 0.05. This means that 5% of the time (five
out of a hundred) you would find a statistically significant difference between the
means even if there is none ("chance"). However, since this is a one-tailed test,
you divide 0.05 by 2 and you get 0.025.
Degrees of Freedom
The t-test also requires that we determine the degrees of freedom (df) for the test.
In the t-test, the degrees of freedom are the sum of the subjects or persons which
is 15 1 = 14. Given the alpha level, the df and the t-value, you look up in the
Table (available as an appendix in the back of most statistics texts) to determine
whether the t-value is large enough to be significant.
TOPIC 5 T-TEST
83
Step 5:
The t-value obtained is 3.79 which is greater than the critical value shown which
is 2.145 (one tailed). Hence, the null hypothesis [Ho:] is Rejected and Ha: is
accepted which states the Posttest Mean > than Pretest Mean. It can be concluded
that the difference between the means is significant. In other words, there is
overwhelming evidence that a "gain" has taken place on the personality inventory
from Year 1 to Year 4 women undergraduates.
Again, you do not have to go through this tedious process, as statistical computer
programs such as SPSS, provides the significance test results, saving you from
looking them up in a table.
Misapplication of the Formula
A common error made by some research students is the misapplication of the
formula. Researchers who have Dependent Samples fail to recognise this fact, and
inappropriately apply the t-test for Independent Groups to test the hypothesis
that 1 = 2 . If an inappropriate Independent Groups t-test is performed with
Dependent Groups, the standard error will be greatly overestimated and significant
differences between the two means may be considered "non-significant" (Type 1
Error).
5.6
EXAMPLE:
In a study, the researcher was keen to determine if teaching note-taking
techniques improved achievement in history. A sample of 22 students selected for
the study and taught note-taking techniques for a period of four weeks. The
research questions put forward is:
"Is there a significant difference in performance in history before and after
the treatment?" i.e you wish to determine whether the difference between the
means for the two sets of score is the same or different.
84
TOPIC 5 T-TEST
To establish the statistical significance of the means obtained on the pretest and
posttest, the repeated measures t-test (also called dependent-samples and pairedsamples t-test) was used using SPSS.
Data was collected from the same group of subjects on both conditions and each
subject obtains a score on the pretest, and after the treatment (or intervention or
manipulation), a score on the posttest.
Ho: 1 = 2
or
Ha: 1 2
Pair 1
Mean
Std. Deviation
Pretest
8.50
22
3.34
.71
Posttest
13.86
22
2.75
.59
The Paired Samples Statistics table above reports that the mean values on the
variable (history test) for the pretest and posttest. The posttest mean is higher
(13.86) than the pretest mean (8.50) indicating improved performance in the
history test after the treatment. The standard deviation for the pretest 3.34 and is
very close to the standard deviation for the posttest which is 2.75.
TOPIC 5 T-TEST
85
The question remains: Is this mean difference large enough to convince us that
there is a significant difference in performance in history, a consequence of
teaching note-taking techniques?
Paired Differences
Pair 1
Pretest
Posttest
Mean
Difference
Std.
Deviation
Std.
Error
Mean
Lower
Upper
df
Sig. (2
tailed)
-5.36
2.90
.62
-6.65
-4.076
-8.65
21
.000
t-Value
This "t" value tells you how far away from 0, in terms of the number of standard
errors, the observed difference between the two sample means falls. The "t" value
is obtained by dividing the mean difference (5.36) by the std. error (.62), which
is equal to 8.65. Refer to Figure 5.4.
p-value
The p-value shown in the "Sig (2 tailed) column is smaller than your chosen
alpha level (0.05) and so you reject the null hypothesis and argue that there is a
real difference between the pretest and posttest.
In other words, we can conclude, that the observed difference between the two
means is statistically significant.
Mean Difference
This is the difference between the means 43.15 63.98 = 20.83.
86
TOPIC 5 T-TEST
ACTIVITY 5.4
t-test for Dependent Means or Groups
Case Study 1:
In a study, a researcher was interested in finding out whether attitude
towards science would be enhanced when students are taught science
using the Inquiry Method. A sample of 22 students were administered an
attitude toward science scale before the experiment. The treatment was
conducted for one semester, after which the same attitude scale was
administered to the same group of students.
N
ATTITUDE
Pair
Mean
Std. Deviation
Pretest
22
8.50
3.33
.71
Posttest
22
13.86
2.75
.59
Paired Differences
Pair
Pretest
Mean
Std.
Deviation
Std.
Error
Mean
Lower
Upper
df
Sig.
(2
tailed)
-5.36
2.90
.62
-6.65
-4.08
-8.66
21
.000
Posttest
1.
2.
3.
4.
5.
TOPIC 5 T-TEST
87
ACTIVITY 5.5
t-test for Independent Means or Groups
Case Study 2:
A researcher was interested in finding out about the creative thinking skills
of secondary school students. He administered a 10-item creative thinking
test to a sample of 4,404 sixteen-year-old students drawn from all over
Malaysia.
GENDER
Mean
Std.
Deviation
Std. Error
Mean
Male
1966
6.9410
2.2858
5.155E-02
Female
2438
6.8351
2.4862
5.035E-02
t-test for
Equality of Means
Sig.
df
Sig. 2tailed
Mean
Difference
Std.
Error
19.408
.000
1.456
4402
.145
.1059
7.271E02
1.469
4327.13
.142
.1059
7.206E02
1.
2.
3.
Briefly describe the 'Group Statistics' table with regards to the means
and variability of scores.
4.
5.
6.
7.
88
TOPIC 5 T-TEST
The Paired t-test is used when you have before and after data from a single
group of subjects. In this test, the t-statistics is computed using the mean
differences rather that the difference in the mean between the two groups. As
such, all subjects must have the pretest and posttest data.
Critical value
Homogeneity of variance
Independent sample
P-value
Related sample (paired sample)
Significant level
Topic One-way
Analysis of
Variance (one
-way ANOVA)
LEARNING OUTCOMES
By the end of this topic, you should be able to:
1.
2.
3.
4.
5.
INTRODUCTION
This topic explains what One-way Analysis of Variance (ANOVA) is about and
the assumptions for using ANOVA in hypothesis testing. It demonstrates how
ANOVA can be computed using the formula and the SPSS procedures. Also
explained are the interpretation of the related statistical results and the use of posthoc comparison tests.
90
TOPIC 6
an experimental study. Suppose you are interested in comparing the means of three
groups (i.e k = 3) rather than two.
You might be tempted to use the multiple t-test and compare the means separately;
i.e. you compare the means of Group 1 and 2, followed by Group 1 and 3 and so
forth. What is the danger of doing this? Multiple t-tests enhance the likelihood of
committing Type 1 error (i.e. claiming that two means are not equal, when in fact
they are equal). In other words, you reject a null hypothesis when it is TRUE. On a
practical level, using the t-test to compare many means is a cumbersome process in
terms of the calculations involved.
Example
Let us look at the following example, which shows the results of a study on
Attitude towards Homework among Students of Varying Ability Levels. Subjects
were divided into three groups: High Ability, Average Ability and Low Ability.
The total sample size is 505 students. You need a special class of statistical
techniques called the One-way Analysis of Variance or One-way ANOVA which
we will discuss here.
Table 6.1: Attitudes toward Homework among 14-Year-Old Students
Group
Mean
Std. Deviation
Std. Error
High ability
220
13.03
3.17
0.12
12.79 13.27
Average
ability
212
11.99
2.93
0.11
11.77 12.21
Low ability
73
9.54
3.50
0.40
8.73 10.36
What do the three standard deviations tell you? Note that the standard deviation
for high ability (3.17) and average ability (2.93) students are fairly close, while
low ability students have a somewhat bigger standard deviation of 3.50.
What do the three Standard Errors tell you? Refer to Table 6.1, and you will
notice that there is a column called 'standard error'. What is the standard error?
The standard error is a measure of how much the sample means vary if you
were to take repeated samples from the same population. The first two groups
Copyright Open University Malaysia (OUM)
TOPIC 6
91
contain > 200 students each; the standard error of the mean for each of these
groups is fairly small. It is 0.12 for high ability students and 0.11 for average
ability students. However, the standard error for the low ability group is
comparatively high = 0.40. Why? The smaller number of low ability students
(n=73) and the larger standard deviation explains why the standard error is
larger.
What does 95 Pct Conf. Int for Mean means? The last column displays the
confidence interval. What is the confidence interval? It is the range which is
likely to contain the true population value or mean. If you take repeated
samples of 14-year-old students from the same population of 14-year-old
students in the country and calculate their mean, there is a probability that 95%
of them should include the unknown population value or mean. For example,
you can be 95% confident that, in the population, the mean of high ability
students is somewhere between 12.79 and 13.27. Similarly, you can be 95%
confident that, in the population, the mean of low ability students is somewhere
between 8.73 and 10.36.
You will notice that the confidence interval is wider for low ability students
(i.e. 1.63) compared to confidence interval for high ability students (i.e. 0.48).
Why? This is due to the larger standard error (0.40) obtained by low ability
students. Since the confidence interval depends on the standard error of the
mean, the confidence interval for low ability students is wider than for high
ability students. So, the larger the standard error, the wider will be the
confidence interval. Makes sense, right?
If the F-value is significant, it tells us that the population means are probably not
all equal and you reject the null hypothesis. Next, you have to locate where the
significance lies or which of the means are significantly different. You have to use
post-hoc analysis to determine this.
92
TOPIC 6
ACTIVITY 6.1
1.
What is the standard error? Why does the standard error vary?
2.
The null hypothesis states that the means of high ability, average ability and low
ability students are the same; i.e. is equal to 4.00.
To test the null hypothesis, the One-way Analysis of Variance is used. The Oneway ANOVA is a statistical technique used to test the null hypothesis that several
populations means are equal. The word 'variance' is used because it examines the
variability in the sample. In other words, how much do the scores of individual
students vary from the mean? Based on the variability or variance, it determines
whether there is reason to believe that the population means are not equal. In our
example, does creativity vary between the three groups of 12-year-old students?
TOPIC 6
93
The alternative hypothesis states that there is a difference between the three groups
of students (see Figure 6.2). However, the alternative hypothesis does not state
which groups differ from one another. It just says that the means of each group are
not all the same; or at least one of the groups differs from the others.
Are the means really different? We need to figure out whether the observed
differences in the sample means are attributed to just the natural variability among
sample means or whether there is reason to believe that the three groups of
students have different means in the population. In other words, are the differences
due to chance or there is a 'real' difference.
6.3
94
(a)
(b)
TOPIC 6
Between-Group Variance
The diagram in the previous Figure 6.2 presents the results of the study. Let
us look more closely at the two types of variability or variance. Note that
each of the three groups has a mean which is also known as the
sample mean.
The high SES group has a mean of 4.12 for the creativity test
The middle SES group has a mean of 4.37 for the creativity test
The low SES has a mean of 3.99 for the creativity test
Within-Group Variance
Within group variance or variability is a measure of how many the
observations or scores within a group vary. It is simply the variance of the
observations or scores within a group or sample, and it is used to estimate the
variance within a group in the population. Remember, ANOVA requires the
assumption that all of the groups have the same variance in the population.
Since you do not know if all of the groups have the same mean, you cannot
just calculate the variance for all of the cases together. You must calculate
the variance for each of the groups individually and then combine these into
an "average" variance.
Within-group variance for the example shows that the 313 students within
the high SES group have different scores, the 297 students within the middle
SES group have different scores and the 340 students within the low SES
also have different scores. Among the three groups, there is slightly greater
variability or variance among Low SES subjects (SD = 1.31) compared to
High SES subjects with a SD of 1.28.
6.4
COMPUTING F-STATISTIC
The F-test or the F-ratio is a measure of how different the means are relative to the
variability or variance within each sample. The larger the F value, the greater the
likelihood that the differences between means are due to something other than
chance alone; i.e. real effects or the means are significantly different from one
another.
TOPIC 6
95
The following is the summarised formula for computing the F-statistic or F-ratio:
F =
Based on the study (see Table 6.2 for results) about the relationship between
creativity and socio-economic status of the subject, computation of the F-statistics
is as follows:
Table 6.2: Results
Mean
SD
n
=
=
=
High SES
4.12
1.28
313
Middle SES
4.37
1.30
297
Low SES
3.99
1.31
340
96
TOPIC 6
Degrees of freedom:
This sum of squares has a number of degrees of freedom equal to the number
of groups minus 1. In this case, df = (3-1) = 2
Step 2: Computation of the Between Mean Squares (BMS)
Between Mean Squares =
BBS
45.21
=
= 22.61
df
2
Divide the BSS figure (45.21) by the number of degrees of freedom (2) to get
our estimate of the variation between groups, referred to as "Between Mean
Squares".
Step 3: Computation of the Within Sum of Squares (WSS)
To measure the variation within groups, we find the sum of the squared
deviation between scores on the Torrance Creative Test and the group
average, calculating separate measures for each group, and then summing the
group values. This is a sum referred to as the "Within Sum of Squares" (or
WSS).
TOPIC 6
97
WSS 1593.18
=
= 1.68
df
947
F=
96
3.940
3.091
2.699
2.466
97
3.939
3.090
2.698
2.465
98
3.938
3.089
2.697
2.465
99
3.937
3.088
2.696
2.464
100
3.936
3.087
2.696
2.463
120
3.920
3.070
2.680
2.450
df2
98
TOPIC 6
Finally, compare the F-statistics (13.34) with the critical value 3.07. At p =
0.05, the F-statistics is larger (>) than the critical value and hence there is
strong evidence to reject the null hypothesis, indicating that there is a
significant difference in creativity among the three groups of students. While
the F-statistic assesses the null hypothesis of equal means, it does not address
the question of which means are different. For example, all three groups may
be different significantly, or two may be equal but differ from the third. To
establish which of the three groups are different, you have to follow up with
post-hoc comparison or tests.
Step 7: Post-Hoc Comparisons or Tests
There are many techniques available for post-hoc comparisons and they are
as follows:
Duncan
Dunnett
Scheffe
Tukey's HSD
Mean1 Mean2 Mean3
Mean1
Mean2
Mean3
Tukey HSD
The Tukey's HSD runs a series of Tukey's post-hoc tests, which are like a
series of t-tests. However, the post-hoc tests are more stringent than the
regular t-tests. It indicates how large an observed difference must be for the
multiple comparison procedure to call it significant. Any absolute difference
between means has to exceed the value of HSD to be statistically significant.
Most statistical programmes will give you an output in the form of a table as
shown above. Group means are listed as a matrix. An asterisk (*) indicates
which pairs of means are significantly different.
TOPIC 6
99
Note that only the mean of Group 3 is significantly different from Group 1.
In other words, High SES (Mean = 4.12) subject scored significantly higher
on creativity than Low SES (Mean = 3.85) subjects. There was no significant
difference between High SES and Middle SES subjects nor was there a
significant difference between Middle SES and Low SES subjects.
If the same subject belongs to the same group and tested twice, such as in the
case of a pretest and posttest design, you should instead use the Repeated
Measure One-way ANOVA (see Topic 7).
(b)
(c)
Normal Populations
For each population, the variable under consideration is normally distributed
(Refer to Topic 2 for techniques to determine normality of distribution). In
other words, to use the One-way ANOVA you have to ensure that the
distributions for each of the groups are normal. The analysis of variance is
robust if each of the distributions is symmetric or if all the distributions are
skewed in the same direction. This assumption can be tested by running
several normality tests as stated next:
(i)
100
TOPIC 6
Table 6.3: Means, Skewness and Kurtosis for the Three Groups
Group 1
Group 2
Group 3
Independent Variable
Group
Statistic
Std. Error
Mean
43.82
2.20
Skewness
.973
.491
Kurtosis
.341
.953
Mean
60.14
2.71
Skewness
-.235
.597
Kurtosis
-1.066
1.154
Mean
64.75
3.61
Skewness
-.407
.564
Kurtosis
-1.289
1.091
The Shapiro-Wilk normality tests indicate that the scores are normally
distributed in each of the three conditions. The Kolmogorov-Statistic is
significant for Group 1, but that statistic is more appropriate for larger
sample sizes. Refer to Figure 6.3.
TOPIC 6
(d)
101
Homogeneity of Variance
Just like the t-test, the Levene's test of homogeneity of variance is used for
the One-way ANOVA and is shown in Figure 6.4. The p-value which is
0.113 is greater than the alpha of 0.05. Hence, it can be concluded that the
variances are homogeneous which is reported as Levene (2, 49) = 2.28, p =
.113.
ACTIVITY 6.2
1.
2.
What are the assumptions that must be met when using ANOVA?
Ho: 1 =
Alternative Hypothesis:
Ho:
3 = 4
102
TOPIC 6
Procedure for the One-way ANOVA with post-hoc analysis Using SPSS
1.
Select the Analyze menu.
2.
Click Compare Means and One-Way ANOVA ..... to open the One-Way
ANOVA dialogue box.
3.
Select the dependent variable (i.e. inductive reasoning) and click the arrow
button to move the variable into the Dependent List box.
4. Select the independent variable (i.e SES) and click the arrow button to move
the variable into the Factor box.
5. Click the Options ..... command push button to open the One-Way
ANOVA: Options sub-dialogue box.
6. Click the check boxes for Descriptive and Homogeneity-of-variance.
7. Click Continue.
8. Click the Post Hoc .... command push button to open the One-Way
ANOVA: Post Hoc Multiple Comparisons sub-dialogue box. You will
notice that a number of multiple comparison options are available. In this
example you will use the Tukey's HSD multiple comparison test.
9. Click the check box for Tukey.
10. Click Continue and then OK.
(a)
Before you conduct the One-way ANOVA, you have to make sure that your
data meet the relevant assumptions of using One-way ANOVA. Lets first
look at the test of homogeneity of variances, since satisfying this assumption
is necessary for interpreting ANOVA results.
Levenes test for homogeneity of variances assesses whether the population
variances for the groups are significantly different from each other. The null
hypothesis states that the population variances are equal.
The following Figure 6.5 shows the SPSS output for the Levene's test. Note
that the Levene F-statistic has a value of 0.383 and a p-value of 0.765. Since
p is greater than = 0.05 (i.e. 0.765 > 0.05); we do not reject the null
hypothesis. Hence, we can conclude that the data does not violate the
homogeneity-of-variance assumption.
TOPIC 6
103
(b)
Another SPSS output is the "Descriptives" table which presents the means
and standard deviations of each group (see Figure 6.6). You will notice that
the means are not all the same. However, this relatively simple conclusion
actually raises more questions. See if you can answer these questions in
Figure 6.6.
As you may have realised, just by looking at the Descriptives table, the
group means cannot tell us decisively if significant differences exist. What is
the next step?
(c)
Significant Differences
104
TOPIC 6
(d)
If you divide 33.445 by 11.072 you will get the F value of 3.021 which is
significant at 0.029.
Since, 0.029 is < than = 0.05, we can reject the Null Hypothesis and
accept the alternative hypothesis. You can conclude that there is a
significant difference in inductive reasoning between the four SES
groups. But which group?
Multiple Comparisons
Having obtained a significant result, you can go further and determine using
a post-hoc test, where the significance lies. There are many different kinds of
post-hoc tests, that examine which means are different from each other. One
commonly used procedure is Tukeys HSD test. The Tukey test compares all
pairs of group means and the results are shown in the Multiple
Comparisons table in Figure 6.8.
Dependent Variable: Inductive Reasoning Ability
Tukey HSD
TOPIC 6
105
Note that each mean is compared to every other mean thrice so the results are
essentially repeated in the table. Interpreting the table reveals that:
There is a significant difference only between Low SES subjects (Mean =
8.01) and Very High SES subjects (Mean = 8.49) at p = 0.047. i.e. Very
High SES scored significantly higher than Low SES at p = 0.047.
However, there are no significant differences between the other groups.
106
TOPIC 6
ACTIVITY 6.3
45
55
59
58
42
54
61
41
62
59
48
57
49
36
48
63
44
65
TOPIC 6
107
ACTIVITY 6.4
Education
Business/
Management
Social
Science
Computer
Science
62
42
80
81
49
52
57
75
63
31
87
58
68
80
64
67
39
22
28
48
79
71
29
26
40
68
62
36
15
76
45
2.
3.
4.
5.
6.
7.
108
TOPIC 6
The one-way ANOVA is used to compare the differences between more than
two groups of samples from unrelated populations.
Even though ANOVA is used to compare the mean, this test uses the variance
in computing the test statistics.
This test requires large, other assumptions needed are normal distribution of
the population parameter, variables measures at least at interval levels, and
equality of variance between the groups.
Test Statistics: F
Between group variances are due to the differences between the groups (could
be due to different treatment etc.), while within group variances are due to
sampling (the differences among the members of the same group).
Technically, for any comparison between groups, the between group variance
should be large simply because they are different groups while within the
group itself the variances should be low (assuming the members are
homogenous).
The F-statistics are based on the premise that if different treatments have
different effects (or different groups respond differently due to their inherited
differences), the between group variance is large while the within group
variance (also called the residual variance) is low. If there is any difference
between the groups, the F-value will be high, causing the null hypothesis to be
rejected.
Analysis of variance
F-test
Between group variance
Within group variance
Sum of squares
Between mean squares
Within mean squares
Post-hoc comparisons
Topic Analysis of
Covariance
(ANCOVA)
LEARNING OUTCOMES
By the end of this topic, you should be able to:
1. Define Analysis of Covariance (ANCOVA);
2. Explain the logic of ANCOVA;
3. Identify the assumptions for using ANCOVA;
4. Compute ANCOVA using SPSS; and
5. Interpret ANCOVA using SPSS.
INTRODUCTION
This topic explains what analysis of covariance (ANCOVA) is about and the
assumptions for using it in hypothesis testing. It also demonstrates how to
compute and interpret ANCOVA using SPSS.
7.1
110
111
The high correlation also means that a large portion of the variance found in
the geography test is actually contributed from the covariable or covariate
'Attitude' and would show as measurements of error.
What should you do? You should remove the covariance from the
geography test thereby removing a substantial portion of the extraneous
variance of individual differences; i.e. you want to "subtract out" or
"remove" Attitude scores and you will be left with the "residual" (it is what
is left over). When you subtract, you have reduced geography scores
variability or variance while maintaining the group difference.
Put it another way, you use ANCOVA to "reduce noise" to produce a more
efficient and powerful estimate of the treatment effect. In other words, you
adjust geography scores for variability on the covariate (attitude scores).
If you have two or more covariables or covariates, make sure that among
themselves there is little intercorrelation (otherwise you are introducing
redundant covariates and end up losing precision). For example, you surely
would not want to use both family income and father's occupation as
covariates because it is likely that they are both highly correlated.
112
7.2
There are a number of assumptions that underlie the analysis of covariance. Most
of the assumptions apply to One-way ANOVA, with the addition of two more
assumptions. As stated by Coakes and Steed (2000), the assumptions are:
(a)
(b)
(c)
(d)
(e)
(f)
113
Look at the graph in Figure 7.3, which shows regression lines for each group
separately. Look to see how each group differs on mean age. The Graduates, for
instance have a mean age of 38, their score on knowledge of current events is 14;
while the mean age for the Diploma holders is 45 and their score on knowledge of
current events is 12.5. The mean for the subjects with High school qualifications
is 50 and their score on the knowledge of current events test is 11.5. What does
this tell you? It is probably obvious to you that part of the differences in
knowledge of current events is due to the groups having a different mean age.
So you decide to include Age as a covariate and use ANCOVA.
Copyright Open University Malaysia (OUM)
114
(a)
ANCOVA reduces the error variance by removing the variance due to the
relationship between age (covariate) and the dependent variable (knowledge
of current events).
(b)
ANCOVA adjusts the means on the covariate for all of the groups,
leading to the adjustment of the means of the dependent variable
(knowledge of current events).
115
116
7.3
One of the most common designs in which ANCOVA is used is in the pretestposttest design. This consists of a test given BEFORE an experimental condition
is carried out, followed by the same test AFTER the experimental condition. In
this case, the pretest score is used as a covariate. In the pretest-posttest design, the
researcher seeks to partial out (remove or hold constant) the effect of the pretest,
in order to focus on possible changes following the intervention or treatment.
A researcher wanted to find out if the critical thinking skills of students can be
improved using the inquiry method when teaching science. A sample of 30
students were selected and divided into the following groups: 13 high ability
subjects, 8 average ability subjects and 13 low ability subjects. A 10-item critical
thinking test was developed by the researcher and administered before the
intervention and after the intervention.
7.3.1
A One-way ANOVA was conducted on the data and the results are shown in
Table 7.1 as follows.
Table 7.1: Test of Homogeneity of Variance
Levene Statistics
df1
df2
Sig.
.711
27
.500
The homogeneity of variance table (Table 7.1) indicates that the variances of the
three groups are similar and the null hypothesis is rejected as the p value is 0.500
is more than the p value of .05. Hence, you have not violated one of the
assumptions for using ANOVA.
Table 7.2: Means and Standard Deviations
Ability
Mean
Std. Deviation
Low
3.22
1.78
Average
4.87
1.45
High
13
4.84
2.11
Total
30
4.37
1.95
117
Table 7.2 shows the means and standard deviations for the three groups of
subjects low, average and high ability. Although the high ability group subjects
scored 4.84 and low ability subjects scored only 3.22; the difference between the
ability levels is not significant. Therefore, teaching students using the inquiry
method seems to have no significant effect on critical thinking.
Table 7.3: ANOVA Table
Dependent Variable: Critical Thinking
Sum of
Squares
df
Mean
Square
Sig.
Corrected Model
16.844a
8.422
2.416
.108
Intercept
535.184
535.184
153.522
.000
Between Groups
16.844
8.422
2.416
.108
Within Groups
94.123
27
3.486
Total
583.000
30
Corrected Total
110.967
29
7.3.2
The same critical thinking test was administered before the commencement of the
experiment which served as the pretest. What happens when the scores of the
pretest are included in the model as a covariate?
See the ANOVA table with the covariate included. Compare this to the ANOVA
table when the covariate was not included. The format of the ANOVA table is
largely the same as without the covariate (see Table 7.4), except that there is an
additional row of information about the covariate (pretest).
118
df
Mean
Square
Sig
Corrected
Model
31.920
10.640
3.500
0.030
Intercept
76.069
76.069
25.020
0.000
PRETEST
15.076
15.076
4.959
0.035
Between Group
25.185
12.593
4.142
0.037
Within Groups
79.047
26
3.040
Total
683.000
30
Corrected Total
110.967
29
Source
Mean
Std. Error
Low
2.92
.59
Average
4.71
.62
High
13
5.15
.50
Average
High
*
* Significant at p = .05
Looking first at the significance values, it is clear that the covariate (i.e. pretest)
significantly influenced the dependent variable (i.e. posttest), because the
significance values are less than .05. Therefore, performance in the pretest had a
significant influence on the posttest. What is more interesting is that when the
effect of the pretest is removed, teaching science using the inquiry method
becomes significant (p is .037 which is less than .05). There was a significant
effect of the inquiry method of teaching on critical thinking after controlling
for the effect of the pretest, F(2,26) = 4.14, p <.05.
Table 7.5 shows the adjusted means (The Sidak test was used to obtain the
adjusted means). These values should be compared with Table 7.2 to see the
Copyright Open University Malaysia (OUM)
119
effect of the covariate on the means of the three groups. The results show that low
ability subjects differed significantly from high ability subjects on the critical
thinking test (see Table 7.6). However, there were significant differences between
average and high ability subjects.
CONCLUSION
This example illustrates how ANCOVA can help us exert stricter experimental
control by taking into account confounding variables to give us a purer measure
of the effect of the experimental manipulation. Without taking into account the
pretest, we would have concluded that the inquiry method of teaching science had
no effect on critical thinking of subjects, yet clearly it does.
SPSS PROCEDURES TO
COVARIANCE (ANCOVA)
CONDUCT
AN
ANALYSIS
OF
Select the dependent variable (e.g. geography test) and click on the
arrow button to move the variable into the Dependent Variable: box
Select the covariate (e.g. attitude) and click on the arrow button to
move the variable into the Covariates(s): box
120
ACTIVITY 7.1
A researcher conducted a study on the memory of four groups of people
of different age groups. Since memory may be related to IQ, the
researcher decided to control it.
1. What is the covariate?
2. What would his analysis show?
3. State a hypothesis for the study.
ACTIVITY 7.2
Refer to the following Table 7.7, which is an SPSS output and answer
the following questions:
1. State the independent variable. Give reasons.
2. Which is the covariate? Explain.
3. State the dependent variable. Give reasons.
4. State a hypothesis for the above results.
5. Do you reject or do not reject the hypothesis stated above?
Table 7.7: SPSS Output
Dependent Variable: Reaction Time
Source
Corrected
Model
Intercept
Age
Group
Error
Total
Corrected
Total
df
Mean
Square
Sig.
76.252
25.417
36470
.064
4.792
4.252
41.974
55.748
1860.000
1
1
2
8
12
4.792
4.252
20.987
6.969
.688
.610
3.012
.431
.457
.106
132.000
11
121
Covariate
Homogeneity of regression
Homogeneity of variance
Independence
Linearity
Normality
Reliability of the covariate
Topic Correlation
LEARNING OUTCOMES
By the end of this topic, you should be able to:
1.
2.
3.
INTRODUCTION
8.1
Researchers are often concerned with the way two variables relate to each other
for given groups of persons such as students in schools and workers in a factory or
office. For example, do students who have higher scores in mathematics also have
higher scores in science? Is there a relationship between a person's self-esteem
and his personality? Is there a relationship between attitudes towards reading and
the number of books read? Is there a relationship between years of experience as a
teacher and attitudes towards teaching?
The correlation coefficient is a number between 0 and 1. If there is no
relationship between the values, the correlation coefficient is 0 or very low. As
the strength of the relationship between the values increases, so does the
correlation coefficient. Thus, the higher the correlation coefficient, the better is
the relationship.
Copyright Open University Malaysia (OUM)
TOPIC 8
8.2
CORRELATION
123
PEARSON PRODUCT-MOMENT
CORRELATION COEFFICIENT
(b)
Assumptions Testing
Correlational analysis has the following underlying assumptions: (S. Coakes
and L. Steed, 2002, SPSS Analysis Without Anguish. Brisbane: John Wiley
& Sons)
Related Pairs the data to be collected from related pairs: i.e. if you
obtain a score on an X variable, there must be a score on the Y variable
from the same subject.
124
TOPIC 8
CORRELATION
Trivial
Low to
Moderate
Moderate
to
Substantial
Substantial
to Very
Strong
Very
Strong
Near
Perfect
0.01-0.09
0.10-0.29
0.30-0.49
0.50-0.69
0.70-0.89
> 0.90
How high does a correlation coefficient have to be, to be called strong? How
small is a weak correlation? The answer to these questions varies with the
variables being studied. For example, if the literature shows that in previous
research, a correlation of 0.51 was found between variable X and variable Y, but
in your study you obtained a correlation of 0.60; then you might conclude that the
correlation between variable X and Y is strong.
However, Cohen (1988) has provided some guidelines to determine the strength
of the relationship between two variables by providing descriptors for the
coefficients. Keep in mind that in education and psychology, it is rare that the
coefficients will be very strong or near perfect since the variables measured
are constructs involving human characteristics, which are subject to wide
variation.
Example:
Data was gathered for the following two variables (IQ test and science test) from a
sample of 12 students. Refer to Table 8.1 below.
Table 8.1: Data of Two Variables (IQ Test and Science Test)
Student No.
IQ Test
Science Test
1
2
3
4
5
6
7
8
9
10
11
12
(X)
120
112
110
120
103
126
113
114
106
108
128
109
(Y)
31
25
19
24
17
28
18
20
16
15
27
19
Each unit or student is represented by a point on the scatter diagram (see the
following Figure 8.2). A dot is placed for each student at the point of
intersection of a straight line drawn through his IQ score perpendicular to the
X-axis and through his science score perpendicular to the Y-axis. For
Copyright Open University Malaysia (OUM)
TOPIC 8
CORRELATION
125
Figure 8.2: Scatter Diagram Showing the Relationship between IQ Scores (X-axis)
and Science Score (Y-axis) for 12 Students
8.2.1
Note that rxy can never take on a value less than 1 nor a value greater than + 1 (r
refers to the correlation coefficient, x the X-axis and y the Y-axis). The following
are three graphs showing various values of rxy and the type of linear relationship
that exists between X and Y for the given values of rxy.
(a)
Positive Correlation
Value of rxy = + 1.00 = Perfect and Direct Relationship.
126
TOPIC 8
CORRELATION
See Figure 8.3. If Attitudes (x) and English Achievement (y) had a positive
relationship then the Slope (1) will be a positive number. Lines with positive
slopes go from the bottom left toward the upper right, i.e. an increase from 1 to 2
on the X-axis is followed by an increase from 3 to 3.5 on the Y-axis.
(b) Negative Correlation
Value of rxy = 1.00 = Perfect Inverse Relationship.
TOPIC 8
CORRELATION
127
See Figure 8.4. If Attitudes (x) and English Achievement (y) have a negative
relationship than the Slope (1) will be a negative number. Lines with
negative slopes go from the upper right to the lower left. The above graph
has a slope of 1. An increase of 1 on the X-axis is associated with a
decrease of 0.5 on the Y-axis; i.e an increase from 1 to 2 on the X-axis is
followed by a decrease from 5 to 4.5 on the Y-axis.
(c)
Zero Correlation
Value of rxy = .00 = No Relationship.
If Attitudes (x) and English Achievement (y) have zero relationship (as shown in
Figure 8.5) than there is NO SYSTEMATIC RELATIONSHIP between X and Y.
Here, some students with high Attitude scores have positive low English scores,
while some students who have low Attitude scores have high positive English
scores.
8.3
128
TOPIC 8
CORRELATION
Verbal
Test
x
Seng Huat
Fauzul
Shalini
Tajang
Sheela
Kumar
Mei Ling
Azlina
Ganesh
Ahmad
Kong Beng
Ningkan
13
10
12
14
10
12
13
9
14
11
8
9
x = 135
Spatial
Test
y
xy
7
6
9
10
7
11
12
10
13
12
9
8
169
100
144
196
100
144
169
81
196
122
64
81
49
36
81
100
49
122
144
100
169
144
81
64
91
60
108
140
70
132
156
90
182
132
72
72
x = 1566
y =1139 xy =1305
y = 114
r
( X 2
X Y
XY N
( X)
( Y)
)( Y
2
TOPIC 8
SSxy xy
SSxx x 2
( x)( y )
n
( x) 2
1303
CORRELATION
129
(135)(114) 15390
22.50
12
12
1566
(135) 2
12
1139
(114) 2
12
n
18225
1566
1566 1518.75 47.25
12
SSyy y 2
( y ) 2
n
12996
1139
1139 1083 56.00
12
12
r
( X 2
X Y
XY N
( X )
( Y )
)( Y
2
22.50
(47.50)(56.00)
22.50
0.436
51.58
8.4
PEARSON PRODUCT-MOMENT
CORRELATION USING SPSS
A study was conducted to determine the relationship between reading ability and
performance in science. A Reading Ability and Science test was administered to
200 lower secondary students. The Pearson product-moment correlation was used
to determine the significance of the relationship. The steps for using SPSS are as
follows:
130
TOPIC 8
CORRELATION
SPSS Procedures:
3. Select the variables you require (i.e. reading and science) and
click on the arrow button to move the variables into the
Variables: box.
8.4.1
SPSS Output
To interpret the correlation coefficient, you examine the coefficient and its
associated significance value (p). The output show that the relationship between
reading and science scores is significant with a correlation coefficient of r = 0.63
which is p < .05. Thus, higher reading scores are associated with higher scores in
science.
TOPIC 8
8.4.2
CORRELATION
131
8.4.3
The null hypothesis (Ho:) states that the correlation between X and Y is = 0.0.
What is the probability that the correlation obtained in the sample came from a
population where the parameter = 0.0? The t-test for the significance of a
correlation coefficient is used. Note that the correlation between reading and
science (r = 0.630) is significant at p < 0.05.
Hence, the null hypothesis is REJECTED which affirms that the two variables are
positively related in the population.
Coefficient of Determination:
r = The correlation between X and Y = 0.630 and r = The coefficient of
determination = (0.630) = 0.3969.
Hence, 39.6% of the variance in Y can be explained by X.
132
TOPIC 8
8.4.4
CORRELATION
SPSS Output
SPSS Procedures:
1.
2.
3.
4.
5.
Select the first variable (i.e. science) and click on the arrow button to
move the variable into the Y Axis: box.
.
6.
Select the second variable (i.e. reading) and click on the arrow button to
move the variable into the X Axis: box.
6.
Click on OK.
TOPIC 8
CORRELATION
133
As you can see from the scatter plot (Figure 8.7) there is a linear relationship
between reading and science scores. Given that the scores cluster uniformly
around the regression line, the assumption of homogeneity of variance has not
been violated.
1
2
Rank
Rank
Rank difference
Sales Advertisement Sales Advertisement
d
(mil) - X
(mil) -Y
11
175.3
66.8
11
0
154.9
59.0
172.7
61.3
10
167.6
61.3
167.6
54.5
160.0
d2
-5
25
7.5
2.5
6.25
7.5
-0.5
0.25
4.5
2.5
6.25
52.2
4.5
2.5
182.9
68.1
12
157.5
47.7
12
2.5
4
0
1.5
2.25
157.5
52.2
2.5
2.5
10
170.2
65.8
10
-1
11
167.6
64.5
-2
12
160.0
54.5
4.5
4.5
0
49
rs 1
6 d 2
n (n 2 1)
=1
6(49)
= 0.796
12(121 1)
Ranks are
assigned to
scores by
giving rank 1
to the smallest
score and
rank 2 to the
value and so
on.
Scores with
same values
will share the
rank
134
TOPIC 8
8.5
CORRELATION
The Spearman Rank Order Correlation is used to determine the linear relationship
between the two variables listed as follows:
1.
2.
3.
Select the variables you require (i.e. reading and science) and click on
the arrow button to move the variables into the Variables: box.
4.
5.
6.
Click on OK.
Results
Correlations
Spearman's rho
rq2
rq6
Correlation Coefficient
Sig. (2-tailed)
N
Correlation Coefficient
Sig. (2-tailed)
N
rq2
1.000
.
203
.507**
.000
203
rq6
.507**
.000
203
1.000
.
203
The p-value of 0.000 (less than 0.05), shows that the linear relationship is a
true reflection of the phenomena in the population. In other words, the linear
relationship seen in the sample is NOT due to mere chance.
TOPIC 8
CORRELATION
135
ACTIVITY 8.1
Geography Test
22
17
20
18
23
21
19
24
19
16
(a)
(b)
What is the mean for the self-efficacy scale and the mean of the
geography test?
(c)
(d)
(e)
(f)
136
TOPIC 8
CORRELATION
The linear relationship between two variables is evaluated from two aspects:
the strength of the relationship (correlation), and the cause-effect association
(regression).
The value for correlation coefficient ranges from 1 to +1. Any value close to
these extremes indicates the strength of the linear relationships in the same or
opposite direction.
There are two methods for computing the correlation coefficient, the Pearson
correlation, and Spearman Rank Order correlation. The latter is the nonparametric equivalent of the former and used when the data is measured in an
ordinal level or when the sample size is small.
The correlation coefficient computed from the sample indicates the strength of
the relationship in the sample. To generalise a linear relationship to the
population, the significant test needs to be performed.
Coefficient of determination
Linear relationship
Pearson's product-moment correlation
Scatter diagram
Spearman rank order correlation
Topic Linear
Regression
LEARNING OUTCOMES
By the end of this topic, you should be able to:
1.
2.
3.
4.
INTRODUCTION
9.1
138
Basically regression is a technique of placing the best fitting straight line to represent
a cluster of points (see the following Figure 9.1). The points are defined in a twodimension plane. The straight line expresses the linear association between the
variables studied. It is a useful technique to establish cause-effect relationship
between variables and to forecast future results/outcomes. An important consideration
in linear regression analysis is, the researcher must identify the independent and
dependent variable prior to the analysis.
9.2
Y = a + bX
Slope
The inclination of a regression line as compared to a base line:
n XY X Y
n X 2 X 2
Y-intercept
An intercepted segment of a line, the point at which a regression line intercepts
the Y-axis:
a Y bX
139
Example:
A research was conducted at TESCO Hypermarket to determine if there is a
cause-effect relationship between the sales and expenditure on advertisements.
Table 9.1 illustrates the computation of the regression coefficients.
Table 9.1: Computation of Regression Coefficients
Month
1
2
3
4
5
6
7
8
9
10
11
12
Total
157.5
157.5
160.0
160.0
167.6
154.9
167.6
172.7
167.6
170.2
175.3
182.9
1993.9
47.7
52.2
52.2
54.5
54.5
59.0
61.3
61.3
64.5
65.8
66.8
68.1
707.9
Mean
166.2
59.0
(X*Y)
7507.07
8222.03
8354.64
8717.89
9133.03
9144.56
10274.66
10586.01
10812.78
11202.95
11707.37
12454.13
118117.11
X^2
24799.95
24799.95
25606.40
25606.40
28103.17
24006.40
28103.17
29832.20
28103.17
28961.23
30716.07
33445.09
332083.21
0.63
a Y bX 46.86
The regression equation for the relationship between Sales and Expenditure on
advertisements is:
Sales = 0.63 (Expenditure on advertisement) 46.86
This means that, on average every increase of RM 100,000 advertisement
expenditure will lead to an increase of RM 0.63 million in sales.
140
9.3
The slope computed, simply shows the degree of the relationship between the
variables in the sample observed. Whether this is due to chance or there is a true
relationship between these two variables can only be determined through the
significant test for regression coefficient.
Example
If the researcher would like to test the hypothesis that there is a true relationship
between sales and expenditure on advertising, the following procedures need to be
adhered.
9.3.1
Prior to proceeding with the significant test for the slope, the assumption of
linearity need to be tested first. This is simply to gather statistical evidence that
the Linear Regression model that we proposed is an appropriate model in relating
the relationship between the variables. The linearity test is also called the global
test.
The Hypothesis
Ho: The variation in the dependent variable is not explained by the linear model
(R2 = 0).
F-value is 13.46
P-value is 0.01
df
1
9
10
SS
254.65
170.22
424.88
MS
254.65
18.91
F
13.46
p-value
0.01
141
Since the p-value is smaller than 0.05, reject null hypothesis and conclude the
alternative hypothesis. There is a linear relationship between the variables studied.
From the data it is evident that there is a linear relationship between sales and
expenditure on advertising.
Now, we can proceed to the test of significance for the regression slope.
9.3.2
The next step is testing the significance of the slope. This is to test whether there
is a significant contribution of the predictor variable to the changes in the
dependent variable. In our case, it is to test the significant contribution of
expenditure on advertising to sales.
Note : For simple linear regression where there is only one independent variable,
if linear relationship is proven the significance test for the slope will show
significant departure from zero.
Requirements
Parameter to be tested: Regression Slope,
Normality: Sample statistics (in this case, b) resembles normal distribution.
Sample size: Large
Recommended test: t-test for regression slope.
Test statistics: t
b
SE (b)
The Hypothesis
H0:
The regression slope is equal to zero.
Ha:
Intercept
Slope
Coefficients
-46.86
0.633
Standard Error
14.77
0.1656
t-Stat
-3.17
3.82
P-value
0.006
0.005
142
t-value is 3.82
p-value is 0.005
Since the p-value is smaller than 0.05, reject null hypothesis and conclude the
alternative hypothesis. The regression slope is not equal to zero. There is a true
relationship between the variables studied. Sales is linearly related to expenditure
in advertisement. The regression coefficient for this relationship is:
Sales = 46.86 + 0.633 (Expenditure on advertisement) + Error
The R2 is 0.599, meaning that 59.9% of the variation in Sales is attributed to the
variation in Expenditure on advertising.
9.4
Linear Relationship
Normal Error
Homoscedasticity
143
SPSS Procedures:
1.
2.
3.
Select the dependent variable and push it into the Dependent Box
4.
5.
6.
Click Continue
7.
Click on OK.
Results
The first step in regression analysis: Global Hypothesis
Ho: The variation in the dependent variable is not explained by the linear model
(R2 = 0).
Ha: A significant porting of the variation in the dependent variable is explained
by the linear model (R2 0).
Refer to Figure 9.2.
144
Since the p-value is less than 0.05, reject the null hypothesis and conclude that a
significant porting of the variation in the dependent variable is explained by the
linear model. Refer to Figure 9.3.
The R2 is 0.306; indicates that about 30.6% of the variation in the customers
satisfaction can be attributed to the changes in the respondents perception on
employees knowledge.
The next step is to test the significant of the slope. In simple linear regression if
the global hypothesis shows that there is a significant linear relationship between
the dependent and independent variable, the significance test for the slope will
also provide evidence that it is significantly different from zero.
The Hypothesis
H0:
Ha:
145
Since the p-value is less than 0.05, reject the null hypothesis and conclude that the
regression slope is not equal to zero. Thus,
Customers Satisfaction = 0.553 (Employees knowledge) + 2.596 + Error
9.5
MULTIPLE REGRESSION
: TV advertisement cost
: Training of sales executives cost
: Cost for employing promoters
: Cost for distributing free samples
: Cost for leasing prime spots at hyper and supermarkets
(a)
(b)
The Hypothesis
Ho: The variation in the sales is not explained by the linear model
comprising of costs for TV advertisement, training of sales executives,
employing promoters, distributing free samples, and leasing prime spots. (R2
= 0).
146
The researcher performs the ANOVA for the linear relationship between
sales and all the defined predictor variables. The result for it is shown in
Table 9.4.
Table 9.4: The Results of the ANOVA for Multiple Regressions
Model
Sum of Squares
df
Mean Square
Regression
30.866
Residual
90.216 4652
Total
Sig.
121.082 4657
Since the p-value is smaller than 0.05, reject the null hypothesis and
conclude the alternative hypothesis. There is a linear relationship between
the variables studied. From the analysis it is evident that there is a linear
relationship between the sales and the combination of the predictor
variables.
The next step is the test of significance for the regression slope (for every
independent [predictor] variable). This is to determine the contribution of
each predictor variable independently.
(c)
Requirements
(d)
b
SE (b)
The Hypothesis
147
The researcher performs the t-test for regression slopes for the linear
relationship between Sales and the following variables:
(i)
(ii)
Unstandardised Coefficients
B
3.5373
0.1214
-0.1247
0.2626
(Constant)
TV ads
Train
Promoters
Free
samples
Prime spots
.05965
0.2163
Sig.
Std. Error
0.4038
0.0261
0.0944
0.0138
8.76
4.650
-1.321
19.095
.000
.000
0.429
.000
0.0114
5.208
.000
0.1531
1.413
0.115
Since the p-value is smaller than 0.05, for (i) costs for TV advertisements,
(ii) employing promoters and (iii) distributing free samples.
The regression model for this relationship between Sales and costs of
advertisements is:
Sales = 3.54 +0.1214 (TV) + 0.2626(Promoters) + 0.0597(Free Samples) + error
The adjusted R2 is 0.254, meaning that 25.4% of the variation in the sales is
attributed to the combined variation in the costs for TV advertisement,
employing promoters, distributing free samples.
148
9.6
In a study on hospital service quality, the researcher classified service quality into
the following dimensions: assurance, reliability, service policy, tangibles, problem
solving and convenience. Apart from this, he also assessed the overall patients
satisfaction with the services. The following is the description of the hospital
service quality dimensions.
Dimension
Number of Items
Assurance
Reliability
Service Policy
Tangibles
Problem Solving
Convenience
Ho: The variation in patients overall satisfaction is not explained by the linear
model comprising of patients assessment on assurance, reliability, service
policy, tangibles, problem solving and convenience. (R2 = 0).
Ha: A significant porting of the variation in patients overall satisfaction is not
explained by the linear model comprising of patients assessment on
assurance, reliability, service policy, tangibles, problem solving and
convenience. (R2 0).
149
SPSS Procedures:
1.
2.
3.
Select the dependent variable and push it into the Dependent Box
4.
Select the independent variables and push them into the Independent
Box
5.
6.
Click Continue
7.
Click on OK.
Results
Since the p-value is less than 0.05, reject the null hypothesis and conclude that a
significant portion of the variation in the dependent variable is explained by the
linear model.
The next step is the test of significance for the regression slope (for every
independent [predictor] variable). This is to determine the contribution of each
predictor variable independently.
150
The Hypothesis
H0:
Ha:
Model
1
R
R Square
a
.790
.624
Adjusted
R Square
.619
R Square
Change
.624
Durbin-W
atson
1.952
151
Refer to Figure 9.7. The adjusted R2 is 0.619, meaning that 61.9% of the variation
in the overall satisfaction is attributed to the combined variation in patients
perception of assurance, reliability and convenience of services provided by the
hospital.
ACTIVITY 9.1
(a)
(b)
(c)
Plot a Scatter Plot for the data and find the best fitting line.
(d)
(e)
152
The linear relationship between two variables is evaluated from two aspects:
the strength of the relationship (correlation), and the cause-effect association
(regression).
In statistics, correlation is used to denote association between two quantitative
variables, assuming that the association is linear.
Linear regression is a technique to establish the cause effect relationships
between two variables. If the two variables are related, the changes in one will
lead to some changes in the corresponding variable. If the researcher can
identify the cause and effect variable, the relationship can be represented in
the form of the following equation:
Y = a + bX;
where Y is the dependent variable, X is the independent variable, and a and b
are two constants to be estimated.
Intercept
Regression equation
Linear regression
Multiple regression
Slope
Regression coefficient
Topic Non-parametric
10
Tests
LEARNING OUTCOMES
By the end of this topic, you should be able to:
1.
2.
3.
4.
INTRODUCTION
This topic provides a brief explanation on the parametric and non-parametric test.
Detailed description is given on chi-square, Mann-Whitney and Kruskal-Wallis
tests. Besides that, the assumptions underlying these statistical techniques are
provided to facilitate student learning. It demonstrates how non-parametric
statistical procedures can be computed using formulae as well as SPSS and how
the statistical results should be interpreted.
10.1
Descriptive statistics are used to compute summary statistics (e.g. mean, median,
standard deviation) to describe the samples, while statistical tests are used for
making inference from sample to the intended population. The following diagram
in Figure 10.1 illustrates this.
Copyright Open University Malaysia (OUM)
(ii)
The parametric or distribution constraint test is a statistical test that requires the
distribution of the population to be specified. Thus, parametric inferential methods
assume that the distributions of the variables being assessed belong to some form
of known probability distribution (e.g. assumption that the observed data are
sampled from a normal distribution).
In contrast, for non-parametric test (also known as distribution-free test) the
distribution is not specified prior to the research but instead determined from the
data. Thus, this family of tests do not require the assumption on the distribution.
Most commonly used non-parametric tests rank the outcome variable from low to
high and then analyse the ranks rather than the actual observation.
Choosing the right test will contribute to the validity of the research findings.
Improper use of statistical tests will not only cause the validity of the test result to
be questioned and do little justification to the research, but at times it can be a
serious error, especially if the results have major implications. For example, it is
used in policy formulation and so on.
Parametric tests have greater statistical power compared to their non-parametric
equivalent. However, parametric tests cannot be used all the time. Instead, they
should be used if the researcher is sure that the data are sampled from a
population that follows a normal distribution (at least approximately).
155
For large data set, use the Kolmogorov-Smirnov test (sample > 100) or
Shapiro-Wilk test (sample < 100) to test whether the distribution of the data
differs significantly from what is normal. This test can be found in most
statistical softwares.
When in doubt, use a non-parametric test; you may have less statistical power
but at least the result is valid.
Sample size plays a crucial role in deciding the family of statistical tests:
parametric or non-parametric. In a large sample, the central limit theorem ensures
that parametric tests work well even if the population is not normal. Parametric
tests are robust to deviations from normal distributions, when the sample size is
large. The issue here is how large is large enough; a rule of thumb suggests that a
sample size of about 30 or more for each category of observation is sufficient to
use the parametric test. The non-parametric tests also work well with large
samples. The non-parametric tests are only slightly less powerful than parametric
tests with large samples.
On the other hand, if the sample size is small we cannot rely on the central limit
theorem; thus, the p value may be inaccurate if the parametric tests were to be
used. The non-parametric test suffers greater loss of statistical power with small
sample size. Table 10.1 summarises some of the commonly used parametric and
non-parametric tests but not all of them are explained in this module.
Requirements
Random sampling
Large sample size
Level of measurement at
least interval
Population parameter is
normally distributed
Test Type
Parametric
One sample Test
Z-test for population proportion
Z-test for population mean
T-test for population mean
Two-sample Test
Z-test for equality of two proportions
t-test for population mean
Paired t-test
Test involving more than two groups
One Way ANOVA
One Sample Test
X 2 Goodness of fit
Random sampling
Small sample size (less
than 30)
Level of measurement can
be lower than interval
Distribution of the
population parameter is
not important
Two-sample Test
X 2 test for differences between two
population
Fishers Exact test
McNemars test
X 2 test of independence
Wilcoxon signed rank test
Mann-Whitney U test
157
(b)
Assumptions
Even though certain assumptions are not critical for using the chi-square,
you need to address a number of generic assumptions:
2.
10.2.1
This test enables us to find out whether a set of Obtained (or Observed)
Frequencies differs from a set of Expected Frequencies. Usually the Expected
Frequencies are the ones that we expect to find if the null hypothesis is true. We
compare our Observed Frequencies with the Expected Frequencies and see how
good the fit is.
Example :
A sample of 110 teenagers was asked, which of the four hand phone brands they
preferred. The number of people choosing the different brands was recorded in
Table 10.2.
Table 10.2: Preferences for Brands of Hand Phones
Brand A
Brand B
Brand C
Brand D
20 teenagers
60 teenagers
10 teenagers
20 teenagers
We want to find out if one or more brands are preferred over others. If they are
not, then we should expect roughly the same number of people in each category.
There will not be exactly the same number of people in each category, but they
should be near equal.
Another way of saying this is: If the null hypothesis is TRUE, and some brands
are not preferred more than others, then all brands should be equally represented.
We expect roughly EQUAL NUMBERS IN EACH CATEGORY, if the NULL
HYPOTHESIS is TRUE.
Expected Frequencies
There are 110 people, and there are four categories. If the null hypothesis is true,
then we should expect 110 / 4 = 27.5 teenagers to be in each category. This is
because, if all brands of hand phones are equally popular, we would expect
roughly equal numbers of people in each category. In other words, the number of
teenagers should be evenly distributed among the four brands.
The numbers that we find in the four categories, if the null hypothesis is true
are called the EXPECTED FREQUENCIES (i.e. all brands are equally
popular).
The numbers that we find in the four categories are called the OBSERVED
FREQUENCIES (i.e. based on the data we collected).
See Table 10.3. What 2 does is to compare the Observed Frequencies with the
Expected Frequencies.
If all brands of hand phones are equally popular, the Observed Frequencies
will not differ from the Expected Frequencies.
159
Table 10.3 shows the observed and expected frequencies for the four brands of
hand phones. It is often difficult to tell just by looking at the data, which is why
you have to use the 2 test.
Table 10.3: Expected and Observed Frequencies and the Differences
Column
1
Brand A
Brand B
Brand C
Brand C
Column
2
Observed
(O)
20
60
10
Column 3
Column 4
Column 5
Expected
(E)
27.5
27.5
27.5
Difference
(O - E)
-7.5
32.5
-17.5
(O E)2
20
27.5
-7.5
56.25
1056.25
306.25
56.25
TOTAL
Column 6
(O E)2
E
2.05
38.41
11.14
2.05
53.65
Step 4:
Add up the figures you obtained in Column 6 and you get 53.65. So the 2 is
53.65.
The formula for the 2 which you did above is shown as follows:
observed
Step 5:
The degrees of freedom (DF) is one less than the number of categories. In this
case, DF is 4 categories 1 = 3. We need to know this, for it is usual to report the
DF, along with the 2 and the associated probability level.
SPSS Output
Hand phones
Chi-Square
53.65a
Df
Asymp. Sig.
.0000
161
Click on the Weight Cases to open the Weight Cases dialogue box.
Select the variable you require and click on the right arrow button to
move the variable in the Frequency Variable: box.
Click on OK. The message Weight On should appear on the status bar
at the bottom of the application window.
Select the variable you require and click on the right arrow button to
move to the variable into the Test Variable List: box.
Click on OK.
10.2.2
Do not Smoke
50
15
Active in Sports
20
25
Example
A researcher is interested in finding out whether male students from high income
or low income families get into trouble more often in school. The following Table
10.5, shows the frequencies of male students from low and high income family
who have discipline problems in school:
Table 10.5: Observed Frequencies
Discipline
Problems
No Discipline
Problems
Total
Low Income
46
71
117
High Income
37
83
120
Total
83
154
237
163
The expected value for each cell of the table can be calculated using the following
formula:
Row total Column total
Total for table
For example, in the table comparing the percentage of high income and low
income students involved in disciplinary problems, the expected count for the
number of low income students with discipline problems is:
Expected Frequency (E1) =
117 83
40.97
237
120 154
77.97
237
Use the formula and compute the Expected Frequencies for E2 and E3. Table 10.6
shows the completed expected frequencies for all the four cells.
Table 10.6: Observed and Expected Frequencies
Discipline
Problems
No Discipline
Problems
Total
Low Income
O = 46
E1 =
O = 71
E2 =
117
High Income
O = 37
E3 =
O = 83
E4 =
120
Total
83
154
237
40.97
42.03
76.03
77.97
2
x 1.87
x2
(a)
Degrees of Freedom
Before we can proceed, we need to know how many degrees of freedom we
have. When a comparison is made between one sample and another, a
simple rule is that the Degrees of freedom equal (Number of columns 1) x
(Number of rows 1) not counting the totals for rows or columns.
Statistical Significance
When the computed 2 statistic is less than the critical value in the table
for a 0.05 probability level, then we DO NOT reject the null hypothesis
of equal distributions.
Since our 2 = 1.87 statistic is less than the critical value for 0.05
probability level (3.841) we DO NOT reject the null hypothesis and
conclude that students from low income families are NOT
SIGNIFICANTLY more likely to have discipline problems than students
from high income families.
165
0.5
0.10
0.05
0.02
0.01
0.001
0.455
2.706
3.841
5.412
6.635
10.827
1.386
4.605
5.991
7.824
9.210
13.815
2.366
6.251
7.815
9.837
11.345
16.268
3.357
7.779
9.488
11.668
13.277
18.465
4.351
9.236
11.070
13.388
15.086
20.517
Note:
CHI-SQUARE
TEST
FOR
Select a row variable and click on right arrow button to move the variable
into the Row(s): box
Select a column variable and click on the right arrow button to move the
variable into the Column(s): box
Click on the Statistics command push button to open the Crosstabs: Statistics
sub-dialogue box
Note:
Click on Continue.
In the Counts box, click on the Observed and Expected check boxes.
In the Percentages box, click on the Row, Column and Total check boxes.
ACTIVITY 10.1
Look at the following table:
10-14 years
15-19 years
20-24 years
25-29 years
72
31
15
50
ACTIVITY 10.2
No
Total
Urban
36
14
50
Rural
30
25
55
Total
66
39
105
167
Questions:
sample from an unrelated population. This test uses the median as the parameter
for comparisons. The Mann-Whitney U test is applied when the sample size is
small (less than 30 per group) and/or when the level of measurement is ordinal.
Refer to Figure 10.2.
Unrelated Samples
n (n 1)
Test Statistics, T = S 1 1
where S is the sum of rank of population 1 and
2
n1 is the sample size of population 1. Population is the population with smaller
sum of rank value.
The Mann-Whitney test uses the rank sum as the test statistics. The procedure is
as follows:
The two independent samples are combined and ranks are assigned to the
scores (it can be a mean score).
The distribution functions of the two populations differ only with respect to
location, if they differ at all.
Example:
169
8.7
5.0
8.2
4.3
7.7
3.9
7.4
3.3
7.4
2.4
7.1
1.7
6.9
6.8
6.3
5.0
4.2
4.1
2.2
We wish to know whether these data provide sufficient evidence to indicate that
behaviour modification psychotherapy using TV advertisements improves the
brand preference among adult shoppers.
The Hypothesis
Ho: There is no difference in the brand preference between the group that
received behaviour modification therapy and the control group.
Ha: There is a difference in the brand preference between the group that received
behaviour modification therapy and the control group.
The level of significance is set at 0.05 ( = 0.05). Table 10.9 presents the Result
of Analysis on brand preference scores of treatment and control groups.
11.9
11.7
9.5
9.4
8.7
8.2
7.7
7.4
7.4
7.1
6.9
6.8
6.3
5.0
4.2
4.1
2.2
Rank
27
26
25
24
23
22
21
19.5
19.5
18
17
16
14
9.5
Ctrl
6.6
5.8
5.4
5.1
5.0
4.3
3.9
3.3
2.4
1.7
Rank
15
13
12
11
9.5
Ranking of the scores by arranging all the scores from both groups in
ascending order.
A rank of 1 is given to the smallest and same score will share the rank
n (n 1)
10(10 1)
= 26.00
= 81.5
T = S 1 1
2
2
p = 0.003
Example of SPSS output of the Mann-Whitney Test (refer Figure 10.4 below).
Mean
Sum
17.44
296.
5
8.15
81.5
171
Since the p-value is smaller than 0.05, reject null hypothesis and conclude the
alternative hypothesis. There is a difference in the brand preference between the
group that received behaviour modification therapy and the group that did not.
The brand preference score of the group that received behaviour modification
therapy is significantly different compared to the group that did not receive any
therapy. From the mean rank, it is evident that the brand preference score for the
group that received behaviour modification therapy is higher. In other words, the
behaviour modification psychotherapy using TV advertisement enhances brand
preference among adults.
Example: Mann-Whitney Test using SPSS
Mann-Whitney Test also can be used to compare the difference between two
distinct groups (e.g. male and female) rating on particular phenomena. In a
service quality survey carried at the Kuching General Hospital, the researcher
gauged the knowledge of hospital staff using a specially designed questionnaire.
He would like to test whether the knowledge level of male and female staff is
similar or differs significantly. The following Table 10.10 provides the mean
score and standard deviation of respondents assessment on the knowledge of the
hospital staff.
Table 10.10: Hospital Staff Knowledge
Mean
Maximum
Male
24
4.58
1.213
Female
31
5.00
1.065
The Hypothesis
Ho :
Ha:
SPSS Command
SPSS PROCEDURES FOR THE MANN WHITNEY TEST
Select the variable you require and click on the right arrow button to
move the variable in the Test Variable List box.
Click on OK.
gender
male
female
Total
N
24
31
55
Test Statisticsa
Mann-Whitney U
Wilcoxon W
Z
Asymp. Sig. (2-tailed)
Knowledge of
hospital staff
311.000
611.000
-1.099
.272
Mean Rank
25.46
29.97
Sum of Ranks
611.00
929.00
173
k R2
12
i - 3 (N + 1), where
Test Statistics, H =
N ( N 1) i 1 ni
The independent samples are combined and ranks are assigned to the scores
(it can be a mean score).
The populations are identical except for a possible difference in location for
at least one population.
Example:
257 302 206 318 449 334 299 149 282 351
Objective:
175
The Hypothesis
257 302 206 318 449 334 299 149 282 351
4
14
12
Sum of rank
= 69
16
18
15
11
17
Group III
Manager
10
The
20
Kruskal-Wallis
21
statistics
Sum of rank
= 90
13
Sum of rank
= 72
19
is
computed
using
the
formula,
k R2
i
12
- 3 (N + 1), where
N ( N 1) i 1 ni
k R2
12
i - 3 (N + 1)
H=
N ( N 1) i 1 ni
= (
69 2 90 2 72 2
12
3 (21 1) )
21(21 1) 10
6
5
= 8.36
SPSS Output
Group
Mean Rank
Clerk
10
6.90
Supervisor
15.00
Manager
15.67
Total
21
Chi-Square
8.361
df
Asymp. Sig.
0.015
The Kruskal-Wallis 2 value is 8.361 and the p-value is 0.015. Since the p-value
is smaller than 0.05, reject null hypothesis and conclude the alternative
hypothesis. There is a difference in the average monthly expenditure on mobile
phone usage among the three groups. The average monthly expenditure on mobile
phone usage among the three different groups is not the same. Even though the
test statistics does not provide information on the differences in the average
monthly expenditure, judging from the mean rank, clerks spend the least
compared to supervisors and managers.
Example: Kruskal-Wallis Test using SPSS
With reference to the hospital service quality survey, the management wanted to
see how respondents employment influenced their assessment on the knowledge
of hospital staff. Respondents were grouped into three categories of employment
(public, private and students), while knowledge of hospital staff was rated on a
five point scale (assumed ordinal). The hospital administrator wanted to know
who gave better ratings: public sector employees, private sector employees or
students.
177
The Hypothesis
Select the variable you require and click on the right arrow button to
move the variable in the Test Variable List box.
Click on OK.
Results
Ranks
Employment
Goverment
Private
Students
Total
Knowledge of staff
N
1
5
17
23
Mean Rank
18.00
9.60
12.35
Test Statisticsa,b
Chi-Square
df
Asymp. Sig.
Knowledge of
staff
(assessment
before
attending
seminar)
1.694
2
.429
ACTIVITY 10.3
Fail
Method X
21
Method Y
29
(a)
(b)
(c)
179
There are two categories of statistical tests: (i) the parametric and (ii)
non-parametric tests.
The parametric or distribution constraint tests are statistical tests that require
the distribution of the population to be specified.
Among the commonly used non-parametric tests are chi-square test, MannWhitney Test and Kruskal-Wallis test.
The chi-square test tests the significant difference in proportion and is very
useful when the variable measured is nominal.
The chi-square is very flexible and mainly used in two forms (i) comparing
the observed proportion with some known values, and (ii) comparing the
difference in distribution of proportions between two groups whereby each
group can have two or more categories.
The Kruskal-Wallis test serves the same purpose as the one way ANOVA,
comparing the differences between more than two groups of samples from
unrelated populations. This test uses the median as the parameter for
comparisons.
The Kruskal-Wallis test is used when the sample size is small and/or when the
level of measurement is ordinal.
Chi-square test
Contingency table
Degree of freedom
Kruskal-Wallis test
Mann-Whitney test
Mean rank
Non-parametric
Parametric
APPENDIX
APPENDIX
183
Appendix A
Creating an SPSS Data File
After you have developed your questionnaire, you need to create an SPSS data file
to enable you to enter data into a format which can be read by SPSS. You can do
this via the SPSS Data Editor which is inbuilt into the SPSS package. When
creating an SPSS data file, your items/questions in the questionnaire will have to
be translated into variables. For example, if you have a question What is your
occupation? and this question has several response options such as 1. Salesman
2. Clerk 3. Teacher 4. Accountant 5. Others; what you need to do is to translate
your question into a variable a name, perhaps called occu. In the context of SPSS
data entry, these response options are called value labels, for example Salesman is
assigned a value label of 1, Clerk 2, Teacher 3, Accountant 4 and Others 5. If the
respondent is a teacher, you enter 3 when inputting data into the variable occu in
your data file. Sometimes you may have a question which requires the respondent
to state in absolute terms such as Your annual salary is _________ In this case,
you can create a variable name called salary. Since this variable only requires the
respondent to state his/her salary, you do not need to create response options
just enter the actual salary figure.
When defining the variable name, you have to consider the following:
(i)
(ii)
When defining a variable name, an uppercase character does not differ from a
lower case character.
Besides understanding the variable name convention and value labels, you will
also need to know other variable definitions such as variable label, variable type,
missing values, column format and measurement level. A variable label describes
the variable name, for example, if the variable name is occu, the variable label can
be Respondents occupation. You need not specify the variable label if do not
wish to but variable label improves the interpretability of your output especially if
you have many variables. Missing values can also be assigned to a variable. It is
Copyright Open University Malaysia (OUM)
184 APPENDIX
rare for one to obtain a questionnaire without any item being left blank. By
convention, a missing value is usually assigned a value of 9 but for statistical
analysis it would be preferable to assign a value which is equivalent to the mean
of the variable to fill up all the missing values. However, this can only be done for
interval or ratio level variables. For example, if you have the variable income and
data were derived from 150 respondents and 20 did not provide their income
information then compute the mean of the income via SPSS for the 150
respondents and then recode all missing values as the computed mean value.
The type of variable relates closely to your items in the questionnaire. For
example, the item age is a numeric variable, meaning you can input the variable
using only numbers such as if a persons age is 34 then you can type 34 under the
age variable column for this particular case. However, sometimes there is a need
to use alphanumeric characters to input data into a variable. A good example is
respondents address. In this case, alphanumeric characters constitute what is
called a string variable type. For example, a short open-ended question will be
Please state your address. The respondent will write his/her address using
alphanumeric characters such as 23 Jalan SS2/75, 47301 Petaling Jaya, Selangor.
So this address is actually a combination of alphabets and numbers.
The column format in the data editor allows you to specify the alignment of your
data in a column, for example left, centre or right. Measurement in the SPSS
variable definition convention differs slightly from that used in the statistics
textbook as SPSS uses scale to refer to both interval and ratio measurement.
Ordinal and nominal levels of measurement are maintained as they are. In
statistical analysis, it is extremely important to know what the level of
measurement for a particular variable is. A nominal variable (also called
categorical variable) classifies persons or objects into two or more categories,
for example, the variable gender is categorised as 1 for Male and 2 for Female,
marital status as 1 for Single, 2 for Married and 3 for Divorced. Numbering in
nominal variables does not indicate that one category is higher or better than
another, for example, representing 1 for Male and 2 for Female does not mean
that male is lower that female by virtue of the number being smaller. In nominal
measurement the numbers are only labels. On the other hand, an ordinal variable
not only classifies persons or objects; they also rank them in terms of degree.
Ordinal variables put persons or objects in order from highest to lowest or from
most to least. In ordinal scale, intervals between ranks are not equal, for
example, the difference between rank 1 and rank 2 is not necessarily the same as
the difference between rank 2 and rank 3. For example, a person(A) with a
height of 5 10 and falls under rank 1 does not have the same interval as a
person(B) with a height of 5 5 who is ranked 2 and another person(C) with a
height of 4 8 who is ranked 3. The difference in height among the three
Copyright Open University Malaysia (OUM)
APPENDIX
185
persons is not equal but there is an order, i.e. A is taller than B and B is taller
than C.
Interval variables have all the characteristics of nominal and ordinal variables but
also have equal intervals. For example, achievement test is treated as an interval
variable. The difference in a score of 50 and a score of 60 is essentially the same
as the difference between the score of 80 and 90. Interval scales, however, do not
have a true zero point. Thus, if Ahmad has a score of 0 for Mathematics it does
not mean he has no knowledge of mathematics at all nor does Muthu scoring 100
means he has total knowledge of Mathematics. Thus, if a person scores 90 marks
we know he scores twice as high as one who scores 45 but we cannot say that a
person scoring 90 knows twice as much as a person scoring 45.
Ratio variables are the highest, most precise level of measurement. This type of
variable has all the properties of the other types of variables above. In addition, it
has a true zero point. For example a persons height a person who is 6 feet tall is
twice as tall a person who is 3 feet tall. A person who weighs 50 kg is one third
the weight of another who is 150 kg. Since ratio scales encompass mostly physical
measures they are not used very often in social science research.
In SPSS, interval and ratio measurements are classified as scale variables.
Nominal and ordinal measurements remain as they are, i.e. nominal and ordinal
variables respectively.
A good understanding of the level of measurement will be useful when defining
the variables via the SPSS Data Editor and in the data analysis process. But before
you proceed to the next phase of data analysis, you need to enter data into a
format which can be read by SPSS. There are several ways you may do this, using
i. SPSS Data Editor ii. Excel iii. Access and iv. Word. The steps to enter data via
the SPSS Data Editor are described below.
How to define variables and enter data using the SPSS Data Editor?
Steps
1.
Click Start All Programs SPSS for Windows SPSS 12.0 for
Windows select Type in data OK Variable View Start defining
your variables by specifying the following:
(a)
(b)
186 APPENDIX
(c)
Width: 8
(d)
Decimal: 0
(e)
(f)
Values: Under Value, type 1; under Value Label, type Male; Click
Add
(g)
(h)
Click Add
(i)
(j)
Columns: 8
(k)
Align: Right
(l)
Measure: Nominal
2.
Proceed to define the second variable and so forth until you have completed
all variables in your questionnaire. Do note that certain variables such as ID
do not have value labels. If you are not sure what the level of measurement
for that particular variable is, you may want to keep the default which is
Scale. Do remember that if the particular variable you are defining share the
same specification such as the variable label of a variable you have already
defined, then you may merely copy it into the relevant cells.
3.
After you have completed defining all your variables, the next step is to
enter data into the data cells by doing the following:
(a)
(b)
(c)
Type in the data e.g. if the respondents gender is male, then type 1
and then proceed to the next variable by pressing the right arrow key
() on your keyboard.
(d)
Input the next variable and so on so forth until you have completed all
your data input.
APPENDIX
187
188 APPENDIX
APPENDIX
189
190 APPENDIX
APPENDIX
191
MODULE FEEDBACK
MAKLUM BALAS MODUL
OR
2.
Thank you.
Centre for Instructional Design and Technology
(Pusat Reka Bentuk Pengajaran dan Teknologi)
Tel No.:
03-27732578
Fax No.:
03-26978702