You are on page 1of 216

HMEF5113

STATISTICS FOR
EDUCATIONAL
RESEARCH

Prof Dr John Arul Philips

Copyright Open University Malaysia (OUM)

Project Directors:

Prof Dato Dr Mansor Fadzil


Assoc Prof Dr Chung Han Tek
Open University Malaysia

Module Writer:

Prof Dr John Arul Philips


Asia e University

Moderators:

Dr Soon Seng Thah


Educational Planning and Research Division
Ministry of Education
Assoc Prof Dr Nagarajah Lee
Open University Malaysia

Developed by:

Centre for Instructional Design and Technology


Open University Malaysia

Printed by:

Meteor Doc. Sdn. Bhd.


Lot 47-48, Jalan SR 1/9, Seksyen 9,
Jalan Serdang Raya, Taman Serdang Raya,
43300 Seri Kembangan, Selangor Darul Ehsan

First Edition, May 2009


Second Edition, December 2012 (rs)
Copyright Open University Malaysia (OUM), December 2012, HMEF5113
All rights reserved. No part of this work may be reproduced in any form or by any means
without the written permission of the President, Open University Malaysia (OUM).

Copyright Open University Malaysia (OUM)

Table of Contents
Course Guide
Course Assignment Guide

ix xvii
xxi xxiii

Topic 1

Introduction to Statistics
1.1
What is Statistics?
1.2
Two Kinds of Statistics
1.2.1 Descriptive Statistics
1.2.2 Inferential Statistics
1.2.3 Descriptive or Inferential Statistics
1.3
Variables
1.3.1 Independent Variable
1.3.2 Dependent Variable
1.4
Operational Definition of Variables
1.5
Sampling
1.6
Sampling Techniques
1.6.1 Simple Random Sampling
1.6.2 Systematic Sampling
1.6.3 Stratified Sampling
1.6.4 Cluster Sampling
1.7
SPSS Software
Summary
Key Terms

1
1
3
3
4
4
5
6
7
7
8
10
10
12
12
14
14
15
16

Topic 2

Descriptive Statistics
2.1
What are Descriptive Statistics?
2.2
Measures of Central Tendency
2.2.1 Mean
2.2.2 Median
2.2.3 Mode
2.3
Measures of Variability or Dispersion
2.3.1 Range
2.3.2 Standard Deviation
2.4
Frequency Distribution
2.4.1 Tables
2.4.2 SPSS Procedure
2.5
Graphs
2.5.1
Bar Charts
2.5.2 Histogram
2.5.3 Line Graphs

17
17
18
18
19
20
21
21
22
25
25
26
26
26
28
29

Copyright Open University Malaysia (OUM)

iv

TABLE OF CONTENTS

Summary
Key Terms

30
30

Topic 3

Normal Distribution
3.1
What is Normal Distribution?
3.2
Why is Normal Distribution Important?
3.3
Characteristics of The Normal Curve
3.3.1 Mean, Median and Mode
3.4
Three-Standard-Deviations Rule
3.5
Inferential Statistics and Normality
3.5.1 Assessing Normality using Graphical Methods
3.5.2 Assessing Normality using Statistical Techniques
3.6
What to Do if The Distribution is Not Normal?
Summary
Key Terms

31
31
32
32
33
34
35
35
47
50
50
51

Topic 4

Hypothesis Testing
4.1
What is a Hypothesis?
4.2
Testing A Hypothesis
4.2.1 Null Hypothesis
4.2.2 Alternative Hypothesis
4.3
Type I And Type II Error
4.4
Two-tailed and One-tailed Test
4.4.1 Two-tailed Test
4.4.2 One-tailed Test
Summary
Key Terms

53
53
55
55
57
57
60
60
63
65
66

Topic 5

t-test
5.1
What is t-test?
5.2
Hypothesis Testing using t-test
5.3
t-test for Independent Means
5.4
t-test for Independent Means Using SPSS
5.5
t-test for Dependent Means
5.6
t-test for Dependent Means Using SPSS
Summary
Key Terms

67
67
68
69
77
79
83
87
88

Copyright Open University Malaysia (OUM)

TABLE OF CONTENTS

Topic 6

One-way Analysis of Variance (One-way ANOVA)


6.1
Logic of The One-way Anova
6.2
Between Group and Within Group Variance
6.3
Computing F-Statistic
6.4
Assumptions For Using One-way Anova
6.6
Using SPSS To Compute One-way Anova
Summary
Key Terms

89
92
93
94
99
101
108
108

Topic 7

Analysis of Covariance (ANCOVA)


7.1
What is Analysis of Covariance (ANCOVA)?
7.2
Assumptions for Using ANCOVA
7.3
Using ANCOVA Pretest-Posttest Design
7.3.1 Before Including a Covariate
7.3.2 After Including a Covariate
Summary
Key Terms

109
109
112
116
116
117
121
121

Topic 8

Correlation
8.1
What is a Correlation Coefficient?
8.2
Pearson Product-Moment Correlation
Coefficient
8.2.1 Range of Values of rxy
8.3
Calculation of the Pearson Correlation
Coefficient (r Or rxy)
8.4
Pearson Product- Moment Correlation using SPSS
8.4.1 SPSS Output
8.4.2 Significance of the Correlation Coefficient
8.4.3 Hypothesis Testing for Significant
Correlation
8.4.4 To Obtain a Scatter Plot using SPSS
8.5
Spearman Rank Order Correlation Coefficient
8.6
Spearman Rank Order Correlation Using SPSS
Summary
Key Terms

122
122
123

129
130
130
131

Linear Regression
9.1
What is Simple Linear Regression?
9.2
Estimating Regression Coefficient
9.3
Significant Test for Regression Coefficients
9.3.1 Testing the Assumption of Linearity
9.3.2 Testing the Significance of the Slope
9.4
Simple Linear Regression using SPSS
9.5
Multiple Regression
9.6
Multiple Regression using SPSS

137
137
138
140
140
141
142
145
148

Topic 9

Copyright Open University Malaysia (OUM)

125
127

132
133
134
136
136

vi

TABLE OF CONTENTS

Topic 10

Summary
Key Terms

152
152

Non-parametric Tests
10.1
Parametric Versus Non-Parametric Tests
10.2
Chi Square Tests
10.2.1 One Variable or Goodness-of-Fit Test
10.2.2 2 Test for Independence: 2 X 2
10.3
Mann-Whitney U tests
10.4
Kruskal-Wallis Rank Sum Tests
Summary
Key Terms

153
153
157
157
161
167
173
178
179

Appendix

183

Copyright Open University Malaysia (OUM)

COURSE GUIDE

Copyright Open University Malaysia (OUM)

viii

COURSE GUIDE

Copyright Open University Malaysia (OUM)

COURSE GUIDE

ix

WELCOME TO HMEF5113 STATISTICS FOR


EDUCATIONAL RESEARCH
Welcome to HMEF5113 Statistics for Educational Research, which is one of the
required courses for the Master of Education (MEd) programme. The course
assumes no previous knowledge of Statistics but it is a prerequisite course for
MEd students before they embark on their research projects. This is a three-credit
hour course conducted over a semester of 14 weeks.

WHAT WILL YOU GET FROM DOING THIS COURSE?


Description of the Course
The course provides some basic knowledge necessary for students to understand
the various statistical techniques and how to apply them when analysing data in
education and psychology. It will acquaint students to the meaning of statistics,
normal distribution and hypothesis testing. The statistical techniques explained
in this course include t-test, ANOVA, ANCOVA, correlation linear regression,
chi-square, Mann-Whitney, and Kruskal-Wallis. The emphasis is on the
assumptions underlying the use of these statistical techniques and on the
interpretation of data. Guides on how to use the SPSS in analysing the data and
their interpretations are also presented at the end of each topic.

Aim of the Course


The main aim of the course is to provide you with basic knowledge on how to
use some basic statistical techniques in educational research.

Course Learning Outcomes


By the end of this course, you should be able to:
1.

Explain the differences between descriptive and inferential statistics and


their uses in educational research;

2.

Assess the normality of a set of data using graphical as well as statistical


techniques;

3.

Differentiate between null and alternative hypotheses and their use in


educational research; and

Copyright Open University Malaysia (OUM)

COURSE GUIDE

4.

Apply the different statistical techniques in educational research, conduct


statistical analyses using SPSS and make appropriate interpretations of
statistical results.

HOW CAN YOU GET THE MOST FROM THIS COURSE?


Learning Package
In this Learning Module you are provided with TWO kinds of course materials:
1.

The Course Guide you are currently reading

2.

The Course Content (consisting of 10 topics)

Course Synopsis
To enable you to achieve the FOUR objectives of the course, HMEF5113 is
divided into 10 topics. Specific objectives are stated at the start of each topic,
indicating what you should be able to do after completing the topic.
Topic 1:

Introduction
The topic introduces the meaning of Statistics and explains the
difference between descriptive and inferential statistics. As
inferential statistics is used to make inferences about the
population on specific variables based on a sample, this topic also
explains the meanings of different types of variables and
highlights the different sampling techniques in educational
research.

Topic 2:

Descriptive Statistics
The topic introduces the different descriptive statistics, namely the
mean, the median, the mode and the standard deviation, and how
they are computed. SPSS procedures on how to obtain these
descriptive statistics are also provided.

Topic 3:

The Normal Distribution


The topic explains what the normal distribution is and introduces
the graphical as well as the statistical techniques used in assessing
normality. It also presents the SPSS procedures for assessing
normality.

Copyright Open University Malaysia (OUM)

COURSE GUIDE

xi

Topic 4:

Hypothesis Testing
The topic explains the difference between the null and alternative
hypotheses and their use in research. It also introduces the
concepts of Type I error and Type II error. It illustrates the
difference between the two-tailed and one-tailed tests and
explains when they are used in hypothesis testing.

Topic 5:

T - test
This topic explains what the t-test is and its use in hypothesis
testing. It also highlights the assumptions for using the t-test. Two
types of t-test are elaborated in the topic. The first one is the t-test
for independent means, while the second one is the t-test for
dependent means. Computation of the t-statistic using formulae,
as well as the SPSS procedures, is explained.

Topic 6:

One-way Analysis of Variance


This topic explains what one-way analysis of variance (ANOVA)
is about and the assumptions for using ANOVA in hypothesis
testing. It demonstrates how ANOVA can be computed using the
formula and the SPSS procedures. Also explained are the
interpretation of the related statistical results and the use of posthoc comparison tests.

Topic 7:

Analysis of Covariance
This topic explains what analysis of covariance (ANCOVA) is
about and the assumptions for using ANCOVA in hypothesis
testing. It also demonstrates how to compute and interpret
ANCOVA using SPSS.

Topic 8:

Correlation
This topic explains the concept of linear relationship between
variables. It discusses the use of statistical tests to determine
correlation and demonstrates how to compute correlation between
variables using SPSS and interpret correlation results.

Topic 9:

Linear Regression
This topic explains the concept of causal relationship between
variables. It discusses the use of statistical tests to determine slope,
intercept and the regression equation. It also demonstrates how to
run regression analysis using SPSS and interpret the results.

Copyright Open University Malaysia (OUM)

xii

COURSE GUIDE

Topic 10:

Non-parametric Tests
This topic provides a brief explanation on the parametric and nonparametric test. Detailed description on chi-square, MannWhitney and Kruskal-Wallis tests and the assumptions underlying
these statistical techniques are provided to facilitate student
learning. It demonstrates how the non-parametric statistical
procedures can be computed using formulae as well as SPSS and
how the statistical results should be interpreted.

Organisation of Course Content


In distance learning, the module replaces the university lecturer. This is one of
the main advantages of distance learning where specially designed study allow
you to study at your own pace, anywhere and at anytime. Think of it as reading
the lecture instead of listening to a lecturer. In the same way that a lecturer might
assign something for you to read or do, the module tells you what to read, when
to read and when to do the activities. Just as a lecturer might ask you questions
in class, your module provides exercises for you to do at appropriate points.
To help you read and understand the individual topics, numerous realistic
examples support all definitions, concepts and theories. Diagrams and text are
combined into a visually appealing, easy-to-read module. Throughout the course
content, diagrams, illustrations, tables and charts are used to reinforce important
points and simplify the more complex concepts. The module has adopted the
following features in each topic:

INTRODUCTION
Lists the headings and subheadings of each topic to provide an overview of the
contents of the topic and prepare you for the major concepts to be studied and
learned.

LEARNING OUTCOMES
This is a listing of what you should be able to do after successful
completion of a topic. In other words, whether you are be able to explain,
compare, evaluate, distinguish, list, describe, relate and so forth. You
should use these indicators to guide your study. When you have finished a
topic, you must go back and check whether you have achieved the learning
outcomes or be able to do what is required of you. If you make a habit of
doing this, you will improve your chances of understanding the contents of
the course.
Copyright Open University Malaysia (OUM)

COURSE GUIDE

xiii

SELF-CHECK
Questions are interspersed at strategic points in the topic to encourage
review of what you have just read and retention of recently learned
material. The answers to these questions are found in the paragraphs
before the questions. This is to test immediately whether you have
understood the few paragraphs of text you have read. Working through
the questions will help you determine whether you understand the topic

ACTIVITY
These are situations drawn from research projects to show how
knowledge of the principles of research methodology may be applied to
real-world situations. The activities illustrate key points and concepts
dealt with in each topic.

The main ideas of each topic are listed in brief sentences to provide a review of
the content. You should ensure that you understand every statement listed. If
you do not, go back to the topic and find out what you do not know.

Key Terms discussed in the topic are placed at the end of each topic to make you
aware of the main ideas. If you are unable to explain these terms, you should go
back to the topic to clarify.

Copyright Open University Malaysia (OUM)

xiv

COURSE GUIDE

DISCUSSION QUESTIONS:
At the end of each topic, a list of questions is presented that are best solved
through group interaction and discussion. You can answer the questions
individually. But, you are encouraged to work with your coursemates and
discuss online and during the seminar sessions.

At the end of each topic a list of articles and titles of books is provided that is
directly related to the contents of the topic. As far as possible the articles and
books suggested for further reading will be available in OUMs Digital Library
(which you can access) and OUMs Library. Also, relevant Internet resources are
made available to enhance your understanding of selected curriculum concepts
and principles as applied in real-world situations.

WHAT SUPPORT WILL YOU GET IN STUDYING THIS


COURSE?
Seminars
There are 15 hours of seminars or face-to-face interaction supporting the course.
These consist of FIVE seminar sessions of three hours each. You will be notified
of the dates, times and location of these tutorials, together with the name and
phone number of your tutor, as soon as you are allocated a seminar group.

MyVLE Online Discussion


Besides the face-to-face seminar sessions, you have the support of online
discussions. You should interact with other students and your facilitator using
myVLE. Your contributions to the online discussion will greatly enhance your
understanding of course content, how to go about doing the assignments and
preparation for the examination.

Copyright Open University Malaysia (OUM)

COURSE GUIDE

xv

Facilitator
Your facilitator will mark your assignment. Do not hesitate to discuss during the
seminar session or online if:

You do not understand any part of the course content or the assigned
readings

You have difficulty with the self-tests and activities

You have a question or problem with the assignment.

HOW SHOULD YOU STUDY FOR THIS COURSE?


1.

Time Commitment for Studying


You should plan to spend about six to eight hours per topic, reading the
notes, doing the self-tests and activities and referring to the suggested
readings. You must schedule your time to discuss online. It is often more
convenient for you to distribute the hours over a number of days rather
than spend one whole day per week on study. Some topics may require
more work than others, although on average, it is suggested that you spend
approximately three days per topic.

2.

Proposed Study Strategy


The following is a proposed strategy for working through the course. If you
run into any trouble, discuss it with your facilitator either online or during
the seminar sessions. Remember, the facilitator is there to help you.
(a)

The most important step is to read the contents of this Course Guide
thoroughly.

(b)

Organise a study schedule. Note the time you are expected to spend
on each topic and the date for submission of your assignment as well
as seminar and examination dates. These are stated in your Course
Assessment Guide. Put all this information in one place, such as your
diary or a wall calendar. Whatever method you choose to use, you
should decide on and jot down your own dates for working on each
topic. You have some flexibility as there are 10 topics spread over a
period of 14 weeks.

(c)

Once you have created your own study schedule, make every effort to
stick to it. The main reason students are unable to cope is because
they get behind in their coursework.

Copyright Open University Malaysia (OUM)

xvi

COURSE GUIDE

(d)

To begin reading a topic:

Remember in distance learning, much of your time will be spent


READING the course content. Study the list of topics given at the
beginning of each topic and examine the relationship of the topic
to the other nine topics.

Read the Topic Overview showing the headings and subheadings


to get a broad picture of the topic.

Read the topics Learning Outcomes (what is expected of you). Do


you already know some of the things to be discussed? What are
the things you do not know?

Read the Introduction (see how it is connected with the previous


topic).

Work through the topic. (The contents of the topic have been
arranged to provide a sequence for you to follow)

As you work through the topic, you will be asked to do the


self-test at appropriate points in the topic. This is to find out if you
understand what you have just read.

Do the Activities (to see if you can apply the concepts learned to
real-world situations)

(e)

When you have completed the topic, review the learning outcomes to
confirm that you have achieved them and are able to do what is
required.

(f)

If you are confident, you can proceed to the next topic. Proceed topic
by topic through the course and try to pace your study so that you
keep yourself on schedule.

(g)

After completing all topics, review the course and prepare yourself for
the final examination. Check that you have achieved all topic learning
outcomes and the course objectives (listed in this Course Guide).

FINAL REMARKS
Once again, welcome to the course. To maximise your gain from this course
you should try at all times to relate what you are studying to the real world.
Look at the environment in your institution and ask yourself whether the ideas
discussed apply. Most of the ideas, concepts and principles you learn in this
course have practical applications. It is important to realise that much of what
Copyright Open University Malaysia (OUM)

COURSE GUIDE

xvii

we do in education and training has to be based on sound theoretical


foundations. The contents of this course provide the principles and theories
explaining human learning whether it be in a school, college, university or
training organisation.
We wish you success with the course and hope that you will find it interesting,
useful and relevant in your development as a professional. We hope you will
enjoy your experience with OUM and we would like to end with a saying by
Confucius Education without thinking is labour lost.

Copyright Open University Malaysia (OUM)

xviii COURSE GUIDE

Copyright Open University Malaysia (OUM)

COURSE ASSIGNMENT
GUIDE

Copyright Open University Malaysia (OUM)

xx

COURSE ASSIGNMENT GUIDE

Copyright Open University Malaysia (OUM)

COURSE ASSIGNMENT GUIDE

xxi

INTRODUCTION
This guide explains the basis on which you will be assessed in this course during
the semester. It contains details of the facilitator-marked assignments, final
examination and participation required for the course.
One element in the assessment strategy of the course is that all students should
have the same information as facilitators about the answers to be assessed.
Therefore, this guide also contains the marking criteria that facilitators will use in
assessing your work.
Please read through the whole guide at the beginning of the course.

ACADEMIC WRITING
(a)

Plagiarism
(i)

What is Plagiarism?
Any written assignment (essays, project, take-home exams, etc)
submitted by a student must not be deceptive regarding the abilities,
knowledge or amount of work contributed by the student. There are
many ways that this rule can be violated. Among them are:

Paraphrases: A closely reasoned argument of an author is paraphrased but


the student does not acknowledge doing so. (Clearly, all our
knowledge is derived from somewhere, but detailed arguments
from clearly identifiable sources must be acknowledged.)
Outright
plagiarism:

Large sections of the paper are simply copied from other


sources, and the copied parts are not acknowledged as
quotations.

Other
sources:

These often include essays written by other students or sold by


unscrupulous organisations. Quoting from such papers is
perfectly legitimate if quotation marks are used and the source
is cited.

Works by
others:

Taking credit deliberately or not deliberately for works


produced by others without giving proper acknowledgement.
These works include photographs, charts, graphs, drawings,
statistics, video clips, audio clips, verbal exchanges such as
interviews or lectures, performances on television and texts
printed on the Web.

Duplication

The student submits the same essay for two or more courses.
Copyright Open University Malaysia (OUM)

xxii

COURSE ASSIGNMENT GUIDE

(ii) How Can I Avoid Plagiarism?

(b)

(c)

Insert quotation marks around copy and paste clause, phrase,


sentence, paragraph and cite the original source.

Paraphrase clause, phrase, sentence or paragraph in your own


words and cite your source.

Adhere to the APA (American Psychological Association) stylistic


format, whichever applicable, when citing a source and when
writing out the bibliography or reference page.

Attempt to write independently without being overly dependent


of information from anothers original works.

Educate yourself on what may be considered as common


knowledge (no copyright necessary), public domain (copyright
has expired or not protected under copyright law) or copyright
(legally protected).

Documenting Sources
Whenever you quote, paraphrase, summarise, or otherwise refer to the
work of another, you are required to cite its original source documentation.
Offered here are some of the most commonly cited forms of material.

Direct Citation

Simply having a thinking skill is no assurance


that children will use it. In order for such skills to
become part of day-to-day behaviour, they must
be cultivated in an environment that value and
sustains them. Just as childrens musical skills
will likely lay fallow in an environment that
doesnt encourage music, learners thinking skills
tend to languish in a culture that doesnt
encourage thinking (Tishman, Perkins and Jay,
1995, p.5)

Indirect Citation

According to Wurman (1988), the new disease of


the 21st century will be information anxiety,
which has been defined as the ever-widening gap
between what one understands and what one
thinks one should understand.

Referencing
All sources that you cite in your paper should be listed in the Reference
section at the end of your paper. Here is how you should do your
Reference.

Copyright Open University Malaysia (OUM)

COURSE ASSIGNMENT GUIDE

xxiii

Journal Article

DuFour, R. (2002). The learning-centred principal:


Educational Leadership, 59(8). 12-15.

Online Journal

Evnine, S. J. (2001). The universality of logic: On the


connection between rationality and logical ability
[Electronic version]. Mind, 110, 335-367.

Webpage

National Park Service. (2003, February 11). Abraham


Lincoln Birthplace National Historic Site.
Retrieved
February
13,
2003,
from
http://www.nps.gov/abli/

Book

Naisbitt, J. and Aburdence, M. (1989). Megatrends


2000. London: Pan Books.

Article in a
Book

Nickerson, R. (1987). Why teach thinking? In J. B.


Baron & R.J. Sternberg (Eds). Teaching thinking
skills: Theory and practice. New York: W.H.
Freeman and Company. 27-37.

Printed
Newspaper

Holden, S. (1998, May 16). Frank Sinatra dies at 82:


Matchless stylist of pop. The New York Times, pp.
A1, A22-A23.

ASSESSMENT
Please refer to myVLE.

TAN SRI DR ABDULLAH SANUSI (TSDAS)


DIGITAL LIBRARY
The TSDAS Digital Library has a wide range of print and online resources
for the use of its learners. This comprehensive digital library, which is
accessible through the OUM portal, provides access to more than 30 online
databases comprising e-journals, e-theses, e-books and more. Examples of
databases available are EBSCOhost, ProQuest, SpringerLink, Books24x7,
InfoSci Books, Emerald Management Plus and Ebrary Electronic Books. As
an OUM learner, you are encouraged to make full use of the resources
available through this library.

Copyright Open University Malaysia (OUM)

xxiv

COURSE ASSIGNMENT GUIDE

Copyright Open University Malaysia (OUM)

Topic

Introduction

to Statistics

LEARNING OUTCOMES
By the end of this topic, you should be able to:
1. Define statistics;
2. Differentiate between descriptive and inferential statistics;
3. Compare the different types of variables;
4. Explain the importance of sampling; and
5. Differentiate between the types of sampling procedures.

INTRODUCTION

This topic introduces the meaning of statistics and explains the difference between
descriptive and inferential statistics. As inferential statistics is used to make
inference about the population on specific variables based on a sample, this topic
also explains the meanings of different types of variables and highlights the
different sampling techniques in educational research.

1.1

WHAT IS STATISTICS?

Let us refer to some definitions of statistics:


American Heritage Dictionary defines statistics as:
The mathematics of the collection, organisation and interpretation of
numerical data, especially the analysis of population characteristics by
inference from sampling.
Copyright Open University Malaysia (OUM)

TOPIC 1 INTRODUCTION TO STATISTICS

The Merriam-Websters Collegiate Dictionary defines statistics as:


A branch of mathematics dealing with the collection, analysis, interpretation
and presentation of masses of numerical data.
Websterss New World Dictionary defines statistics as:
Facts or data of a numerical kind; assembled, classified and tabulated so as to
present significant information about a given subject.
Jon Kettenring, President of the American Statistics Association, defines
statistics as:
The science of learning from data. Statistics is essential for the proper
running of government, central to decision making in industry, and a core
component of modern educational curricula at all levels.

Note that the word "mathematics" is mentioned in two of the definitions above,
while "science" is stated in the other definition. Some students are afraid of
mathematics and science. These students feel that since they are from the fields of
humanities and social sciences, they are weak in mathematics. Being terrified of
mathematics does not just happen overnight. Chances are that you may have had
bad experiences with mathematics in earlier years (Kranzler, 2007).
Fear of mathematics can lead to a defeatist attitude which may affect the way you
approach statistics. In most cases, the fear of statistics is due to irrational beliefs.
Just because you had difficulty in the past, does not mean that you will always
have difficulty with quantitative subjects. You have come this far in your
education and by doing this course in statistics, it is not likely that you are an
incapable person.
You have to convince yourself that statistics is not a difficult subject and you need
not worry about the mathematics involved. Identify your irrational beliefs and
thoughts about statistics. Are you telling yourself: "I'll never be any good in
statistics." I'm a loser when it comes to anything dealing with numbers," or
"What will other students think of me if I do badly?"

Copyright Open University Malaysia (OUM)

INTRODUCTION TO STATISTICS

For each of these irrational beliefs about your abilities, ask yourself what evidence
is there to suggest that "you will never be good in statistics" or that "you are weak
at mathematics." When you do that, you will begin to replace your irrational
beliefs with positive thoughts and you will feel better. You will realise that your
earlier beliefs about statistics are the cause of your unpleasant emotions. Each
time you feel anxious or emotionally upset, question your irrational beliefs. This
may help you to overcome your initial fears.
Keeping this in mind, this course has been written by presenting statistics in a
form that appeals to those who fear mathematics. Emphasis is on the applied
aspects of statistics and with the aid of a statistical software called Statistical
Package for the Social Sciences (or better known as SPSS), you need not worry
too much about the intricacies of mathematical formulas. Computations of
mathematical formulas have been kept to a minimum. Nevertheless, you still need
to know about the different formulas used, what they mean and when they are
used.

1.2

TWO KINDS OF STATISTICS

Statistics are all around you. Television uses a lot of statistics: for example, when
it reports that during the holidays, a total of 134 people died in traffic accidents;
the stock market fell by 26 points; or that the number of violent crimes in the city
has increased by 12%. Imagine a football game between Manchester United and
Liverpool and no one kept score! Without statistics, you could not plan your
budget, pay your taxes, enjoy games to their fullest, evaluate classroom
performance and so forth. Are you beginning to get the picture? We need
statistics. Generally, there are two kinds of statistics:

Descriptive Statistics

Inferential Statistics

1.2.1

Descriptive Statistics

Descriptive statistics are used to describe the basic features of the data in a study.
Historically, descriptive statistics began during Roman times when the empire
undertook census of births, deaths, marriages and taxes. They provide simple
summaries about the sample and the measures. Together with simple graphics
analysis, they form the basis of virtually every quantitative analysis of data. With
descriptive statistics, you are simply describing what is or what the data show.

Copyright Open University Malaysia (OUM)

TOPIC 1 INTRODUCTION TO STATISTICS

Descriptive statistics are used to present quantitative descriptions in a manageable


form. In a research study, we may have lots of measures. Or we may measure a
large number of people on any measure. Descriptive statistics help us to simplify
large amounts of data in a sensible way. Each descriptive statistic reduces lots of
data into a simple summary. For instance, the Grade Point Average (GPA) for a
student describes the general performance of a student across a wide range of
subjects or courses.
Descriptive statistics includes the construction of graphs, charts and tables and the
calculation of various descriptive measures such as averages (e.g. mean) and
measures of variation (e.g. standard deviation). The purpose of descriptive
statistics is to summarise, arrange and present a set of data in such a way that
facilitates interpretation. Most of the statistical presentations appearing in
newspapers and magazines are descriptive in nature.

1.2.2

Inferential Statistics

Inferential statistics or statistical induction comprises the use of statistics to


make inferences concerning some unknown aspect of a population. Inferential
statistics are relatively new. Major development began with the works of Karl
Pearson (1857-1936) and the works of Ronald Fisher (1890-1962) who published
their findings in the early years of the 20th century. Since the work of Pearson and
Fisher, inferential statistics has evolved rapidly and is now applied in many
different fields and disciplines.
Inference is the act or process of deriving a conclusion based solely on what one
already knows. In other words, you are trying to reach conclusions that extend
beyond data obtained from your sample towards what the population might think.
You are using methods for drawing and measuring the reliability of conclusions
about a population based on information obtained from a sample of the
population. Among the widely used inferential statistical tools are t-test, analysis
of variance, Pearsons correlation, linear regression and multiple regression.

1.2.3

Descriptive or Inferential Statistics

Descriptive statistics and inferential statistics are interrelated. You must always
use techniques of descriptive statistics to organise and summarise the information
obtained from a sample before carrying out an inferential analysis. Furthermore,
the preliminary descriptive analysis of a sample often reveals features that lead
you to the choice of the appropriate inferential method.
Copyright Open University Malaysia (OUM)

INTRODUCTION TO STATISTICS

As you proceed through this course, you will obtain a more thorough
understanding of the principles of descriptive and inferential statistics. You should
establish the intent of your study. If the intent of your study is to examine and
explore the data obtained for its own intrinsic interest only, the study is
descriptive. However, if the information is obtained from a sample of a population
and the intent of the study is to use that information to draw conclusions about the
population, the study is inferential. Thus, a descriptive study may be performed on
a sample as well as on a population. Only when an inference is made about the
population, based on data obtained from the sample, does the study become
inferential.
SELF-CHECK 1.1
1. Define statistics.
2. Explain the differences between descriptive and inferential statistics.
3. When would you use the two types of statistics?
4. Explain two ways in which descriptive statistics and inferential
statistics are interrelated.

1.3

VARIABLES

Before you can use a statistical tool to analyse data, you need to have data which
have been collected. What is data? Data is defined as pieces of information which
are processed or analysed to enable interpretation. Quantitative data consist of
numbers, while qualitative data consist of words and phrases. For example, the
scores obtained from 30 students in a mathematics test are referred to as data. To
explain the performance of these students you need to process or analyse the
scores (or data) using a calculator or computer or manually. We collect and
analyse data to explain a phenomenon. A phenomenon is explained based on the
interaction between two or more variables. The following is an example of a
phenomenon:
Intelligence Quotient (IQ) and Attitude Influence
Performance in Mathematics
Note that there are THREE variables explaining the particular phenomenon,
namely, Intelligence Quotient, Attitude and Mathematics Performance.

Copyright Open University Malaysia (OUM)

TOPIC 1 INTRODUCTION TO STATISTICS

What is a Variable?
A variable is a construct that is deliberately and consciously invented or adopted
for a special scientific purpose. For example, the variable Intelligence is a
construct based on observation of presumably intelligent and less intelligent
behaviours. Intelligence can be specified by observing and measuring using
intelligence tests, as well as interviewing teachers about intelligent and less
intelligent students. Basically, a variable is something that varies and has a
value. A variable is a symbol to which are assigned numerals or values. For
example, the variable mathematics performance is assigned scores obtained
from performance on a mathematics test and may vary or range from 0 to 100.
A variable can be either a continuous variable or categorical variable. In the
case of the variable gender there are only two values, i.e. male and female, and
is called a categorical variable. Other examples of categorical variables include
graduate non-graduate, low income high income, citizen non-citizen. There
are also variables which have more than two values. For example, religion such as
Islam, Christianity, Sikhism, Buddhism and Hinduism may have several values.
Categorical variable are also known as nominal variables. A continuous variable
has numeric value like 1, 2, 3, 4, 10...etc. An example is the scores on
mathematics performance which range from 0 to 100. Other examples are salary,
age, IQ, weight, etc.
When you use any statistical tool, you should be very clear on which variables
have been identified as independent and which are dependent variables.

1.3.1

Independent Variable

An independent variable (IV) is the variable that is presumed to cause a change in


the dependent variable (DV). The independent variables are the antecedents, while
the dependent variable is the consequent. See Figure 1.1 which describes a study
to determine which teaching method (independent variable) is effective in
enhancing the academic performance in history (dependent variable) of students.
An independent variable (teaching method) can be manipulated. Manipulated
means the variable can be manoeuvred, and in this case it is divided into
discovery method and lecture method. Other examples of independent
variables are gender (male and female), race (Malay, Chinese and Indian) and
socioeconomic status (high, middle and low). Other names for the independent
variable are treatment, factor and predictor variable.

Copyright Open University Malaysia (OUM)

INTRODUCTION TO STATISTICS

1.3.2

Dependent Variable

A dependent variable is a variable dependent on other variable(s).The dependent


variable in this study is the academic performance which cannot be manipulated
by the researcher. Academic performance is a score and other examples of
dependent variables are IQ (score from IQ tests), attitude (score on an attitude
scale), self-esteem (score from a self-esteem test) and so forth. Other names for
the dependent variable are outcome variable, results variable and criterion
variable.

Figure 1.1: An example of independent variables and dependent variables

Put it another way, the DV is the variable predicted to, whereas the independent
variable is predicted from. The DV is the presumed effect, which varies with
changes or variation in the independent variable.

1.4

OPERATIONAL DEFINITION OF VARIABLES

As mentioned earlier, a variable is deliberately constructed for a specific


purpose. Hence, a variable used in your study may be different from a variable
used in another study even though they have the same name. For example, the
variable academic achievement used in your study may be computed based on
performance in the UPSR examination; while in another study, it may be
computed using a battery of tests you developed. Operational definition
(Bridgman, 1927) means that variables used in the study must be defined as it is
used in the context of the study. This is done to facilitate measurement and to
eliminate confusion.
Thus, it is essential that you stipulate clearly how you have defined variables
specific to your study. For example, in an experiment to determine the
effectiveness of the discovery method in teaching science, the researcher will have
to explain in great detail the variable discovery method used in the experiment.
Copyright Open University Malaysia (OUM)

TOPIC 1 INTRODUCTION TO STATISTICS

Even though there are general principles of the discovery method, its application
in the classroom may vary. In other words, you have to define the variable
operationally or how it is used in the experiment.
SELF-CHECK 1.2
1.

What is a variable?

2.

Explain the differences between a continuous variable and


nominal variable.

3.

Why should variables be operationally defined?

1.5

SAMPLING

Every day, we make judgments and decisions based on samples. For example,
when you pick a grape and taste it before buying the whole bunch of grapes, you
are doing a sampling. Based on the one grape you have tasted, you will make the
decision whether to buy the grapes or not. Similarly, when a teacher asks a student
two or three questions, he is trying to determine the students grasp of an entire
subject. People are not usually aware that such a pattern of thinking is called
sampling.

Population (Universe) is defined as an aggregate of people, objects, items,


etc. possessing common characteristics. It is a complete group of people,
objects, items, etc. about which we want to study. Every person, object, item,
etc. has certain specified attributes. In Figure 1.2, the population consists of #,
$, @, & and %.

Sample is that part of the population or universe which we select for the
purpose of investigation. The sample is used as an "example" and in fact the
word sample is derived from the Latin exemplum, which means example. A
sample should exhibit the characteristics of the population or universe; it
should be a "microcosm," a word which literally means "small universe." In
Figure 1.2, the sample also consists of one #, $, @, & and %.

Copyright Open University Malaysia (OUM)

INTRODUCTION TO STATISTICS

Figure 1.2: Drawing a sample from the population

We use samples to make inferences about the population. Reasoning from a


sample to the population is called statistical induction or inference. Based on the
characteristics of a specifically chosen sample (a small part of the population of
the group that we observe), we make inferences concerning the characteristics of
the population. We measure the trait or characteristic in a sample and generalise
the finding to the population from which the sample was taken.
Why is a sample used in educational research?
The study of a sample offers several advantages over a complete study of the
population. Why and when is it desirable to study a sample rather than the
population or universe?

In most studies, investigation of the sample is the only way of finding out
about a particular phenomenon. In some cases, due to financial, time and
physical constraints, it is practically impossible to study the whole population.
Hence, an investigation of the sample is the only way of making a study.

If one were to study the population, then every item in the population is
studied. Imagine having to study 500,000 Form 5 students in Malaysia!
Wonder what the costs will be! Even if you have the money and time to study
the entire population of Form 5 students in the country, it may take so much
time that the findings will be no use by the time they become available.

Copyright Open University Malaysia (OUM)

10

TOPIC 1 INTRODUCTION TO STATISTICS

Studying the population may not be necessary, since we have sound sampling
techniques that will yield satisfactory results. Of course, we cannot expect
from a sample exactly the same answer that might be obtained from studying
the whole population.

However, by using statistics, we can establish based on the results obtained


from a sample, the limits, with a known probability where the true answer lies.

We are able to generalise logically and precisely about different kinds of


phenomena which we have never seen simply based upon a sample of, say,
200 students.
ACTIVITY 1.1
1. What is the difference between a population and a sample?
2. Why is a study of the population practically impossible?
3. The sample should be representative of the population. Explain.
4. Provide a scenario of your own, in which a sample is not
representative.
5. Explain why a sample of 30 doctors from Kuala Lumpur taken to
estimate the average income of all Kuala Lumpur residents is not
representative.

1.6

SAMPLING TECHNIQUES

When some students are asked how they selected the sample for a study, quite a
few are unable to explain convincingly the techniques used and the rationale for
selecting the sample. If you have to draw a sample, you must choose the method
for obtaining the sample from the population. In making that choice, keep in mind
that the sample will be used to draw conclusions about the entire population.
Consequently, the sample should be a representative sample, that is, it should
reflect as closely as possible the relevant characteristics of the population under
consideration.

1.6.1

Simple Random Sampling

All individuals in the defined population have an equal and independent chance of
being selected as a member of the sample. Independent means that the selection
of one individual does not affect in any way the selection of any other individual.
So, each individual, event or object has an equal probability of being selected.
Copyright Open University Malaysia (OUM)

INTRODUCTION TO STATISTICS

11

Suppose for example there are 10,000 Form 1 students in a particular district and
you want to select a simple random sample of 500 students, when we select the
first case, each student has one chance in 10,000 of being selected. Once the
student is selected, the next student to be selected has a 1 in 9,999 chance of being
selected. Thus, as each case is selected, the probability of being selected next
changes slightly because the population from which we are selecting has become
one case smaller.
Using a Table of Random Numbers (refer to Figure 1.3) to select a sample,
obtain a list of all Form 1 students in Daerah Petaling and assign a number to each
student. Then, get a table of random numbers which consists of a long series of
three or four digit numbers generated randomly by a computer. Using the table,
you randomly select a row or column as a starting point, then select all the
numbers that follow in that row or column. If more numbers are needed, proceed
to the next row or column until enough numbers have been selected to make up
the desired sample.

Figure 1.3: Table of Random Numbers

Say, for example, you choose line 3 and begin your selection. You will select
student #265, followed by student #313 and student #492. When you come to
805 you skip the number because you only need numbers between 1 and 500.
You proceed to the next number, i.e. student #404. Again you skip 550 and
proceed to select student #426. You continue until you have selected all 500
students to form your sample. To avoid repetition, you also eliminate numbers
that have occurred previously. If you have not found enough numbers by the time
you reach the bottom of the table, you move over to the next line or column.

Copyright Open University Malaysia (OUM)

12

TOPIC 1 INTRODUCTION TO STATISTICS

SELF-CHECK 1.3
1.

What is the meaning of random?

2.

What is simple random sampling technique?

3.

Explain the use of the Table of Random Numbers in the


selection of a random sample.

1.6.2

Systematic Sampling

Systematic sampling is random sampling with a system. From the sampling


frame, a starting point is chosen at random, and thereafter at regular intervals. If it
can be ensured that the list of students from the accessible population is randomly
listed, then systematic sampling can be used. First, you divide the accessible
population (1,000) by the sample desired (100) which will give you 10. Next,
select a figure less or smaller than the number arrived by the division i.e. less than
10. If you choose 8, then you select every eighth name from the list of population.
If the random starting point is 10, then the subjects selected are 10, 18, 26, 34, 42,
50, 58, 66 and 74 until you have your sample of 100 subjects. This method differs
from random sampling because each member of the population is not chosen
independently. The advantage is that it spreads the sample more evenly over the
population and it is easier to select than a simple random sample.
ACTIVITY 1.2
1. Briefly discuss how you would select a sample of 300 teachers from a
population of 5,000 teachers in a district using systematic sampling.
2. What are some advantages of using systematic sampling?

1.6.3

Stratified Sampling

In certain studies, the researcher wants to ensure that certain sub-groups or stratum
of individuals are included in the sample and for this stratified sampling is
preferred. For example, if you intend to study differences in reasoning skills among
students in your school according to socio-economic status and gender, random
sampling may not ensure that you have sufficient number of male and female
students with the socio-economic levels. The size of the sample in each stratum is
Copyright Open University Malaysia (OUM)

INTRODUCTION TO STATISTICS

13

taken in proportion to the size of the stratum. This is called proportional


allocation. Suppose that Table 1.1 shows the population of students in your school.
Table 1.1: Population of Students in Your School
Male, High Income

160

Female, High Income

140

Male, Low Income

360

Female, Low Income

340

TOTAL

1,000

The first step is to calculate the percentage in each group.


% male, high income = ( 160 / 1,000 ) x 100 = 16%
% female, high income = ( 140 / 1,000 ) x 100 = 14%
% male, low income = ( 360 / 1,000 ) x 100 = 36%
% female, low income = ( 340 / 1,000) x 100 = 34%
If you want a sample of 100 students, you should ensure that:
16% should be male, high income = 16 students
14% should be female, high income = 14 students
36% should be male, low income = 36 students
34% should be female, low income = 34 students
When you take a sample from each stratum randomly, it is referred to as
stratified random sampling. The advantage of stratified sampling is that it
ensures better coverage of the population than simple random sampling. Also, it is
often administratively more convenient to stratify a sample so that interviewers
can be specifically trained to deal with a particular age group or ethnic group.

Copyright Open University Malaysia (OUM)

14

TOPIC 1 INTRODUCTION TO STATISTICS

ACTIVITY 1.3
Male, full-time teachers
Male, part-time teachers
Female, full-time teachers
Female, part-time teachers

=
=
=
=

90
18
63
9

The data above shows the number of full-time and part-time teachers in a
school according to gender.
Select a sample of 40 teachers using stratified sampling.

1.6.4

Cluster Sampling

In cluster sampling, the unit of sampling is not the individual but rather a naturally
occurring group of individuals. Cluster sampling is used when it is more feasible
or convenient to select groups of individuals than it is to select individuals from a
defined population. Clusters are chosen to be as heterogeneous as possible, that is,
the subjects within each cluster are diverse and each cluster is somewhat
representative of the population as a whole. Thus, only a sample of the clusters
needs to be taken to capture all the variability in the population.
For example, in a particular district there are 10,000 households clustered into 25
sections. In cluster sampling, you draw a random sample of five sections or
clusters from the list of 25 sections or clusters. Then, you study every household
in each of the five sections or clusters. The main advantage of cluster sampling is
that it saves time and money. However, it may be less precise than simple random
sampling.

1.7

SPSS SOFTWARE

SPSS software is frequently used by educational researchers for data analysis. It


can be used to generate both descriptive and inferential statistical output to answer
research questions and test hypotheses. The software is modular with the base
module as its core. The other more commonly used modules are Regression
Models and Advanced Models.
To use SPSS, you have to create the SPSS data file. Once this data file is created
and data entered, you can run statistical procedures to generate your statistical
output. Refer to Appendix A at the end of this module on how to go about
creating this SPSS data file.
Copyright Open University Malaysia (OUM)

INTRODUCTION TO STATISTICS

15

Statistics is a branch of mathematics dealing with the collection, analysis,


interpretation and presentation of masses of numerical data.

Descriptive statistics include the construction of graphs, charts and tables and
the calculation of various descriptive measures such as averages (means) and
measures of variation (standard deviations).

Inferential statistics or statistical induction comprises the use of statistics to


make inferences concerning some unknown aspect of a population.

A variable is a construct that is deliberately and consciously invented or


adopted for a special scientific purpose.

A variable can be either a continuous variable (ordinal variable) or categorical


variable (nominal variable).

An independent variable (IV) is the variable that is presumed to cause a


change in the dependent variable (DV).

A dependent variable is a variable dependent on other variable(s).

Operational definition means that variables used in the study must be defined
as it is used in the context of the study.
Population (universe) is defined as an aggregate of people, objects, items, etc.
possessing common characteristics, while sample is that part of the population
or universe we select for the purpose of investigation.

In simple random sampling, all individuals in the defined population have an


equal and independent chance of being selected as a member of the sample.

Systematic sampling is random sampling with a system. From the sampling


frame, a starting point is chosen at random, and thereafter at regular intervals.

In a stratified sample, the sampling frame is divided into non-overlapping


groups or strata and a sample is taken from each stratum.

In cluster sampling, the unit of sampling is not the individual but rather a
natural group of individuals.
Copyright Open University Malaysia (OUM)

16

TOPIC 1 INTRODUCTION TO STATISTICS

Cluster sampling
Dependent variable
Descriptive statistics
Independent variable
Inferential statistics
Nominal variable
Ordinal variable

Random sampling
Sampling
Statistics
Stratified sampling
Systematic sampling
Variable

Copyright Open University Malaysia (OUM)

Topic Descriptive
Statistics

LEARNING OUTCOMES
By the end of this topic, you should be able to:
1. Explain what is meant by descriptive statistics;
2. Compute the mean;
3. Compute the standard deviation;
4. Explain the implication of differences in standard deviations;
5. Identify the median and the mode; and
6. Explain the types of charts used to display data.

INTRODUCTION

This topic introduces the different descriptive statistics, namely the mean, the
median, the mode and the standard deviation, and how they are computed. SPSS
procedures on how to obtain these descriptive statistics are also provided.

2.1

WHAT ARE DESCRIPTIVE STATISTICS?

Descriptive statistics are used to summarise a collection of data and present it in a


way that can be easily and clearly understood. For example, a researcher
administered a scale via a questionnaire to measure self-esteem among 500
teenagers. How might these measurements be summarised? There are two basic
methods: numerical and graphical. Using the numerical approach, one might
compute the mean and the standard deviation. Using the graphical approach, one
might create a frequency table, bar chart, a line graph or a box plot. These
graphical methods display detailed information about the distribution of the
Copyright Open University Malaysia (OUM)

18

TOPIC 2 DESCRIPTIVE STATISTICS

scores. Graphical methods are better suited than numerical methods for
identifying patterns in the data. Numerical approaches are more precise and
objective.
Descriptive statistics are typically distinguished from inferential statistics. With
descriptive statistics you are simply describing what is or what the data show
based on the sample. With inferential statistics, you are trying to reach
conclusions based on the sample that extend beyond the immediate data. For
instance, we use inferential statistics to infer from the sample data what the
population might think. Or, we use inferential statistics to make judgments of the
probability that an observed difference between groups is dependable or might
have happened by chance in this study. Thus, we use inferential statistics to make
inferences from our data to more general conditions; we use descriptive statistics
simply to describe what is going on in our data.
Descriptive statistics are used to present quantitative descriptions in a manageable
form. In a research study, we may have lots of measures or we may measure a
large number of people on any measure. Descriptive statistics help us to simply
depict large amounts of data in a sensible way. Each descriptive statistic reduces
lots of data into a simpler summary. For instance, consider Grade Point Average
(GPA). This single number describes the general performance of a student across
a potentially wide range of course experiences. The number describes a large
number of discrete events such as the grade obtained for each subject taken.
However, every time you try to describe a large set of observations with a single
indicator you run the risk of distorting the original data or losing important details.
The GPA does not tell you whether a student was in a difficult or easy course, or
whether the student was taking courses in his major field or in other disciplines.
Given these limitations, descriptive statistics provide a powerful summary of
phenomena that may enable comparisons across people or other units.

2.2
2.2.1

MEASURES OF CENTRAL TENDENCY


Mean

Mean and the standard deviation are the most widely used statistical tools in
educational and psychological research. Mean is the most frequently used
measure of central tendency, while standard deviation is the most frequently used
measure of variability or dispersion.

Copyright Open University Malaysia (OUM)

TOPIC 2 DESCRIPTIVE STATISTICS

19

Computing the Mean


The mean or X (pronounced as X bar) is the figure obtained when the sum of all
the items in the group is divided by the number of items (N). Say for example you
have the score of 10 students on a science test.
The sum () of all the ten scores =

Mean or X =

23 + 22 + 26 + 21 + 30 + 24 + 20 + 27 + 25
+ 32
= 250

X 250

25.0
N
10

In the computation of the mean, every item counts. As a result, extreme values at
either end of the group or series of scores severely affect the value of the mean.
The mean could be "pulled towards" as a result of the extreme scores which may
give a distorted picture of the groups or series of scores or data.
However, in general, the mean is a good measure of central tendency for roughly
symmetric distributions but can be misleading in skewed distributions (see the
example on page 20) since it can be greatly influenced by extreme scores.

2.2.2

Median

Median is the score found at the exact middle of the set of values. One way to
compute the median is to list all scores in ascending order and then locate the
score in the centre of the sample. For example, if we order the following seven
scores as shown below, we would get:
12, 18, 22, 25, 30, 37, 40
Score 25 is the median because it represents the halfway point for the distribution
of scores.
Look at this set of eight scores. What is the median score?
15, 15, 15, 20, 20, 21, 25, 36
There are eight scores. The fourth score (20) and the fifth score (20) represent the
halfway point. Since both of these scores are 20, the median is 20.

Copyright Open University Malaysia (OUM)

20

TOPIC 2 DESCRIPTIVE STATISTICS

If the two middle scores had different values, you have to interpolate to determine
the median by adding up the two values and dividing the sum by 2. For example,
15, 15, 15, 18, 20, 21,

25, 36

The median is (18 + 20)/2 = 19.

2.2.3

Mode

Mode is the most frequently occurring value in the set of scores. To determine the
mode, you might again order the scores as shown below and then count each one.
15, 15, 15, 20, 20, 21, 25, 36
The most frequently occurring value is the mode. In our example, the value 15
occurs three times and is the mode. In some distributions, there is more than one
modal value. For instance, in a bimodal distribution there are two values that
occur most frequently.
If the distribution is truly normal (i.e. bell-shaped), the mean, median and mode
are all equal to each other.
Should You Use the Mean or the Median?
The mean and median are two common measures of central tendencies of a
typical score in a sample. Which of these two should you use when describing
your data? It depends on your data. In other words, you should ask yourself
whether the measure of central tendency you have selected gives a good
indication of the typical score in your sample. If you suspect that the measure of
central tendency selected does not give a good indication of the typical score, then
you most probably have chosen the wrong one.
The mean is the most frequently used measure of central tendency and it should
be used if you are satisfied that it gives a good indication of the typical score in
your sample. However, there is a problem with the mean. Since it uses all the
scores in a distribution, it is sensitive to extreme scores.
Example:
The mean for these set of nine scores:
20 + 22 + 25 + 26 + 30 + 31 + 33 + 40 + 42 is 29.89
Copyright Open University Malaysia (OUM)

TOPIC 2 DESCRIPTIVE STATISTICS

21

If we were to change the last score from 42 to 70, see what happens to the mean:
20 + 22 + 25 + 26 + 30 + 31 + 33 + 40 + 70 is 33.00
Obviously, this mean is not a good indication of the typical score in this set of
data. The extreme score has changed the mean from 29.89 to 33.00. If these were
test scores, it may give the impression that students performed better in the later
test when in fact only one student scored highly.
NOTE: Keep in mind this characteristic when interpreting the mean
obtained from a set of data.
If you find that you have an extreme score and you are unable to use the mean,
then you should use the median. The median is not sensitive to extreme scores. If
you examine the above example, the median is 30 in both distributions. The
reason is simply that the median score does not depend on the actual scores
themselves beyond putting them in ascending order. So the last score in a
distribution could be 80, 150 or 5,000 and the median still would not change. It is
this insensitivity to extreme scores that makes the median useful when you cannot
use the mean.

2.3

MEASURES OF VARIABILITY OR
DISPERSION

Variability or dispersion refers to the spread of the values around the central
tendency. There are two common measures of dispersion, the range and the
standard deviation.

2.3.1

Range

Range is simply the highest value minus the lowest value. For example, in a
distribution, if the highest value is 36 and the lowest is 15, the range is 36 15 = 21.

Copyright Open University Malaysia (OUM)

22

2.3.2

TOPIC 2 DESCRIPTIVE STATISTICS

Standard Deviation

Standard deviation is a more accurate and detailed estimate of dispersion because


an outlier can greatly exaggerate the range. The standard deviation shows the
relation that a set of scores has to the mean of the sample. For instance, when you
give a test, there is bound to be variation in the scores obtained by students.
Variability, variation or dispersion is determined by the distance of a particular
score from the norm or measure of central tendency such as the mean. The
standard deviation is a statistic that shows the extent of variability or variation
for a given series of scores from the mean.
Standard deviation makes use of the deviations of the individual scores from the
mean. Then, each individual deviation is squared to avoid the problem of plus
and minus. Standard deviation is the most often used measure of variability or
variation in educational and psychological research.
The following is the formula for calculating standard deviation:
2
1 n
S=
X i X OR

n 1 i 1

(a)

X X

N 1

Interpretation of the Formula


Standard deviation is found by:

Taking the difference between the mean X and each item X X ;

Squaring this difference X X ;

Summing all the squared differences X X ;

Dividing by the number of scores (N) minus 1; and

Extracting the square root.

Copyright Open University Malaysia (OUM)

TOPIC 2 DESCRIPTIVE STATISTICS

(b)

23

Computing Standard Deviation


Example: A mathematics test was given to a group of 10 students. Their
scores are shown in Column 1 of Table 2.1.
Table 2.1: Example of Computing Standard Deviation
Column 1

Column 2

X X

X X

23
22
26
21
30
24
20
27
25
32

23 25 = 2
22 25 = 3
26 25 = + 1
21 25 = 4
30 25 = + 5
24 25 = + 1
20 25 = 5
27 25 = + 2
25 25 = 0
32 25 = + 7

4
9
1
16
25
1
25
4
0
49

Column 3
2

X X = 134
2

X = 25
Apply the formula:
Std. Deviation =
(c)

X X
N1

134
134

3.8586
10 1
9

Differences in Standard Deviations


A mathematics test was administered to Class A and Class B. The
distribution of the scores are shown below.
In Class A (Figure 2.1), the scores are widely spread out, which means there
is high variance or a bigger standard deviation i.e. most of the scores are 6
from the mean. If the mean is 50, then you can say that approximately 95%
of the students scored between 44 and 56.

Copyright Open University Malaysia (OUM)

24

TOPIC 2 DESCRIPTIVE STATISTICS

Figure 2.1: Standard deviation

In Class B (Figure 2.2), there is low variance or a small standard deviation which
explains why most of the scores are clustered around the mean. Most of the scores
are bunching around the mean i.e. most of the scores are 3 from the mean. If
the mean is 50, approximately 95% of the students scored between 47 and 53.

Figure 2.2: Standard deviation

ACTIVITY 2.1
Below are the scores obtained by students in two classes on a history test:
Class A marks: 15, 25, 20, 20, 18, 22, 16, 24, 28, 12
Class B marks: 10, 30, 13, 27, 16, 24, 5, 35, 28, 12
(a) Compute the mean of the two classes.
(b) Compute the standard deviation of the two classes.
(c) Explain the implication of differences in standard deviations.

Copyright Open University Malaysia (OUM)

TOPIC 2 DESCRIPTIVE STATISTICS

25

FREQUENCY DISTRIBUTION

2.4

Frequency distribution is a way of displaying numbers in an organised manner.


A frequency distribution is simply a table that, at the minimum, displays how
many times in a data set each response or "score" occurs. A good frequency
distribution will display more information than this; although with just this
minimal information, many other bits of information can be computed.

2.4.1

Tables

Tables can contain a great deal of information but they also take up a lot of space
and may overwhelm readers with details. How should tables be presented in a
manner that can be easily understood? In general, frequency tables are best for
variables with different numbers of categories (see Table 2.2).
Table 2.2: Question: Should Sex Education be Taught in Secondary School?
Frequency

Percent

Valid Percent

Cumulative
Percent

4. Strongly Agree

7.7

7.7

7.7

3. Agree

23.1

23.1

30.8

2. Disagree

30.8

30.8

61.5

1. Strongly Disagree

38.5

38.5

100.0

13

100.0

100.0

Total

Table 2.2 summarises the responses of 13 teachers with regard to the teaching of
sex education in secondary school.

The first column contains the values or categories of the variables (opinion
on teaching sex education in schools extent of agreement).

The frequency column indicates the number of respondents in each category.

The percent column lists the percentage of the whole sample in each
category. These percentages are based on the total sample size, including
those who did not answer the question. Those who did not answer will be
shown as missing cases in this column.

The valid percent column contains the percentage of those who gave a valid
response to the question that belongs to each category. When there are no
missing cases, the valid percent column is similar to the percent column.
Copyright Open University Malaysia (OUM)

26

TOPIC 2 DESCRIPTIVE STATISTICS

The cumulative percentage column provides the rolling addition of


percentages from the first category to the last valid category. For example,
7.7 percent of teachers strongly agree that sex education should be taught in
secondary school. A further 23.1 percent of them simply agree that sex
education should be taught. The cumulative percentage column adds up the
percentage of those who strongly agree with those who agree (7.7 + 23.1 =
30.8). Thus, 30.8 percent at least agree (either agree or strongly agree) that
sex education should be taught in secondary school.

2.4.2

SPSS Procedure

To obtain a frequency table, measure of central tendency and variability:


1.

Select the Analyse menu.

2.

Click on the Descriptive Statistics and then on Frequencies to open the


Frequencies dialogue box.

3.

Select the variable(s) you require (i.e. opinion on sex education) and click
on the button to move the variable into the Variables(s) box.

4.

Click on the Statistics. command push button to open the Frequencies:


Statistics sub-dialogue box.

5.

In the Central Tendency box, select the Mean, Median and Mode check
boxes.

6.

In the Dispersion box, select the Std. deviation and Range check boxes.

7.

Click on Continue and then OK.

2.5

GRAPHS

Graphs are widely used in describing data. However, it should be appropriately


used. There is a tendency for graphs to be cluttered, confusing and downright
misleading.

2.5.1

Bar Charts

The following are elements of a graph that should be given due consideration
(refer to Figure 2.3):

The X-axis represents the values of the variables being displayed. The Xaxis may be divided into discrete categories (bar charts) or continuous
Copyright Open University Malaysia (OUM)

TOPIC 2 DESCRIPTIVE STATISTICS

27

values (line graphs). Which units are used depend on the level of
measurement of the variable being graphed.

In the example in Figure 2.3, the X-axis represents the students gain scores
after undergoing an innovative instructional programme.

The Y-axis, which appears either in percentages or frequencies, as in Figure


2.3, shows the frequency of students who obtained the various scores
indicated in the X-axis.

Interpretation of the graph on Students Gain Scores:

A total of 275 students obtained between 1 and 5 marks as a result of


the innovative instructional programme; 199 obtained between 6 and
10 marks; 77 between 11 and 15 marks; and 28 between 16 and 20
marks.

The number of students who obtained high gain scores decreases


gradually.

Figure 2.3: Example of a bar chart

Copyright Open University Malaysia (OUM)

TOPIC 2 DESCRIPTIVE STATISTICS

28

2.5.2

Histogram

Histograms are different from bar charts because they are used to display
continuous variables (see the histogram in Figure 2.4).

Figure 2.4: Percentage who agreed that sex education should be taught
in secondary schools

The X-axis represents the different age groups, while the Y-axis represents the
percentages of respondents.

Each bar in the X-axis represents one age group in ascending order.

The Y-axis in this case represents the percentages of respondents in the Sex
Education survey.

Interpretation of the graph Sex Education Should be Taught in Secondary


School:

Among the 18 to 28 age group, only 20% agreed that sex education should
be taught in schools compared to 60% in the 51 to 61 age group.

About 40% in the 40 to 50 age group and 50% among the 29 to 39 age
group agreed that sex education should be taught in secondary schools.

Only 10% of those aged 73 years and older agreed that secondary school
students should be taught sex education.

Copyright Open University Malaysia (OUM)

TOPIC 2 DESCRIPTIVE STATISTICS

2.5.3

29

Line Graphs

The line graph serves a similar function as a histogram. It should be used for
continuous variables. The main differences between a line graph and a histogram
are that on a line graph, the frequency of any value on the X-axis is represented by
a point on a line rather than by a single column and the values of the continuous
variable are not automatically grouped into a smaller number of groups as they are
in histograms. As such, the line graph reflects the frequencies or percentages of
every value of the x variable and thus avoids potential distortions due to the way
in which the values are grouped.
The line graph in Figure 2.5 shows the frequency of using the library among a
group of male and female respondents. The level of measurement of the Y-axis
variable is ordinal or interval. Line graphs are more suitable for variables that
have more than five or six categories. They are less suited for variables with a
very large number of values as this can produce a very jagged and confusing
graph.
Since a separate line is produced for each category of the x variable, only x
variables with a small numbers of categories should be used. This will normally
mean that the x variable is a nominal or ordinal variable.

Figure 2.5: Example of a line graph

Copyright Open University Malaysia (OUM)

30

TOPIC 2 DESCRIPTIVE STATISTICS

ACTIVITY 2.2
Interpret the line graph (Figure 2.5) showing the frequency of a group of
respondents visiting the library. A separate line is used for male and
female respondents.

Descriptive statistics are used to summarise a collection of data and present it


in a way that can be easily and clearly understood.

Mean, median and mode are common descriptive statistics used to measure
central tendency, while standard deviation is the commonly used statistic to
measure variability or dispersion of data.

A frequency distribution is a table that, at the minimum, displays how many


times in a data set each response or "score" occurs.

Graphs are also used to condense large sets of data and these include the use
of bar charts, histograms and line graphs.

Frequency distribution
Graphs
Mean
Measures of central tendency
Measures of variability or dispersion

Median
Mode
Range
Standard deviation

Copyright Open University Malaysia (OUM)

Topic Normal

Distribution

LEARNING OUTCOMES
By the end of this topic, you should be able to:
1.

Explain what normal distribution means;

2.

Assess normality using graphical techniques histogram;

3.

Assess normality using graphical techniques box plots;

4.

Assess normality using graphical techniques normality plots; and

5.

Assess normality using statistical techniques.

INTRODUCTION

This topic explains what normal distribution is and introduces the graphical as
well as the statistical techniques used in assessing normality. It also presents SPSS
procedures for assessing normality.

3.1

WHAT IS NORMAL DISTRIBUTION?

Now that you know what mean stands for, as well as the standard deviation of a
set of scores, we can proceed to examine the concept of normal distribution. The
normal curve was developed mathematically in 1733 by DeMoivre as an
approximation to the binomial distribution. Laplace used the normal curve in 1783
to describe the distribution of errors. However, it was Gauss who popularised the
normal curve when he used it to analyse astronomical data in 1809 and it became
known as the Gaussian distribution.
The term normal distribution refers to a particular way in which scores or
observations tend to pile up or distribute around a particular value rather than be
Copyright Open University Malaysia (OUM)

32

TOPIC 3 NORMAL DISTRIBUTION

scattered all over. The normal distribution which is bell-shaped is based on a


mathematical equation (which we will not get into).
While some argue that in the real world, scores or observations are seldom
normally distributed, others argue that in the general population, many variables
such as height, weight, IQ scores, reading ability, job satisfaction and blood
pressure turn out to have distributions that are bell-shaped or normal.

3.2

WHY IS NORMAL DISTRIBUTION


IMPORTANT?

Normal distribution is important for the following reasons:

Many physical, biological and social phenomena or variables are normally


distributed. However, some variables are only approximately normally
distributed.

Many kinds of statistical tests (such as t-test, ANOVA) are derived from a
normal distribution. In other words, most of these statistical tests work best
when the sample tested is distributed normally.

Fortunately, these statistical tests work very well even if the distribution is only
approximately normally distributed. Some tests work well even with very wide
deviations from normality. They are described as robust tests that are able to
tolerate the lack of a normal distribution.

3.3

CHARACTERISTICS OF THE NORMAL


CURVE

A normal distribution (or normal curve) is completely determined by the mean


and standard deviation i.e. two normally distributed variables having the same
mean and standard deviation must have the same distribution. We often identify a
normal curve by stating the corresponding mean and standard deviation and
calling those the parameters of the normal curve.
A normal distribution is symmetric and centred at the mean of the variable, and its
spread depends on the standard deviation of the variable. The larger the standard
deviation, the flatter and more spread out is the distribution.

Copyright Open University Malaysia (OUM)

TOPIC 3 NORMAL DISTRIBUTION

33

Figure 3.1: Normal distribution or curve

The graph in Figure 3.1 is a picture of a normal distribution of IQ scores among a


sample of adolescents.

Mean is 100.

Standard Deviation is 15.

As you can see, the distribution is symmetric. If you folded the graph in the
centre, the two sides would match, i.e. they are identical.

3.3.1

Mean, Median and Mode

The centre of the distribution is the mean. The mean of a normal distribution is
also the most frequently occurring value (i.e. the mode) and it is also the value
that divides the distribution of scores into two equal parts (i.e. the median). In any
normal distribution, the mean, median and the mode all have the same value (i.e.
100 in the example above).

Copyright Open University Malaysia (OUM)

34

3.4

TOPIC 3 NORMAL DISTRIBUTION

THREE-STANDARD-DEVIATIONS RULE

Normal distribution shows the area under the curve. The three-standard-deviations
rule, when applied to a variable, states that almost all the possible observations or
scores of the variable lie within three standard deviations to either side of the
mean. The normal curve is close to (but does not touch) the horizontal axis
outside the range of the three standard deviations to either side of the mean. Based
on the graph in Figure 3.1, you will notice that with a mean of 100 and a standard
deviation of 15;

68% of all IQ scores fall between 85 (i.e. one standard deviation less than the
mean which is 100 15 = 85) and 115 (i.e. one standard deviation more than
the mean which is 100 + 15 = 115).

95% of all IQ scores fall between 70 (i.e. two standard deviations less than the
mean which is 100 30 = 70) and 130 (i.e. two standard deviations more than
the mean which is 100 + 30 = 130).

99% of all IQ scores fall between 55 (i.e. three standard deviations less than
the mean which is 100 45 = 55) and 145 (i.e. three standard deviations more
than the mean which is 100 + 45 = 145).

A normal distribution can have any mean and standard deviation. However, the
percentage of cases or individuals falling within one, two or three standard
deviations from the mean is always the same. The shape of a normal distribution
does not change. Means and standard deviations will differ from variable to
variable but the percentage of cases or individuals falling within specific intervals
is always the same in a true normal distribution.

Copyright Open University Malaysia (OUM)

TOPIC 3 NORMAL DISTRIBUTION

35

ACTIVITY 3.1
1. What is meant by the statement that a population is normally
distributed?
2. Two normally distributed variables have the same means and the
same standard deviations. What can you say about their distributions?
Explain your answer.
3. Which normal distribution has a wider spread: the one with mean 1
and standard deviation 2 or the one with mean 2 and standard
deviation 1? Explain your answer.
4. The mean of a normal distribution has no effect on its shape. Explain.
5. What are the parameters for a normal curve?

3.5

INFERENTIAL STATISTICS AND


NORMALITY

Often in statistics, one would like to assume that the sample under investigation
has a normal distribution or an approximate normal distribution. However, such
an assumption should be supported in some way by some techniques. As
mentioned earlier, the use of several inferential statistics such as the t-test and
ANOVA require that the distribution of the variables analysed are normally
distributed or at least approximately normally distributed. However, as discussed
in Topic 1, if a simple random sample is taken from a population, the distribution
of the observed values of a variable in the sample will approximate the
distribution of the population. Generally, the larger the sample, the better the
approximation tends to be. In other words, if the population is normally
distributed, the sample of observed values would also be normally distributed if
the sample is randomly selected and it is large enough.

3.5.1

Assessing Normality using Graphical Methods

Assessing normality means determining whether the samples of students,


teachers, parents or principals you are studying are normally distributed. When
you draw a sample from a population that is normally distributed, it does not
mean that your sample will necessarily have a distribution that is exactly normal.
Samples vary, so the distribution of each sample may also vary. However, if a
Copyright Open University Malaysia (OUM)

36

TOPIC 3 NORMAL DISTRIBUTION

sample is reasonably large and it comes from a normal population, its distribution
should look more or less normal.
For example, when you administer a questionnaire to a group of school principals,
you want to be sure that your sample of 250 principals is normally distributed.
Why? The assumption of normality is a prerequisite for many inferential
statistical techniques and there are two main ways of determining the normality of
distribution.
The normality of a distribution can be determined using graphical methods (such
as histograms, stem-and-leaf plots and boxplots) or using statistical procedures
(such as the Kolmogorov-Smirnov statistic and the Shapiro-Wilk statistics).
SPSS Procedures for Assessing Normality
There are several procedures to obtain the different graphs and statistics to
assess normality, for example the EXPLORE procedure is the most convenient
when both graphs and statistics are required.
From the main menu, select Analyse.
Click Descriptive Statistics and then Explore ....to open the Explore dialogue
box.
Select the variable you require and click the arrow button to move this variable
into the Dependent List: box.
Click the Plots...command push button to obtain the Explore: Plots subdialogue box.
Click the Histogram check box and the Normality plots with tests check box,
and ensure that the Factor levels together radio button is selected in the
Boxplots display.
Click Continue.
In the Display box, ensure that Both is activated.
Click the Options...command push button to open the Explore: Options subdialogue box.
In the Missing Values box, click the Exclude cases pairwise (if not selected
by default)
Click Continue and then OK.

Copyright Open University Malaysia (OUM)

TOPIC 3 NORMAL DISTRIBUTION

(a)

37

Assessing Normality using Histogram


See the graph in Figure 3.2, which is a histogram showing the distribution of
scores obtained on a Scientific Literacy Test administered to a sample of
students.
The values on the vertical axis indicate the frequency or number of cases.
The values on the horizontal axis are midpoints of value ranges. For
example, the first bar is 20 and the second bar is 30, indicating that each bar
covers a range of 10.
A simple look at the bars shows that the distribution has the rough shape of
a normal distribution. However, there are some deviations. The question is
whether this deviation is small enough to say that the distribution is
approximately normal. Generating the histogram via the Explore option does
not show you the normal curve overlay. To show this overlay, you have to
generate the histogram using the Frequencies option (Analyse Descriptive
Statistics Frequencies Charts Histograms With Normal Curve).

Figure 3.2: Distribution of scores obtained on a Scientific Literacy Test

(b)

Assessing Normality using Skewness


Skewness is the degree of departure from the symmetry of a distribution. A
normal distribution is symmetrical. A non-symmetrical distribution is
described as being either negatively or positively skewed. A distribution is
skewed if one of its tails is longer than the other or the tail is pulled to either
the left or the right.
Refer to Figure 3.3, which shows the distribution of the scores obtained by
students on a test. There is a positive skew because it has a longer tail in the
positive direction or the long tail is on the right side (towards the high values
on the horizontal axis).
Copyright Open University Malaysia (OUM)

38

TOPIC 3 NORMAL DISTRIBUTION

What does it mean? It means that more students were getting low scores in
the test and this indicates that the test was too difficult. Alternatively, it
could mean that the questions were not clear or the teaching methods and
materials did not bring about the desired learning outcomes.

Figure 3.3: Distribution of scores obtained by students on a test

Refer to Figure 3.4 which shows the distribution of the scores obtained by
students on a test. There is a negative skew because it has a longer tail in the
negative direction or to the left (towards the lower values on the horizontal
axis).
What does it mean? It means that more students were getting high scores on
the test. This may indicate that either the test was too easy or the teaching
methods and materials were successful in bringing about the desired
learning outcomes.

Figure 3.4: Distribution of scores obtained by students on a test

Interpreting the Statistics for Skewness


Besides graphical methods, you can also determine skewness by examining
the statistics reported. A normal distribution has a skewness of 0. See the
table on the right in Figure 3.5, which reports the skewness statistics for
Copyright Open University Malaysia (OUM)

TOPIC 3 NORMAL DISTRIBUTION

39

three independent groups. A positive value indicates a positive skew, while


a negative value indicates a negative skew.
Among the three groups, Group 3 is not normally distributed compared to
the other two groups. Its skewness value of -1.200 which is greater than 1
normally indicates that the distribution is non-symmetrical (Rule of thumb:
>|1| indicates a non-symmetrical distribution).
The distribution of Group 2 with a skewness value of .235 is closer to being
normal of 0 followed by Group 1 with a skewness value of .973.

Figure 3.5: Skewness statistics for three independent groups

(c)

Assessing Normality using Kurtosis


Kurtosis indicates the degree of "flatness" or "peakedness" in a distribution
relative to the shape of normal distribution. Refer to the graphs in
Figure 3.6.

Figure 3.6: Kurtosis

(i)

Low Kurtosis: Data with low kurtosis tend to have a flat top near the
mean rather than a sharp peak.

(ii)

High Kurtosis: Data with high kurtosis tend to have a distinct peak
near the mean, decline rather rapidly and have a heavy tail.
Copyright Open University Malaysia (OUM)

40

TOPIC 3 NORMAL DISTRIBUTION

See the graphs in Figure 3.7:

A normal distribution has a kurtosis of 0 and is called mesokurtic


(Graph A). (Strictly speaking, a mesokurtic distribution has a value of
3 but in line with the practice used in SPSS, the adjusted version is 0).

If a distribution is peaked (tall and skinny), its kurtosis value is greater


than 0 and it is said to be leptokurtic (Graph B) and has a positive
kurtosis.

If, on the other hand, the kurtosis is flat, its value is less than 0, or
platykurtic (Graph C) and has a negative kurtosis.

Figure 3.7: Mesokurtic, Leptokurtic and Platykurtic

Interpreting the Statistics for Kurtosis


Besides graphical methods, you can also determine skewness by examining
the statistics reported. A normal distribution has a kurtosis of 0. See the
table below in Figure 3.8, which reports the kurtosis statistics for three
independent groups.

Figure 3.8: Kurtosis statistics for three independent groups

Group 1 with a kurtosis value of 0.500 (positive value) is more


normally distributed than the other two groups because it is closer to 0.
Copyright Open University Malaysia (OUM)

TOPIC 3 NORMAL DISTRIBUTION

(d)

41

Group 2 with a kurtosis value of 1.58 has a distribution that is more


flattened and not as normally distributed compared to Group 1.

Group 3 with a kurtosis value + 1.65 has a distribution that is more


peaked and not as normally distributed compared to Group 1.

Assessing Normality using Box Plot


The boxplot also provides information about the distribution of scores.
Unlike the histogram which plots actual values, the boxplot summarises the
distribution using the median, the 25th and 75th percentiles, and extreme
scores in the distribution. See Figure 3.9, which shows a boxplot for the
same set of data on scientific literacy discussed earlier. Note that the lower
boundary of the box is the 25th percentile and the upper boundary is the
75th percentile.

Figure 3.9: Boxplot for the set of data on scientific literacy

Copyright Open University Malaysia (OUM)

42

(i)

TOPIC 3 NORMAL DISTRIBUTION

The BOX
The box has hinges that form the outer boundaries of the box. The
hinges are the scores that cut off the top and bottom 25% of the data.
Thus, 50% of the scores fall within the hinges. The thick horizontal
line through the box represents the median. In the case of a normal
distribution, the line runs through the centre of the box.
If the median is closer to the top of the box, then the distribution is
negatively skewed. If it is closer to the bottom of the box, then it is
positively skewed.

(ii)

WHISKERS
The smallest and largest observed values within the distribution are
represented by the horizontal lines at either end of the box, commonly
referred to as whiskers.
The two whiskers indicate the spread of the scores.
Scores that fall outside the upper and lower whiskers are classified as
extreme scores or outliers. If the distribution has any extreme scores,
i.e. 3 or more box lengths from the upper or lower hinge, these will be
represented by a circle (o).
Outliers tell us that we should see why it is so extreme. Could it be that
you may have made an error in data entry?
Why is it important to identify outliers? This is because many of the
statistical techniques used involve calculation of means. The mean is
sensitive to extreme scores and it is important to be aware whether
your data contain such extreme scores if you are to draw conclusions
from the statistical analysis conducted.

(e)

Assessing Normality using Normality Probability Plot


Besides the histogram and the box plot, another frequently used graphical
technique of determining normality is the "Normality Probability Plot" or
"Normal Q-Q Plot." The idea behind a normal probability plot is simple. It
compares the observed values of the variable to the observations expected
for a normally distributed variable. More precisely, a normal probability plot
is a plot of the observed values of the variable versus the normal scores (the
observations expected for a variable having the standard normal
distribution).
In a normal probability plot, each observed or value (score) obtained is
paired with its theoretical normal distribution forming a linear pattern. If the
Copyright Open University Malaysia (OUM)

TOPIC 3 NORMAL DISTRIBUTION

43

sample is from a normal distribution, then the observed values or scores fall
more or less in a straight line. The normal probability plot is formed by:

Vertical axis: Expected normal values

Horizontal axis: Observed values

SPSS Procedures
1. Select Analyze from the main menu.
2. Click Descriptive Statistics and then Explore.....to open the Explore
dialogue box.
3. Select the variable you require (i.e. mathematics score) and click on
the arrow button to move this variable to the Dependent List: box.
4. Click the Plots....command push button to obtain the Explore: Plots
sub dialogue box.
5. Click the Histogram check box and the Normality plots with tests
check box and ensure that the Factor levels together radio button is
selected in the Boxplots display.
6. Click Continue.
7. In the Display box, ensure that both are activated.
8. Click the Options....command push button to open the Explore:
Options sub-dialogue box.
9. In the Missing Values box, click on the Exclude cases pairwise radio
button. If this option is not selected then, by default, any variable with
missing data will be excluded from the analysis. That is, plots and
statistics will be generated only for cases with complete data.
10. Click on Continue and then OK.
Note that these commands will give you the 'Histogram', 'Stem-and-leaf
plots', 'Boxplots' and 'Normality Plots'.

Copyright Open University Malaysia (OUM)

44

TOPIC 3 NORMAL DISTRIBUTION

Figure 3.10: Example of a normal probability plot

When you use a normal probability plot to assess the normality of a variable,
you must remember that ascertaining whether the distribution is roughly
linear and is normal is subjective. The graph in Figure 3.10 is an example of
a normal probability plot. Though none of the value falls exactly on the line,
most of the points are very close to the line.

Values that are above the line represent units for which the observation
is larger than its normal score

Values that are below the line represent units for which the observation
is smaller than its normal score

Note that there is one value that falls well outside the overall pattern of the
plot. It is called an outlier and you will have to remove the outlier from the
sample data and redraw the normal probability plot.

Copyright Open University Malaysia (OUM)

TOPIC 3 NORMAL DISTRIBUTION

45

Even with the outlier, the values are close to the line and you can conclude
that the distribution will look like a bell-shaped curve. If the normal scores
plot departs only slightly from having all of its dots on the line, then the
distribution of the data departs only slightly from a bell-shaped curve. If one
or more of the dots departs substantially from the line, then the distribution
of the data is substantially different from a bell-shaped curve.
Outliers:
Refer to the normal probability plot in Figure 3.11. Note that there are
possible outliers which are values lying off the hypothetical straight line.
Outliers are anomalous values in the data which may be due to recording
errors, which may be correctable, or they may be due to the sample not
being entirely from the same population.

Figure 3.11: Outliers

Skewness to the left:


Refer to the normal probability plot in Figure 3.12. Both ends of the
normality plot fall below the straight line passing through the main body of
the values of the probability plot, then the population distribution from
which the data were sampled may be skewed to the left.

Copyright Open University Malaysia (OUM)

46

TOPIC 3 NORMAL DISTRIBUTION

Figure 3.12: Skewness to the left

Skewness to the right:


If both ends of the normality plot bend above the straight line passing
through the values of the probability plot, then the population distribution
from which the data were sampled may be skewed to the right. Refer to
Figure 3.13.

Figure 3.13: Skewness to the right

Copyright Open University Malaysia (OUM)

TOPIC 3 NORMAL DISTRIBUTION

47

ACTIVITY 3.2

Figure 3.14: Normal probability plot for the distribution of


mathematics scores

Refer to the output of a Normal Probability Plot for the distribution of


mathematics scores by eight students in Figure 3.14.
1. Comment on the distribution of scores.
2. Would you consider the distribution normal?
3. Are there outliers?

3.5.2

Assessing Normality using Statistical Techniques

The graphical methods discussed present qualitative information about the


distribution of data. Histograms, box plots and normal probability plots are
graphical methods useful for determining whether data follow a normal curve.
Extreme deviations from normality are often readily identified from graphical
methods. However, in many instances the decision is not straightforward. Using
graphical methods to decide whether a data set is normally distributed involves
making a subjective decision; formal test procedures are usually necessary to test
the assumption of normality.
In general, both statistical tests and graphical plots should be used to determine
normality. However, the assumption of normality should not be rejected on the
Copyright Open University Malaysia (OUM)

48

TOPIC 3 NORMAL DISTRIBUTION

basis of a statistical test alone. In particular, when the sample is large, statistical
tests for normality can be sensitive to very small (i.e. negligible) deviations in
normality. Therefore, if the sample is very large, a statistical test may reject the
assumption of normality when the data set, as shown using graphical methods, is
essentially normal and the deviation from normality is too small to be of practical
significance.
(a)

Kolmogorov-Smirnov Test
You could use the Kolmogorov-Smirnov test to evaluate statistically
whether the difference between the observed distribution and a theoretical
normal distribution is small enough to be just due to chance. If it could be
due to chance, you would treat the distribution as being normal. If the
distribution between the actual distribution and the theoretical normal
distribution is larger, then it is likely to be due to chance (sampling error)
and then you would treat the actual distribution as not being normal.
In terms of hypothesis testing, the Kolmogorov-Smirnov test is based on Ho:
that the data are normally distributed. The test is used for samples which
have more than 50 subjects.
H0:
Ha:

DISTRIBUTION FITS THE DATA


DISTRIBUTION DOES NOT FIT THE DATA

DISTRIBUTION: NORMAL

If the Kolmogorov-Smirnov tests yields a significance level of less (<)


than 0.05, it means that the distribution is NOT normal.

However, if the Kolmogorov-Smirnov test yields a significance level


of more (>) than 0.05, it means that the distribution is normal.
Kolmogorov-Smirnova

SCORE

Statistic

df

Sig.

.21

1598

.000*

* This is lower bound of the true significance


a Lilliefors Significance Correction

Copyright Open University Malaysia (OUM)

TOPIC 3 NORMAL DISTRIBUTION

(b)

49

Shapiro-Wilk Test
Another powerful and most commonly employed test for normality is the
Shapiro-Wilk test by Shapiro and Wilk. It is an effective method for testing
whether a data set has been drawn from a normal distribution.

If the normal probability plot is approximately linear (the data follow a


normal curve), the test statistic will be relatively high.

If the normal probability plot has curvature that is evidence of nonnormality in the tails of a distribution, the test statistic will be
relatively low.

In terms of hypothesis testing, the Shapiro-Wilk test is based on Ho: that the
data are normally distributed. The test is used for samples which have less
than 50 subjects.
H0:
Ha:

DISTRIBUTION FITS THE DATA


DISTRIBUTION DOES NOT FIT THE DATA

DISTRIBUTION: NORMAL

Reject the assumption of normality if the test of significance reports a


p-value of less (<) than 0.05.

DO NOT REJECT the assumption of normality if the test of


significance reports a p-value of more (>) than 0.05.

Table 3.1 shows the Kolmogorov-Smirnov statistic for assessing normality.


Table 3.1: Kolmogorov-Smirnov Statistic for Assessing Normality
SPSS Output
Tests of Normality
Shapiro-Wilk
Independent variable group

Statistic

df

Sig.

Group 1

.912

22

.055

Group 2

166

14

.442

Group 3

.900

16

.084

The Shapiro-Wilk normality tests indicate that the scores are normally
distributed in each of the three groups. All the p-values reported are more
than 0.05 and hence you DO NOT REJECT the null hypothesis.

Copyright Open University Malaysia (OUM)

50

TOPIC 3 NORMAL DISTRIBUTION

NOTE:
It should be noted that with large samples, even a very small deviation from
normality can yield low significance levels. So a judgment still has to be made as
to whether the departure from normality is large enough to matter.

3.6

WHAT TO DO IF THE DISTRIBUTION IS


NOT NORMAL?

You have TWO choices if the distribution is not normal and they are:

Use a Non-parametric Statistic

Transform the Variable to make it Normal

(a)

Use a Non-parametric Statistic


In many cases, if the distribution is not normal, an alternative statistic will
be available especially for bivariate analyses such as correlation or
comparisons of means. These alternatives which do not require normal
distributions are called non-parametric or distribution-free statistics. Some
of these alternatives are shown in Figure 3.15 as follows:

Figure 3.15: Non-parametric or distribution-free statistics

Copyright Open University Malaysia (OUM)

TOPIC 3 NORMAL DISTRIBUTION

(b)

51

Transform the Variable to make it Normal


The shape of a distribution can be changed by expressing it in a different
way statistically. This is referred to as transforming the distribution.
Different types of transformations can be applied to "normalise" the
distribution. The type of transformation selected depends on the manner to
which the distribution departs from normality. (We will not discuss
transformation in this course.)
ACTIVITY 3.3
Kolmogorov-Smirnova

SCORE

Statistic
0.57

df
999

Sig.
.200*

* This is lower bound of the true significance


a Lilliefors Significance Correction
Examine the SPSS output above and determine if the sample is
normally distributed.

Normal distribution refers to a particular way in which scores or observations


will tend to pile up or distribute around a particular value.

The normal distribution is bell-shaped and is completely determined by the


mean and standard deviation.

The use of several inferential statistics such as t-tests and ANOVA requires
that the variables analysed are normally distributed or at least approximately
normally distributed.

Normality of a distribution can be assessed using graphical methods or


statistical techniques.

The graphical methods used to assess normality are the histogram, the boxplot
and the normality probability plot.

The statistical techniques used to assess normality are the KolmogorovSmirnov test and Shapiro-Wilk test.

Copyright Open University Malaysia (OUM)

52

TOPIC 3 NORMAL DISTRIBUTION

Boxplot
Histogram
Kolmogorov-Smirnov test

Normal distribution
Normality probability plot
Shapiro-Wilk test

Copyright Open University Malaysia (OUM)

Topic Hypothesis

Testing

LEARNING OUTCOMES
By the end of this topic, you should be able to:
1. Explain the difference between null and alternative hypothesis and their
use in research;
2. Differentiate between Type I and Type II errors; and
3. Explain when the two-tailed and one-tailed test is used.

INTRODUCTION

The topic explains the difference between the null and alternative hypotheses and
their use in research. It also introduces the concepts of Type I error and Type II
error. It illustrates the difference between the two-tailed and one-tailed tests and
explains when they are used in hypothesis testing.

4.1

WHAT IS A HYPOTHESIS?

Your car did not start. You have a hunch and put forward the hypothesis that "the
car does not start because there is no petrol." You check the fuel gauge to either
accept or reject the hypothesis. If you find there is petrol, you reject the
hypothesis.
Next, you hypothesise that "the car did not start because the spark plugs are dirty."
You check the spark plugs to determine if they are dirty. You find that the spark
plugs are indeed dirty. You do not reject the hypothesis.

Copyright Open University Malaysia (OUM)

54

TOPIC 4 HYPOTHESIS TESTING

Many researchers state their research questions in the form of a "hypothesis."


Hypothesis is singular and hypotheses are plural. A hypothesis is a tentative
statement that explains a particular phenomenon which is testable. The key
word is "testable." Refer to the following statements:
(i)

Juvenile delinquents tend to be from low socio-economic families.

(ii)

Children who attend kindergarten are more likely to have higher reading
scores.

(iii) The discovery method of teaching may enhance the creative thinking skills
of students.
(iv) Children who go for tuition tend to perform better in mathematics.
All these are examples of hypotheses. However, these statements are not
particularly useful because of words such as "may," "tend to" and "more likely."
Using these tentative words does not suggest how you would go about proving it.
To solve this problem, a hypothesis should state:

Two or more variables that are measurable

An independent and dependent variable

A relationship between two or more variables

A possible prediction

Examine the hypothesis in Figure 4.1. It has all the attributes mentioned:

The variables are "critical thinking" and "gender," which are both measurable.

The independent variable is "gender" which can be manipulated as male


and female; and the dependent variable is "critical thinking."

There is a possible relationship between the gender of undergraduates and


their critical thinking skills.

It is possible to predict that males may be better in critical thinking compared


to females or vice-versa.

Copyright Open University Malaysia (OUM)

TOPIC 4 HYPOTHESIS TESTING

55

Figure 4.1: Hypothesis

ACTIVITY 4.1
1. Rewrite the four hypotheses using the formalised style shown. Ensure
that each hypothesis has all the attributes stated.
2. Write two more original hypotheses of your own using this form.

4.2
4.2.1

TESTING A HYPOTHESIS
Null Hypothesis

The null hypothesis is a hypothesis (or hunch) about the population. It represents a
theory that has been put forward because it is believed to be true. The word "null"
means nothing or zero. So, a null hypothesis states that nothing happened. For
example, there is no difference between males and females in critical thinking
skills or there is no relationship between socio-economic status and academic
performance. Such a hypothesis is denoted with the symbol "Ho:". In other words,
you are saying,

You do not expect the groups to be different.

You do not expect the variables to be related.

Copyright Open University Malaysia (OUM)

56

TOPIC 4 HYPOTHESIS TESTING

Say, for example, you conduct an experiment to test the effectiveness of the
discovery method in learning science compared to the lecture method. You select
a random sample of 30 students for the discovery method group and 30 students
for the lecture method group (see Topic 1 on Random Sampling).
Based on your sample, you hypothesise that there are no differences in science
achievement between students in the discovery method group and students in the
lecture method group. In other words, you make the claim that there are no
differences in science scores between the two groups in the population. This is
represented by the following two types of null hypotheses with the following
notation or Ho:
Ho: =

OR

Ho: - = 0

In other words, you are saying that:

The science mean scores for the discovery method group () is EQUAL to
the mean scores for the lecture method group ( ).

The science mean scores for the discovery method group () MINUS the
mean scores for the lecture method group ( ) is equal to ZERO.

The null hypothesis is often the reverse of what the researcher actually believes in
and it is put forward to allow the data to contradict it (You may find it strange but
it has its merit!).
Based on the findings of the experiment, you found that there was a significant
difference in science scores between the discovery method group and the lecture
method group.
In fact, the mean score of subjects in the discovery method group was HIGHER
than the mean of subjects in the lecture method group. What do you do?

You REJECT the null hypothesis because earlier you had said they would be
equal.

You reject the null hypothesis in favour of the ALTERNATIVE


HYPOTHESIS (i.e. ).

Copyright Open University Malaysia (OUM)

TOPIC 4 HYPOTHESIS TESTING

4.2.2

57

Alternative Hypothesis

Alternative Hypothesis ( H1 ) is the opposite of Null Hypothesis. For example, the


alternative hypothesis for the study discussed earlier is that THERE IS A
DIFFERENCE in science scores between the discovery method group and the
lecture method group represented by the following notation:
Ha:

Ha: The Alternative Hypothesis might be that the science mean scores between the
discovery method group and the lecture method group are DIFFERENT.
Ha: >

Ha: The Alternative Hypothesis might be that the science mean scores of the
discovery method group are HIGHER than the mean scores of the lecture method
group.
Ha: <

Ha: The Alternative Hypothesis might be that the science mean scores of the discovery
method group are LOWER than the mean scores of the lecture method group.
SELF-CHECK 4.1
1. What is the meaning of a null hypothesis?
2. What do you mean when you "reject" the null hypothesis?
3. What is the alternative hypothesis?
4. What do you mean when you "accept" the alternative hypothesis?

4.3

TYPE I AND TYPE II ERROR

The aim of any hypothesis-testing situation is to make a decision; in particular,


you have to decide whether to reject the Null Hypothesis (Ho), in favour of the
Alternative Hypothesis (Ha). Although you would like to make a correct decision
always, there are times when you might make a wrong decision.

You can claim that the two means are not equal in the population when in fact
they are.

Or you can fail to say that there is a difference when there is really no
difference.
Copyright Open University Malaysia (OUM)

58

TOPIC 4 HYPOTHESIS TESTING

Statisticians have given names to these two types of errors as follows:


Type 1 Error
Claiming that two means are different when in fact they are equal. In other words,
you reject a null hypothesis when it is TRUE.
Type 2 Error
Claiming that there are no differences between two means when in fact there is a
difference. In other words, you reject a null hypothesis when it is FALSE.
How do you remember to differentiate between the two types of errors?
Type 1 Error is the error you are likely to make when you examine your data and
say that "Something is happening here!" For example, you conclude that "There is
a difference between males and females." In fact, there is no difference between
males and females in the population.
Type 2 Error is the error you are likely to make when you examine your data and
say "Nothing is happening here! For example, you conclude that "There is no
difference between males and females." In fact, there is a difference between
males and females in the population.
Four Possible Situations in Testing a Hypothesis
Ho: =
OR
Ho: - = 0

The null hypothesis can be true or false and you can reject or not reject the null
hypothesis. There are four possible situations which arise in testing a hypothesis
and they are summarised in Figure 4.2.
FALSE

TRUE

Do Not Reject Ho:


[Say it is TRUE]

Correct Decision
[no problem]

Risk committing
Type 2 Error

Reject Ho:
[Say it is FALSE]

Risk committing
Type 1 Error

Correct Decision
[no problem]

Figure 4.2: Four possible situations in testing a hypothesis

Copyright Open University Malaysia (OUM)

TOPIC 4 HYPOTHESIS TESTING

59

Based on your study:

You decide to Reject the Null Hypothesis (Ho). You have a correct decision if
in the real world the null hypothesis is TRUE.

You decide to Reject the Null Hypothesis (Ho). You risk committing Type 1
Error if in the real world the hypothesis is TRUE.

You decide NOT to Reject the Null Hypothesis (Ho). You risk committing
Type 2 Error if in the real world the hypothesis is FALSE.

You decide NOT to Reject the Null Hypothesis (Ho). You have made a correct
decision if in the real world the null hypothesis is FALSE.

In other words, when you detect a difference in the sample you are studying and a
difference is also detected in the population, you are OK. When there is no
difference in the sample you are studying and there is no difference in the
population you are OK.
ACTIVITY 4.3
You can use the logic of hypothesis testing in the courtroom. A student
is being tried for stealing a motorcycle. The judicial system is based on
the premise that a person is "innocent until proven guilty." It is the court
that must prove based on sufficient evidence that the student is guilty.
Thus, the null and alternative hypotheses would be:
Ho: The student is innocent
Ha: The student is guilty
1. Using the table in Figure 4.2, state the four possible outcomes of the
court's decision.
2. Interpret the Type I and Type II errors in this context.

Copyright Open University Malaysia (OUM)

60

4.4

TOPIC 4 HYPOTHESIS TESTING

TWO-TAILED AND ONE-TAILED TEST

In your study, you want to determine if there is a difference in spatial thinking


between males and females; i.e. null hypothesis Ho: 1 = 2 . The alternative
hypothesis is Ha: 1 2 . A hypothesis test whose alternative hypothesis has this
form is called a TWO-TAILED TEST.
In your study, you want to determine if females are inferior in spatial thinking
compared to males; i.e. null hypothesis is still Ho: 1 = 2 . But, the alternative
hypothesis is Ha: 1 < 2 . A hypothesis test whose alternative hypothesis has this
form is called a LEFT-TAILED TEST.
In your study, you want to determine if females are better in spatial thinking
compared to males; i.e. null hypothesis is still Ho: 1 = 2 . The alternative
hypothesis is Ha: 1 > 2 . A hypothesis test whose alternative hypothesis has this
form is called a RIGHT-TAILED TEST.

Note:
A hypothesis test is called a ONE-TAILED TEST if it is either left-tailed or righttailed; i.e. if it is not TWO-TAILED.

4.4.1

Two-tailed Test

EXAMPLE:
You conducted a study to determine if there is a difference in spatial thinking
between male and female adolescents. Your sample consists of 40 males and 42
female adolescents. You administer a 30-item spatial thinking test to the sample
and the results showed that males scored 23.4 and females scored 24.1.
Step 1:
You want to test the following null and alternative hypotheses:

Ho : 1 = 2
Ha : 1 2

Copyright Open University Malaysia (OUM)

TOPIC 4 HYPOTHESIS TESTING

61

Step 2:
Using the t-test for an independent variable (which we will discuss in detail in
Topic 5) means you obtained a t-value of 1.50. Based on the alternative
hypothesis, you decide that you are going to use a two-tailed test.

Step 3:
If you are using an alpha () of .05 for a two-tailed test, you have to divide .05 by
2 and you get 0.025 for each side of the rejection area.

Figure 4.3: Step 3

Step 4:
The df = n-1 = (40 + 42) - 2 = 80. Look up the t table in Table 4.1 and find that
the critical value is 1.990 and the graph in Figure 4.3 shows that it ranges from
1.990 to + 1.990 which forms the Do Not Reject area.

Copyright Open University Malaysia (OUM)

62

TOPIC 4 HYPOTHESIS TESTING

Table 4.1: The t table


Table of Critical Values for Student's t-test
One

0.250

0.100

0.050

0.025

0.010

0.005

Two

0.500

0.200

0.100

0.050

0.020

0.010

50

0.679

1.299

1.676

2.009

2.403

2.678

60

0.679

1.296

1.671

2.000

2.390

2.660

70

0.678

1.294

1.667

1.994

2.381

2.648

80

0.678

1.292

1.664

1.990

2.374

2.639

90

0.677

1.291

1.662

1.987

2.368

2.632

df

Step 5:
The t-value you have obtained is 1.554 (We will discuss the formula for
computing the t-value in Topic 5). This value does not fall in the Rejection
Region. What is your conclusion? You do not reject Ho. In other words, you
conclude that there is NO SIGNIFICANT DIFFERENCE in spatial thinking
between male and female adolescents. You could also say that the test results are
not statistically significant at the 5% level and provide at most weak evidence
against the null hypothesis.
At = 0.05, the data does not provide sufficient evidence to conclude that the
mean scores on spatial thinking of females is superior to that of males, even
though the mean scores obtained is higher than that of males.
ACTIVITY 4.4
1. How would you have concluded if the t-value obtained is 2.243?
2. Explain how you might commit a Type I or Type II error.

Copyright Open University Malaysia (OUM)

TOPIC 4 HYPOTHESIS TESTING

4.4.2

63

One-tailed Test

EXAMPLE:
You conduct a study to determine if students taught to use mind maps are better in
recalling concepts and principles in economics. A sample of 10 students were
administered a 20-item economics test before the treatment (i.e. pretest). The
same test was administered after the treatment (i.e. posttest) which lasted six
weeks.
Step 1:
The null and alternative hypotheses are:

Ho: 1 = 2 (Mean scores on the economics tests are the same)

Ha: 1 > 2 (Mean score of the posttest is greater than the mean score of
the pretest)

Step 2:
Decide on the significant level (alpha). Here, you have set it at the 5% significant
level or alpha () = 0.05.

Step 3:
Computation of the test statistic. Using the dependent t-test formula, you obtained
a t-value of 4.711.

Step 4:
The critical value for the right-tailed test is t with df = n-1. The number of
subjects is n = 10 and = 0.05. You check the "Table of Critical Values for the tTest" and it reveals that for df = 10 1 = 9. The critical value is 1.833 (Figure
4.4).

Copyright Open University Malaysia (OUM)

64

TOPIC 4 HYPOTHESIS TESTING

Figure 4.4: Step 4

Step 5:
You find that the t-value obtained is 4.711. It falls in the Rejection Region. What is
your conclusion? You reject Ho. In other words, you conclude that there is a
SIGNIFICANT DIFFERENCE in the performance in economics before and after the
treatment. You could also say that the test results are statistically significant at the 5%
level. Put it another way, the p-value is less than the specified significance level of
0.05. (The p-value is provided in most outputs of statistical packages such as SPSS.)
At = 0.05, the data provides sufficient evidence to conclude that the mean scores
on the posttest are superior to the mean scores obtained in the pretest. Evidently,
teaching students mind mapping enhances their recall of concepts and principles
in economics.
ACTIVITY 4.5
A researcher conducted a study to determine the effectiveness of
immediate feedback on the recall of information in biology. The
experimental group of 30 students was provided with immediate
feedback on the questions that were asked. The control group consisted
of 30 students who were given delayed feedback on the questions asked.
1. Determine the null hypothesis for the hypothesis test.
2. Determine the alternative hypothesis for the hypothesis test.
3. Classify the hypothesis test as two-tailed, left-tailed or right-tailed.
Explain your answer.
Copyright Open University Malaysia (OUM)

TOPIC 4 HYPOTHESIS TESTING

65

Inferential statistics are used in making inferences from sample observation to


the relevant population.

Hypothesis testing allows us to use sample data to test a claim about a


population, such as testing whether a population proportion or population
mean equals to some values.

There are two types of hypotheses: null and alternative.

Statistical inference using hypothesis testing involves procedures for testing


the significance of hypotheses using data collected from samples.

Drawing the wrong conclusion is referred as error of inference.

There are two types of error: Type I and Type II errors. Both relate to the
rejection or acceptance of the null hypothesis.

Type I error is committed when the researcher rejects the null when the null is
indeed true; in other words incorrectly rejecting the null.

The probability level where the null is incorrectly rejected is called the
significance level, denoted by the symbol a value set a priori (before even
conducting the research) by the researcher.

Type II error is committed when the researcher fails to reject the null when the
null is indeed false, in other words wrongly accepting the null.

Type II error is often denoted as .

In any research, the intention of the researcher is to correctly reject the null; if
the design is carefully selected and the samples represent the population, the
chances of achieving this objective are high. Thus, the power of the study is
defined as 1 - .

Copyright Open University Malaysia (OUM)

66

TOPIC 4 HYPOTHESIS TESTING

Alternate hypothesis
Hypothesis
Inferential statistics
Null hypothesis

Power
Type I error
Type II error

Copyright Open University Malaysia (OUM)

Topic t-test

LEARNING OUTCOMES
By the end of this topic, you will be able to:
1.

Explain what is a t-test and its use in hypothesis testing;

2.

Demonstrate using the t-test for Independent Means;

3.

Identify the assumptions for using the t-test; and

4.

Demonstrate the use of the t-Test for Dependent Means.

INTRODUCTION

This topic explains what t-test is and its use in hypothesis testing. It also
highlights the assumptions for using the t-test. Two types of t-test are elaborated
in the topic. The first is t-test for independent means while the second is the t-test
for dependent means. Computation of the t-statistic using formulae as well as
SPSS procedures is also explained.

5.1

WHAT IS t-TEST?

The t-test was developed by a statistician, W.S. Gossett (see Figure 5.1), who
worked in a brewery in Dublin, Ireland. His pen name was student and hence,
the term students t-test was published in the scientific journal, Biometrika, in
1908. The t-test is a statistical tool used to infer differences between small
samples based on the mean and standard deviation.

Copyright Open University Malaysia (OUM)

68

TOPIC 5 T-TEST

Figure 5.1: W.S. Gossett (1878-1937)

In many educational studies, the researcher is interested in testing the differences


between means on some variable. The researcher is keen to determine whether the
differences observed between two samples represent a real difference between the
populations from which the samples were drawn. In other words, did the observed
difference just happen by chance when, in reality, the two populations did not
differ at all on the variable studies.
For example, a teacher wants to find out whether the Discovery Method of
teaching science to primary schoolchildren is more effective than the Lecture
Method. She conducts an experiment involving 70 primary school children of
whom 35 are taught using the Discovery method and 35 are taught using the
Lecture method. Subjects in the Discovery group score 43.0 marks, while subjects
in the Lecture method group score 38.0 marks on the science test. The Discovery
group does better than the Lecture group. Does the difference between the two
groups represent a real difference or is it due to chance? To answer this question,
the t-test is often used by researchers.

5.2

HYPOTHESIS TESTING USING t-TEST

How do we go about establishing whether the differences in the two means are
statistically significant or due to chance? You begin by formulating a hypothesis
about the difference. This hypothesis states that the two means are equal or the
difference between the two means is zero and is called the null hypothesis.
Using the null hypothesis, you begin testing the significance by saying: "There is
no difference in the score obtained in science between subjects in the Discovery
group and the Lecture group."
Copyright Open University Malaysia (OUM)

TOPIC 5 T-TEST

69

More commonly, the null hypothesis may be stated as follows:


Ho : 1 = 2

(a)

OR
Ho : 1 - 2 = 0

(b)

If you reject the null hypothesis, it means the difference between the two means
have statistical significance. On the other hand, if you do not reject the null
hypothesis, it means the difference between the two means is NOT statistically
significant and the difference is due to chance.
Note:
For a null hypothesis to be accepted, the difference between the two means need
not be equal to zero since sampling may account for the departure from zero.
Thus, you can accept the null hypothesis even if the difference between the two
means is not zero provided the difference is likely to be due to chance. However,
if the difference between the two means appears too large to have been brought
about by chance, you reject the null hypothesis and conclude that a real difference
exists.
ACTIVITY 5.1
1.

State TWO null hypothesis in your area of interest that can be tested
using the t-test.

2.

What do you mean when you reject or do not reject a null


hypothesis?

5.3

t-TEST FOR INDEPENDENT MEANS

The t-test is a powerful statistical tool that enables you to determine that the
differences obtained between two groups is statistically significant. When two
groups are independent of each other, it means the sample drawn came from two
populations. In other words, it means that the two groups are independent or
belong to "unpaired groups" and "unpooled groups."

Copyright Open University Malaysia (OUM)

70

(a)

TOPIC 5 T-TEST

Illustration
Say, for example, you conduct a study to determine the spatial reasoning
ability of 70 ten-year-old children in Malaysia. The sample consisted of 35
males and 35 females (see Figure 5.2). The sample of 35 males was drawn
from the population of ten-year-old males in Malaysia and the sample of 35
females was drawn from the population of ten-year-old females in Malaysia.

Note that they are independent samples because they come from two completely
different populations.

Figure 5.2: Independent Samples

Research Question:
"Is there a significant difference in spatial reasoning between male and
female ten-year-old children?"
Null Hypothesis or Ho:
"There is no significant difference in spatial reasoning between male and
female ten-year-old children."

Copyright Open University Malaysia (OUM)

TOPIC 5 T-TEST

(b)

71

Formula for Independent t-test


Note that the formula for the t-test shown below is a ratio.

The top part of the equation is the


difference between the two means.

X1 X 2
SE(X1 X 2 )

The bottom part of the equation is the


Standard Error (SE) which is a
measure of the variability of dispersion
of the scores.

(c) Computation of the Standard Error


Use the formula below. To compute the standard error (SE), take the
variance (i.e. standard deviation squared) for Group 1 and divide it by the
number of subjects in that group minus "1." Do the same for Group 2. Then,
add these two values and take the square root.
This is the formula for the
Standard Error:

SE(X1 X 2 )

Combine the two formulas


and you get this version of
the t-test formula:

var1
var2

(n1 1) (n 2 1)

X1 X 2
var1
var2

(n1 1) (n 2 1)

The results of the study are as follows:

Group 1: Males

Mean
12

SD
2.0

N
35

Variance
4.0

Group 2: Females

10

2.0

35

4.0

Copyright Open University Malaysia (OUM)

72

TOPIC 5 T-TEST

Let's try using the formula:


t

12 10
4.01
4.02

(35 1) (35 1)

2
00.1177 0.1177

2
4.124
0.485

Note: The t-value will be positive if the mean for Group 1 is larger or more than
(>) the mean of Group 2 and negative if it is smaller or less than (<).
(d)

What do you do after computing the t-value?


Once you have computed the t-value (which is 4.124), look up the t-value in
the Table of Critical Values for Student's t-test which tells us whether the
ratio is large enough to say that the difference between the groups is
significant. In other words, the difference observed is not likely due to
chance or sampling error.

(e)

Alpha Level
As with any test of significance, you need to set the alpha level. In most
educational and social research, the "rule of thumb" is to set the alpha level
at .05. This means that 5% of the time (five times out of a hundred) you
would find a statistically significant difference between the means even if
there is none ("chance").

(f)

Degrees of Freedom
The t-test also requires that we determine the degrees of freedom (df) for the
test. In the t-test, the degrees of freedom are the sum of the subjects or
persons in both groups minus 2. Given the alpha level, the df, and the tvalue, you look up the Table of Critical Values for Student's t-test (available
as an appendix in the back of most statistics texts) to determine whether the
t-value is large enough to be significant.

(g)

Look up the Table of Critical Values for Student's t-test shown in


Table 5.1 (Note: Only part of the table is given here)
The df is 70 minus 2 = 68. You take the nearest df which is 70 and read the
column for the two-tailed alpha of 0.050.
The t-value you obtained is 4.124. The critical value shown is 1.994. Since
the t-value is greater than the critical value of 1.994, you reject Ho and
Copyright Open University Malaysia (OUM)

TOPIC 5 T-TEST

73

conclude that the difference between the means for the two groups is
different. In other words, males scored significantly higher than females on
the spatial reasoning test.
However, you do not have to go through this tedious process, as statistical
computer programs such as SPSS provide the significance test results,
saving you from looking them up in a table.
Table 5.1: Table of Critical Values for Student's t-test
One-tail

0.250

0.100

0.050

0.025

0.010

0.005

Two-tail

0.500

0.200

0.100

0.050

0.020

0.010

21

0.686

1.323

1.721

2.080

2.518

2.831

22

0.686

1.321

1.717

2.074

2.508

2.819

23

0.685

1.319

1.714

2.069

2.500

2.807

24

0.685

1.318

1.711

2.064

2.492

2.797

25

0.684

1.316

1.708

2.060

2.485

2.787

26

0.684

1.315

1.706

2.056

2.479

2.779

27

0.684

1.314

1.703

2.052

2.473

2.771

28

0.683

1.313

1.701

2.048

2.467

2.763

29

0.683

2.462

1.311

1.699

2.045

2.756

30

0.683

1.310

1.697

2.042

2.457

2.750

40

0.681

1.303

1.684

2.021

2.423

2.704

50

0.679

1.299

1.676

2.009

2.403

2.678

60

0.679

1.296

1.671

2.000

2.390

2.660

70

0.678

1.294

1.667

1.994

2.381

2.648

80

0.678

1.292

1.664

1.990

2.374

2.639

90

0.677

1.291

1.662

1.987

2.368

2.632

100

0.677

1.290

1.660

1.984

2.364

2.626

100

0.674

1.282

1.645

1.960

2.326

2.576

df

Copyright Open University Malaysia (OUM)

74

TOPIC 5 T-TEST

ACTIVITY 5.2

(h)

1.

Would you reject Ho if you had set the alpha at 0.01 for a two-tailed
test?

2.

When do you use the one-tailed test and two-tailed t-test?

Assumptions for Independent t-test


While the t-test has been described as a robust statistical tool, it is based on a
model that makes several assumptions about the data that must be met prior to
analysis. These assumptions need to be evaluated because the accuracy of your
interpretation of the data depends on whether the assumptions are violated.
The following are five main assumptions that are generic to all t-tests:
(i)

Scale of Measurement
The data that you collect for the dependent variable should be based on
an instrument or scale that is continuous or ordinal. For example,
scores that you obtain from a 5-point Likert scale: 1, 2, 3, 4, 5 or marks
obtained in a mathematics test, the score obtained on an IQ test or the
score obtained on an aptitude test.

(ii)

Random Sampling
The sample of subjects should be randomly sampled from the
population of interest.

(iii) Normality
The data come from a distribution that has one of those nice bellshaped curves known as a normal distribution. Refer to Topic 3: The
Normal Distribution, which provides both graphical and statistical
methods for assessing normality of a sample or samples.
(iv) Sample Size
Fortunately, it has been shown that if the sample size is reasonably
large, quite severe departures from normality do not seem to affect the
conclusions reached. Then again what is a reasonable sample size? It
has been argued that as long as you have enough people in each group
(typically greater or equal to 30 cases) and the groups are close to
equal in size, you can be confident that the t-test will be a good and
strong tool for getting the correct conclusions. Statisticians say that the
t-test is a "robust" test. Departure from normality is most serious when
Copyright Open University Malaysia (OUM)

TOPIC 5 T-TEST

75

sample sizes are small. As sample sizes increase, the sampling


distribution of the mean approaches a normal distribution regardless of
the shape of the original population.
(v)

Homogeneity of Variance
It has often been suggested by some researchers that homogeneity of
variance or equality of variance is actually more important than the
assumption of normality. In other words, are the standard deviations of
the two groups pretty close to equal? Most statistical software
packages provide a "test of equality of variances" along with the
results of the t-test and the most common being Levene's test of
homogeneity of variance. Refer to Table 5.2.
Table 5.2: Levene's Test of Homogeneity of Variance

Levenes Test of Equality of


Variances

Equal
Variance
Assumed
Unequal
Variances
Assumed

95% Confidence Interval

Sig

Sign.
Twotail

Mean
Difference

Std. Error
Difference

Upper

Lower

3.39

.080

.848

20

.047

1.00

1.18

-1.46

3.46

.848

16.70

.049

1.00

-1.49

3.40

1.18

Begin by putting forward the null hypothesis that:


"There are no significant differences between the variances of the two groups"
and you set the significant level at .05.
If the Levene statistic is significant, i.e. LESS than .05 level (p < .05), then the
null hypothesis is: REJECTED and one accepts the alternative hypothesis while
concluding that the VARIANCES ARE UNEQUAL. (The unequal variances in
the SPSS output is used)
If the Levene statistic is not significant i.e. MORE than .05 level (p > .05), then
you DO NOT REJECT (or Accept) the null hypothesis and conclude that the
VARIANCES ARE EQUAL. (The equal variances in the SPSS output is used)

Copyright Open University Malaysia (OUM)

76

TOPIC 5 T-TEST

The Levene test is robust in the face of departures from normality. The Levene's
test is based on deviations from the group mean.

SPSS provides two options i.e. "homogeneity of variance assumed" and


"homogeneity of variance not assumed" (see Table below).

The Levene test is more robust in the face of non-normality than more
traditional tests like Bartlett's test.
ACTIVITY 5.3
Refer to Table 5.2. Based on Levenes Test of Homogeneity of variance,
what is your conclusion? Explain.

Lets examine an EXAMPLE:


In the CoPs Project, an Inductive Reasoning scale consisting of 11 items was
administered to 946 eighteen-year-old respondents. One of the research questions
put forward is:

"Is there a significant difference in inductive reasoning between male and


female subjects"?

To establish the statistical significance of the means of these two groups, the ttest is used. Use SPSS.

Copyright Open University Malaysia (OUM)

TOPIC 5 T-TEST

5.4

77

T-TEST FOR INDEPENDENT MEANS USING


SPSS

SPSS PROCEDURES for independent groups t-test:


1.
Select the Analyze menu.
2.
Click on Compare Means and then IndependentSamples T Test .... to open the Independent Samples
T Test dialogue box.
3.
Select the test variable(s) [i.e. Inductive Reasoning] and
then click the arrow button to move the variables into
the Test Variables(s): box
4.
Select the grouping variables [i.e. gender] and click
the arrow button to move the variable into the Grouping
Variable box
5.
Click Define Groups .... command pushbutton to
open the Define Groups sub-dialogue box.
6.
In the Group 1 box, type the lowest value for the variable
[i.e. 1 for 'males']. Enter the second value for the
variables [i.e. 2 for 'females'] in the Group 2 box.
7.
Click Continue and then OK.
Output #1:
The Group Statistics in Table 5.3 reports the mean values on the variable
(inductive reasoning) for the two different groups (males and females). Here, we
see that 495 females in the sample scored 8.99 while 451 males had a mean score
of 7.95 on inductive reasoning. The standard deviation for the males is 3.46 while
that for the females is 3.14. The scores for the females are less dispersed
compared to those for the males.
Table 5.3: Mean Values on the Variable (Inductive Reasoning) for the Two Different
Groups (Males and Females)
Group Statistics
Gender
INDUCTIVE
REASONING

Mean

Std. Deviation Std. Error Mean

Male

451

7.9512

3.4618

2.345

Female

495

8.9980

3.1427

3.879

Copyright Open University Malaysia (OUM)

78

TOPIC 5 T-TEST

The question remains: Is this sample difference in inductive reasoning large


enough to convince us that there is a real significant difference in inductive
reasoning ability between the population of 18-year-old females and the
population of 18-year-old males?
Output #2:
Lets examine this output in two parts:
Firstly, determine that the data meet the "Homogeneity of Variance" assumption.
You can use the Levene's Test and set the alpha at 0.05. The alpha obtained is
0.030 which is less than (<) than 0.05 and you reject the Ho: and conclude that the
variances are not equal. Hence, you have violated the "Homogeneity of Variance"
assumption. Thus, the Unequal Variances Assumed output should be used.
Refer to Figure 5.3.
Levenes Test of
Equality of Variances

Equal
Variance
Assumed

Sig

4.720

.030

-4.875

-4.853

Unequal
Variances
Assumed

95% Confidence
Interval
Sign.
Twotail

Mean
Difference

Std. Error
Difference

Upper

Lower

944

.000

-1.0468

-2.147

-1.4682

-.6254

911.4

.049

-1.0468

-2.146

-1.4701

-.6234

Figure 5.3: Levenes Test of equality of variances

Secondly, examine the following:

The SPSS output below displays the results of the t-test to test whether the
difference between the two sample means is significantly different from zero.

Remember, the null hypothesis states there is no real difference between the
means (Ho: 1 = 2).

Any observed difference just occurred by chance.

Interpretation:
t-value
This "t" value tells you how far away from 0, in terms of the number of standard
errors, the observed difference between the two sample means falls. The "t" value
is obtained by dividing the Mean Difference ( 1.0468) by the Std. Error ( .2146)
which is equal to 4.878.
Copyright Open University Malaysia (OUM)

TOPIC 5 T-TEST

79

p-value
If the p-value as shown in the "sig (2 tailed) column is smaller than your chosen
alpha level you do not reject the null hypothesis and argue that there is a real
difference between the populations. In other words, we can conclude that the
observed difference between the samples is statistically significant.
Mean Difference
This is the difference between the means (labelled "Mean Difference") i.e. 7.9512
8.9980 = 1.0468.

5.5

T-TEST FOR DEPENDENT MEANS

The Dependent means t-test or the Paired t-test or the Repeated measures ttest is used when you have data from only one group of subjects i.e. each subject
obtains two scores under different conditions. For example, when you give a pretest and after a particular treatment or intervention give the same subjects a posttest. In this form of design, the same subjects obtain a score on the pretest and,
after some intervention or manipulation obtain a score on the posttest. Your
objective is to determine whether the difference between means for the two sets of
scores is the same or different.
Example:
Research Questions:

Is there a significant difference in pretest and posttest scores in social studies


for subjects in the discovery method group?

Is there a significant difference in pretest and posttest scores in social studies


for subjects in the chalk and talk group?

Null Hypotheses:

There is no significant difference between the pretest and the posttest for the
discovery method group.

There is no significant difference between the pretest and the posttest for the
chalk and talk group.

Copyright Open University Malaysia (OUM)

80

TOPIC 5 T-TEST

Formula of the Dependent t-test


D

t=

D
D -

N
N N-1

Where,
t

= t-ratio
= Average difference
= Different scores squared then summed

D
( D)
2

= Different scores summed then squared


= Number of pairs

EXAMPLE:
A researcher conducted a study on personality changes in 15 college women from
Year 1 to Year 4. A 30-item personality test was administered in Year 1 and then
again in Year 4 to the same 15 women. The results of the study are shown in
Table 5.4.
Table 5.4: Results of the Study
Subject
Number
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

Year 1 Test
(Pretest)
21
18
13
10
22
15
17
24
25
20
21
19
17
20
16

Year 4 Test
(Posttest)
24
20
15
15
20
19
18
22
28
23
25
22
16
26
19

X =18.5

X =20.8

D2

+3
+2
+3
+5
-2
+4
+1
-2
+3
+3
+4
+3
-1
+6
+3

+9
+4
+9
+25
+4
+14
+1
+4
+9
+9
+16
+9
+1
+36
+9

D 35

Copyright Open University Malaysia (OUM)

159

TOPIC 5 T-TEST

81

Step 1:
Calculate the mean score for the Year 1 Test by adding up all the Year 1 Test
scores and divide by the number of subjects. This will give you the mean score of
18.5. Similarly, calculate the mean score of the Year 4 Test and this will give you
the mean score of 20.8.
Step 2:
Next, calculate the value of standard deviation using the formula as follows.

SD =

2 D
D n
N-1

35
159
SD =

15
15 1

159 81.67
5.52 2.35
14

Step 3:

D
)
SD
Calculate effect size, the mean difference divided by the standard deviation.
Applying the t-test for Dependent Means formula: (

The mean difference is 20.8 18.5 = 2.3 and the standard deviation is 2.35.
Substitute these values in the above equation, i.e. 2.3 / 2.35 = 0.979.
To determine the likelihood that the effect size is a function of chance, first
calculate the t-ratio by multiplying the effect size by the square root of the number
of pairs.

In this example, the t is (0.979) 15 (0.979)(3.87) 3.79

Copyright Open University Malaysia (OUM)

82

TOPIC 5 T-TEST

Table 5.5: Table of Critical Values for Student's t-test


Tail One

0.100

0.050

0.025

0.010

0.005

Tail Two

0.200

0.100

0.050

0.020

0.010

10

1.372

1.812

2.228

2.764

3.169

11

1.363

1.796

2.201

2.718

3.106

12

1.356

1.782

2.179

2.681

3.055

13

1.350

1.771

2.160

2.650

3.012

14

1.345

1.761

2.145

2.624

2.977

15

1.341

1.753

2.131

2.602

2.947

16

1.337

1.746

2.120

2.583

2.921

17

1.333

1.740

2.110

2.567

2.898

18

1.330

1.734

2.101

2.552

2.878

19

1.328

1.729

2.093

2.539

2.861

20

1.325

1.725

2.086

2.528

2.845

df

Step 4:
Having computed the t-value (which is 3.79) you look up the t-value in The Table
of Critical Values for Student's t-test or The Table of Significance which tells us
whether the ratio is large enough to say that the difference between the groups is
significant. In other words, the difference observed is not likely due to chance or
sampling error. Refer to Table 5.5.
Alpha Level
The researchers set the alpha level at 0.05. This means that 5% of the time (five
out of a hundred) you would find a statistically significant difference between the
means even if there is none ("chance"). However, since this is a one-tailed test,
you divide 0.05 by 2 and you get 0.025.
Degrees of Freedom
The t-test also requires that we determine the degrees of freedom (df) for the test.
In the t-test, the degrees of freedom are the sum of the subjects or persons which
is 15 1 = 14. Given the alpha level, the df and the t-value, you look up in the
Table (available as an appendix in the back of most statistics texts) to determine
whether the t-value is large enough to be significant.

Copyright Open University Malaysia (OUM)

TOPIC 5 T-TEST

83

Step 5:
The t-value obtained is 3.79 which is greater than the critical value shown which
is 2.145 (one tailed). Hence, the null hypothesis [Ho:] is Rejected and Ha: is
accepted which states the Posttest Mean > than Pretest Mean. It can be concluded
that the difference between the means is significant. In other words, there is
overwhelming evidence that a "gain" has taken place on the personality inventory
from Year 1 to Year 4 women undergraduates.

Again, you do not have to go through this tedious process, as statistical computer
programs such as SPSS, provides the significance test results, saving you from
looking them up in a table.
Misapplication of the Formula
A common error made by some research students is the misapplication of the
formula. Researchers who have Dependent Samples fail to recognise this fact, and
inappropriately apply the t-test for Independent Groups to test the hypothesis
that 1 = 2 . If an inappropriate Independent Groups t-test is performed with
Dependent Groups, the standard error will be greatly overestimated and significant
differences between the two means may be considered "non-significant" (Type 1
Error).

The opposite error, mistaking non-significant differences for significant ones


(Type 2 Error), may be made if the Independent Groups t-test is applied to
Dependent Groups t-test. Thus, when using the t-test, you need to recognise and
distinguish independent and dependent samples.

5.6

T-TEST FOR DEPENDENT MEANS USING


SPSS

EXAMPLE:
In a study, the researcher was keen to determine if teaching note-taking
techniques improved achievement in history. A sample of 22 students selected for
the study and taught note-taking techniques for a period of four weeks. The
research questions put forward is:
"Is there a significant difference in performance in history before and after
the treatment?" i.e you wish to determine whether the difference between the
means for the two sets of score is the same or different.

Copyright Open University Malaysia (OUM)

84

TOPIC 5 T-TEST

To establish the statistical significance of the means obtained on the pretest and
posttest, the repeated measures t-test (also called dependent-samples and pairedsamples t-test) was used using SPSS.
Data was collected from the same group of subjects on both conditions and each
subject obtains a score on the pretest, and after the treatment (or intervention or
manipulation), a score on the posttest.
Ho: 1 = 2

or

Ha: 1 2

SPSS PROCEDURES for the dependent groups t-test:

1. Select the Analyze menu.


2. Click Compare Means and then Paired-Samples T Test
...to open the Paired-Sample T Test dialogue box.

3. Select the test variable(s). [i.e. History Test] and


then click the arrow button to move the variables into
the Paired Variables box.
4. Click on Continue and then OK.
The following Table 5.6 shows the SPSS outputs.
Table 5.6: SPSS Outputs
Paired Samples Statistics

Pair 1

Mean

Std. Deviation

Std. Error Mean

Pretest

8.50

22

3.34

.71

Posttest

13.86

22

2.75

.59

The Paired Samples Statistics table above reports that the mean values on the
variable (history test) for the pretest and posttest. The posttest mean is higher
(13.86) than the pretest mean (8.50) indicating improved performance in the
history test after the treatment. The standard deviation for the pretest 3.34 and is
very close to the standard deviation for the posttest which is 2.75.

Copyright Open University Malaysia (OUM)

TOPIC 5 T-TEST

85

The question remains: Is this mean difference large enough to convince us that
there is a significant difference in performance in history, a consequence of
teaching note-taking techniques?
Paired Differences

Pair 1
Pretest
Posttest

Mean
Difference

Std.
Deviation

Std.
Error
Mean

Lower

Upper

df

Sig. (2
tailed)

-5.36

2.90

.62

-6.65

-4.076

-8.65

21

.000

Figure 5.4: Paired Differences

t-Value
This "t" value tells you how far away from 0, in terms of the number of standard
errors, the observed difference between the two sample means falls. The "t" value
is obtained by dividing the mean difference (5.36) by the std. error (.62), which
is equal to 8.65. Refer to Figure 5.4.
p-value
The p-value shown in the "Sig (2 tailed) column is smaller than your chosen
alpha level (0.05) and so you reject the null hypothesis and argue that there is a
real difference between the pretest and posttest.

In other words, we can conclude, that the observed difference between the two
means is statistically significant.
Mean Difference
This is the difference between the means 43.15 63.98 = 20.83.

Copyright Open University Malaysia (OUM)

86

TOPIC 5 T-TEST

ACTIVITY 5.4
t-test for Dependent Means or Groups
Case Study 1:
In a study, a researcher was interested in finding out whether attitude
towards science would be enhanced when students are taught science
using the Inquiry Method. A sample of 22 students were administered an
attitude toward science scale before the experiment. The treatment was
conducted for one semester, after which the same attitude scale was
administered to the same group of students.
N

ATTITUDE
Pair

Mean

Std. Deviation

Std. Error Mean

Pretest

22

8.50

3.33

.71

Posttest

22

13.86

2.75

.59

Paired Differences

Pair

Pretest

Mean

Std.
Deviation

Std.
Error
Mean

Lower

Upper

df

Sig.
(2
tailed)

-5.36

2.90

.62

-6.65

-4.08

-8.66

21

.000

Posttest

Answer the following questions and discuss online:

1.

State a null hypothesis for the above study.

2.

State an alternative hypothesis for the above study.

3.

Briefly describe the 'Paired Sample Statistics' table with regards to


the means and variability of scores.

4.

What is the conclusion of the null hypothesis stated in (1)?

5.

What is the conclusion of the alternative hypothesis stated in (2)?

Copyright Open University Malaysia (OUM)

TOPIC 5 T-TEST

87

ACTIVITY 5.5
t-test for Independent Means or Groups
Case Study 2:
A researcher was interested in finding out about the creative thinking skills
of secondary school students. He administered a 10-item creative thinking
test to a sample of 4,404 sixteen-year-old students drawn from all over
Malaysia.
GENDER

Mean

Std.
Deviation

Std. Error
Mean

Male

1966

6.9410

2.2858

5.155E-02

Female

2438

6.8351

2.4862

5.035E-02

Levenes Test for


Equality of Variances
Difference
Equal
Equal

t-test for
Equality of Means

Sig.

df

Sig. 2tailed

Mean
Difference

Std.
Error

19.408

.000

1.456

4402

.145

.1059

7.271E02

1.469

4327.13

.142

.1059

7.206E02

Answer the following questions and discuss online:

1.

State a null hypothesis for the above study.

2.

State an alternative hypothesis for the above study.

3.

Briefly describe the 'Group Statistics' table with regards to the means
and variability of scores.

4.

Is there evidence for homogeneity of variance? Explain.

5.

What would you do if the significance level is 0.053?

6.

What is the conclusion of the null hypothesis stated in (1)?

7.

What is the conclusion of the alternative hypothesis stated in (2)?

Copyright Open University Malaysia (OUM)

88

TOPIC 5 T-TEST

The Independent t-test is used to determine whether the difference observed


between two unrelated groups is statistically significant. It is a parametric test
and required the following assumption:
normal distribution of the test parameter.
the data is measured as interval or ratio data.

In order to have a meaningful interpretation, the independent t-test also


requires a large sample size.

The homogeneity of variance is another required assumption but the SPSS


does offer options for determining the p-value when this requirement is not
met.

The Paired t-test is used when you have before and after data from a single
group of subjects. In this test, the t-statistics is computed using the mean
differences rather that the difference in the mean between the two groups. As
such, all subjects must have the pretest and posttest data.

Critical value
Homogeneity of variance
Independent sample

P-value
Related sample (paired sample)
Significant level

Copyright Open University Malaysia (OUM)

Topic One-way

Analysis of
Variance (one
-way ANOVA)

LEARNING OUTCOMES
By the end of this topic, you should be able to:
1.

Define One-way ANOVA;

2.

Explain the logic of One-way ANOVA;

3.

Compute One-way ANOVA using a formula;

4.

Identify the assumptions for using the One-way ANOVA; and

5.

Compute One-way ANOVA using SPSS and interpret the results.

INTRODUCTION
This topic explains what One-way Analysis of Variance (ANOVA) is about and
the assumptions for using ANOVA in hypothesis testing. It demonstrates how
ANOVA can be computed using the formula and the SPSS procedures. Also
explained are the interpretation of the related statistical results and the use of posthoc comparison tests.

6.1 WHAT IS ANOVA TEST?


In educational research, we are often involved in finding out whether there are
differences between groups. For example, is there a difference between male and
female students, between rural and urban students and so forth. As we have
discussed in Topic 5, the t-test was used to compare differences of means between
two groups, such as comparing outcomes between a control and treatment group in
Copyright Open University Malaysia (OUM)

90

TOPIC 6

ONE-WAY ANALYSIS OF VARIANCE (ONEWAY ANCOVA)

an experimental study. Suppose you are interested in comparing the means of three
groups (i.e k = 3) rather than two.
You might be tempted to use the multiple t-test and compare the means separately;
i.e. you compare the means of Group 1 and 2, followed by Group 1 and 3 and so
forth. What is the danger of doing this? Multiple t-tests enhance the likelihood of
committing Type 1 error (i.e. claiming that two means are not equal, when in fact
they are equal). In other words, you reject a null hypothesis when it is TRUE. On a
practical level, using the t-test to compare many means is a cumbersome process in
terms of the calculations involved.
Example
Let us look at the following example, which shows the results of a study on
Attitude towards Homework among Students of Varying Ability Levels. Subjects
were divided into three groups: High Ability, Average Ability and Low Ability.
The total sample size is 505 students. You need a special class of statistical
techniques called the One-way Analysis of Variance or One-way ANOVA which
we will discuss here.
Table 6.1: Attitudes toward Homework among 14-Year-Old Students
Group

Mean

Std. Deviation

Std. Error

95 Pct Conf. Int


for Mean

High ability

220

13.03

3.17

0.12

12.79 13.27

Average
ability

212

11.99

2.93

0.11

11.77 12.21

Low ability

73

9.54

3.50

0.40

8.73 10.36

Interpretation of the Table 6.1


What do the three means tell you? High ability students have the highest mean
(13.03), while low ability students have the lowest mean (9.54). Meanwhile,
average ability students fall in the middle, with a mean of 11.99.

What do the three standard deviations tell you? Note that the standard deviation
for high ability (3.17) and average ability (2.93) students are fairly close, while
low ability students have a somewhat bigger standard deviation of 3.50.

What do the three Standard Errors tell you? Refer to Table 6.1, and you will
notice that there is a column called 'standard error'. What is the standard error?
The standard error is a measure of how much the sample means vary if you
were to take repeated samples from the same population. The first two groups
Copyright Open University Malaysia (OUM)

TOPIC 6

ONE-WAY ANALYSIS OF VARIANCE (ONEWAY ANCOVA)

91

contain > 200 students each; the standard error of the mean for each of these
groups is fairly small. It is 0.12 for high ability students and 0.11 for average
ability students. However, the standard error for the low ability group is
comparatively high = 0.40. Why? The smaller number of low ability students
(n=73) and the larger standard deviation explains why the standard error is
larger.

What does 95 Pct Conf. Int for Mean means? The last column displays the
confidence interval. What is the confidence interval? It is the range which is
likely to contain the true population value or mean. If you take repeated
samples of 14-year-old students from the same population of 14-year-old
students in the country and calculate their mean, there is a probability that 95%
of them should include the unknown population value or mean. For example,
you can be 95% confident that, in the population, the mean of high ability
students is somewhere between 12.79 and 13.27. Similarly, you can be 95%
confident that, in the population, the mean of low ability students is somewhere
between 8.73 and 10.36.

You will notice that the confidence interval is wider for low ability students
(i.e. 1.63) compared to confidence interval for high ability students (i.e. 0.48).
Why? This is due to the larger standard error (0.40) obtained by low ability
students. Since the confidence interval depends on the standard error of the
mean, the confidence interval for low ability students is wider than for high
ability students. So, the larger the standard error, the wider will be the
confidence interval. Makes sense, right?

At the heart of ANOVA is the concept of Variance. What is variance? Most of


you would say, it is the standard deviation squared! Yes, that is correct. The focus
is on two types of variance:

Between-Group Variance, i.e. if there are three groups, it is the variance


between the three groups.

Within-Group Variance, i.e. if in each group there are 30 subjects, it is the


variance of scores within subjects in that group.
The F-value is a ratio of the Between-Group
Variance and Within-Group Variance

If the F-value is significant, it tells us that the population means are probably not
all equal and you reject the null hypothesis. Next, you have to locate where the
significance lies or which of the means are significantly different. You have to use
post-hoc analysis to determine this.

Copyright Open University Malaysia (OUM)

92

TOPIC 6

ONE-WAY ANALYSIS OF VARIANCE (ONEWAY ANCOVA)

ACTIVITY 6.1
1.

What is the standard error? Why does the standard error vary?

2.

Explain "95 Pct Conf. Int for Mean".

6.2 LOGIC OF THE ONE-WAY ANOVA


A researcher was interested in finding out whether there are differences in creative
thinking among 12-year-old students from different socio-economic backgrounds.
Creative thinking was measured using The Torrance Test of Creative Thinking
consisting of five items, while socio-economic status (SES) was measured using
household income. Socio-economic status or SES was divided into three groups
(high, middle and low). The null hypothesis generated is that all three groups will
have the same mean score on the creative test. In formula terms, if we use the
symbol [pronounced as mew] to represent the average score, the null
hypothesis is expressed through the following notation:
The null hypothesis is represented in Figure 6.1 as follows:
Ho: 1 = 2 = 3

Figure 6.1: Null hypothesis

The null hypothesis states that the means of high ability, average ability and low
ability students are the same; i.e. is equal to 4.00.
To test the null hypothesis, the One-way Analysis of Variance is used. The Oneway ANOVA is a statistical technique used to test the null hypothesis that several
populations means are equal. The word 'variance' is used because it examines the
variability in the sample. In other words, how much do the scores of individual
students vary from the mean? Based on the variability or variance, it determines
whether there is reason to believe that the population means are not equal. In our
example, does creativity vary between the three groups of 12-year-old students?

Copyright Open University Malaysia (OUM)

TOPIC 6

ONE-WAY ANALYSIS OF VARIANCE (ONEWAY ANCOVA)

93

The alternative hypothesis is represented in Figure 6.2 as follows:


Ha = Mean of at least one group is
different from the others

Figure 6.2: Alternative Hypothesis

The alternative hypothesis states that there is a difference between the three groups
of students (see Figure 6.2). However, the alternative hypothesis does not state
which groups differ from one another. It just says that the means of each group are
not all the same; or at least one of the groups differs from the others.
Are the means really different? We need to figure out whether the observed
differences in the sample means are attributed to just the natural variability among
sample means or whether there is reason to believe that the three groups of
students have different means in the population. In other words, are the differences
due to chance or there is a 'real' difference.

6.3

BETWEEN GROUP AND WITHIN GROUP


VARIANCE

As mentioned earlier, the researcher was interested in determining whether there


were differences in creativity between students from different socio-economic
backgrounds; i.e. High SES, Middle SES and Low SES. To determine if there are
significant differences between the three means, you have to compute the F-ratio
or F-test. To compute the F-ratio you have to use two types of variances:

Between-Group Variance or the variability between group means.

Within-Group Variance or the variability of the observations (or scores) within


a group (around a particular group's mean)

Copyright Open University Malaysia (OUM)

94

(a)

(b)

TOPIC 6

ONE-WAY ANALYSIS OF VARIANCE (ONEWAY ANCOVA)

Between-Group Variance
The diagram in the previous Figure 6.2 presents the results of the study. Let
us look more closely at the two types of variability or variance. Note that
each of the three groups has a mean which is also known as the
sample mean.

The high SES group has a mean of 4.12 for the creativity test

The middle SES group has a mean of 4.37 for the creativity test

The low SES has a mean of 3.99 for the creativity test

Within-Group Variance
Within group variance or variability is a measure of how many the
observations or scores within a group vary. It is simply the variance of the
observations or scores within a group or sample, and it is used to estimate the
variance within a group in the population. Remember, ANOVA requires the
assumption that all of the groups have the same variance in the population.
Since you do not know if all of the groups have the same mean, you cannot
just calculate the variance for all of the cases together. You must calculate
the variance for each of the groups individually and then combine these into
an "average" variance.
Within-group variance for the example shows that the 313 students within
the high SES group have different scores, the 297 students within the middle
SES group have different scores and the 340 students within the low SES
also have different scores. Among the three groups, there is slightly greater
variability or variance among Low SES subjects (SD = 1.31) compared to
High SES subjects with a SD of 1.28.

6.4

COMPUTING F-STATISTIC

The F-test or the F-ratio is a measure of how different the means are relative to the
variability or variance within each sample. The larger the F value, the greater the
likelihood that the differences between means are due to something other than
chance alone; i.e. real effects or the means are significantly different from one
another.

Copyright Open University Malaysia (OUM)

TOPIC 6

ONE-WAY ANALYSIS OF VARIANCE (ONEWAY ANCOVA)

95

The following is the summarised formula for computing the F-statistic or F-ratio:
F =

Between Mean Square


Within Mean Square

Based on the study (see Table 6.2 for results) about the relationship between
creativity and socio-economic status of the subject, computation of the F-statistics
is as follows:
Table 6.2: Results

Mean
SD
n

=
=
=

High SES
4.12
1.28
313

Middle SES
4.37
1.30
297

Low SES
3.99
1.31
340

Steps For Computing F-Statistics Or F-Ratio:


Step 1: Computation of the Between Sum of Squares (BSS)
The first step is to calculate the variation between groups by comparing the mean
of each SES group with the Mean of the Overall Sample (the mean score on the
test for all students in this sample is 4.00).
BSS = n1 ( x1 x) + n2 (x2 x) + n3 (x3 x)
This measure of between group variance is referred to as "Between Sum of
Squares" (or BSS). This is calculated by adding up (for all the three groups), the
difference between the group's mean and the overall population mean (4.00),
multiplied by the number of cases (i.e. n) in each group.
Between Sum of Squares (BSS) = No. of students x (Mean of Group 1 Overall
Mean) + No. of students x (Mean of Group 2 Overall Mean) + No. of
students x (Mean of Group 3 Overall Mean)
= 313 (4.12 4.00) + 297 (4.37 4.00) + 340 (3.99 4.00)
= 4.51 + 40.66 + 0.034 = 45.21

Copyright Open University Malaysia (OUM)

96

TOPIC 6

ONE-WAY ANALYSIS OF VARIANCE (ONEWAY ANCOVA)

Degrees of freedom:
This sum of squares has a number of degrees of freedom equal to the number
of groups minus 1. In this case, df = (3-1) = 2
Step 2: Computation of the Between Mean Squares (BMS)
Between Mean Squares =

BBS
45.21
=
= 22.61
df
2

Divide the BSS figure (45.21) by the number of degrees of freedom (2) to get
our estimate of the variation between groups, referred to as "Between Mean
Squares".
Step 3: Computation of the Within Sum of Squares (WSS)
To measure the variation within groups, we find the sum of the squared
deviation between scores on the Torrance Creative Test and the group
average, calculating separate measures for each group, and then summing the
group values. This is a sum referred to as the "Within Sum of Squares" (or
WSS).

WSS = ( n1 1) SD1 + ( n2 1) SD2 + ( n3 1 ) SD3


Within Sum of Squares (WSS) = (Degrees of Freedom of Group 1 1) x
SD1 + (Degrees of Freedom of Group 2 1) x SD2 + (Degrees of Freedom
of Group 3 1) x SD3
= (313 1) 1.28 + (297 1) 1.30 + (340 1) 1.31
= 511.18 + 500.24 + 581.76
= 1593.18
Degrees of freedom:
As in Step 1, we need to adjust the WSS to transform it into an estimate of
population variance, an adjustment that involves a value for the number of
degrees of freedom within. To calculate this, we take a value equal to the
number of cases in the total sample (N = 950), minus the number of groups
(k = 3), i.e. 950 - 3 = 947
Step 4: Computation of the Within Mean Squares (WMS)
Divide the WSS figure (1593.18) by the degrees of freedom (N - k = 947) to
get an estimate of the variation within groups referred to as "Within Mean
Squares".

Copyright Open University Malaysia (OUM)

TOPIC 6

ONE-WAY ANALYSIS OF VARIANCE (ONEWAY ANCOVA)

Within Mean Squares =

97

WSS 1593.18
=
= 1.68
df
947

Step 5: Computation of the F-test statistic


This calculation is relatively straightforward. Simply divide the Between
Mean Squares (BMS), the value obtained in step 1, by the Within Mean
Squares (WMS), the value calculated in step 2.

F=

Between Mean Squares


22.61
=
= 13.46
Within Mean Squares
1.68

Step 6: To Reject or Not Reject the Hypothesis


To determine if the F-statistics is sufficiently large to reject the null
hypothesis, you have to determine the critical value for the F-statistics by
referring to the F-distribution. There are two degrees of freedom:

k -1 which is the numerator [i.e. Three groups minus one = 3 1 = 2]

N k which is the denominator [i.e. no. of subjects minus number of


groups = 950 3 = 947

The critical value is 3.070 which is 2 df by 120 df (the distribution


provided in most textbooks has a maximum of 120 df. You use it for any
denominator exceeding 120 df).
Extract from Table of Critical Values for the F-Distribution
df1

96

3.940

3.091

2.699

2.466

97

3.939

3.090

2.698

2.465

98

3.938

3.089

2.697

2.465

99

3.937

3.088

2.696

2.464

100

3.936

3.087

2.696

2.463

120

3.920

3.070

2.680

2.450

df2

Copyright Open University Malaysia (OUM)

98

TOPIC 6

ONE-WAY ANALYSIS OF VARIANCE (ONEWAY ANCOVA)

Finally, compare the F-statistics (13.34) with the critical value 3.07. At p =
0.05, the F-statistics is larger (>) than the critical value and hence there is
strong evidence to reject the null hypothesis, indicating that there is a
significant difference in creativity among the three groups of students. While
the F-statistic assesses the null hypothesis of equal means, it does not address
the question of which means are different. For example, all three groups may
be different significantly, or two may be equal but differ from the third. To
establish which of the three groups are different, you have to follow up with
post-hoc comparison or tests.
Step 7: Post-Hoc Comparisons or Tests
There are many techniques available for post-hoc comparisons and they are
as follows:

Least Square Difference (LSD)

Duncan

Dunnett

Tukeys Honestly Significant Difference (HSD)

Scheffe
Tukey's HSD
Mean1 Mean2 Mean3
Mean1
Mean2
Mean3

Tukey HSD

The Tukey's HSD runs a series of Tukey's post-hoc tests, which are like a
series of t-tests. However, the post-hoc tests are more stringent than the
regular t-tests. It indicates how large an observed difference must be for the
multiple comparison procedure to call it significant. Any absolute difference
between means has to exceed the value of HSD to be statistically significant.
Most statistical programmes will give you an output in the form of a table as
shown above. Group means are listed as a matrix. An asterisk (*) indicates
which pairs of means are significantly different.

Copyright Open University Malaysia (OUM)

TOPIC 6

ONE-WAY ANALYSIS OF VARIANCE (ONEWAY ANCOVA)

99

Note that only the mean of Group 3 is significantly different from Group 1.
In other words, High SES (Mean = 4.12) subject scored significantly higher
on creativity than Low SES (Mean = 3.85) subjects. There was no significant
difference between High SES and Middle SES subjects nor was there a
significant difference between Middle SES and Low SES subjects.

6.5 ASSUMPTIONS FOR USING ONE-WAY ANOVA


Just like all statistical tools, there are certain assumptions that have to be met for
their usage. The following are several assumptions for using the One-way
ANOVA:
(a)

Independent Observations or Subject


Are the observations in each of the groups independent? This means that the
data must be independent. In other words, a particular subject should belong
to only one group. If there are three groups, they should be made up of
separate individuals so that the data are truly independent.

If the same subject belongs to the same group and tested twice, such as in the
case of a pretest and posttest design, you should instead use the Repeated
Measure One-way ANOVA (see Topic 7).
(b)

Simple Random Samples


The samples taken from the population under consideration are randomly
selected (Refer to Topic 1 for random selection techniques).

(c)

Normal Populations
For each population, the variable under consideration is normally distributed
(Refer to Topic 2 for techniques to determine normality of distribution). In
other words, to use the One-way ANOVA you have to ensure that the
distributions for each of the groups are normal. The analysis of variance is
robust if each of the distributions is symmetric or if all the distributions are
skewed in the same direction. This assumption can be tested by running
several normality tests as stated next:

(i)

Normality Tests Using Skewness


Refer to Table 6.3, which shows the means, skewness and kurtosis for
the three groups. The skewness and kurtosis scores indicate that the
scores in Group 1 and Group 2 are normally distributed. There is some
positive skewness in Group 1.

Copyright Open University Malaysia (OUM)

100

TOPIC 6

ONE-WAY ANALYSIS OF VARIANCE (ONEWAY ANCOVA)

Table 6.3: Means, Skewness and Kurtosis for the Three Groups

Group 1

Group 2

Group 3

Independent Variable
Group

Statistic

Std. Error

Mean

43.82

2.20

Skewness

.973

.491

Kurtosis

.341

.953

Mean

60.14

2.71

Skewness

-.235

.597

Kurtosis

-1.066

1.154

Mean

64.75

3.61

Skewness

-.407

.564

Kurtosis

-1.289

1.091

(ii) Normality Tests Using Kolmogorov-Smirnov Statistic

Figure 6.3: Test of normality

The Shapiro-Wilk normality tests indicate that the scores are normally
distributed in each of the three conditions. The Kolmogorov-Statistic is
significant for Group 1, but that statistic is more appropriate for larger
sample sizes. Refer to Figure 6.3.

Copyright Open University Malaysia (OUM)

TOPIC 6

(d)

ONE-WAY ANALYSIS OF VARIANCE (ONEWAY ANCOVA)

101

Homogeneity of Variance

Figure 6.4: Test of homogeneity of variances

Just like the t-test, the Levene's test of homogeneity of variance is used for
the One-way ANOVA and is shown in Figure 6.4. The p-value which is
0.113 is greater than the alpha of 0.05. Hence, it can be concluded that the
variances are homogeneous which is reported as Levene (2, 49) = 2.28, p =
.113.
ACTIVITY 6.2
1.

When would you use One-way ANOVA instead of the t-test to


compare means?

2.

What are the assumptions that must be met when using ANOVA?

6.6 USING SPSS TO COMPUTE ONE-WAY ANOVA


In the COPs study in 2006, a team of researchers administered an Inductive
Reasoning Test to a sample of 946 18-year-old Malaysians. One of the
independent variables examined was socio-economics status (SES). There were
four SES groups: Very High SES, High SES, Middle SES and Low SES.
Researchers were interested in answering the following research question:
Is there a significant difference in inductive reasoning ability between adolescents
of different socio-economic status?
Null Hypothesis:

Ho: 1 =

Alternative Hypothesis:

Ho:

3 = 4

Mean of at least are group is different


from the others

Copyright Open University Malaysia (OUM)

102

TOPIC 6

ONE-WAY ANALYSIS OF VARIANCE (ONEWAY ANCOVA)

Procedure for the One-way ANOVA with post-hoc analysis Using SPSS
1.
Select the Analyze menu.
2.
Click Compare Means and One-Way ANOVA ..... to open the One-Way
ANOVA dialogue box.
3.
Select the dependent variable (i.e. inductive reasoning) and click the arrow
button to move the variable into the Dependent List box.
4. Select the independent variable (i.e SES) and click the arrow button to move
the variable into the Factor box.
5. Click the Options ..... command push button to open the One-Way
ANOVA: Options sub-dialogue box.
6. Click the check boxes for Descriptive and Homogeneity-of-variance.
7. Click Continue.
8. Click the Post Hoc .... command push button to open the One-Way
ANOVA: Post Hoc Multiple Comparisons sub-dialogue box. You will
notice that a number of multiple comparison options are available. In this
example you will use the Tukey's HSD multiple comparison test.
9. Click the check box for Tukey.
10. Click Continue and then OK.

(a)

Testing for Homogeneity of Variance

Before you conduct the One-way ANOVA, you have to make sure that your
data meet the relevant assumptions of using One-way ANOVA. Lets first
look at the test of homogeneity of variances, since satisfying this assumption
is necessary for interpreting ANOVA results.
Levenes test for homogeneity of variances assesses whether the population
variances for the groups are significantly different from each other. The null
hypothesis states that the population variances are equal.
The following Figure 6.5 shows the SPSS output for the Levene's test. Note
that the Levene F-statistic has a value of 0.383 and a p-value of 0.765. Since
p is greater than = 0.05 (i.e. 0.765 > 0.05); we do not reject the null
hypothesis. Hence, we can conclude that the data does not violate the
homogeneity-of-variance assumption.

Copyright Open University Malaysia (OUM)

TOPIC 6

ONE-WAY ANALYSIS OF VARIANCE (ONEWAY ANCOVA)

103

Figure 6.5: SPSS output for the Levene's Test

(b)

Means and Standard Deviations

Another SPSS output is the "Descriptives" table which presents the means
and standard deviations of each group (see Figure 6.6). You will notice that
the means are not all the same. However, this relatively simple conclusion
actually raises more questions. See if you can answer these questions in
Figure 6.6.

Figure 6.6: "Descriptives" Table

As you may have realised, just by looking at the Descriptives table, the
group means cannot tell us decisively if significant differences exist. What is
the next step?
(c)

Significant Differences

Having concluded that the assumption of homogeneity of variance has been,


the means and standard deviations of each of the four groups have been
computed; the next step is to determine whether SES influences inductive
reasoning. You are seeking to establish whether the four means are 'equal'.
Look at Figure 6.7.

Copyright Open University Malaysia (OUM)

104

TOPIC 6

ONE-WAY ANALYSIS OF VARIANCE (ONEWAY ANCOVA)

Figure 6.7: Significant differences

What does the table in Figure 6.7 indicates?

(d)

The Between groups row shows that the df is 3 (i.e. k 1 = 4 1 = 3)


and the mean square is 33.445.

The Within groups row shows that the df is 942 (N k = 946 4 =


942) and the mean square is 11.072.

If you divide 33.445 by 11.072 you will get the F value of 3.021 which is
significant at 0.029.

Since, 0.029 is < than = 0.05, we can reject the Null Hypothesis and
accept the alternative hypothesis. You can conclude that there is a
significant difference in inductive reasoning between the four SES
groups. But which group?

Multiple Comparisons

Having obtained a significant result, you can go further and determine using
a post-hoc test, where the significance lies. There are many different kinds of
post-hoc tests, that examine which means are different from each other. One
commonly used procedure is Tukeys HSD test. The Tukey test compares all
pairs of group means and the results are shown in the Multiple
Comparisons table in Figure 6.8.
Dependent Variable: Inductive Reasoning Ability
Tukey HSD

Copyright Open University Malaysia (OUM)

TOPIC 6

ONE-WAY ANALYSIS OF VARIANCE (ONEWAY ANCOVA)

105

Figure 6.8: Multiple comparisons table

Note that each mean is compared to every other mean thrice so the results are
essentially repeated in the table. Interpreting the table reveals that:
There is a significant difference only between Low SES subjects (Mean =
8.01) and Very High SES subjects (Mean = 8.49) at p = 0.047. i.e. Very
High SES scored significantly higher than Low SES at p = 0.047.
However, there are no significant differences between the other groups.

Copyright Open University Malaysia (OUM)

106

TOPIC 6

ONE-WAY ANALYSIS OF VARIANCE (ONEWAY ANCOVA)

ACTIVITY 6.3

A study was conducted to determine the effectiveness of the collaborative


method in teaching primary school mathematics among pupils of varying
ability levels. The performance of 18 pupils on a mathematics posttest is
presented in Table 6.4 below.
Table 6.4: The Performance of 18 Pupils on a Mathematics Posttest
Low Ability Pupils

Middle Ability Pupils

High Ability Pupils

45

55

59

58

42

54

61

41

62

59

48

57

49

36

48

63

44

65

Based on the output, answer the following questions:


(a) Comment on the mean and standard deviation for the three groups.
(b) Is there a significant difference in the mathematics performance
between pupils of different ability levels?
(c) What is the p-value?
(d) What is the F-ratio or F-value?
(e) Interpret the Tukey HSD.

Copyright Open University Malaysia (OUM)

TOPIC 6

ONE-WAY ANALYSIS OF VARIANCE (ONEWAY ANCOVA)

107

ACTIVITY 6.4

A researcher conducted a study to assess the level of knowledge possessed


by university students of their rights and responsibilities as citizens.
Students completed a standardised test. The students major was also
recorded. Data in terms of percentages were recorded below for 32
students. Compute the One-way Anova test for the data provided in Table
6.5 as follows:
Table 6.5: Data

Education

Business/
Management

Social
Science

Computer
Science

62

42

80

81

49

52

57

75

63

31

87

58

68

80

64

67

39

22

28

48

79

71

29

26

40

68

62

36

15

76

45

Based on the output, answer the following questions:


1.

What is your computed answer?

2.

What would be the null hypothesis in this study?

3.

What would be the alternate hypothesis?

4.

What probability level did you choose and why?

5.

What were your degrees of freedom?

6.

Is there a significant difference between the four testing conditions?


Interpret your answer.

7.

If you have made an error, would it be a Type I or a Type II error?


Explain your answer.

Copyright Open University Malaysia (OUM)

108

TOPIC 6

ONE-WAY ANALYSIS OF VARIANCE (ONEWAY ANCOVA)

The one-way ANOVA is used to compare the differences between more than
two groups of samples from unrelated populations.

Even though ANOVA is used to compare the mean, this test uses the variance
in computing the test statistics.

This test requires large, other assumptions needed are normal distribution of
the population parameter, variables measures at least at interval levels, and
equality of variance between the groups.
Test Statistics: F

Between Mean Squares


Within Mean Squares

Between group variances are due to the differences between the groups (could
be due to different treatment etc.), while within group variances are due to
sampling (the differences among the members of the same group).

Technically, for any comparison between groups, the between group variance
should be large simply because they are different groups while within the
group itself the variances should be low (assuming the members are
homogenous).

The F-statistics are based on the premise that if different treatments have
different effects (or different groups respond differently due to their inherited
differences), the between group variance is large while the within group
variance (also called the residual variance) is low. If there is any difference
between the groups, the F-value will be high, causing the null hypothesis to be
rejected.

Analysis of variance
F-test
Between group variance
Within group variance

Sum of squares
Between mean squares
Within mean squares
Post-hoc comparisons

Copyright Open University Malaysia (OUM)

Topic Analysis of

Covariance
(ANCOVA)

LEARNING OUTCOMES
By the end of this topic, you should be able to:
1. Define Analysis of Covariance (ANCOVA);
2. Explain the logic of ANCOVA;
3. Identify the assumptions for using ANCOVA;
4. Compute ANCOVA using SPSS; and
5. Interpret ANCOVA using SPSS.

INTRODUCTION

This topic explains what analysis of covariance (ANCOVA) is about and the
assumptions for using it in hypothesis testing. It also demonstrates how to
compute and interpret ANCOVA using SPSS.

7.1

WHAT IS ANALYSIS OF COVARIANCE


(ANCOVA)?

The Analysis of Covariance or ANCOVA, is a powerful statistical procedure that


is used in educational research to remove the effects of pre-existing individual
differences among subjects in a study. Due to sampling error, the two (or more
than two) groups that you are comparing do not start on an equal footing with
respect to one or more factors. Examples of such factors are relevant prior
knowledge, motivation, self-regulation, self-efficacy and intelligence.
Copyright Open University Malaysia (OUM)

110

TOPIC 7 ANALYSIS OF COVARIANCE (ANCOVA)

For instance, when a researcher wants to compare the effectiveness of two


teaching methods, he is most concerned about prior knowledge that students
bring with them that is relevant before the experiment begins. For example, it
could happen through mere chance coincidence in the sorting process that
students in either the lecture method or discussion method group start out with
more prior knowledge about the subject or content that they are studying that is
relevant to the experiment.
Besides prior knowledge, other factors that could complicate the situation include
level of intelligence, attitude, motivation and self-efficacy. The Analysis of
Covariance (ANCOVA) provides a way of measuring and removing the effects of
such initial systematic differences between groups or samples.
EXAMPLE:
A researcher conducted a study with the aim of comparing the effectiveness of the
lecture method and the discussion method in teaching geography (see
Figure 7.1). One group received instruction using the lecture method and another
group received instruction using the discussion method.
For illustration purposes, only four students were randomly assigned to the two
groups (in real-life research, you will certainly have more subjects). The result is
two sets of bivariate measures, one set for each group.

Figure 7.1: Lecture method group versus Discussion method group

Copyright Open University Malaysia (OUM)

TOPIC 7 ANALYSIS OF COVARIANCE (ANCOVA)

111

The data in Figure 7.2 explains the following features of covariance:

Firstly, there is a considerable range of individual differences within each


group for both attitude scores and geography scores. For example,
student #1 obtained 66, while student #2 obtained 85 for the geography test
in the lecture method group and so forth.
Secondly, there is a strong positive correlation between attitude and
geography scores for both the groups (i.e. 0.95 for the lecture method
group and 0.92 for the discussion method group). In other words, it is not
surprising that the more positive the attitude towards geography, the more
likely it is that a student does well in the subject regardless of the method of
instruction.

The high correlation also means that a large portion of the variance found in
the geography test is actually contributed from the covariable or covariate
'Attitude' and would show as measurements of error.

What should you do? You should remove the covariance from the
geography test thereby removing a substantial portion of the extraneous
variance of individual differences; i.e. you want to "subtract out" or
"remove" Attitude scores and you will be left with the "residual" (it is what
is left over). When you subtract, you have reduced geography scores
variability or variance while maintaining the group difference.

Put it another way, you use ANCOVA to "reduce noise" to produce a more
efficient and powerful estimate of the treatment effect. In other words, you
adjust geography scores for variability on the covariate (attitude scores).

As a rule, you should select a covariable or covariate (in this case, it is


'attitude') that is highly correlated to the dependent or outcome variable
(i.e. geography scores).

If you have two or more covariables or covariates, make sure that among
themselves there is little intercorrelation (otherwise you are introducing
redundant covariates and end up losing precision). For example, you surely
would not want to use both family income and father's occupation as
covariates because it is likely that they are both highly correlated.

Copyright Open University Malaysia (OUM)

112

TOPIC 7 ANALYSIS OF COVARIANCE (ANCOVA)

Figure 7.2: Data explaining features of covariance

7.2

ASSUMPTIONS FOR USING ANCOVA

There are a number of assumptions that underlie the analysis of covariance. Most
of the assumptions apply to One-way ANOVA, with the addition of two more
assumptions. As stated by Coakes and Steed (2000), the assumptions are:
(a)

Normality: The dependent variable should have a normal distribution for


participants with the same score on the covariate and in the same group. You
want to obtain normality at each score on the covariate. If the scores for the
covariate alone are normally distributed, then ANCOVA is robust to this
assumption.

(b)

Linearity: A linear relationship should exist between the dependent variable


and covariate for each group. This can be verified by inspecting scatter plots
for each group. If you have more than one covariate, they should not be
substantially correlated with each other. If they are, they do not add
significantly to reduction of error.

(c)

Independence: Each subjects scores on the dependent variable and the


covariate should be independent of those scores for all the other subjects.

(d)

Homogeneity of Variance: Like ANOVA, ANCOVA assumes


homogeneity of variance. In other words, the variance of Group 1 is equal to
the Variance of Group 2 and so on.

(e)

Homogeneity of Regression: ANCOVA assumes homogeneity of


regression exists. That is, the correlation between the dependent variable and
the covariate in each group should be the same. In other words, the
regression lines (or slopes) of each plot should be similar (i.e. parallel)
across groups. The hypothesis tested is that the slopes do not differ from
each other.

Copyright Open University Malaysia (OUM)

TOPIC 7 ANALYSIS OF COVARIANCE (ANCOVA)

(f)

113

Reliability of the Covariate: The instrument used to measure the covariate


should be reliable. In the case of variables such as gender and age, this
assumption can usually be easily met. However, with other types of
variables such as self-efficacy, attitudes, personality, etc., meeting this
assumption can be more difficult.

What is the Homogeneity of Regression Assumption?


One of the assumptions for using ANCOVA is the homogeneity of regression,
which means that the slopes of the regression line should be parallel for the
groups studied. Imagine a case where there are three groups of people we wish to
test the hypothesis; that the higher the Qualification, the higher the Knowledge of
Current Events. It may be the general belief that knowledge of current events is
associated with qualification level. Can you think of other variables that might be
related to the dependent variable (Knowledge of Current Events)? We will select
one covariate, i.e. Age. Assume that age is positively related to Knowledge of
Current Events.

Figure 7.3: Regression lines for the three groups

Look at the graph in Figure 7.3, which shows regression lines for each group
separately. Look to see how each group differs on mean age. The Graduates, for
instance have a mean age of 38, their score on knowledge of current events is 14;
while the mean age for the Diploma holders is 45 and their score on knowledge of
current events is 12.5. The mean for the subjects with High school qualifications
is 50 and their score on the knowledge of current events test is 11.5. What does
this tell you? It is probably obvious to you that part of the differences in
knowledge of current events is due to the groups having a different mean age.
So you decide to include Age as a covariate and use ANCOVA.
Copyright Open University Malaysia (OUM)

114

TOPIC 7 ANALYSIS OF COVARIANCE (ANCOVA)

(a)

ANCOVA reduces the error variance by removing the variance due to the
relationship between age (covariate) and the dependent variable (knowledge
of current events).

(b)

ANCOVA adjusts the means on the covariate for all of the groups,
leading to the adjustment of the means of the dependent variable
(knowledge of current events).

In other words, what ANCOVA does is to answer the question:


What would the means for the three groups be on knowledge of current events
(y or DV) if the means of the three groups for age (x or covariate) were all the
same?
ANCOVA adjusts the knowledge of current events means (y means) to what they
would be if the three groups had the same mean on age (x or covariate).
While ANOVA uses the real means of each group to determine if the
differences are significant, ANCOVA uses the Grand Mean. The grand mean is
the mean of each group divided by the number of groups (i.e. 38 + 45 + 50
divided by 3 = 44). Now, we can see how far each mean is from the grand mean.
So for the graduates groups, ANCOVA does not use the mean age of 38, in order
to find the mean knowledge of current events. Instead, it gives an estimate of what
the mean of knowledge of current events would be, if age were held constant (i.e.
the mean ages of the groups were the same which in this case is 44).
Hence, you have to ensure that the regression slopes for each group are parallel. If
the slopes are not parallel, using a procedure that adjusts the means of the groups
to an average (the grand mean) does not make sense. Is it possible to have a
sensible grand mean, from three very different slopes as shown in Figure 7.4? The
answer is NO because the differences between the groups are not the same, for
each value of the covariate. So, in this case, the use of ANCOVA would not be
sensible.

Copyright Open University Malaysia (OUM)

TOPIC 7 ANALYSIS OF COVARIANCE (ANCOVA)

115

Figure 7.4: Regression lines for the three groups

SPSS PROCEDURES TO OBTAIN SCATTER DIAGRAM and


REGRESSION LINE FOR EACH GROUP
Select Graphs, then Scatter. If you are using SPSS 16, then it is Graphs
Legacy Dialog and then Scatter/Dot.
Make sure Simple is selected, and then choose Define.
Move the dependent variable (i.e. Knowledge of current events) to the YAxis.
Move the independent variable (i.e. Qualification level) to the X-Axis.
Move the grouping variable to Set Markers box.
Click OK.
[Note that this will give you the scatter diagram of all the groups together]
Once you have done the above, double-click on the Graph which opens up
the SPSS Chart Editor.
Choose Chart and Options which opens the Scatter Plot Options.
Check on the Subgroups box.
Click on Fit Options button which opens the Fit Line dialogue box.
Click on Linear Regression and ensure the box is highlighted.
In Regression Prediction, check the Mean box.
Click on Continue, then OK.
[This will give you the regression line of each of the groups separately]

Copyright Open University Malaysia (OUM)

116

TOPIC 7 ANALYSIS OF COVARIANCE (ANCOVA)

7.3

USING ANCOVA PRETEST-POSTTEST


DESIGN

One of the most common designs in which ANCOVA is used is in the pretestposttest design. This consists of a test given BEFORE an experimental condition
is carried out, followed by the same test AFTER the experimental condition. In
this case, the pretest score is used as a covariate. In the pretest-posttest design, the
researcher seeks to partial out (remove or hold constant) the effect of the pretest,
in order to focus on possible changes following the intervention or treatment.
A researcher wanted to find out if the critical thinking skills of students can be
improved using the inquiry method when teaching science. A sample of 30
students were selected and divided into the following groups: 13 high ability
subjects, 8 average ability subjects and 13 low ability subjects. A 10-item critical
thinking test was developed by the researcher and administered before the
intervention and after the intervention.

7.3.1

Before Including a Covariate

A One-way ANOVA was conducted on the data and the results are shown in
Table 7.1 as follows.
Table 7.1: Test of Homogeneity of Variance
Levene Statistics

df1

df2

Sig.

.711

27

.500

The homogeneity of variance table (Table 7.1) indicates that the variances of the
three groups are similar and the null hypothesis is rejected as the p value is 0.500
is more than the p value of .05. Hence, you have not violated one of the
assumptions for using ANOVA.
Table 7.2: Means and Standard Deviations
Ability

Mean

Std. Deviation

Low

3.22

1.78

Average

4.87

1.45

High

13

4.84

2.11

Total

30

4.37

1.95

Copyright Open University Malaysia (OUM)

TOPIC 7 ANALYSIS OF COVARIANCE (ANCOVA)

117

Table 7.2 shows the means and standard deviations for the three groups of
subjects low, average and high ability. Although the high ability group subjects
scored 4.84 and low ability subjects scored only 3.22; the difference between the
ability levels is not significant. Therefore, teaching students using the inquiry
method seems to have no significant effect on critical thinking.
Table 7.3: ANOVA Table
Dependent Variable: Critical Thinking
Sum of
Squares

df

Mean
Square

Sig.

Corrected Model

16.844a

8.422

2.416

.108

Intercept

535.184

535.184

153.522

.000

Between Groups

16.844

8.422

2.416

.108

Within Groups

94.123

27

3.486

Total

583.000

30

Corrected Total

110.967

29

A R Squared 1.52 (Adjusted R Square = 0.89)


Since the p-value reported is .108 which is more than the p-value of .05, the
Tukeys post hoc comparison test revealed no significant differences between the
three groups of students. Therefore, it is concluded that teaching science using the
inquiry method seems to have no significant effect on critical thinking.

7.3.2

After Including a Covariate

The same critical thinking test was administered before the commencement of the
experiment which served as the pretest. What happens when the scores of the
pretest are included in the model as a covariate?
See the ANOVA table with the covariate included. Compare this to the ANOVA
table when the covariate was not included. The format of the ANOVA table is
largely the same as without the covariate (see Table 7.4), except that there is an
additional row of information about the covariate (pretest).

Copyright Open University Malaysia (OUM)

118

TOPIC 7 ANALYSIS OF COVARIANCE (ANCOVA)

Table 7.4: ANOVA Table

Dependent Variable: Critical Thinking


Sum III Sum
of Squares

df

Mean
Square

Sig

Corrected
Model

31.920

10.640

3.500

0.030

Intercept

76.069

76.069

25.020

0.000

PRETEST

15.076

15.076

4.959

0.035

Between Group

25.185

12.593

4.142

0.037

Within Groups

79.047

26

3.040

Total

683.000

30

Corrected Total

110.967

29

Source

Table 7.5: Adjusted Means and Standard Errors


Ability

Mean

Std. Error

Low

2.92

.59

Average

4.71

.62

High

13

5.15

.50

Table 7.6: Pairwise Comparisons


Low
Low
Average
High

Average

High
*

* Significant at p = .05
Looking first at the significance values, it is clear that the covariate (i.e. pretest)
significantly influenced the dependent variable (i.e. posttest), because the
significance values are less than .05. Therefore, performance in the pretest had a
significant influence on the posttest. What is more interesting is that when the
effect of the pretest is removed, teaching science using the inquiry method
becomes significant (p is .037 which is less than .05). There was a significant
effect of the inquiry method of teaching on critical thinking after controlling
for the effect of the pretest, F(2,26) = 4.14, p <.05.
Table 7.5 shows the adjusted means (The Sidak test was used to obtain the
adjusted means). These values should be compared with Table 7.2 to see the
Copyright Open University Malaysia (OUM)

TOPIC 7 ANALYSIS OF COVARIANCE (ANCOVA)

119

effect of the covariate on the means of the three groups. The results show that low
ability subjects differed significantly from high ability subjects on the critical
thinking test (see Table 7.6). However, there were significant differences between
average and high ability subjects.
CONCLUSION
This example illustrates how ANCOVA can help us exert stricter experimental
control by taking into account confounding variables to give us a purer measure
of the effect of the experimental manipulation. Without taking into account the
pretest, we would have concluded that the inquiry method of teaching science had
no effect on critical thinking of subjects, yet clearly it does.
SPSS PROCEDURES TO
COVARIANCE (ANCOVA)

CONDUCT

AN

ANALYSIS

OF

Select the Analyze menu.

Click on the General Linear Model and then Univariate to open


the Univariate dialogue box.

Select the dependent variable (e.g. geography test) and click on the
arrow button to move the variable into the Dependent Variable: box

Select the independent variable (e.g. treatment), and click on the


arrow button to move the variable into the Fixed Factor(s): box.

Select the covariate (e.g. attitude) and click on the arrow button to
move the variable into the Covariates(s): box

Click on the Options command push button to open the


Univariate: Options sub-dialogue box.

In the Display box click on the Descriptive statistics, Estimates of


effect size, Observed power and Homogeneity tests check boxes.

Click on Continue and then OK.

Copyright Open University Malaysia (OUM)

120

TOPIC 7 ANALYSIS OF COVARIANCE (ANCOVA)

ACTIVITY 7.1
A researcher conducted a study on the memory of four groups of people
of different age groups. Since memory may be related to IQ, the
researcher decided to control it.
1. What is the covariate?
2. What would his analysis show?
3. State a hypothesis for the study.

ACTIVITY 7.2
Refer to the following Table 7.7, which is an SPSS output and answer
the following questions:
1. State the independent variable. Give reasons.
2. Which is the covariate? Explain.
3. State the dependent variable. Give reasons.
4. State a hypothesis for the above results.
5. Do you reject or do not reject the hypothesis stated above?
Table 7.7: SPSS Output
Dependent Variable: Reaction Time
Source
Corrected
Model
Intercept
Age
Group
Error
Total
Corrected
Total

Sum III Sum


of Squares

df

Mean
Square

Sig.

76.252

25.417

36470

.064

4.792
4.252
41.974
55.748
1860.000

1
1
2
8
12

4.792
4.252
20.987
6.969

.688
.610
3.012

.431
.457
.106

132.000

11

Copyright Open University Malaysia (OUM)

TOPIC 7 ANALYSIS OF COVARIANCE (ANCOVA)

121

The Analysis of Covariance or often referred to as ANCOVA is a powerful


statistical procedure that is used in educational research to remove the effects
of pre-existing individual differences among subjects in a study.

ANCOVA provides a way of measuring and removing the effects of such


initial systematic differences between groups or samples.

It is a parametric procedure that requires the following assumption to be met:


(i) Normality, (ii) Linearity, (iii) Independence (iv) Homogeneity of Variance,
(v) Homogeneity of Regression and (vi) Reliability of the Covariate.

Covariate
Homogeneity of regression
Homogeneity of variance
Independence

Linearity
Normality
Reliability of the covariate

Copyright Open University Malaysia (OUM)

Topic Correlation

LEARNING OUTCOMES
By the end of this topic, you should be able to:
1.

Explain the concept of relationship between variables;

2.

Discuss the use of the statistical tests to determine correlation; and

3.

Interpret SPSS outputs on correlation tests.

INTRODUCTION

This topic explains the concept of causal relationship between variables. It


discusses the use of statistical tests to determine slope, intercept and the
regression equation. It also demonstrates how to run regression analysis using
SPSS and interpret the results.

8.1

WHAT IS A CORRELATION COEFFICIENT?

Researchers are often concerned with the way two variables relate to each other
for given groups of persons such as students in schools and workers in a factory or
office. For example, do students who have higher scores in mathematics also have
higher scores in science? Is there a relationship between a person's self-esteem
and his personality? Is there a relationship between attitudes towards reading and
the number of books read? Is there a relationship between years of experience as a
teacher and attitudes towards teaching?
The correlation coefficient is a number between 0 and 1. If there is no
relationship between the values, the correlation coefficient is 0 or very low. As
the strength of the relationship between the values increases, so does the
correlation coefficient. Thus, the higher the correlation coefficient, the better is
the relationship.
Copyright Open University Malaysia (OUM)

TOPIC 8

8.2

CORRELATION

123

PEARSON PRODUCT-MOMENT
CORRELATION COEFFICIENT

Pearson's product-moment correlation coefficient (also known as Pearson r),


usually denoted by r, is one example of a correlation coefficient. It is a measure of
the linear association between two variables that have been measured on interval
or ratio scales, such as the relationship between the amount of education and
income levels. If there is a relationship between the amount of education and
income levels, the two variables co-vary.
(a)

(b)

Assumptions Testing
Correlational analysis has the following underlying assumptions: (S. Coakes
and L. Steed, 2002, SPSS Analysis Without Anguish. Brisbane: John Wiley
& Sons)

Related Pairs the data to be collected from related pairs: i.e. if you
obtain a score on an X variable, there must be a score on the Y variable
from the same subject.

Scale of Measurement data should be interval or ratio in nature.

Normality the scores for each variable should be normally distributed.

Linearity the relationship between the two variables must be linear.

Homogeneity of Variance the variability in scores for one variable is


roughly the same at all values of the other variable; i.e. it is concerned
with how the scores cluster uniformly about the regression line.

Strength of the Correlation


The strength of a relationship is indicated by the size of the correlation
coefficient: the larger the correlation, the stronger the relationship. A strong
relationship exists where cases in one category of the X variable usually
have a particular value on the Y variable, while those in a different value of
X have a different value on Y.
For example, if people who exercise regularly nearly always have better
health than those who do not exercise, then exercise and health are more
strongly correlated. If those who exercise regularly are just a little more
likely to be healthy than non-exercisers then the two variables are only
weakly related. The scale in Figure 8.1 as follows shows the strength of the
correlation coefficient.

Copyright Open University Malaysia (OUM)

124

TOPIC 8

CORRELATION

Trivial

Low to
Moderate

Moderate
to
Substantial

Substantial
to Very
Strong

Very
Strong

Near
Perfect

0.01-0.09

0.10-0.29

0.30-0.49

0.50-0.69

0.70-0.89

> 0.90

Figure 8.1: The Strength of the Correlation Coefficient

How high does a correlation coefficient have to be, to be called strong? How
small is a weak correlation? The answer to these questions varies with the
variables being studied. For example, if the literature shows that in previous
research, a correlation of 0.51 was found between variable X and variable Y, but
in your study you obtained a correlation of 0.60; then you might conclude that the
correlation between variable X and Y is strong.
However, Cohen (1988) has provided some guidelines to determine the strength
of the relationship between two variables by providing descriptors for the
coefficients. Keep in mind that in education and psychology, it is rare that the
coefficients will be very strong or near perfect since the variables measured
are constructs involving human characteristics, which are subject to wide
variation.
Example:
Data was gathered for the following two variables (IQ test and science test) from a
sample of 12 students. Refer to Table 8.1 below.
Table 8.1: Data of Two Variables (IQ Test and Science Test)

Student No.

IQ Test

Science Test

1
2
3
4
5
6
7
8
9
10
11
12

(X)
120
112
110
120
103
126
113
114
106
108
128
109

(Y)
31
25
19
24
17
28
18
20
16
15
27
19

Each unit or student is represented by a point on the scatter diagram (see the
following Figure 8.2). A dot is placed for each student at the point of
intersection of a straight line drawn through his IQ score perpendicular to the
X-axis and through his science score perpendicular to the Y-axis. For
Copyright Open University Malaysia (OUM)

TOPIC 8

CORRELATION

125

example, a student who obtained an IQ score of 120 also obtained a science


score of 24. The intersection between these lines is represented by the dot 'A'.

The scatter diagram shows a moderate positive relationship between IQ scores


and science scores. However, we do not have a summarised measure of this
relationship. There is a need for a more precise measure to describe the
relationship between the two variables. You need a numerical descriptive
measure of the correlation between IQ scores and science scores, which will
be discussed later.

Figure 8.2: Scatter Diagram Showing the Relationship between IQ Scores (X-axis)
and Science Score (Y-axis) for 12 Students

8.2.1

Range of Values of rxy

Note that rxy can never take on a value less than 1 nor a value greater than + 1 (r
refers to the correlation coefficient, x the X-axis and y the Y-axis). The following
are three graphs showing various values of rxy and the type of linear relationship
that exists between X and Y for the given values of rxy.
(a)

Positive Correlation
Value of rxy = + 1.00 = Perfect and Direct Relationship.

Copyright Open University Malaysia (OUM)

126

TOPIC 8

CORRELATION

Figure 8.3: Perfect Correlation

See Figure 8.3. If Attitudes (x) and English Achievement (y) had a positive
relationship then the Slope (1) will be a positive number. Lines with positive
slopes go from the bottom left toward the upper right, i.e. an increase from 1 to 2
on the X-axis is followed by an increase from 3 to 3.5 on the Y-axis.
(b) Negative Correlation
Value of rxy = 1.00 = Perfect Inverse Relationship.

Figure 8.4: Negative Correlation


Copyright Open University Malaysia (OUM)

TOPIC 8

CORRELATION

127

See Figure 8.4. If Attitudes (x) and English Achievement (y) have a negative
relationship than the Slope (1) will be a negative number. Lines with
negative slopes go from the upper right to the lower left. The above graph
has a slope of 1. An increase of 1 on the X-axis is associated with a
decrease of 0.5 on the Y-axis; i.e an increase from 1 to 2 on the X-axis is
followed by a decrease from 5 to 4.5 on the Y-axis.
(c)

Zero Correlation
Value of rxy = .00 = No Relationship.

Figure 8.5: No Correlation

If Attitudes (x) and English Achievement (y) have zero relationship (as shown in
Figure 8.5) than there is NO SYSTEMATIC RELATIONSHIP between X and Y.
Here, some students with high Attitude scores have positive low English scores,
while some students who have low Attitude scores have high positive English
scores.

8.3

CALCULATION OF THE PEARSON


CORRELATION COEFFICIENT (r OR rXY)

A researcher conducted a study to determine the relationship between verbal and


spatial ability. She was interested in finding out whether students who scored high
on verbal ability also scored high on spatial ability. She administered two 15-item
tests measuring verbal and spatial ability to a sample of 12 primary school pupils.
The results of the study are shown in Table 8.2 as follows.
Copyright Open University Malaysia (OUM)

128

TOPIC 8

CORRELATION

Table 8.2: Results of the Study

Verbal
Test
x
Seng Huat
Fauzul
Shalini
Tajang
Sheela
Kumar
Mei Ling
Azlina
Ganesh
Ahmad
Kong Beng
Ningkan

13
10
12
14
10
12
13
9
14
11
8
9
x = 135

Spatial
Test
y

xy

7
6
9
10
7
11
12
10
13
12
9
8

169
100
144
196
100
144
169
81
196
122
64
81

49
36
81
100
49
122
144
100
169
144
81
64

91
60
108
140
70
132
156
90
182
132
72
72

x = 1566

y =1139 xy =1305

y = 114

Illustration of the Calculation of Correlation Coefficient (r or rxy) for the


Data in Table 8.2
The Pearson Correlation Coefficient (also called the Pearson r) is the commonly
used formula in computing the correlation between two variables. The formula
measures the strength and direction of a linear relationship between variable X
and variable Y. The sample correlation coefficient is denoted by r. The formula
for the sample correlation coefficient is:

r
( X 2

X Y
XY N
( X)
( Y)
)( Y

2

Copyright Open University Malaysia (OUM)

TOPIC 8

SSxy xy
SSxx x 2

( x)( y )
n
( x) 2

1303

CORRELATION

129

(135)(114) 15390

22.50
12
12

1566

(135) 2
12

1139

(114) 2
12

n
18225
1566
1566 1518.75 47.25
12

SSyy y 2

( y ) 2

n
12996
1139
1139 1083 56.00
12
12

Using the formula to Obtain the Correlation Coefficient:

r
( X 2

X Y
XY N
( X )
( Y )

)( Y
2

22.50
(47.50)(56.00)

22.50
0.436
51.58

8.4

PEARSON PRODUCT-MOMENT
CORRELATION USING SPSS

A study was conducted to determine the relationship between reading ability and
performance in science. A Reading Ability and Science test was administered to
200 lower secondary students. The Pearson product-moment correlation was used
to determine the significance of the relationship. The steps for using SPSS are as
follows:

Copyright Open University Malaysia (OUM)

130

TOPIC 8

CORRELATION

SPSS Procedures:

1. Select the Analyze menu.


2. Click on Correlate and then Bivariate to open the
Bivariate Correlations dialogue box.

3. Select the variables you require (i.e. reading and science) and
click on the arrow button to move the variables into the
Variables: box.

4. Ensure that the Pearson correlation option has been


selected.
5. In the Test of Significance box, select the One-tailed radio
button.
6. Click on OK.

8.4.1

SPSS Output

Refer to Figure 8.6.

Figure 8.6: SPSS Output

To interpret the correlation coefficient, you examine the coefficient and its
associated significance value (p). The output show that the relationship between
reading and science scores is significant with a correlation coefficient of r = 0.63
which is p < .05. Thus, higher reading scores are associated with higher scores in
science.

Copyright Open University Malaysia (OUM)

TOPIC 8

8.4.2

CORRELATION

131

Significance of the Correlation Coefficient

We introduced Pearson correlation as a measure of the strength of a relationship


between two variables. But any relationship should be assessed for its significance
as well as its strength. The significance of the relationship is expressed in
probability levels: p (e.g., significant at p =.05). This tells how unlikely a given
correlation coefficient, r, will occur given there is no relationship in the
population. It assumes that you have a sample of cases from a population. The
question is whether your observed statistic for the sample is likely to be observed
given some assumption of the corresponding population parameter. If your
observed statistic does not exactly match the population parameter, perhaps the
difference is due to sampling error.
To be useful, a correlation coefficient needs to be accompanied by a test of
statistical significance. It is also important for you to know about the sample size.
Generally, a strong correlation in a small population may be statistically nonsignificant, while a much weaker correlation in a large sample may be statistically
significant. For example, in a large sample, even low correlations (as low as 0.06)
can be statistically significant. Similar sized correlations that are statistically
significant with large samples are not significant for the smaller samples. This is
because with smaller samples the likelihood of sampling error is higher.

8.4.3

Hypothesis Testing for Significant Correlation

The null hypothesis (Ho:) states that the correlation between X and Y is = 0.0.
What is the probability that the correlation obtained in the sample came from a
population where the parameter = 0.0? The t-test for the significance of a
correlation coefficient is used. Note that the correlation between reading and
science (r = 0.630) is significant at p < 0.05.
Hence, the null hypothesis is REJECTED which affirms that the two variables are
positively related in the population.
Coefficient of Determination:
r = The correlation between X and Y = 0.630 and r = The coefficient of
determination = (0.630) = 0.3969.
Hence, 39.6% of the variance in Y can be explained by X.

Copyright Open University Malaysia (OUM)

132

TOPIC 8

8.4.4

CORRELATION

To Obtain a Scatter Plot using SPSS

SPSS Output
SPSS Procedures:
1.

Select the Graph menu.

2.

Click on Scatter to open the Scatterplot dialogue box.

3.

Ensure Simple Scatterplot option is selected.

4.

Click on the Define command push button to open the Simple


Scatterplot sub-dialogue box.

5.

Select the first variable (i.e. science) and click on the arrow button to
move the variable into the Y Axis: box.
.

6.

Select the second variable (i.e. reading) and click on the arrow button to
move the variable into the X Axis: box.

6.

Click on OK.

Figure 8.7: Scatter Plot


Copyright Open University Malaysia (OUM)

TOPIC 8

CORRELATION

133

As you can see from the scatter plot (Figure 8.7) there is a linear relationship
between reading and science scores. Given that the scores cluster uniformly
around the regression line, the assumption of homogeneity of variance has not
been violated.

8.5 SPEARMAN RANK ORDER CORRELATION


COEFFICIENT
This is the alternative form if the assumptions for Pearson correlation are not met.
In this case, the variables are converted into rank and the correlation coefficient is
computed using the ranked data. Table 8.3 illustrates how the Spearman Rank
Order correlation is computed for the sales and expenditure on advertisement data
by converting the scores into ranks.
Table 8.3: Computation of Spearman Rank Order Correlation
Month

1
2

Rank
Rank
Rank difference
Sales Advertisement Sales Advertisement
d
(mil) - X
(mil) -Y
11
175.3
66.8
11
0
154.9

59.0

172.7

61.3

10

167.6

61.3

167.6

54.5

160.0

d2

-5

25

7.5

2.5

6.25

7.5

-0.5

0.25

4.5

2.5

6.25

52.2

4.5

2.5

182.9

68.1

12

157.5

47.7

12
2.5

4
0

1.5

2.25

157.5

52.2

2.5

2.5

10

170.2

65.8

10

-1

11

167.6

64.5

-2

12

160.0

54.5

4.5

4.5

0
49

rs 1

6 d 2
n (n 2 1)

=1

6(49)
= 0.796
12(121 1)

Copyright Open University Malaysia (OUM)

Ranks are
assigned to
scores by
giving rank 1
to the smallest
score and
rank 2 to the
value and so
on.
Scores with
same values
will share the
rank

134

TOPIC 8

8.5

CORRELATION

SPEARMAN RANK ORDER CORRELATION


USING SPSS

The Spearman Rank Order Correlation is used to determine the linear relationship
between the two variables listed as follows:

Employees are knowledgeable

Performs the service right the first time


SPSS Procedures:

1.

Select the Analyze menu.

2.

Click on Correlate and then Bivariate to open the Bivariate


Correlations dialogue box.

3.

Select the variables you require (i.e. reading and science) and click on
the arrow button to move the variables into the Variables: box.

4.

Ensure that the Spearman correlation option has been selected.

5.

In the Test of Significance box, select the One-tailed radio button.

6.

Click on OK.

Results
Correlations
Spearman's rho

rq2

rq6

Correlation Coefficient
Sig. (2-tailed)
N
Correlation Coefficient
Sig. (2-tailed)
N

rq2
1.000
.
203
.507**
.000
203

rq6
.507**
.000
203
1.000
.
203

**. Correlation is significant at the 0.01 level (2-tailed).

The correlation coefficient of 0.507, indicates a moderate positive relationship


between Employees are knowledgeable (rq2) & Performs the service right the
first time (rq6).

The p-value of 0.000 (less than 0.05), shows that the linear relationship is a
true reflection of the phenomena in the population. In other words, the linear
relationship seen in the sample is NOT due to mere chance.

Copyright Open University Malaysia (OUM)

TOPIC 8

CORRELATION

135

NOTE : CAUSATION AND CORRELATION


Causation and correlation are two concepts that have been wrongly interpreted by some
researchers. The presence of a correlation between two variables does not necessarily
mean there exists a causal link between them. Say for instance that there is a correlation
(0.60) between "teachers salary" and "academic performance of students".
Does this imply that well-paid teachers "cause" better academic performance of students?
Would the percentage of academic performance increase if we increased the pay of
teachers? It is dangerous to conclude the causation just because there is a correlation or
relationship between the two variables. It tells nothing by itself about whether "teachers
salary" causes "achievement".

ACTIVITY 8.1

A researcher conducted a study which aimed to determine the


relationship between self-efficacy and academic performance in
geography. A 20-item self-efficacy scale and a 25-item geography test
was administered to a group of 12 students.
The following are the results of the study:
Self-Efficacy Scale
15
13
14
12
16
12
11
17
15
13

Geography Test
22
17
20
18
23
21
19
24
19
16

(a)

Compute the Pearson Correlation Coefficient.

(b)

What is the mean for the self-efficacy scale and the mean of the
geography test?

(c)
(d)
(e)

Plot a scatter plot for the data.


Comment on the scatter plot.
Compute the Spearman Rank Order correlation coefficient.

(f)

Perform a significant test for the correlation coefficient.

Copyright Open University Malaysia (OUM)

136

TOPIC 8

CORRELATION

The linear relationship between two variables is evaluated from two aspects:
the strength of the relationship (correlation), and the cause-effect association
(regression).

In statistics, correlation is used to denote association between two quantitative


variables, assuming that the association is linear.

The value for correlation coefficient ranges from 1 to +1. Any value close to
these extremes indicates the strength of the linear relationships in the same or
opposite direction.

There are two methods for computing the correlation coefficient, the Pearson
correlation, and Spearman Rank Order correlation. The latter is the nonparametric equivalent of the former and used when the data is measured in an
ordinal level or when the sample size is small.

The correlation coefficient computed from the sample indicates the strength of
the relationship in the sample. To generalise a linear relationship to the
population, the significant test needs to be performed.

Coefficient of determination
Linear relationship
Pearson's product-moment correlation

Scatter diagram
Spearman rank order correlation

Copyright Open University Malaysia (OUM)

Topic Linear

Regression

LEARNING OUTCOMES
By the end of this topic, you should be able to:
1.

Explain the concept of relationship between variables;

2.

Determine the slope and intercept of a regression equation;

3.

Discuss the use of the statistical tests to determine cause-effect


relationship between variables; and

4.

Interpret SPSS outputs on regression analysis.

INTRODUCTION

This topic explains the concept of causal relationship between variables. It


discusses the use of statistical tests to determine slope, intercept and the
regression equation. It also demonstrates how to run regression analysis using
SPSS and interpret the results.

9.1

WHAT IS SIMPLE LINEAR REGRESSION?

Correlation describes the strength of an association between two variables. If the


two variables are related, the changes in one will lead to some changes in the
corresponding variable. If the researcher can identify the cause and effect
variable, the relationship can be represented in the form of equation:
Y = a + bX
where Y is the dependent variable, X is the independent variable, and a and b are
two constants to be estimated.

Copyright Open University Malaysia (OUM)

138

TOPIC 9 LINEAR REGRESSION

Basically regression is a technique of placing the best fitting straight line to represent
a cluster of points (see the following Figure 9.1). The points are defined in a twodimension plane. The straight line expresses the linear association between the
variables studied. It is a useful technique to establish cause-effect relationship
between variables and to forecast future results/outcomes. An important consideration
in linear regression analysis is, the researcher must identify the independent and
dependent variable prior to the analysis.

9.2

ESTIMATING REGRESSION COEFFICIENT

Y = a + bX
Slope
The inclination of a regression line as compared to a base line:

n XY X Y

n X 2 X 2

Y-intercept
An intercepted segment of a line, the point at which a regression line intercepts
the Y-axis:

a Y bX

Figure 9.1: Slope and Intercept of a Regression Line


Copyright Open University Malaysia (OUM)

TOPIC 9 LINEAR REGRESSION

139

Example:
A research was conducted at TESCO Hypermarket to determine if there is a
cause-effect relationship between the sales and expenditure on advertisements.
Table 9.1 illustrates the computation of the regression coefficients.
Table 9.1: Computation of Regression Coefficients
Month

Sales (mil) Advertisement (hundred thousand)


(X)
(Y)

1
2
3
4
5
6
7
8
9
10
11
12
Total

157.5
157.5
160.0
160.0
167.6
154.9
167.6
172.7
167.6
170.2
175.3
182.9
1993.9

47.7
52.2
52.2
54.5
54.5
59.0
61.3
61.3
64.5
65.8
66.8
68.1
707.9

Mean

166.2

59.0

12 118117.11 1993.9 707.9


12 332083.21 1993.9

(X*Y)
7507.07
8222.03
8354.64
8717.89
9133.03
9144.56
10274.66
10586.01
10812.78
11202.95
11707.37
12454.13
118117.11

X^2
24799.95
24799.95
25606.40
25606.40
28103.17
24006.40
28103.17
29832.20
28103.17
28961.23
30716.07
33445.09
332083.21

0.63

a Y bX 46.86
The regression equation for the relationship between Sales and Expenditure on
advertisements is:
Sales = 0.63 (Expenditure on advertisement) 46.86
This means that, on average every increase of RM 100,000 advertisement
expenditure will lead to an increase of RM 0.63 million in sales.

Copyright Open University Malaysia (OUM)

140

TOPIC 9 LINEAR REGRESSION

9.3

SIGNIFICANT TEST FOR REGRESSION


COEFFICIENTS

The slope computed, simply shows the degree of the relationship between the
variables in the sample observed. Whether this is due to chance or there is a true
relationship between these two variables can only be determined through the
significant test for regression coefficient.
Example
If the researcher would like to test the hypothesis that there is a true relationship
between sales and expenditure on advertising, the following procedures need to be
adhered.

9.3.1

Testing the Assumption of Linearity

Prior to proceeding with the significant test for the slope, the assumption of
linearity need to be tested first. This is simply to gather statistical evidence that
the Linear Regression model that we proposed is an appropriate model in relating
the relationship between the variables. The linearity test is also called the global
test.
The Hypothesis
Ho: The variation in the dependent variable is not explained by the linear model
(R2 = 0).

Ha: A significant porting of the variation in the dependent variable is explained


by the linear model (R2 0).
The level of significance is set at 0.05 ( = 0.05).
The researcher performs the ANOVA for the linear relationship between sales and
expenditure on advertising. The result is shown in Table 9.2.
Table 9.2: The Results of the ANOVA for Simple Linear Regression between Sales and
Expenditure on Advertising
ANOVA
Regression
Residual
Total

F-value is 13.46

P-value is 0.01

df
1
9
10

SS
254.65
170.22
424.88

MS
254.65
18.91

F
13.46

Copyright Open University Malaysia (OUM)

p-value
0.01

TOPIC 9 LINEAR REGRESSION

141

Since the p-value is smaller than 0.05, reject null hypothesis and conclude the
alternative hypothesis. There is a linear relationship between the variables studied.
From the data it is evident that there is a linear relationship between sales and
expenditure on advertising.
Now, we can proceed to the test of significance for the regression slope.

9.3.2

Testing the Significance of the Slope

The next step is testing the significance of the slope. This is to test whether there
is a significant contribution of the predictor variable to the changes in the
dependent variable. In our case, it is to test the significant contribution of
expenditure on advertising to sales.
Note : For simple linear regression where there is only one independent variable,
if linear relationship is proven the significance test for the slope will show
significant departure from zero.
Requirements
Parameter to be tested: Regression Slope,
Normality: Sample statistics (in this case, b) resembles normal distribution.
Sample size: Large
Recommended test: t-test for regression slope.

Test statistics: t

b
SE (b)

The Hypothesis
H0:
The regression slope is equal to zero.

Ha:

The regression slope is not equal to zero.

The level of significance is set at 0.05 ( = 0.05).


The researcher performs the t-test for regression slope for the linear relationship
between expenditure on advertisement and sales. The result is shown in Table 9.3.
Table 9.3: The Results of the T-test to Test the Significance of the Regression Slope

Intercept
Slope

Coefficients
-46.86
0.633

Standard Error
14.77
0.1656

t-Stat
-3.17
3.82

Copyright Open University Malaysia (OUM)

P-value
0.006
0.005

142

TOPIC 9 LINEAR REGRESSION

t-value is 3.82

p-value is 0.005

Since the p-value is smaller than 0.05, reject null hypothesis and conclude the
alternative hypothesis. The regression slope is not equal to zero. There is a true
relationship between the variables studied. Sales is linearly related to expenditure
in advertisement. The regression coefficient for this relationship is:
Sales = 46.86 + 0.633 (Expenditure on advertisement) + Error
The R2 is 0.599, meaning that 59.9% of the variation in Sales is attributed to the
variation in Expenditure on advertising.

9.4

SIMPLE LINEAR REGRESSION USING SPSS

The Linear regression is to determine the causal relationship between the


dependent and independent variables listed below:

Employees knowledge (Independent)

Customer Satisfaction (Dependent)


NOTE: Before proceeding with the regression analysis, the following
assumptions need to be checked.

Linear Relationship

Normal Error

Homoscedasticity

Copyright Open University Malaysia (OUM)

TOPIC 9 LINEAR REGRESSION

143

SPSS Procedures:

1.

Select the Analyze menu.

2.

Click on Regression and then Linear to open the Linear


Regression dialogue box.

3.

Select the dependent variable and push it into the Dependent Box

4.

Select the independent variable and push it into the Independent


Box

5.

Click Statistics and tick Estimates, Model fit, and Descriptive

6.

Click Continue

7.

Click on OK.

Results
The first step in regression analysis: Global Hypothesis

Ho: The variation in the dependent variable is not explained by the linear model
(R2 = 0).
Ha: A significant porting of the variation in the dependent variable is explained
by the linear model (R2 0).
Refer to Figure 9.2.

Figure 9.2: ANOVA

Copyright Open University Malaysia (OUM)

144

TOPIC 9 LINEAR REGRESSION

Since the p-value is less than 0.05, reject the null hypothesis and conclude that a
significant porting of the variation in the dependent variable is explained by the
linear model. Refer to Figure 9.3.

Figure 9.3: Model Summary

The R2 is 0.306; indicates that about 30.6% of the variation in the customers
satisfaction can be attributed to the changes in the respondents perception on
employees knowledge.
The next step is to test the significant of the slope. In simple linear regression if
the global hypothesis shows that there is a significant linear relationship between
the dependent and independent variable, the significance test for the slope will
also provide evidence that it is significantly different from zero.
The Hypothesis
H0:

The regression slope is equal to zero.

Ha:

The regression slope is not equal to zero.

Refer to Figure 9.4.

Figure 9.4: Coefficients

Copyright Open University Malaysia (OUM)

TOPIC 9 LINEAR REGRESSION

145

Since the p-value is less than 0.05, reject the null hypothesis and conclude that the
regression slope is not equal to zero. Thus,
Customers Satisfaction = 0.553 (Employees knowledge) + 2.596 + Error

9.5

MULTIPLE REGRESSION

Multiple regression is an extension of simple linear regression. It uses the same


principles in placing the best fitting straight line to represent a cluster of points,
BUT the consideration is not TWO but multiple dimensions.
Example
A researcher is interested in determining the various factors that contribute to the
sales of a newly introduced hair shampoo. Among the crucial factors that he
wishes to study are cost for TV advertisement, training of sales executives,
employing promoters, distribution of free samples, and leasing the prime spots at
hypermarkets and supermarkets.

The variables involved in the study are:


TV
Train
Promoters
Free samples
Prime spot

: TV advertisement cost
: Training of sales executives cost
: Cost for employing promoters
: Cost for distributing free samples
: Cost for leasing prime spots at hyper and supermarkets

(a)

Testing the assumption of linearity


The purpose is to determine whether the factors are linearly related to the
sales of the newly introduced hair shampoo.

(b)

The Hypothesis
Ho: The variation in the sales is not explained by the linear model
comprising of costs for TV advertisement, training of sales executives,
employing promoters, distributing free samples, and leasing prime spots. (R2
= 0).

Ha: A significant porting of the variation in the sales is explained by the


linear model comprising of costs for TV advertisement, training of sales
executives, employing promoters, distributing free samples, and leasing
prime spots. (R2 0).
The level of significance is set at 0.05 ( = 0.05).

Copyright Open University Malaysia (OUM)

146

TOPIC 9 LINEAR REGRESSION

The researcher performs the ANOVA for the linear relationship between
sales and all the defined predictor variables. The result for it is shown in
Table 9.4.
Table 9.4: The Results of the ANOVA for Multiple Regressions
Model

Sum of Squares

df

Mean Square

Regression

30.866

Residual

90.216 4652

Total

Sig.

6.173 318.33 .0000


0.019

121.082 4657

a. Predictors: (Constant)TV, Train, Promoters, Free sample, Prime spot


b. Dependent Variable: sales

Since the p-value is smaller than 0.05, reject the null hypothesis and
conclude the alternative hypothesis. There is a linear relationship between
the variables studied. From the analysis it is evident that there is a linear
relationship between the sales and the combination of the predictor
variables.
The next step is the test of significance for the regression slope (for every
independent [predictor] variable). This is to determine the contribution of
each predictor variable independently.
(c)

Requirements

Parameter to be tested: Regression Slope,


Normality: Sample statistics resembles normal distribution.
Sample size: Large sample size
Recommended test: t-test for regression slope.
Test statistics: t

(d)

b
SE (b)

The Hypothesis

H0: The regression slope is equal to zero.


Ha: The regression slope is not equal to zero.
The level of significance is set at 0.05 ( = 0.05).

Copyright Open University Malaysia (OUM)

147

TOPIC 9 LINEAR REGRESSION

The researcher performs the t-test for regression slopes for the linear
relationship between Sales and the following variables:
(i)

Costs for TV advertisements;

(ii)

Training of sales executives;

(iii) Employing promoters;


(iv) Distributing free samples; and
(v)

Leasing prime spots.

The result for it is shown in Table 9.5.


Table 9.5: The Results of the T-test to Test the Significance of the Regression Slopes
Model

Unstandardised Coefficients
B
3.5373
0.1214
-0.1247
0.2626

(Constant)
TV ads
Train
Promoters
Free
samples
Prime spots

.05965
0.2163

Sig.

Std. Error
0.4038
0.0261
0.0944
0.0138

8.76
4.650
-1.321
19.095

.000
.000
0.429
.000

0.0114

5.208

.000

0.1531

1.413

0.115

a. Dependent Variable: sales

Since the p-value is smaller than 0.05, for (i) costs for TV advertisements,
(ii) employing promoters and (iii) distributing free samples.
The regression model for this relationship between Sales and costs of
advertisements is:
Sales = 3.54 +0.1214 (TV) + 0.2626(Promoters) + 0.0597(Free Samples) + error

The adjusted R2 is 0.254, meaning that 25.4% of the variation in the sales is
attributed to the combined variation in the costs for TV advertisement,
employing promoters, distributing free samples.

Copyright Open University Malaysia (OUM)

148

9.6

TOPIC 9 LINEAR REGRESSION

MULTIPLE REGRESSION USING SPSS

In a study on hospital service quality, the researcher classified service quality into
the following dimensions: assurance, reliability, service policy, tangibles, problem
solving and convenience. Apart from this, he also assessed the overall patients
satisfaction with the services. The following is the description of the hospital
service quality dimensions.
Dimension

Number of Items

Assurance

Reliability

Service Policy

Tangibles

Problem Solving

Convenience

He wanted to determine how patients perception on the service performance of


the hospital on the six dimensions of service quality influenced their overall
satisfaction.
The Hypothesis

Ho: The variation in patients overall satisfaction is not explained by the linear
model comprising of patients assessment on assurance, reliability, service
policy, tangibles, problem solving and convenience. (R2 = 0).
Ha: A significant porting of the variation in patients overall satisfaction is not
explained by the linear model comprising of patients assessment on
assurance, reliability, service policy, tangibles, problem solving and
convenience. (R2 0).

Copyright Open University Malaysia (OUM)

TOPIC 9 LINEAR REGRESSION

149

SPSS Procedures:

1.

Select the Analyze menu.

2.

Click on Regression and then Linear to open the Linear Regression


dialogue box.

3.

Select the dependent variable and push it into the Dependent Box

4.

Select the independent variables and push them into the Independent
Box

5.

Click Statistics and tick Estimates, Model fit and Descriptive

6.

Click Continue

7.

Click on OK.

Results

Refer to Figure 9.5.

Figure 9.5: ANOVA

Since the p-value is less than 0.05, reject the null hypothesis and conclude that a
significant portion of the variation in the dependent variable is explained by the
linear model.
The next step is the test of significance for the regression slope (for every
independent [predictor] variable). This is to determine the contribution of each
predictor variable independently.

Copyright Open University Malaysia (OUM)

150

TOPIC 9 LINEAR REGRESSION

The Hypothesis

H0:

The regression slope is equal to zero.

Ha:

The regression slope is not equal to zero.

Refer to Figure 9.6.

Figure 9.6: Coefficients

Overall satisfaction is linearly related to patients perception on assurance


reliability and convenience. The regression equation for this relationship is:
Overall satisfaction = 0.638 +0.204 (Assurance) + 0.263(Reliability) + 0.382
(Convenience) + error
Model Summaryb

Model
1

R
R Square
a
.790
.624

Adjusted
R Square
.619

R Square
Change
.624

Durbin-W
atson
1.952

a. Predictors: (Constant), Convenience, Assurance, Reliability


b. Dependent Variable: Overall satisfaction
Figure 9.7: Model Summary

Copyright Open University Malaysia (OUM)

TOPIC 9 LINEAR REGRESSION

151

Refer to Figure 9.7. The adjusted R2 is 0.619, meaning that 61.9% of the variation
in the overall satisfaction is attributed to the combined variation in patients
perception of assurance, reliability and convenience of services provided by the
hospital.
ACTIVITY 9.1

A researcher conducted a study which aimed to determine the


relationship between self-efficacy and academic performance in
geography. A 20-item self-efficacy scale and a 25-item geography test
was administered to a group of 12 students.
The following data are the results of the study:
Self-Efficacy Score
15
13
14
12
16
12
11
17
15
13

Geography Test Score


22
17
20
18
23
21
19
24
19
16

(a)

Compute the Coefficient of Determination.

(b)

Interpret the Coefficient of Determination

(c)

Plot a Scatter Plot for the data and find the best fitting line.

(d)

Determine the Regression Equation and explain it.

(e)

Predict the marks for Geography test if the Self-Efficacy Score


is 18.

Copyright Open University Malaysia (OUM)

152

TOPIC 9 LINEAR REGRESSION

The linear relationship between two variables is evaluated from two aspects:
the strength of the relationship (correlation), and the cause-effect association
(regression).
In statistics, correlation is used to denote association between two quantitative
variables, assuming that the association is linear.
Linear regression is a technique to establish the cause effect relationships
between two variables. If the two variables are related, the changes in one will
lead to some changes in the corresponding variable. If the researcher can
identify the cause and effect variable, the relationship can be represented in
the form of the following equation:
Y = a + bX;
where Y is the dependent variable, X is the independent variable, and a and b
are two constants to be estimated.

Intercept

Regression equation

Linear regression

Significant test for slope

Multiple regression

Slope

Regression coefficient

Copyright Open University Malaysia (OUM)

Topic Non-parametric

10

Tests

LEARNING OUTCOMES
By the end of this topic, you should be able to:

1.

Identify the differences between Parametric and Non-parametric


tests;

2.

Explain the concept of the chi-square, Mann-Whitney and KruskalWallis tests;

3.

Discuss the procedure in using the chi-square, Mann-Whitney and


Kruskal-Wallis tests; and

4.

Interpret SPSS outputs on chi-square, Mann-Whitney and KruskalWallis tests.

INTRODUCTION

This topic provides a brief explanation on the parametric and non-parametric test.
Detailed description is given on chi-square, Mann-Whitney and Kruskal-Wallis
tests. Besides that, the assumptions underlying these statistical techniques are
provided to facilitate student learning. It demonstrates how non-parametric
statistical procedures can be computed using formulae as well as SPSS and how
the statistical results should be interpreted.

10.1

PARAMETRIC VERSUS NON-PARAMETRIC


TESTS

Descriptive statistics are used to compute summary statistics (e.g. mean, median,
standard deviation) to describe the samples, while statistical tests are used for
making inference from sample to the intended population. The following diagram
in Figure 10.1 illustrates this.
Copyright Open University Malaysia (OUM)

154 TOPIC 10 NON-PARAMETRIC TESTS

Figure 10.1: Descriptive Statistics and Statistical Tests

There are two categories of statistical tests:


(i)

The parametric test

(ii)

The non-parametric test

The parametric or distribution constraint test is a statistical test that requires the
distribution of the population to be specified. Thus, parametric inferential methods
assume that the distributions of the variables being assessed belong to some form
of known probability distribution (e.g. assumption that the observed data are
sampled from a normal distribution).
In contrast, for non-parametric test (also known as distribution-free test) the
distribution is not specified prior to the research but instead determined from the
data. Thus, this family of tests do not require the assumption on the distribution.
Most commonly used non-parametric tests rank the outcome variable from low to
high and then analyse the ranks rather than the actual observation.
Choosing the right test will contribute to the validity of the research findings.
Improper use of statistical tests will not only cause the validity of the test result to
be questioned and do little justification to the research, but at times it can be a
serious error, especially if the results have major implications. For example, it is
used in policy formulation and so on.
Parametric tests have greater statistical power compared to their non-parametric
equivalent. However, parametric tests cannot be used all the time. Instead, they
should be used if the researcher is sure that the data are sampled from a
population that follows a normal distribution (at least approximately).

Copyright Open University Malaysia (OUM)

TOPIC 10 NON-PARAMETRIC TESTS

155

On the other hand, non-parametric tests should be used if:

The outcome is a rank (e.g. brand preference);

The score and the population is not normally distributed; or

The existence of a significant number of outliers.

Sometimes, it is not easy to decide whether a sample comes from a normal


population. The following clues can be used to make decisions on normality:

Construct a histogram with normal curve overlapping; it will be fairly obvious


whether the distribution is approximately bell-shaped.

For large data set, use the Kolmogorov-Smirnov test (sample > 100) or
Shapiro-Wilk test (sample < 100) to test whether the distribution of the data
differs significantly from what is normal. This test can be found in most
statistical softwares.

Examine the literature; what matters is the distribution of the overall


population, not the distribution of the sample. In deciding whether a
population is normal, look at all available data, not just data in the current
experiment.

When in doubt, use a non-parametric test; you may have less statistical power
but at least the result is valid.

Sample size plays a crucial role in deciding the family of statistical tests:
parametric or non-parametric. In a large sample, the central limit theorem ensures
that parametric tests work well even if the population is not normal. Parametric
tests are robust to deviations from normal distributions, when the sample size is
large. The issue here is how large is large enough; a rule of thumb suggests that a
sample size of about 30 or more for each category of observation is sufficient to
use the parametric test. The non-parametric tests also work well with large
samples. The non-parametric tests are only slightly less powerful than parametric
tests with large samples.
On the other hand, if the sample size is small we cannot rely on the central limit
theorem; thus, the p value may be inaccurate if the parametric tests were to be
used. The non-parametric test suffers greater loss of statistical power with small
sample size. Table 10.1 summarises some of the commonly used parametric and
non-parametric tests but not all of them are explained in this module.

Copyright Open University Malaysia (OUM)

156 TOPIC 10 NON-PARAMETRIC TESTS

Table 10.1: Commonly used Parametric and Non-parametric Tests


Test
Type
P
A
R
A
M
E
T
R
I
C

Requirements

Random sampling
Large sample size
Level of measurement at
least interval
Population parameter is
normally distributed

Test Type
Parametric
One sample Test
Z-test for population proportion
Z-test for population mean
T-test for population mean
Two-sample Test
Z-test for equality of two proportions
t-test for population mean
Paired t-test
Test involving more than two groups
One Way ANOVA
One Sample Test
X 2 Goodness of fit

Sign test for population median


Sign test for population mean
N
O
N
P
A
R
A
M
E
T
R
I
C

Random sampling
Small sample size (less
than 30)
Level of measurement can
be lower than interval
Distribution of the
population parameter is
not important

Two-sample Test
X 2 test for differences between two

population
Fishers Exact test
McNemars test
X 2 test of independence
Wilcoxon signed rank test
Mann-Whitney U test

Tests involving more than two groups


X 2 test for differences between more than
two populations
Cochran Test
X 2 test of independence
Friedmans Test
Kruskal-Wallis rank sum

Copyright Open University Malaysia (OUM)

TOPIC 10 NON-PARAMETRIC TESTS

157

10.2 CHI SQUARE TESTS


In some situations, you need to use non-parametric statistics because the variables
measured are not intervals or ratios but are categorical such as religion, ethnic
origin, socioeconomic class, political preference and so forth. To examine
hypotheses using such variables, the chi-square test has been widely used. In this
section, we will discuss these popular non-parametric tests called the CHISQUARE (pronounced as kai-square) and denoted by this symbol: 2 .
(a)

(b)

Assumptions
Even though certain assumptions are not critical for using the chi-square,
you need to address a number of generic assumptions:

Random Sampling Observations should be randomly sampled from


the population of all possible observations.

Independence Observations Each observation should be generated


by a different subject and no subject is counted twice. In other words, the
subject should appear in only one group and the groups are not related in
any way.

Size of Expected Frequencies When the number of cells is less than


10 and particularly when the total sample size is small, the lowest
expected frequency required for a chi-square test is 5.

Types of Chi-Square Tests


We will discuss the use of the chi-square for:
1.

One-variable 2 (goodness-of-fit test) used when we have one


variable only.

2.

2 (test for independence: 2 x 2) used when we are looking for an


association between two variables, with two levels.

10.2.1

One Variable 2 or Goodness-of-Fit Test

This test enables us to find out whether a set of Obtained (or Observed)
Frequencies differs from a set of Expected Frequencies. Usually the Expected
Frequencies are the ones that we expect to find if the null hypothesis is true. We
compare our Observed Frequencies with the Expected Frequencies and see how
good the fit is.

Copyright Open University Malaysia (OUM)

158 TOPIC 10 NON-PARAMETRIC TESTS

Example :
A sample of 110 teenagers was asked, which of the four hand phone brands they
preferred. The number of people choosing the different brands was recorded in
Table 10.2.
Table 10.2: Preferences for Brands of Hand Phones
Brand A

Brand B

Brand C

Brand D

20 teenagers

60 teenagers

10 teenagers

20 teenagers

We want to find out if one or more brands are preferred over others. If they are
not, then we should expect roughly the same number of people in each category.
There will not be exactly the same number of people in each category, but they
should be near equal.
Another way of saying this is: If the null hypothesis is TRUE, and some brands
are not preferred more than others, then all brands should be equally represented.
We expect roughly EQUAL NUMBERS IN EACH CATEGORY, if the NULL
HYPOTHESIS is TRUE.
Expected Frequencies
There are 110 people, and there are four categories. If the null hypothesis is true,
then we should expect 110 / 4 = 27.5 teenagers to be in each category. This is
because, if all brands of hand phones are equally popular, we would expect
roughly equal numbers of people in each category. In other words, the number of
teenagers should be evenly distributed among the four brands.

The numbers that we find in the four categories, if the null hypothesis is true
are called the EXPECTED FREQUENCIES (i.e. all brands are equally
popular).

The numbers that we find in the four categories are called the OBSERVED
FREQUENCIES (i.e. based on the data we collected).

See Table 10.3. What 2 does is to compare the Observed Frequencies with the
Expected Frequencies.

If all brands of hand phones are equally popular, the Observed Frequencies
will not differ from the Expected Frequencies.

Copyright Open University Malaysia (OUM)

TOPIC 10 NON-PARAMETRIC TESTS

159

If the Observed Frequencies differ greatly from the Expected Frequencies,


then it is likely that all four brands of hand phones are not equally popular.

Table 10.3 shows the observed and expected frequencies for the four brands of
hand phones. It is often difficult to tell just by looking at the data, which is why
you have to use the 2 test.
Table 10.3: Expected and Observed Frequencies and the Differences
Column
1
Brand A
Brand B
Brand C
Brand C

Column
2
Observed
(O)
20
60
10

Column 3

Column 4

Column 5

Expected
(E)
27.5
27.5
27.5

Difference
(O - E)
-7.5
32.5
-17.5

(O E)2

20

27.5

-7.5

56.25
1056.25
306.25
56.25

TOTAL

Column 6

(O E)2
E
2.05
38.41
11.14
2.05
53.65

HOW DO YOU DETERMINE IF THE OBSERVED AND EXPECTED


FREQUENCIES ARE SIMILAR?
Step 1:
Calculate the differences between the Expected Frequencies and Observed
Frequencies (see Column 4). Do not worry about the plus and minus signs!
Step 2:
Square the differences (see Column 5) to obtain the absolute value of the
difference.
Step 3:
Divide the squared difference with the measure of variance (see Column 6). The
measure of variance is the Expected Frequencies (i.e. 27.5). For Brand A, it is
56.25 / 27.5 = 2.05 and do the same for the other brands.

Copyright Open University Malaysia (OUM)

160 TOPIC 10 NON-PARAMETRIC TESTS

Step 4:
Add up the figures you obtained in Column 6 and you get 53.65. So the 2 is
53.65.
The formula for the 2 which you did above is shown as follows:

observed

frequency - expected frequency


expected frequency

Step 5:
The degrees of freedom (DF) is one less than the number of categories. In this
case, DF is 4 categories 1 = 3. We need to know this, for it is usual to report the
DF, along with the 2 and the associated probability level.
SPSS Output
Hand phones
Chi-Square

53.65a

Df

Asymp. Sig.

.0000

a. 0 cells (.0%) have expected frequencies less than 5.


The minimum expected cell frequency is 27.5
The 2 value of 53.65 (rounded to 53.6) is compared with that value that would be
expected for a 2 with 3 DF, if the null hypothesis were true (i.e. all brands of
hand phones are preferred equally). [SPSS will compute this comparison]. The
SPSS Output shows that with a 2 value of 53.6 the associated probability value is
0.0001. This means that the probability that this difference was due to chance is
very small. We can conclude that there is a significant difference between the
Observed and Expected Frequencies; i.e. all the four brands of hand phones are
not equally popular. More people prefer brand B (60) than the other hand phone
brands.

Copyright Open University Malaysia (OUM)

TOPIC 10 NON-PARAMETRIC TESTS

161

SPSS PROCEDURES FOR THE CHI-SQUARE TEST FOR


GOODNESS OF FIT

Select the Data menu.

Click on the Weight Cases to open the Weight Cases dialogue box.

Click on the Weight cases by radio button.

Select the variable you require and click on the right arrow button to
move the variable in the Frequency Variable: box.

Click on OK. The message Weight On should appear on the status bar
at the bottom of the application window.

Select the Analyze menu.

Click on Nonparametric Tests and then Chi-Squareto open the


Chi-Square Test dialogue box.

Select the variable you require and click on the right arrow button to
move to the variable into the Test Variable List: box.

Click on OK.

10.2.2

2 Test for Independence: 2 X 2

Chi-square (2 ) enables you to discover whether there is a relationship or


association between two categorical variables. For example, is there an
association between students who smoke cigarettes and those who do not smoke,
and students who are active in sports and those who are not active in sports? This
is a type of categorical data, because we are asking whether they smoke or do not
smoke (not how many cigarettes they smoke); and whether they are active or not
active in sports. The design of the study is shown in Table 10.4, which is called a
contingency table and it is a 2 x 2 table because there are two rows and two
columns.
Table 10.4: 2 x 2 Contingency Table
Smoke

Do not Smoke

Not Active in Sports

50

15

Active in Sports

20

25

Copyright Open University Malaysia (OUM)

162 TOPIC 10 NON-PARAMETRIC TESTS

Example
A researcher is interested in finding out whether male students from high income
or low income families get into trouble more often in school. The following Table
10.5, shows the frequencies of male students from low and high income family
who have discipline problems in school:
Table 10.5: Observed Frequencies
Discipline
Problems

No Discipline
Problems

Total

Low Income

46

71

117

High Income

37

83

120

Total

83

154

237

To examine statistically whether boys got in trouble in school more often, we


need to frame the question in terms of hypotheses.
Step 1: Establish Hypotheses
The first step of the chi-square test for independence is to establish hypotheses.
The null hypothesis is that the two variables are independent or, in this
particular case that the likelihood of getting into discipline problems is the same
for high income and low income students. The alternative hypothesis to be tested
is that the likelihood of getting into discipline problems is not the same for high
income and low income students.
It is important to keep in mind that the chi-square test only tests whether two
variables are independent. It cannot address questions of which is greater or less.
Using the chi-square test, we cannot evaluate directly the hypothesis that low
income students get in trouble more than high income students; rather, the test
(strictly speaking) can only test whether the two variables are independent or not.
Step 2: Calculate the Expected Value for Each Cell of the Table
As with the goodness-of-fit example described earlier, the key idea of the chisquare test for independence is a comparison of observed and expected values.
How many of something was expected and how many were observed in some
processes? In the case of tabular data, however, we usually do not know what the
distribution should look like. Rather, in this use of the chi-square test, expected
values are calculated based on the row and column totals from the table.

Copyright Open University Malaysia (OUM)

TOPIC 10 NON-PARAMETRIC TESTS

163

The expected value for each cell of the table can be calculated using the following
formula:
Row total Column total
Total for table
For example, in the table comparing the percentage of high income and low
income students involved in disciplinary problems, the expected count for the
number of low income students with discipline problems is:
Expected Frequency (E1) =

117 83
40.97
237

Expected Frequency (E4) =

120 154
77.97
237

Use the formula and compute the Expected Frequencies for E2 and E3. Table 10.6
shows the completed expected frequencies for all the four cells.
Table 10.6: Observed and Expected Frequencies

Discipline
Problems

No Discipline
Problems

Total

Low Income

O = 46
E1 =

O = 71
E2 =

117

High Income

O = 37
E3 =

O = 83
E4 =

120

Total

83

154

237

Step 3: Calculate Chi-square Statistic

With these sets of figures, we calculate the chi-square statistic as follows:


Chi-square = Sum of

(Observed Frequency - Expected Frequency)2


(Expected Frequency)

Copyright Open University Malaysia (OUM)

164 TOPIC 10 NON-PARAMETRIC TESTS

In the example above, we get a chi-square statistic equals to:


(46 40.97) 2 (37 42.03) 2 (71 76.03) 2 (83 77.97) 2

40.97
42.03
76.03
77.97
2
x 1.87
x2

Step 4: Assess Significance Level

(a)

Degrees of Freedom
Before we can proceed, we need to know how many degrees of freedom we
have. When a comparison is made between one sample and another, a
simple rule is that the Degrees of freedom equal (Number of columns 1) x
(Number of rows 1) not counting the totals for rows or columns.

For our data, this gives (21) x (21) = 1.


(b)

Statistical Significance

We now have our chi-square statistic (2 = 1.87), our predetermined


alpha level of significance (0.05), and our degrees of freedom (df =1).
Refer to the chi square distribution table with 1 degree of freedom and
reading along the row, we find our value of 2 = 1.87 is below 3.841 (see
Table 10.7).

When the computed 2 statistic is less than the critical value in the table
for a 0.05 probability level, then we DO NOT reject the null hypothesis
of equal distributions.

Since our 2 = 1.87 statistic is less than the critical value for 0.05
probability level (3.841) we DO NOT reject the null hypothesis and
conclude that students from low income families are NOT
SIGNIFICANTLY more likely to have discipline problems than students
from high income families.

Copyright Open University Malaysia (OUM)

TOPIC 10 NON-PARAMETRIC TESTS

165

Table 10.7: Extract from the Table of 2 Critical Values

Probability Level (alpha)


Df

0.5

0.10

0.05

0.02

0.01

0.001

0.455

2.706

3.841

5.412

6.635

10.827

1.386

4.605

5.991

7.824

9.210

13.815

2.366

6.251

7.815

9.837

11.345

16.268

3.357

7.779

9.488

11.668

13.277

18.465

4.351

9.236

11.070

13.388

15.086

20.517

Note:

The 2 X 2 contingency table can be extended to larger tables such as 3 X 2 or 4


X 3 depending on the number of categories in the independent and dependent
variables. The formulae and the computation procedure are similar to that of the
2 X 2 contingency table.

SPSS PROCEDURES FOR THE


RELATEDNESS OR INDEPENDENCE

CHI-SQUARE

TEST

FOR

Select the Analyze menu.

Click on Descriptive Statistics and then on Crosstabs to open the


Crosstabs dialogue box.

Select a row variable and click on right arrow button to move the variable
into the Row(s): box

Select a column variable and click on the right arrow button to move the
variable into the Column(s): box

Click on the Statistics command push button to open the Crosstabs: Statistics
sub-dialogue box

Note:

The 2 X 2 contingency table can be extended to larger tables such as 3 X 2 or 4


X 3 depending on the number of categories in the independent and dependent
variables. The formulae and the computation procedure are similar to that of the
2 X 2 contingency table.

Copyright Open University Malaysia (OUM)

166 TOPIC 10 NON-PARAMETRIC TESTS

Click on the Chi-square box.

Click on Continue.

Click on the Cells.command push button to open the Crosstabs: Cell


Display sub-dialogue box.

In the Counts box, click on the Observed and Expected check boxes.

In the Percentages box, click on the Row, Column and Total check boxes.

Click on Continue and then OK.

ACTIVITY 10.1
Look at the following table:

What is the value of the expected frequencies?


Observed

10-14 years

15-19 years

20-24 years

25-29 years

72

31

15

50

ACTIVITY 10.2

A study was conducted to determine if science and mathematics should


be taught in English. A total of 105 parents were asked to respond yes
or no. The data (shown in the following table) were categorised
according to whether they were from an urban or rural area:
Yes

No

Total

Urban

36

14

50

Rural

30

25

55

Total

66

39

105

Copyright Open University Malaysia (OUM)

TOPIC 10 NON-PARAMETRIC TESTS

167

Questions:

What is the null hypothesis? What is the alternative hypothesis?

How many degrees of freedom are there?

What is the value of the chi-square statistic for this table?

What is the p-value of this statistic?

10.3 MANN-WHITNEY U TESTS


The Mann-Whitney U test is used to compare the differences between two groups of

sample from an unrelated population. This test uses the median as the parameter
for comparisons. The Mann-Whitney U test is applied when the sample size is
small (less than 30 per group) and/or when the level of measurement is ordinal.
Refer to Figure 10.2.

Figure 10.2: Mann-Whitney U Test

The Mann-Whitney U test tests the significant difference between two


independent groups. This test requires the dependent variable to be measured in
ordinal level. For example, comparing the IQ scores of male and females (the IQ
score is considered as an ordinal level measurement because an individual with an
IQ score of 100 is not twice as intelligent as the one with a score of 50). The
Mann-Whitney U test is also used for interval data when the sample size is small.
Requirements for the test:

Parameter to be tested: Median

Normality: No Assumption of Normality

Unrelated Samples

Sample size: Small

Copyright Open University Malaysia (OUM)

168 TOPIC 10 NON-PARAMETRIC TESTS

n (n 1)
Test Statistics, T = S 1 1
where S is the sum of rank of population 1 and
2
n1 is the sample size of population 1. Population is the population with smaller
sum of rank value.

The Mann-Whitney test uses the rank sum as the test statistics. The procedure is
as follows:

The two independent samples are combined and ranks are assigned to the
scores (it can be a mean score).

The sum rank of Population 1 (usually the population of interest, decided


based on the null hypothesis) is computed.

This sum rank is than used to compute the test statistics.

Some crucial assumptions of the Mann-Whitney test:

The data consists of a random sample of observations from two unrelated


populations with an unknown median.

The two samples are independent.

The variable observed is a continuous random variable (usually mean).

The distribution functions of the two populations differ only with respect to
location, if they differ at all.
Example:

In assessing the effect of TV advertisements on buyers preference on branding, a


simple experiment was carried out. A group of adults was selected to participate
in this experiment. One group was subjected to a behaviour modification
psychotherapy using a series of television advertisements while another formed
the control group. 17 adults were given the treatment, while 10 others did not
receive any treatment. After the treatment period, both the experimental and the
control group were rated for their brand preference using the brand preference
scale. Refer to Figure 10.3.

Copyright Open University Malaysia (OUM)

169

TOPIC 10 NON-PARAMETRIC TESTS

Figure 10.3: Processes in the experiment

The result of the experiment can be seen in Table 10.8 below.


Table 10.8: Result
BMP
Ctrl

Brand Preference Score


11.9
11.7
9.5 9.4
6.6
5.8
5.4 5.1

8.7
5.0

8.2
4.3

7.7
3.9

7.4
3.3

7.4
2.4

7.1
1.7

6.9

6.8

6.3

5.0

4.2

4.1

2.2

We wish to know whether these data provide sufficient evidence to indicate that
behaviour modification psychotherapy using TV advertisements improves the
brand preference among adult shoppers.
The Hypothesis

Ho: There is no difference in the brand preference between the group that
received behaviour modification therapy and the control group.
Ha: There is a difference in the brand preference between the group that received
behaviour modification therapy and the control group.
The level of significance is set at 0.05 ( = 0.05). Table 10.9 presents the Result
of Analysis on brand preference scores of treatment and control groups.

Copyright Open University Malaysia (OUM)

170 TOPIC 10 NON-PARAMETRIC TESTS

Table 10.9: Result of Analysis


PRS score / Rank
BMP

11.9

11.7

9.5

9.4

8.7

8.2

7.7

7.4

7.4

7.1

6.9

6.8

6.3

5.0

4.2

4.1

2.2

Rank

27

26

25

24

23

22

21

19.5

19.5

18

17

16

14

9.5

Ctrl

6.6

5.8

5.4

5.1

5.0

4.3

3.9

3.3

2.4

1.7

Rank

15

13

12

11

9.5

Ranking of the scores by arranging all the scores from both groups in
ascending order.

A rank of 1 is given to the smallest and same score will share the rank
n (n 1)
10(10 1)
= 26.00
= 81.5
T = S 1 1
2
2
p = 0.003

Example of SPSS output of the Mann-Whitney Test (refer Figure 10.4 below).

Figure 10.4: SPSS Output of the Mann-Whitney Test

Copyright Open University Malaysia (OUM)

Mean

Sum

17.44

296.
5

8.15

81.5

TOPIC 10 NON-PARAMETRIC TESTS

171

Since the p-value is smaller than 0.05, reject null hypothesis and conclude the
alternative hypothesis. There is a difference in the brand preference between the
group that received behaviour modification therapy and the group that did not.
The brand preference score of the group that received behaviour modification
therapy is significantly different compared to the group that did not receive any
therapy. From the mean rank, it is evident that the brand preference score for the
group that received behaviour modification therapy is higher. In other words, the
behaviour modification psychotherapy using TV advertisement enhances brand
preference among adults.
Example: Mann-Whitney Test using SPSS

Mann-Whitney Test also can be used to compare the difference between two
distinct groups (e.g. male and female) rating on particular phenomena. In a
service quality survey carried at the Kuching General Hospital, the researcher
gauged the knowledge of hospital staff using a specially designed questionnaire.
He would like to test whether the knowledge level of male and female staff is
similar or differs significantly. The following Table 10.10 provides the mean
score and standard deviation of respondents assessment on the knowledge of the
hospital staff.
Table 10.10: Hospital Staff Knowledge

Mean

Std. Deviation Minimum

Maximum

Male

24

4.58

1.213

Female

31

5.00

1.065

The Hypothesis

Ho :

There is no difference between the male and female hospital staffs


knowledge.

Ha:

There is a significant difference between the male and female hospital


staffs knowledge

Copyright Open University Malaysia (OUM)

172 TOPIC 10 NON-PARAMETRIC TESTS

SPSS Command
SPSS PROCEDURES FOR THE MANN WHITNEY TEST

Open the Analysis menu.

Select the Nonparametric test.

Select the two Independent Samples

Select the variable you require and click on the right arrow button to
move the variable in the Test Variable List box.

Push gender into the Grouping variable box.

Click on OK.

An example of SPSS results


Ranks
Knowledge of
hospital staff

gender
male
female
Total

N
24
31
55

Test Statisticsa

Mann-Whitney U
Wilcoxon W
Z
Asymp. Sig. (2-tailed)

Knowledge of
hospital staff
311.000
611.000
-1.099
.272

a. Grouping Variable: gender

Mean Rank
25.46
29.97

Sum of Ranks
611.00
929.00

Decision: The p-value is greater than


0.05, do not reject the null, there is not
enough evidence to conclude the
alternate.
There is no significant difference
between the male and female hospital
staff knowledge. Any difference
observed could be due to chance.

Copyright Open University Malaysia (OUM)

TOPIC 10 NON-PARAMETRIC TESTS

173

10.4 KRUSKAL-WALLIS RANK SUM TESTS


The Kruskal-Wallis test serves the same purpose as the One way ANOVA,
comparing the differences between more than two groups of samples from
unrelated populations. This test is less stringent than the ANOVA. This test uses
the median as the parameter for comparisons. The Kruskal-Wallis test is used
when the sample size is small (less than 30 per group) and/or when the level of
measurement is ordinal. Refer to Figure 10.5.

Figure 10.5: The Kruskal-Wallis Test

The Kruskal-Wallis test tests the significant differences among independent


groups (if the number of independent groups are two, then the appropriate test is
the Mann-Whitney U test). This test requires the dependent variable to be
measured in ordinal level. For example, comparing the IQ scores of Malay,
Chinese and Indian youths (IQ score is considered as ordinal level measurement
because an individual with an IQ score of 100 is not twice as intelligent as the one
with a score of 50). The Kruskal-Wallis test is also used for interval data when the
sample size is small.
Requirement for the test:

Parameter to be tested: Median


Normality : No assumption of normality

Sample size: Small

Sample characteristics: Unrelated samples

Recommended test: Kruskal-Wallis test


Copyright Open University Malaysia (OUM)

174 TOPIC 10 NON-PARAMETRIC TESTS

k R2
12
i - 3 (N + 1), where
Test Statistics, H =

N ( N 1) i 1 ni

N = Total sample size of all the group


ni = Sample size of each group
Ri = Rank sum of each group
The procedure for the Kruskal-Wallis test is as follows:

The independent samples are combined and ranks are assigned to the scores
(it can be a mean score).

The sum ranks of the different populations are computed.

This sum rank then is used to compute the test statistics.

Some crucial assumptions of the Kruskal-Wallis test:

The data consists of k-random samples of n1, n2, nk.

The samples are independent.

The variable observed is a continuous random variable (usually mean).

The populations are identical except for a possible difference in location for
at least one population.

Example:

In studying the average amount spent on mobile phone usage, a researcher


collected the average monthly mobile phone bills from three groups of adults:
clerical staff, supervisors and managers. Table 10.11 presents the data.
Table 10.11: Data
Average monthly expenditure on mobile phone bill
Clerical

257 302 206 318 449 334 299 149 282 351

Supervisor 460 496 450 350 463 357


Manager

338 767 202 833 632

Objective:

To determine whether there is any difference in the average monthly mobile


phone expenditure among the three populations.

Copyright Open University Malaysia (OUM)

TOPIC 10 NON-PARAMETRIC TESTS

175

The Hypothesis

Ho: There is no difference in the average monthly expenditure on mobile phone


usage among clerks, supervisors and managers
H1: There are differences in the average monthly expenditure on mobile phone
usage among clerks, supervisors and managers
The level of significance is set at 0.05 ( = 0.05). Table 10.12 as follows shows
the results of the analysis.
Table 10.12: Results of the Analysis
Clerk
Rank

257 302 206 318 449 334 299 149 282 351
4

14

12

Sum of rank
= 69

Supervisor 460 496 450 350 463 357


Rank

16

18

15

11

17

Group III

338 767 202 833 632

Manager

10

The

20

Kruskal-Wallis

21

statistics

Sum of rank
= 90

13

Sum of rank
= 72

19

is

computed

using

the

formula,

k R2
i

12
- 3 (N + 1), where
N ( N 1) i 1 ni
k R2
12
i - 3 (N + 1)
H=

N ( N 1) i 1 ni

= (

69 2 90 2 72 2
12

3 (21 1) )
21(21 1) 10
6
5

= 8.36

Copyright Open University Malaysia (OUM)

176 TOPIC 10 NON-PARAMETRIC TESTS

SPSS Output

Refer to Table 10.13.


Table 10.13: SPSS Output

Average monthly expenditure on


mobile phone bill

Group

Mean Rank

Clerk

10

6.90

Supervisor

15.00

Manager

15.67

Total

21

Average monthly expenditure

Chi-Square

8.361

df

Asymp. Sig.

0.015

The Kruskal-Wallis 2 value is 8.361 and the p-value is 0.015. Since the p-value
is smaller than 0.05, reject null hypothesis and conclude the alternative
hypothesis. There is a difference in the average monthly expenditure on mobile
phone usage among the three groups. The average monthly expenditure on mobile
phone usage among the three different groups is not the same. Even though the
test statistics does not provide information on the differences in the average
monthly expenditure, judging from the mean rank, clerks spend the least
compared to supervisors and managers.
Example: Kruskal-Wallis Test using SPSS

With reference to the hospital service quality survey, the management wanted to
see how respondents employment influenced their assessment on the knowledge
of hospital staff. Respondents were grouped into three categories of employment
(public, private and students), while knowledge of hospital staff was rated on a
five point scale (assumed ordinal). The hospital administrator wanted to know
who gave better ratings: public sector employees, private sector employees or
students.

Copyright Open University Malaysia (OUM)

TOPIC 10 NON-PARAMETRIC TESTS

177

The Hypothesis

Ho : There is no difference in the assessment on hospital staff knowledge among


public sector employees, private sector employees, and students.
H1 : There are differences in the assessment on hospital staff knowledge among
public sector employees, private sector employees, and students.
SPSS Command
SPSS PROCEDURES FOR THE KRUSKAL WALLIS TEST

Open the Analysis menu.

Select the Nonparametric test.

Select the K Independent Samples

Select the variable you require and click on the right arrow button to
move the variable in the Test Variable List box.

Push the independent into the Grouping variable box.

Click Define and define the group.

Tick Kruskal-Wallis H for Test Type.

Click on OK.

Copyright Open University Malaysia (OUM)

178 TOPIC 10 NON-PARAMETRIC TESTS

Results
Ranks
Employment
Goverment
Private
Students
Total

Knowledge of staff

N
1
5
17
23

Mean Rank
18.00
9.60
12.35

Test Statisticsa,b

Chi-Square
df
Asymp. Sig.

Knowledge of
staff
(assessment
before
attending
seminar)
1.694
2
.429

Since the p-value is 0.429 which is greater than


0.05, there is no difference in the assessment
on hospital staff knowledge among public
sector employees, private sector employees,
and students.

a. Kruskal Wallis Test


b. Grouping Variable: Employment

ACTIVITY 10.3

The following data summarises the students PASS or FAIL in a


mathematics test on fractions and the method used to teach the concept
Group

Mathematics Test Performance


Pass

Fail

Method X

21

Method Y

29

(a)

Determine the expected frequencies and degree of freedom.

(b)

Formulate the hypothesis to test performance in mathematics test


that is associated with the teaching methods.

(c)

Compute the chi-square statistics and state your conclusion.

Copyright Open University Malaysia (OUM)

TOPIC 10 NON-PARAMETRIC TESTS

179

There are two categories of statistical tests: (i) the parametric and (ii)
non-parametric tests.

The parametric or distribution constraint tests are statistical tests that require
the distribution of the population to be specified.

Parametric inferential methods assume that the distribution of the variables


being assessed belong to some form of known probability distribution.

Among the commonly used non-parametric tests are chi-square test, MannWhitney Test and Kruskal-Wallis test.

The chi-square test tests the significant difference in proportion and is very
useful when the variable measured is nominal.

The chi-square is very flexible and mainly used in two forms (i) comparing
the observed proportion with some known values, and (ii) comparing the
difference in distribution of proportions between two groups whereby each
group can have two or more categories.

Thus, even though the chi-square is often used with a 2 by 2 contingency


table, it can be extended to n by m table.

The Mann-Whitney U test is used to compare the differences between two


groups of samples from unrelated populations. It uses the median as the
parameter for comparisons and the test is used when the sample size is small
(less than 30 per group) and/or when the level of measurement is ordinal.

The Kruskal-Wallis test serves the same purpose as the one way ANOVA,
comparing the differences between more than two groups of samples from
unrelated populations. This test uses the median as the parameter for
comparisons.

The Kruskal-Wallis test is used when the sample size is small and/or when the
level of measurement is ordinal.

Copyright Open University Malaysia (OUM)

180 TOPIC 10 NON-PARAMETRIC TESTS

Chi-square test
Contingency table
Degree of freedom
Kruskal-Wallis test

Mann-Whitney test
Mean rank
Non-parametric
Parametric

Copyright Open University Malaysia (OUM)

APPENDIX

Copyright Open University Malaysia (OUM)

Copyright Open University Malaysia (OUM)

APPENDIX

183

Appendix A
Creating an SPSS Data File
After you have developed your questionnaire, you need to create an SPSS data file
to enable you to enter data into a format which can be read by SPSS. You can do
this via the SPSS Data Editor which is inbuilt into the SPSS package. When
creating an SPSS data file, your items/questions in the questionnaire will have to
be translated into variables. For example, if you have a question What is your
occupation? and this question has several response options such as 1. Salesman
2. Clerk 3. Teacher 4. Accountant 5. Others; what you need to do is to translate
your question into a variable a name, perhaps called occu. In the context of SPSS
data entry, these response options are called value labels, for example Salesman is
assigned a value label of 1, Clerk 2, Teacher 3, Accountant 4 and Others 5. If the
respondent is a teacher, you enter 3 when inputting data into the variable occu in
your data file. Sometimes you may have a question which requires the respondent
to state in absolute terms such as Your annual salary is _________ In this case,
you can create a variable name called salary. Since this variable only requires the
respondent to state his/her salary, you do not need to create response options
just enter the actual salary figure.
When defining the variable name, you have to consider the following:
(i)

it can only have a maximum of 8 characters (however version SPSS 12.0


and above allows up to 64 characters);

(ii)

it must begin with a letter;

(iii) it cannot end with a full stop or underscore;


(iv) it must be unique, i.e. no duplication is allowed;
(v)

it cannot include blanks or special characters such as !, ?, , and *.

When defining a variable name, an uppercase character does not differ from a
lower case character.
Besides understanding the variable name convention and value labels, you will
also need to know other variable definitions such as variable label, variable type,
missing values, column format and measurement level. A variable label describes
the variable name, for example, if the variable name is occu, the variable label can
be Respondents occupation. You need not specify the variable label if do not
wish to but variable label improves the interpretability of your output especially if
you have many variables. Missing values can also be assigned to a variable. It is
Copyright Open University Malaysia (OUM)

184 APPENDIX

rare for one to obtain a questionnaire without any item being left blank. By
convention, a missing value is usually assigned a value of 9 but for statistical
analysis it would be preferable to assign a value which is equivalent to the mean
of the variable to fill up all the missing values. However, this can only be done for
interval or ratio level variables. For example, if you have the variable income and
data were derived from 150 respondents and 20 did not provide their income
information then compute the mean of the income via SPSS for the 150
respondents and then recode all missing values as the computed mean value.
The type of variable relates closely to your items in the questionnaire. For
example, the item age is a numeric variable, meaning you can input the variable
using only numbers such as if a persons age is 34 then you can type 34 under the
age variable column for this particular case. However, sometimes there is a need
to use alphanumeric characters to input data into a variable. A good example is
respondents address. In this case, alphanumeric characters constitute what is
called a string variable type. For example, a short open-ended question will be
Please state your address. The respondent will write his/her address using
alphanumeric characters such as 23 Jalan SS2/75, 47301 Petaling Jaya, Selangor.
So this address is actually a combination of alphabets and numbers.
The column format in the data editor allows you to specify the alignment of your
data in a column, for example left, centre or right. Measurement in the SPSS
variable definition convention differs slightly from that used in the statistics
textbook as SPSS uses scale to refer to both interval and ratio measurement.
Ordinal and nominal levels of measurement are maintained as they are. In
statistical analysis, it is extremely important to know what the level of
measurement for a particular variable is. A nominal variable (also called
categorical variable) classifies persons or objects into two or more categories,
for example, the variable gender is categorised as 1 for Male and 2 for Female,
marital status as 1 for Single, 2 for Married and 3 for Divorced. Numbering in
nominal variables does not indicate that one category is higher or better than
another, for example, representing 1 for Male and 2 for Female does not mean
that male is lower that female by virtue of the number being smaller. In nominal
measurement the numbers are only labels. On the other hand, an ordinal variable
not only classifies persons or objects; they also rank them in terms of degree.
Ordinal variables put persons or objects in order from highest to lowest or from
most to least. In ordinal scale, intervals between ranks are not equal, for
example, the difference between rank 1 and rank 2 is not necessarily the same as
the difference between rank 2 and rank 3. For example, a person(A) with a
height of 5 10 and falls under rank 1 does not have the same interval as a
person(B) with a height of 5 5 who is ranked 2 and another person(C) with a
height of 4 8 who is ranked 3. The difference in height among the three
Copyright Open University Malaysia (OUM)

APPENDIX

185

persons is not equal but there is an order, i.e. A is taller than B and B is taller
than C.
Interval variables have all the characteristics of nominal and ordinal variables but
also have equal intervals. For example, achievement test is treated as an interval
variable. The difference in a score of 50 and a score of 60 is essentially the same
as the difference between the score of 80 and 90. Interval scales, however, do not
have a true zero point. Thus, if Ahmad has a score of 0 for Mathematics it does
not mean he has no knowledge of mathematics at all nor does Muthu scoring 100
means he has total knowledge of Mathematics. Thus, if a person scores 90 marks
we know he scores twice as high as one who scores 45 but we cannot say that a
person scoring 90 knows twice as much as a person scoring 45.
Ratio variables are the highest, most precise level of measurement. This type of
variable has all the properties of the other types of variables above. In addition, it
has a true zero point. For example a persons height a person who is 6 feet tall is
twice as tall a person who is 3 feet tall. A person who weighs 50 kg is one third
the weight of another who is 150 kg. Since ratio scales encompass mostly physical
measures they are not used very often in social science research.
In SPSS, interval and ratio measurements are classified as scale variables.
Nominal and ordinal measurements remain as they are, i.e. nominal and ordinal
variables respectively.
A good understanding of the level of measurement will be useful when defining
the variables via the SPSS Data Editor and in the data analysis process. But before
you proceed to the next phase of data analysis, you need to enter data into a
format which can be read by SPSS. There are several ways you may do this, using
i. SPSS Data Editor ii. Excel iii. Access and iv. Word. The steps to enter data via
the SPSS Data Editor are described below.
How to define variables and enter data using the SPSS Data Editor?
Steps
1.

Click Start All Programs SPSS for Windows SPSS 12.0 for
Windows select Type in data OK Variable View Start defining
your variables by specifying the following:
(a)

Name: Type Gender <Enter>

(b)

Type: Select Numeric OK


Copyright Open University Malaysia (OUM)

186 APPENDIX

(c)

Width: 8

(d)

Decimal: 0

(e)

Label: Respondents gender

(f)

Values: Under Value, type 1; under Value Label, type Male; Click
Add

(g)

Under Value again, type 2; under Value Label, type Female

(h)

Click Add

(i)

Missing: No missing values OK

(j)

Columns: 8

(k)

Align: Right

(l)

Measure: Nominal

2.

Proceed to define the second variable and so forth until you have completed
all variables in your questionnaire. Do note that certain variables such as ID
do not have value labels. If you are not sure what the level of measurement
for that particular variable is, you may want to keep the default which is
Scale. Do remember that if the particular variable you are defining share the
same specification such as the variable label of a variable you have already
defined, then you may merely copy it into the relevant cells.

3.

After you have completed defining all your variables, the next step is to
enter data into the data cells by doing the following:
(a)

Click Data View

(b)

Click row 1, column 1 (note the variable name as shown)

(c)

Type in the data e.g. if the respondents gender is male, then type 1
and then proceed to the next variable by pressing the right arrow key
() on your keyboard.

(d)

Input the next variable and so on so forth until you have completed all
your data input.

Copyright Open University Malaysia (OUM)

APPENDIX

Copyright Open University Malaysia (OUM)

187

188 APPENDIX

Copyright Open University Malaysia (OUM)

APPENDIX

Copyright Open University Malaysia (OUM)

189

190 APPENDIX

Copyright Open University Malaysia (OUM)

APPENDIX

Copyright Open University Malaysia (OUM)

191

MODULE FEEDBACK
MAKLUM BALAS MODUL

If you have any comment or feedback, you are welcome to:


1.

E-mail your comment or feedback to modulefeedback@oum.edu.my

OR
2.

Fill in the Print Module online evaluation form available on myVLE.

Thank you.
Centre for Instructional Design and Technology
(Pusat Reka Bentuk Pengajaran dan Teknologi)
Tel No.:

03-27732578

Fax No.:

03-26978702

Copyright Open University Malaysia (OUM)

You might also like