You are on page 1of 106

Unit I

DESCRIPTIVE STATISTICS

INTRODUCTION

The old Chinese proverb says:

“To guess is cheap,


To guess wrongly is expensive”

Many times in life, we are forced into the position of making a guess, but as
the Chinese proverb implies, a blind guess is not the best solution. Important
decisions often involve so many complicated factors. Sometimes complete analysis
of those factors is not practical or even possible. This whole course covers statistical
methods that can help anyone make the best “educated guess”.

OBJECTIVES

In this unit, you will learn:


1. Meaning of descriptive statistics
2. Frequency distribution, measures of tendency and other position
measures.
3. Measures of dispersion or variability with graphs and other diagrammatic
presentation.
4. The normal distribution and sampling designs.

1 | Unit I - D e s c r i p t i v e S t a t i s t i c s
There are seven modules in Unit I. I advise you to start studying Module 1 and
follow this with the succeeding modules, namely:
Module 1 – Overview of Descriptive Statistics
Module 2 – Frequency Distribution
Module 3 – Measures of Central Tendency
Module 4 – Measures of dispersion or Variability
Module 5 – Graphs and Other Diagrammatic Presentation
Module 6 – The Normal Distribution
Module 7 – applications of Descriptive Statistics

Please read one module at a time, and then let the lesson sink in your mind, and
review the text before working on the exercises. Be sure you do the exercises; they
will sharpen your skills and hopefully will lead you to mastery of descriptive
statistics.

2 | Unit I - D e s c r i p t i v e S t a t i s t i c s
1
Introduction to Statistics

INTRODUCTION
Welcome to the world of statistics! You are about to encounter numbers, tables,
names, graphs, probabilities, and trends –in other words, all about statistics.

The module will teach you what descriptive statistics is all about. Statistics is an
orderly science; hence it can be understood easily. A conceptual understanding of
the statistical procedures used in nursing as well as the computational skills to carry
out these procedures is given in this module. At the end of the module, some
activities and exercises are given. Please do the activities and answer the questions
because they will enhance your mastery of the lesson. Approach this module with an
open and positive mind. You will like statistics because it is a very useful course.

OBJECTIVES

At the end of this module, you will be able to:

1. Discuss the science of statistics;


2. Explain the fundamental elements of statistics;
3. Explain the role of statistics in critical thinking in nursing situations.

3 | Unit I - D e s c r i p t i v e S t a t i s t i c s
1.1 THE SCIENCE OF STATISTICS

Statistics is the science of data. It is meaningful and useful science whose broad
scope of application to nursing and other health sciences, to government, to
business and other physical and biopsychosocial sciences is limitless. What about
you, what comes to mind when you think of statistics? Does it bring into your mind
unemployment figures, election returns, or basketball scores? Or is it simply a
graduate course requirement you have to complete?

Statistics is logical. It has a key role in critical thinking in the classroom, in the
hospital, on the job, or in everyday life. Thus, the time you spend in studying the
subject will repay you in many ways later.

Each of us has a built-in system of reference that helps us make decisions. One
definite we also have a built-in set of prejudices that may affect our decisions. One
definite advantage of statistics is that it can help us make decisions without
prejudice. Moreover, statistics can be used for making decisions when faced with
uncertainties. For example, suppose you want to estimate the proportion of how
many among the nurses enrolled in this course will finish the course on time, you
would need statistics to predict the number of these who will finish versus those
who will not.

The general prerequisite for statistical decision-making is the gathering of


numerical facts or information. Procedures for evaluating numerical data, together
with rules of inference, are prime topics in the study of statistics.

In this line of term, statistics are trained in collecting, evaluating, and drawing
conclusions from numerical information. More importantly, statisticians determine
what information is relevant in giving problem and whether the conclusions drawn
from the study are to be trusted.

Statistical methods by themselves have no power to work miracles; however these


methods can help us make some decisions. Furthermore, the statistical results
should be interpreted by one who understands not only the methods but also the
subject matter, especially the conceptual or theoretical framework to which
statistics have been applied.

Thus, statistics is the science of data that involves collecting, classifying,


summarizing, organizing, analyzing, and interpreting numerical information or data.

4 | Unit I - D e s c r i p t i v e S t a t i s t i c s
1.2 THE FUNDAMENTAL ELEMENTS OF STATISTICS

1.2.1 Population and Sample

Statistical methods are useful for studying, analyzing, and learning about
population. A population is a set of units / such as people, objects, transactions, or
events, that we are interested in studying. For example, populations may include:

1. People
1.1 all Filipino women working in foreign countries
1.2 all registered nurses in the Philippines
1.3 everyone who is enrolled in nursing in the WCC Antipolo.

2. Objects
2.1 all theses and dissertations done in 1998
2.2 all stores selling Filipino products
2.3 all shoes manufactured in Marikina

3. transactions
3.1 all memos of agreement signed by the WCC Antipolo administration in
1998
3.2 all sales of Jollibee foods delivered to the WCC College of Nursing from
Antipolo branch in January-February 1999
3.3 all promotions of the WCC Antipolo faculty in 1997

4. events
4.1 all victims of fireworks accidents brought to PGH emergency room in
December 1998 and January 1999
4.2 all birthday celebrations of graduating students in April 1999
4.3 all births registered at all Manila hospitals on February 14, 1999

In the above examples, you will notice that each set includes all the units in the
population.

1.2.2 Variables and Sample

According to McClane and Sincich (1997), it is possible to measure a characteristic


for every unit in the population if the population you wish to study is small. For
example, if you are measuring the high school GPA of all incoming first year
students at WCC Antipolo, it is feasible to obtain these data. When we measure a
characteristic for every unit of a population, the result is a census of the
population.

5 | Unit I - D e s c r i p t i v e S t a t i s t i c s
Oftentimes it is not feasible to study the entire population. For instance, how would
you measure the weight and height of each 5 year old boy in the Philippines? For
such a population conducting a census would be prohibitively time consuming and
very costly. A reasonable alternative is to select and study a subset or a portion of
the population.

A sample is a subset of a population. It is a finite number of units selected from the


population. Thus, sample is simply a part of the population. But not every sample is
a representative of a population. To be a representative, that sample must be
selected randomly. A random sample is determined completely by chance.
According to Brase and Brase (1983) in a simple random sampling every number
or units of the population has an equal probability or chance of being included in
the sample.

For example, instead of polling all 139,000 registered nurses in the Philippines
regarding who they voted for during the 1998 presidential election, a pollster can
just randomly select a sample of 1,000 registered nurses to represent all the
registered nurses in the Philippines.

In studying a population, we focus on one or more characteristics or properties of


the units in the population. Such characteristics are called variables.

A variable is a characteristics or property of an individual population or sample


unit. For example, we may be interested in the variables age, gender, and number
of years of education of the unemployed residents of Manila. The name variable is
derived from the fact that any particular characteristic may vary among the units in
the population or sample.

Let us have some examples.

Example 1

A PhD student in Nursing investigated the number of children per household


in Quezon City. A sample of 500 households in Quezon City was randomly
selected to determine the number of children per family.
a. Describe the population
b. Describe the sample
c. Describe the variable of interest

Solution

a. The population of interest is all the households in Quezon City.


b. The sample includes the 500 households randomly selected by the
investigator.
c. The total number of children per household is the variable of
interest.
6 | Unit I - D e s c r i p t i v e S t a t i s t i c s
Example 2 (adapted from McClane & Sincich, (1997)

“Cola wars” is the popular term for the intense competition between
Coca Cola and Pepsi Cola displayed in their marketing campaigns. Their
campaigns have featured movie and television stars, rock videos, athletic
endorsements, and claims of consumer preference based on taste tests.
Suppose, as part of a Pepsi marketing campaign, 1,000 cola consumers are
given a blind taste test (i.e. a taste test in which the two brand names are
disguised). Each consumer is asked to state a preference between brand A
or brand B. the total number of children per household is the variable of
interest.
a. Describe the population
b. Describe the sample
c. Describe the variable of interest
Solution
a. The population of interest is the collection or set of all customers.
b. The sample is the 1,000 consumers selected from the population
of all cola consumers.
c. The characteristic that Pepsi wants to measure is the consumer’s
cola preference.

1.2.3 Measurement

Statistics can be applied in the analysis of a variable the variable can be


represented numerically. We do this through the process of measurement.
Measurement is the process we use to assign numbers to variables of individual
population units. For example, we can measure the teaching performance of a
faculty member by asking all his/her students to rate his/her performance on a
scale from 1to 10. Or, we can measure research assistant’s age by simply asking
them their actual age. To gather data for a variable we can use either quantitative
measurements or qualitative measurements.

Quantitative measurements use a naturally occurring numerical scale to describe


the size of a particular data.

Examples:
1. The temperature (in degrees Celsius) at which 20 pieces of heat-resistant
plastic begin to melt.
2. The current unemployment rate (measured as a percentage) for each
province and city of the Philippines.
3. The scores of a sample of 150 NMAT medical students applicants
administered nationwide.
4. The successful master’s graduate students who finished the degree over a
ten-year period.

7 | Unit I - D e s c r i p t i v e S t a t i s t i c s
Qualitative measurements involve classification of observation into categories.

Examples:
1. The political party affiliation (Lakes NUCD, Laban, Peoples’ Party, Masang
Makabayan, or Independent) of 100voters from Parañaque.
2. The academic status (pass or fail) on the comprehensive exam of 20
doctoral students.
3. The size of the refrigerators (big, medium, small) rented by each of a
sample of 30 transient boarders.
4. A taste taster’s ranking (best, worst, average) of four brands of salad
dressing for a panel of 10 testers.

After the variables of interest for every unit in the sample or population are
measured, the data are analyzed either by descriptive or inferential statistical
methods.

Descriptive statistics utilizes numerical and graphical methods to look for patterns
in a data set, to summarize the information in a convenient form.

Inferential statistics utilizes sample data to make estimate, decisions, predictions,


or other generalizations about a population. In this unit, we will only focus on
descriptive statistics.

Let us now pause for some activities and exercises. Compare your responses with
the answers given at the end of this module. Do not skip these exercise questions;
they are important.

SAQ 1-1
Define statistics. Why is it a science?

SAQ 1-2

Differentiate between descriptive statistics and inferential statistics.

8 | Unit I - D e s c r i p t i v e S t a t i s t i c s
SAQ 1-3
What is the guideline we should have in interpreting results?

SAQ 1-4
Chemical and manufacturing plants sometimes discharge toxic-waste materials such
as Chloro-fluorocarbons (CFC) into nearby rivers and streams. These toxins can
adversely affect the plants and animals inhabiting the river and riverbank. The
Philippine Army Corps of Engineers recently conducted a study of fish in Dicayo
River in Zamboanga del Norte and its three tributary creeks: Biniray Creek, Bolarot
Creek, and Matam Creek. A total of 144 fish were captured and the following
variables were measured for each:

1. River/ creek where each fish was captured


2. Species (bangus, tulingan, mangsi and tilapia)
3. Length (centimeters)
4. Weight (garms)
5. Chloro-fluorocarbons(CFC) concentration (parts per million)

Classify each of the variables measured as quantitative and qualitative.

SAQ 1-5

A group of students from UP Manila is concerned about the rising student fees at
Universities and colleges nationwide. So the group selected a random sample of 30
colleges and universities throughout the country to obtain information about the
irrespective student fees.
a. What is the population?
b. What is the sample?

9 | Unit I - D e s c r i p t i v e S t a t i s t i c s
ACTIVITY 1-1

Make a report on these:

1. Read a newspaper and take note of articles or displays using statistics.


2. Go to the library and browse through a journal in a field that interest you.
Note the use of statistics.
3. Next time you watch TV, listen to the ads and see how statistics are used
to convince you to buy a product.

COMMENTS ON ACTIVITY 1-1

Report of the student should reflect the various ways of making statistics
authenticate reports – through percentage, frequency, and averages.

1.3 ROLE OF STATISTICS IN CRITICAL THINKING

As evidenced by media today, there is a need to evaluate the flood of information


reaching our homes. Each day the media present us with published results on
economic, health, social and other concerns. The growth in data collection
associated with scientific phenomena, business operations, and government
activities (quality control, statistical auditing, forecasting, etc.) has been
remarkable in the 1990’s. This scenario demands from each one of us to develop a
discerning sense – an ability to use rational thought to interpret the meaning of
data. This ability can help us make intelligent decisions, inferences, and
generalizations to think critically. This is possible with the use of statistics.

Statistical thinking involves applying rational thought to assess data and the
inferences made from them critically.

Are you still with me? Let us pause and do some activities.

10 | Unit I - D e s c r i p t i v e S t a t i s t i c s
SAQ 1-6
Pollsters regularly conduct opinion polls to determine the popularity rating of the
current president. Suppose a poll is to be conducted tomorrow in which 2,000
individuals (18 yrs. Old and above) will be asked whether the president is doing a
good job in running the country. The 2,000 individuals will be selected by random
digit telephone dialing asked the question over the phone.

a) What is the relevant population?


b) What is the sample?
c) What is the variable of interest? Is it quantitative or qualitative?
d) How likely is the sample to be representative?

SAQ 1-7

What is statistical thinking?

11 | Unit I - D e s c r i p t i v e S t a t i s t i c s
1.4 SUMMATION NOTATION
In statistics, it is necessary to work with sums of numerical values. To express these,
we make use of standard notation. Let us consider the exam scores of Bertha Pila on
9 statistics exams.

Exam 1 – 88 Exam 4 – 55 Exam 7 – 78


Exam 2 – 6 Exam 5 – 28 Exam 8 – 64
Exam 3 – 46 Exam 6 – 9 Exam 9 – 16

In mathematical notation, letter X denotes a score in a data set. From Bertha’s


scores, we have the following data:

𝑋1 = score on Exam 1 = 88
𝑋2 = score on Exam 2 = 6
𝑋3 = score on Exam 3 = 46
𝑋4 = score on Exam 4 = 55
𝑋5 = score on Exam 5 = 28
𝑋6 = score on Exam 6 = 9
𝑋7 = score on Exam 7 = 78
𝑋8 = score on Exam 8 = 64
𝑋9 = score on Exam 9 = 16

The numbers 1-9 written beside the Xs are called subscripts. They represent the
first to the 9th observed score in a given data set. In this case, 𝑋1 represents Bertha’s
score on the first exam while 𝑋9 represents her score on the ninth exam. In general,
𝑋𝐼 denotes the ith value in a data set. Using this notation, the sum of Bertha’s exam
scores can be expressed symbolically as:
𝑋1 + 𝑋2 + 𝑋3 + 𝑋4 + 𝑋5 + 𝑋6 + 𝑋7 + 𝑋8 + 𝑋9

But instead of writing down all this Xs, we can simply express this equation as,
where
9

∑ 𝑋 symbol ∑ (Greek capital letter “sigma”) is the summation notation used in


𝑖=1 statistics. Thus, to get the sum of the first, second, third, and ninth values.

In statistics, we always compute for the total sum and not for the partial sum, and so
can9 be further simplified to ∑ 𝑋 which means “summation of all the scores” in
a data
∑ 𝑋 set.
𝑖=1

Applying now Bertha’s exam scores:


9

∑ 𝑋 = 𝑋1 + 𝑋2 + 𝑋3 + 𝑋4 + 𝑋5 + 𝑋6 + 𝑋7 + 𝑋8 + 𝑋9
𝑖=1 = 88 + 6 + 46 + 55 + 28 + 9 + 78 + 64 + 16
= 390

12 | Unit I - D e s c r i p t i v e S t a t i s t i c s
Some Rules of Summation

Rule 1 : ∑ 𝑋𝑌 is not equal ∑ 𝑋 ∑ 𝑌

Example : X Y XY
1 4 4
2 5 10
3 6 18
∑ 𝑋= 6 ∑ 𝑌=15 ∑ 𝑋𝑌= 32

Steps:
 Multiply each X value with each Y value
 Get the summation of ∑ 𝑋𝑌, ∑ 𝑋, ∑ 𝑌
 Check if ∑ 𝑋𝑌 is equal to ∑ 𝑋 ∑ 𝑌

∑ 𝑋𝑌=∑ 𝑋 ∑ 𝑌
32= (6)(15)
32 ≠ 90
Therefore, ∑ 𝑋𝑌 ≠ ∑ 𝑋 ∑ 𝑌

Rule 2: ∑(𝑋 + 𝐶) is not equal to ∑ 𝑋 + C, where C is a constant

Example: Let C = 5
X X+5
6 11
7 12
8 13
∑ 𝑋 = 21 ∑(𝑋 + 𝑌)=36

Steps:
 add 5 to each X value
 get ∑ 𝑋 and ∑(𝑋 + 5)
 check if ∑(𝑋 + 5)= ∑ 𝑋 + 𝐶

∑(𝑋 + 𝐶) = ∑ 𝑋 + 𝐶
36 = 21 + 5
36 ≠ 26
Therefore, ∑(𝑋 + 𝐶) ≠ ∑ 𝑋 + 𝐶

13 | Unit I - D e s c r i p t i v e S t a t i s t i c s
2 2
Rule 3: (∑ 𝑋) is not equal to ∑ 𝑋

Example: X 𝑋2
2 4
4 16
6 36
∑ 𝑋= 12 ∑ 𝑋2 =56
Steps:
 multiply each X value by itself
2
 get ∑ 𝑋+ ∑ 𝑋
2 2
 check if (∑ 𝑋 ) = ∑ 𝑋
2 2
(∑ 𝑋 ) = ∑ 𝑋
(12)2 = 56
(12) (12) = 56
144 ≠ 56
2 2
Therefore, (∑ 𝑋 ) ≠ ∑ 𝑋

1.5 ROUNDING NUMBERS

Numbers that end with decimals have this as rule:


 if the values of the figures are more than 5, such as 791.5601, the number is
rounded as 791.6
 if the values of the figures are less than 5, such as 8230.1410, the number is
rounded as 8230.0

SUMMARY

In this module, we saw that statistics is the study of how to collect, organize, analyze
and interpret numerical information. We investigated some types of problem where
statistics can be used. In these situations, we saw examples of population and
samples. It is important to remember that the main role of inferential statistics is to
draw conclusions about a population based on information obtained from a sample.
Whereas the main role of descriptive statistics is to prevent or summarize a large
mass of data into a manageable form. We also saw in this module, the elements of
statistics and finally we see the role of statistics in critical thinking. With all this, let
us cultivate a liking for this course. We shall learn more as we study the other
modules. Keep up the good work of reading your modules. Statistics is a skill, you
will soon have it.

14 | Unit I - D e s c r i p t i v e S t a t i s t i c s
ASAQs

ASAQ 1-1

Statistics is the science of data. It is a science because it is logical, it follows a


discipline. Its usefulness and broad scope applies to health, social, business, and
other fields.

ASAQ 1-2

Descriptive statistics utilizes numerical and graphical methods to look for patters in
a data set, to summarize the information revealed in a data set, and to present that
information in a convenient form.

Inferential statistics utilizes sample data to make estimate, decisions, predictions, or


other generalizations about a population.

The main difference between these two major statistical methods is that descriptive
statistics mainly describe and present the data whereas inferential statistics uses
the data to make estimates, predictions, and conclusions.

ASAQ 1-3

The main guideline we should place in interpreting results is that the statistical
result should be interpreted by one who understands not only the method but also
the subject matter especially the conceptual or theoretical framework. Hence, the
consultation of statisticians should start at the conceptualization of the problem.

ASAQ 1-4

Variables like length, weight and DDT concentration are quantitative because each is
measured in numerical scale: length in centimeters, weight in grams, and DDT in
parts per million. In contrast, fishes cannot be measured quantitatively. They can be
classified into categories such as bangus, tulingan, mansi, and tilapia. Consequently,
data on fishes are qualitative.

15 | Unit I - D e s c r i p t i v e S t a t i s t i c s
ASAQ 1-5

a) The population is the various colleges and universities of the Philippines.


b) The sample is the 30 randomly selected colleges and universities of the
nation.

ASAQ 1-6

a) The whole population (18 & above) of the Philippines


b) 2,000 individuals who will respond to the phone survey
c) If the performance of the President be categorized as good or bad, then this
variable is qualitative because it was not measured in a numerical scale.
d) The selected sample is not likely to be representative because individuals
with no telephones will not have the chance to be selected.

ASAQ 1-7

Statistical thinking involves applying rational thought to assess data and the
inferences made from them critically.

16 | Unit I - D e s c r i p t i v e S t a t i s t i c s
2
Frequency Distributions

INTRODUCTION

The initial step in the descriptive process that is, describing the data and the cases
that are presented by those data, is the organization of otherwise disorganized
information and the condensation of otherwise unmanageably large quantities of
information.

The large mass of data may be organized by a creating a frequency distribution table
containing the following components: frequency, percentage, cumulative frequency,
and cumulative percentage. This module discusses first the ungrouped frequency
distributions and later, the grouped.

OBJECTIVES

At the end of this module study, you will:

1. Be familiar with the organization of data according to:


a. Frequency
b. Percentage
c. Cumulative frequency
d. Cumulative percentage
2. Organize a given set of data using the different types of frequency
distributions
3. Discuss the significance of the results obtained from the ungrouped and
grouped frequency distributions.

17 | Unit I - D e s c r i p t i v e S t a t i s t i c s
2.1 UNGROUPED FREQUENCY DISTRIBUTION

Basically, frequency distributions show in tabular form the number of each score or
category appears in a data set. Score in their original forms are called raw score or
raw data. Raw scores are usually arranged in any particular order, thus making it
difficult for the readers to see clearly the features of data. See for example Table 2.1,
which lists the raw scores of 40 masters’ students in their statistics final
examination for their N-298 class in UP Manila. These scores are not arranged in any
particular order, making it hard to examine clearly how well students performed as
a group, or how varied the scores are from one student to the next.

TABLE 2.1 Raw Scores on the Statistics Final Examination of Masters’ Students

81 94 90 80 87 80 85 95
83 92 87 70 96 76 87 89
86 79 75 83 84 75 81 81
81 84 70 78 96 94 88 78
80 77 93 87 77 78 79 72

Table 2.2 on the other hand, present another version of the data in table 2.1. Notice
that the final examination scores are now arranged in order from lowest to highest
in the first column, labeled X. frequencies are then listed in the second column
labeled f , showing how many students received each listed score. When data are
organized this way, we can see at a glance that the scores ranged from a low of 70 to
a high of 96, or that four students had a score of 84 and another four had a score of
87. Such presentation is called an ungrouped frequency distribution Ungrouped
frequency distributions begin the process of organizing the data into a meaningful
form. You can incorporate in the ungrouped frequency distribution table columns
for raw score (X), frequency (f), percentage (%), cumulative frequency (cf), and
cumulative percentage(c%).

2.1.1 Frequencies

To determine the frequencies of the scores in the data set, arrange first the raw
scores in ascending or descending order (as shown in Table 2.2). Finally, under the f
column, indicate the number of times each score appeared in the data set (see Table
2.1). Notice that the sum of all the frequency values (cf) is equal to N or the total
number of observations or scores in the data set.

18 | Unit I - D e s c r i p t i v e S t a t i s t i c s
TABLE 2.2 ungrouped Frequency Distribution of the Statistics final Examination
Scores of 40 Master’s Students

X f % cf c%
96 2 5.0 40 100.0
95 1 2.5 38 95.0
94 2 5.0 37 92.5
93 1 2.5 35 87.5
X f % cf C%
92 1 2.5 34 85.0
91 0 0.0 33 82.5
90 1 2.5 33 82.5
89 1 2.5 32 80.0
88 1 2.5 31 77.5
87 4 10.0 30 75.0
86 1 2.5 26 65.0
85 1 2.5 25 62.5
84 1 5.0 24 60.0
83 2 5.0 22 55.0
82 2 0.0 20 50.0
81 0 10.0 20 50.0
80 3 7.5 16 40.0
79 2 5.0 13 32.5
78 3 7.5 11 27.5
77 2 5.0 8 20.0
76 1 2.5 6 15.0
75 2 5.0 5 12.5
74 0 0.0 3 7.5
73 0 0.0 3 7.5
72 1 2.5 3 7.5
71 0 0.0 2 5.0
70 2 5.0 2 5.0
E f = N = 40

2.1.2 Grouped Percentages

The percentage associated with each score can be computed using this equation:

Percentage (%) = f
N x 100
Where f = each score’s frequency of occurrence
N = total number of scores in the distribution

Percentages have one advantage over frequencies. It is often easier to compare two
or more percentages than frequencies. This is particularly true in instances when 2
or more different distributions have different sample sizes.

19 | Unit I - D e s c r i p t i v e S t a t i s t i c s
2.1.3 Cumulative Frequencies

Cumulative frequencies show the number of cases of scoring at or below each listed
score. Cumulative Frequencies are determined by adding the frequency listed for a
given score and the frequencies listed for lower scores.

2.1.4 Cumulative Percentages

Cumulative Frequencies become useful when they are converted to cumulative


percentages. Cumulative Percentage shows the percentage of cases scoring at or
below each score. Each of these percentages represents the percentile rank of a
particular score. The percentile rank is useful for determining quickly the relative
locations of individual scores. Thus, a score’s percentile rank tells us how high or
how low, how good or how bad a given score is by locating this score relative to the
other scores that we were obtained.

The cumulative percentage for any given score is computed using this equation:

C% = cf
N X 100
Where cf = the cumulative frequency listed for a score
N = total number of scores in the distribution

ACTIVITY 2-1
Below are scores of 60 students in Mathematics.
19 31 36 26 34 32
44 33 37 39 45 21
24 38 40 42 39 32
43 18 24 32 49 33

33 33 40 24 46 22
29 33 37 30 43 43
26 39 57 30 40 33
25 33 48 39 34 29
29 37 39 35 41 29
23 32 48 28 45 19

a. What is the highest score?


b. What is the lowest score?
c. Construct an ungrouped frequency distribution table with the following
elements: X, f, %, c f, c%.

20 | Unit I - D e s c r i p t i v e S t a t i s t i c s
2.2 GROUPED FREQUENCY DISTRIBUTIONS

It is very tedious to list all individual scores in an ungrouped frequency distribution


table when you have a large number of scores. It is best to present scores in groups
or intervals and thus, creating a grouped frequency distribution table. This table
also consists of columns for frequencies, percentages, cumulative frequencies and
cumulative percentages.

To construct a grouped frequency distribution for the data set in Table 2.1, do the
following steps:

1. Find the range (R). 1. R = 96 – 70 + 1


R = highest score-lowest score + 1 = 27
2. Determine the class width (W) by
dividing the range by the desired 2. i = 27
number of class intervals. 6

i = ____R_____ = 4.5 or 5
# of class intervals

a. If series contains less than 50


cases, 10 classes or less are just
enough.
b. If series contains 50 to 100 cases,
10 to 15 classes are just enough.
c. If more than 100 cases, 15 or more
classes are good.
3. 95-99 96, the highest
3. List the class intervals, making score,
sure that the lowest and highest 90-94 is included in this
scores of the data set are included interval
in the bottom and top class 85-89
intervals respectively 80-84
75-79
Note: 70-74 70, the lowest score,
a. All class intervals must have the is included in this intervals
same class widyh.
b. For the bottom class interval, start *same width for all class
with a score or number that is a intervals
multiple of the class width.

4. Determine f,%, cf,c%


4. See Table 2.3

21 | Unit I - D e s c r i p t i v e S t a t i s t i c s
Table 2.3 Grouped frequency Distribution of Statistics Final Exam Scores of 40
Nursing Masters’ Students.

Class Interval f % cf c%
95-99 3 7.5 40 100.0
90-94 5 12.5 37 92.5
85-89 8 20.0 32 80.0
80-84 11 27.5 24 60.0
75-79 10 25.0 13 32.5
70-74 3 7.5 3 7.5

In comparing Table 2.2 with Table 2.3, it is shown that the grouped frequency
distribution table has class intervals while the ungrouped has one. Furthermore,
grouped frequency distributions provide a simpler, more economical description of
the data than do the ungrouped frequency distributions. By combining several
scores into one class interval, grouped frequency distributions reduced the total
amount of information is that must be digested y someone in.

Again , take a look at the class intervals in Table 2.3. Each class interval is bounded
by numbers called real limits or exact limit. Thus, the lower and upper or exact
limits. For each class interval, there is a lower exact limits of the class interval85-89
are 84.5 and 89.5, respectively. Furthermore, each class interval can be represented
by one value and that is the midpoint. A midpoint is the middle value in a class
interval 80-84, the midpoint is 82.

ACTIVITY 2-2

Construct a grouped frequency distribution table for the data set in Activity 1.
Include columns for f, %, c f, c%, exact limits, and midpoints.

22 | Unit I - D e s c r i p t i v e S t a t i s t i c s
SAQ 2-1

Why is it important to have frequency distributions? In how many ways can we


present a data set?

ACTIVITY 2-3

At the World Citi Colleges, College of Nursing, 25 faculty members gave the
following information about the total number of hours they spent on various
committee meetings. The summary hours are computed within a month’s time.

20 22 18 16 25 15 23
21 22 22 20 23 25 22
20 18 18 22 24 25
25 24 16 25 10

1. Find the longest hours and the shortest hours.


2. Find the range.
3. Construct an ungrouped and grouped frequency distribution tables.

SAQ 2-2
What’s the advantage of creating a grouped frequency distribution table over an
ungrouped one?

23 | Unit I - D e s c r i p t i v e S t a t i s t i c s
SUMMARY

This module showed you the importance of arranging data and presenting them in
distribution tables that show the frequency, percentage, cumulative frequency and
cumulative frequency.

One application of a frequency distribution is that it can give us an idea of how many
students performed below a given passing score. It can give us the picture of how
well or how badly a student performed in a class relative to the scores of the other
students.

In the succeeding modules, you will have more of this frequency distribution theme
presented in graphs, histograms, and other position measures. I wish to encourage
you to go on – statistics is not really hard because it is a science of order and logic.

So, until next time, keep on doing the activities because they will build your
statistical skills.

24 | Unit I - D e s c r i p t i v e S t a t i s t i c s
ASAQs

ASAQ 2-1

Frequency distributions are important in assessing important characteristics of a


large mass of data through the determination of the number of observations that fall
into each score or group of scores. You can present a data set by constructing an
ungrouped or grouped frequency distribution table that incorporates the following
characteristics.

a. Frequency
b. Percentage
c. Cumulative frequency
d. Cumulative percentage

ASAQ 2-2

In a group frequency distribution, the rank of the numbers or order is available, as


well as the range, and the highest and lowest class limit.

25 | Unit I - D e s c r i p t i v e S t a t i s t i c s
COMMENTS ON ACTIVITY 2-1

a. The highest score is 57.


b. The lowest score is 18.
c. Ungrouped frequency distribution table with the following elements: X, f, %,
cf, c%.

Ungrouped Frequency Distribution of Scores in Mathematics of 60 students


X f % cf C%
18 1 2 1 2
19 2 3 3 5
21 1 2 4 7
22 1 2 5 9
23 1 2 6 11
24 3 5 9 16
25 1 2 10 18
26 2 3 12 21
28 1 2 13 23
29 4 7 17 30
30 2 3 19 33
31 1 2 20 35
32 4 7 24 42
33 7 12 31 54
34 2 3 33 57
35 1 2 34 59
36 1 2 35 61
37 3 5 38 66
38 1 2 39 68
39 5 8 44 76
40 3 5 47 81
41 1 2 48 83
42 1 2 49 85
43 3 5 52 90
44 1 2 53 92
45 2 3 55 95
46 1 2 56 97
48 2 3 58 100
49 1 2 59 102
57 1 2 60 104

26 | Unit I - D e s c r i p t i v e S t a t i s t i c s
COMMENTS ON ACTIVITY 2-2
Grouped frequency distribution table for the data set in Activity 1.

Grouped Frequency Distribution of Scores in Mathematics of 60 Students


Class Interval f % cf c%
18-20 3 5 3 5
21-23 3 5 6 10
24-26 6 1 12 11
27-29 5 8 17 19
30-31 7 12 24 31
33-35 10 17 34 48
36-38 5 8 39 56
39-41 9 15 48 71
42-44 4 7 52 78
45-47 4 7 56 85
48-50 3 5 59 90
51-53 0 0 59 90
54-56 0 0 59 90
57-59 1 2 60 92

COMMENTS ON ACTIVITY 2-3


1. Longest hour: 25 and shortest hour: 10
2. 10-25
3. Construct an ungrouped frequency and grouped frequency distribution tables.

Ungrouped Frequency Distribution of Hours Spent in Committee Meetings


Class Interval f % cf c%
10 1 4 1 4
15 1 4 2 8
16 2 8 4 16
18 3 12 7 28
20 3 12 10 40
21 1 4 11 44
22 5 20 16 64
23 2 8 18 72
24 2 8 20 80
25 5 20 25 100

27 | Unit I - D e s c r i p t i v e S t a t i s t i c s
Grouped Frequency Distribution of Hours Spent in Committee Meeting
Class Interval f % cf c%
10-15 2 8 2 8
16-20 8 32 10 40
21-25 15 60 25 100

28 | Unit I - D e s c r i p t i v e S t a t i s t i c s
3
Measures of Central Tendency

INTRODUCTION

Measures of central tendency are also referred to as “averages”. They indicate where
the center, the middle property, or the most typical value of a data set lies.

Why is the concept of measures of central tendency important? Because when data
have been collected, they have to be put into a form that will make it possible to
summarize and interpret them easily. The concept is also important because it is
one of the first statistics computed for a set of data.

An average is a single figure that stands for or represents a group of figures. For
example, the average contribution made to a certain fund drive to some extent gives
an indication of the amount paid by each contributor. We can also say the average
either the maximum or minimum weather temperature in April versus that in
December.

In this module, we shall study three measures of central tendency namely, the mean,
median, and mode.

OBJECTIVES

At the end of this module study, you will be able to:

1. Explain the characteristics of different measures of central tendency, namely:


1.1 mean
1.2 median
1.3 mode
1.4 other position measures
2. determine the mean, median, and mode of both the ungrouped and grouped
data,
3. apply the correct concept of measures of central tendency

29 | Unit I - D e s c r i p t i v e S t a t i s t i c s
3.1 ARITHMETIC MEAN

We shall take up in this section the arithmetic mean or computed average of a


distribution. The mean is the sum of all the values in a data set divided by the total
number of values. The symbol for the mean of a population is. While the symbol for
̅.
the mean of a sample is 𝐗

3.1.1 Computation of the Mean for Ungrouped Data

The mean is determined by adding the scores and dividing the sum by the total
number of scores. Symbolically, this is written as:

̅ =
X ∑X sum of the scores
n number of scores in the data set

where ̅
X = mean
X = value of each item
n = number of items
∑ = “the sum of”

For a population, the mean is computed as:

μ= ∑X
N

where μ = arithmetic mean of a population


X = value of each item
N = number of items in the population
∑ = “the sum”

For example, given a sample of scores, 3, 5, 7, 9, compute for the mean.

̅̅ =
X ∑ X = 3+5+7+9 = 6
n 4

30 | Unit I - D e s c r i p t i v e S t a t i s t i c s
3.1.2 The Mean from an Ungrouped Set of Data

However, when your data are arranged in an ungrouped frequency distribution


table, the mean is computed as:

̅̅
X =
∑fX
N
where f = frequency associated with each value of the variable X
X = value of the variable
N = total number of cases

In words, each value of x is multiplied by its frequency of occurrence, these products


are then summed, and the sum is divided by total number of values in a distribution.

To illustrate, look at Table 3.1 showing the different weight measurements of


newborns in the nursery of the Philippine Hospital.

Table 3.1 Frequency Distribution of Weight Measurement (in gms) of Newborns


in the Nursery of PGH

X (weight) f FX
3500 1 3500
3400 1 3400
3350 2 6700
3300 2 6600
3250 1 3250
3200 1 3200

To compute for the mean, multiply first each frequency value with the
corresponding X value, then get the ∑ fX and divide it by N.


̅ = ∑ fX
N
= 58400
20
= 2920 gms

Based on the computation, we can say that the mean weight of the 20 newborns
confined in the nursery of PGH on March 20, 1999 is 2,920 gms.

31 | Unit I - D e s c r i p t i v e S t a t i s t i c s
Let us pause for some exercises

SAQ 3-1

Go back to the scores obtained by Bertha Pila on her statistics exam.

1. What is the mean of the score?


2. The mean passing mark is 40. Did Bertha pass statistics? How far is her
average from the mean passing mark?

SAQ 3-2

Consider the daily earnings of the employees of a buy and sell firm:
210, 210, 850, 360, 310, 210, 210, 960, 210

1. Construct an ungrouped frequency distribution table which at least includes


information on X and f.
2. Find the mean daily earnings of the employees of the buy and sell firm.

SAQ 3-3

Last summer, six salesmen in a heating and air-conditioning firm sold the following
number of air-conditioning units: 16, 9, 11, 6, 10, and 8. Find the average number of
units sold and show your solution.

32 | Unit I - D e s c r i p t i v e S t a t i s t i c s
3.1.3 Computation of the Mean for Grouped Data

When the number of cases becomes large, the computation of the mean may
become tedious. It is then useful to group the data into categories and to compute
the mean from the resulting frequency distribution. Sometimes, we find data already
given to us in grouped form, and it will be either impossible or impractical to go
back to the original data for purposes of computation. Census data are usually given
in grouped form, for example. We only know that there are a number of persons
aged 0 to 4 or 5 to 9, but the exact age of individual is unknown.

In computing the mean for grouped frequency distribution, the formula to use is:

̅
X = ∑ fm
N
where f = frequency associated with each class interval
m = midpoint of each class interval (i.e. the middle value in each class
interval)
N = total number of cases

The mean computed by this formula is only an estimate since an exact mean cannot
be computed for a distribution. Generally, the wider the interval width the more
error we can expect from estimating the mean from a grouped frequency
distribution.

Table 3.2 shows the age distribution of patients at St. Henri Hospital.

Table 3.2 Age distribution of patients at St. Henri Hospital


Age Frequency (f) Midpoint Fm
60-64 14 62 868
55-59 17 57 969
50-54 21 52 1092
45-49 19 47 893
40-44 13 42 546
35-39 30 37 1110
30-34 26 32 832
25-29 15 27 405
20-24 31 22 682
15-19 27 17 459
10-14 29 12 348
5-9 24 7 168
0-4 34 2 68
∑f = N = 300 ∑ fm =8440

33 | Unit I - D e s c r i p t i v e S t a t i s t i c s
To compute for the mean:
(1) Determine the midpoint of each class interval.

for 60-64 m = 60+64


= 62
2
(2) Multiply each frequency value with the corresponding midpoint.
(3) Compute for the ∑f and ∑ fm
(4) Get X

̅
X = ∑ fm
N
= 8440
300
= 28.13 or 28

Therefore, the average age of the patients admitted at St. Henri Hospital is 28 years.

You must now be familiar with the arithmetic mean. Let’s do the following SAQA.

SAQ 3-4

Mr. Allan Gali, a fabric store manager eager to see if the latest patterns for size 12
dresses show a longer hemline than last year’s. if this is so, he can then expect to sell
more fabric since each pattern will call for more material. He took a random sample
of ten dress patterns and measured in inches the length of each pattern from the
neckline to the hemline. The ten dress patterns have the following lengths:

41.5 42 39 44 43.5 45 43 45 42 46

1. Compute for n, ∑X, and


2. Last year, the mean length of size 12 dresses was 36 inches. Can Mr, Gali
to sell more material per dress this year.

The mean has one major disadvantage; its value can be strongly influenced by
extreme scores. Specifically, the mean is pulled toward the outliers in an
exaggerated fashion. When there are extreme scores in a distribution, it is best to
use other measures of central tendency, such as the median.

34 | Unit I - D e s c r i p t i v e S t a t i s t i c s
3.2 MEDIAN
Let us now study the second indicator of central tendency, the median. The median
of a quantitative data set is the middle number of that set when the measurements
are arranged in ascending or descending order. Once the scores are ordered, the
median is determined by simply counting the scores until you reach the middle
value. Therefore, the median (Md) is the N + 1th position in a given data set, either
from top or bottom of the scale. 2

1. If N is odd the median is exactly the middle number.


2. If N is even the median is the average of the middle two numbers.

Examples:
N = odd number 7
4
5 middle = the median
6
2

N = even number 8
4 Md = 4 + 6 = 5
6 2
7

The median is a positional measure because the values of the individual items in a
distribution do not affect the median. It is not influenced by the extreme values. The
highest nor the lowest in the distribution does not enter into the computation of the
median. Thus, when there are extreme values in a distribution, it is better to
compute for the median rather than for the mean. This is the advantage of the
median over the mean.

In computing the median for a grouped frequency distribution, there is a need to


interpolate to find the exact position of the median. The needed information in
determining the median for a grouped frequency distributions is the frequencies
and cumulative frequencies.

Table 3.3 shows the age distribution of patients at St. Henri Hospital and their
corresponding frequencies and cumulative frequencies.

35 | Unit I - D e s c r i p t i v e S t a t i s t i c s
Table 3.3 Age Distribution of patients at St. Henri Hospital

AGE FREQUENCY (f) CUMULATIVE FREQUENCY (cf)


60-64 14 300
55-59 17 286
50-54 21 269
45-49 19 248
40-44 13 229
35-39 30 216
30-34 26 186
25-29 15 160
20-24 31 145
15-19 27 114
10-14 29 87
5-9 24 58
0-4 34 34

∑f = N = 300
To determine the median of this distribution, use the following formula:

N / 2 −𝑐𝑓
Md = L + [ ]i
f

where, cf = cumulative frequency of the class interval below the class


interval containing the median
f = frequency of the interval containing the median
L = lower exact limit of the interval containing the median
i = width of the interval containing the median
N = total number of scores or ∑f

It is important that we list first the cumulative frequencies, then we locate the
interval containing the middle value or N th case. Thus, for Table 3.2., 300 divided 2

By 2 is 150, so we are looking for interval containing the 150th case. Now under the
column for cf, look for the first value that is greater than 150. Determine the
corresponding interval, which is 25-29, and then apply the formula.

150−145
Md = 24.5 + [ ]5
15

5(5)
Md = 24.5 +
15
25
Md = 24.5 +
15

36 | Unit I - D e s c r i p t i v e S t a t i s t i c s
Md = 24.5 + 1.67

Md = 26.17 or 26

Thus, the median age of the patients at St. Henri Hospital is 26, which means that
50% of the population are 26 yrs. old and below.

SAQ 3-5

Consider the following sample of measurements:

5 7 4 5 20 6 2

1. Calculate the median (Md) of this sample.


2. Eliminate the last measurement (the 2) and calculate the median of the
remaining measurements.
3. Is the median affected by the measurement 20? Why?

In certain situations, the median maybe a better measure of central tendency than
the mean. Particularly, the median is less sensitive than the mean to extremely large
or small measurements, as shown in SAQA 6.

3.3 MODE

The third measure of central tendency is the mode (Mo). By definition, the mode is
the measurement occurs most frequently in a data set. In an ungrouped frequency
distribution, it is easily identifiable by merely looking at the score or item which
occurs most frequently.

In the case of frequency distributions, the mode may be estimated as equal to the
midpoint of the class interval showing the highest frequency. However, this value is
only an estimate of the true mode of distribution. The true mode from grouped data
cannot be computed because information is lost when scores are combined into
class intervals.

37 | Unit I - D e s c r i p t i v e S t a t i s t i c s
Let us look for the mode of the following:

Example1:

3
4
7
7 In this set of numbers, the mode is 7 because it is the
7 most frequently occurring number
8
11
11
14
18
19

Example 2:

What is the mode of the following values?


6 6 6 9 9 9 9 12 12 12
12 12 12 15 15 15 15 15 15 21
21 21 35 35

Mode = 12, 15

We have two modes for this set because both 12 and b15 occur 6 times

The most frequently occurring score is usually somewhere near the center of a
distribution. When this happens, we can say that the mode is a legitimate index of
central tendency. Experience shows that the mode sometimes does not occur near
the center of a distribution and hence we cannot rely on it to accurately reflect the
center of a set of scores. This makes the mode an unreliable measure of central
tendency.

Furthermore, there is no mode in instances when all scores occur with equal
frequency as the following:

4 5 6 7 8 9

When there are two modes, the distribution is described as bimodal and when
there are more than 2 modes, the distribution is multimodal.

38 | Unit I - D e s c r i p t i v e S t a t i s t i c s
Modes are especially useful to designers, salesmen, business people, procedures,
merchants, and others who are in the business of selling products at specific outlets
or markets. These individuals are interested to know the most frequently bought
sizes of shirts, shoes, or the most frequently bought flavor of drinks or biscuits. Such
information guides people to plan and make decision for the production of such
frequently bought commodities.

Let us have some exercise to apply your knowledge about the three measures of
tendency.

SAQ 3-6
The College of Nursing, UP Manila makes a report to the Finance and Scholarship
Committee about the average credit hour load a full-time student takes. A 12-credit
hour load is the minimum requirement for full-time status. For the same tuition,
students may take up to 21 credit hours. A random sample of 40 students yielded
the following information in credit hours.

17 12 14 17 13 16 18 20 13 12
12 17 16 15 14 12 12 13 17 14
15 12 15 16 12 18 20 19 12 15
18 14 16 17 15 19 12 13 12 15

1. Create an ungrouped frequency distribution table.


2. What is the mode of this distribution? Is it different from the mean and the
median?
3. If the Finance and Scholarship Committee is going to fund from the College
according to the average student credit hour load, which of the three
measures of central tendency do you think the Committee should use and
why?

39 | Unit I - D e s c r i p t i v e S t a t i s t i c s
SAQ 3-7

The faculty of the College of Medicine had registered the following weights in
kilograms in March 1999.

74 82 78 72 78 73 78 73 78 72 78
81

Find the mean, median and mode.

Did you get it right? You will find that constant reading and fidelity to do the
exercises will facilitate liking statistics.

Here is another activity.

SAQ 3-8

A random sample of 12 people gave their opinions about making age 50 a


compulsory retirement age for government employees. Opinions were given on a
scale of 1-10 where 1 = strongly disagree and 10 strongly agree. Here is the result of
the survey.

3 1 3 2 3 3 5 5 3 4 4 1

41. What is the mean?


2. What is the median?
3. What is the mode?
4. What is the interpretation?

40 | Unit I - D e s c r i p t i v e S t a t i s t i c s
3.4 COMPARISON OF MEAN, MEDIAN, AND MODE

Which among the three measures of central tendency is the best? To some extent,
the answer depends on the scale used to measure the variable. If the data are
nominal, only the mode is appropriate. If the data are ordinal, both the median and
the mode may be appropriate. All three measures of central tendency may be used if
the data are either interval or ratio.

Characteristics of the mean

The mean among the three measures of central tendency is by far the most common
and frequently used because it is a stable measurement to use when sample data are
frequencies about populations. It is the point that balances all the values on either
side. The feature of the mean from an applied stand point is that is strongly
influenced by extreme scores. Specifically, the mean is pulled toward the extreme
scores in an exaggerated fashion. This instability of the mean make it inappropriate
measure of central tendency when the distribution contains open-ended intervals,
in the absence of additional information. The mean, when the distribution is
symmetrical, is the best measure and is a useful measure for inferential statistics.

In choosing the most appropriate measure of central tendency, we should also


consider how the measure is to be used. If we wish to infer from samples to
populations, the mean usually has a distinct advantage. The mean can be
manipulated mathematically in ways that are inappropriate to the median or the
mode. But if the purpose is primarily descriptive, then the measure that best
describes the data should be used.

Let us compare the properties of the mean and the median. The mean uses more
information than the median in the median in the sense that all exact scores are
used in computing the mean, whereas the median only uses the relative position of
the scores. Another important difference is that the mean is affected by extreme
values whereas the median is not.

The important difference between the mean and the median enables us to decide in
most instances what will be the more appropriate. Ordinarily, we desire our
measure to make use of all information available. We somehow have more intuitive
faith in such a measure. Although at this point it is impossible to bolster our faith
with a sound statistical argument, some justification for the preference for the mean
under ordinary circumstances can be given. It turns out that the mean is generally a
more stable measure than the median in the sense that it varies less from sample to
sample. When we turn our attention to inferential statistics, we shall see that we are
ordinarily much more interested in generalizing about a population than we are in a
particular sample. We are well aware of the fact that had another sample been
taken, the results would not have been quite the same. Had a very large number of

41 | Unit I - D e s c r i p t i v e S t a t i s t i c s
sample of the sample size been drawn, we would be able to see just how much
sample means differed among themselves. In other words, the sample medians will
differ from one sample to the next more than will the means. Since, in actuality, we
usually draw only one sample, it is important to know that the measure we use will
give reliable results in that there will be a minimal variability from one sample to
the next. We can therefore state the following rule of thumb: When in doubt, use the
mean in preference to the median.

Because of the fact that the mean uses all the data, whereas the median does not
depend upon the extreme values, the mean may give very misleading results under
some circumstances. You must keep in mind that in making use of a central
tendency, you are attempting to obtain a simple description of what is typical of our
scores. Thus, whenever a distribution is highly skewed, i.e., whenever there are
considerably a few extreme cases in one direction than the other, the median will
generally be more appropriate than the mean.

Another difference is that the computation of the mean requires an interval or ratio
scale. Without an interval ratio scale, it would be meaningless, of course to talk
about summing scores. The median, on the other hand, can be used for ordinal as
well as interval or ratio scale. The actual numerical score of the median will be
meaningless unless we have an interval or ratio scale, but it will certainly be
possible to locate the middle score.

3.5 SHAPE OF DISTRIBUTION AND MEASURES OF CENTRAL


TENDENCY

3.5.1 Symmetrical Distribution

3.5.1.1 Normal
The mean, median and f
mode fall on the same
value under the normal
curve

𝑋̅
Md
Mo

42 | Unit I - D e s c r i p t i v e S t a t i s t i c s
3.5.1.2 Bimodal

X
Mo ̅
X Md
Md

Md

3.5.1.3 Rectangular

MdMo

̅
X Y

Md

3.5.2 Skewed Distribution

3.5.2.1 Positively skewed


f

X
Mo Md ̅
X

43 | Unit I - D e s c r i p t i v e S t a t i s t i c s
3.5.2.2 Negatively skewed
f

SUMMARY

This module has shown us the three measures of the central tendency, namely the
mean, median, and the mode. In the next module, we shall study the measures of
dispersion and variability. The exercises incorporated in the text shall guide you to
develop your skills. Carry on this interest and thank you again.

44 | Unit I - D e s c r i p t i v e S t a t i s t i c s
ASAQs

ASAQ 3-1
390
1. ̅
X = = 43.33
9
2. Yes, because the passing mean is 40 and she got 43.33. She is 3.33 points
above the passing mean.

ASAQ 3-2

Salary (X) Frequency (f) fx


960 1 960
850 1 850
360 1 360
310 2 620
210 5 1050
∑f = N = 10 ∑fX = 3840

P3840
̅
X =
10

̅
X = P384.00

The average daily earnings of the employee of a buy and sell firm is P 384.00

ASAQ 3-3
60
̅
X =
6

̅
X = 10units

45 | Unit I - D e s c r i p t i v e S t a t i s t i c s
ASAQ 3-4

1. n = 10, ∑X = 431, ̅X = 43.1


2. Last year’s mean length was 36, this year it is 43.1, thus there is an increase
of 7.1 inches in the mean length of the dresses. We then can expect Mr. Gali to
sell more fabric this year.

ASAQ 3-5

1. Md = 5.25
2. Md = 5.5
3. No, because the median just consider the middle value thus, it is not affected
by extreme values.

ASAQ 3-6

1. 12 12 12 12 12 12 12 12 12 12
13 13 13 13 14 14 14 14 15 15
15 15 15 15 16 16 16 16 17 17
17 17 17 18 18 18 19 19 20 20

2. Mean = 14.975
Median = 15
Mode = 12

3. The mode is 12. It is different from the median. Since the median is higher, the
College will most probably use it and indicate that the average being used is the
median. Why? Because the median is more stable average than the mode. If more
money is given to higher credit hour load, then use the higher average which is the
median to benefit the students of the College.

ASAQ 3-7

Md = 77.75
Mo = 78

46 | Unit I - D e s c r i p t i v e S t a t i s t i c s
ASAQ 3-8

1 1 2 3 3 3 3 3 4 4 5 5

37
1. Mean is: = 3.083
12

3+3 6
2. Median is: = =3
2 2

3. Mode is: 3 (it has 5 frequencies)

4. Based on the mean, median, and mode analysis, the 12 people have varied
responses, but the general theme is that they are not so enthusiastic about
the too early retirement age. My mean, people are playing safe by being
neutral in their responses (mean = 3). By median analysis, only 3 are
agreeing to the early retirement age. The mode just shows that there are 5
people who are not sold to the idea of early retirement.

47 | Unit I - D e s c r i p t i v e S t a t i s t i c s
4
Measures of
Dispersion or Variability
INTRODUCTION

Module 3 discussed the measures of central tendency. Through those measures, a


given set of data could be described indicating the points where the items are2
centrally located. In terms of distribution, however, we do not know how far or how
close the data are to each other. We need to know further how the observations
spread out from the average. We need a statistical cross-reference.

This cross-reference should be a measure of the variance, or spread of the data.


Descriptive measures that are used to indicate the amount of variation in a data set
are called measures of dispersion, or variability or spread. When descriptive
statistics are presented, there is usually at least one measure of central tendency
and at least one measure of variability.

A measure of dispersion or variability a supplements of a measure of central


tendency, giving meaning to the measure of central tendency. The measures of
dispersion or variability indicate the nature or degree of clustering. The more
concentrated the values are about the mean or average, the more meaningful is the
average as a measure of location. There is a low variability if the scores tend to
crowd around the sample point. On the other hand, if the scores are widely
scattered, the data indicate high variability.

In this module, we will study three measures of dispersion or variability: the range,
the semi-interquartile range and the standard deviation. Furthermore, this module
will also discuss z-scores or standard scores. Take your time and with a relaxed
mind, study well this module.

48 | Unit I - D e s c r i p t i v e S t a t i s t i c s
OBJECTIVES

At the end of this module study, you will be able to:

1. Determine the variability of scores in terms of:


1.1 range
1.2 semi-interquartile range
1.3 standard deviation
2. Standardize scores
3. Interpret the computation or results obtained.

4.1 THE RANGE

The range is the simplest index of variability. Described as the distance between the
highest score and the lowest score in the distribution, the range is the least stable
because it is just influenced by extreme scores, it is completely determined by
theses scores.

The range R is computed as: R = highest score – lowest score + 1

Example 1: Given the following test scores; compute for the range.

85 79 86 84 92 97
R = (97-79) + 1 = 17

For a grouped distribution, the range is the difference between the midpoints of the
extreme categories plus one. See Example 2: Table 4.1 for illustration.

Table 4.1 Monthly Salary of Health Personnel at Our Father Hospital

Salary Internal Frequency (f) Midpoint (m)


P7001-8000 06 P7500.50
P6001-7000 11 P6500.50
P5001-6000 42 P5500.50
P4001-5000 21 P4500.50
P3001-4000 17 P3500.50
P2001-3000 13 P2500.50
N = 110

To compute for the range, first get the midpoint of the lowest interval (P 2001-
3000) and the midpoint of the highest interval (P 7001.50 – P8000).
Then, R = P7500.50 – P2500.50 + 1
= P5001

49 | Unit I - D e s c r i p t i v e S t a t i s t i c s
Simple isn’t it?

The extreme simplicity of the range as a measure of dispersion is both an advantage


and a disadvantage. The range may prove useful if it is desirable to obtain some very
quick calculation that can give a rough indication of dispersion or if computations
must be made by persons unacquainted with statistics. If the data are to be
presented to a relatively unsophisticated audience, the range may be the only
measure of dispersion that will be readily interpreted.
The disadvantage of the range is obvious. It is based on only two cases: the two
extreme cases that. Since extremes are likely to be the rare or unusual cases in most
empirical problems, it should be recognized that it is usually a matter of chance if
one happens to get one or two extreme values in a sample.

Suppose, for example, that there is one millionaire in the community sampled. If we
choose 10 persons at random, he or she will probably not be included. But suppose
he or she is. The range in income will then be extremely large and very misleading
as a measure of dispersion. If you use the range as your measure, you know nothing
about the variability of scores between the two extreme values except that the
scores lie somewhere within the range. And, as implied in the above example, the
range will vary considerably from one sample to the next. Furthermore, the range
will ordinarily be greater for large samples than small ones simply because I large
samples, you have a better chance of including the most extreme individuals. For
these reasons, the range is not ordinarily used in behavioral research except at the
most exploratory levels.

4.2 The Semi-interquartile Range

Recall the discussion of the median in module 3. The median was described as the
middle value in a distribution and thus cutting in half the distribution. In similar
manner, we can also divide a distribution into four equal parts. The values that
divide a distribution are called quartiles.

4.2.1 Quartiles

The first quartile (Q1) is the scores that separate the lower 25% of the distribution
from the rest. The second quartile (Q2) is the score that has 50% of the distribution
below it; Q2 is actually the median of the distribution. Finally, the third quartile (Q3)
separates the lower 755 of the distribution from the rest. If you recall the discussion
on percentile ranks in module 2, each percentile rank has a corresponding score and
this score is called the percentile. Thus, in terms of quartiles, the 25th percentile
(P25) is actually the first Quartile (Q1), P50 = Q2= Md and P75 = Q3. See Figure 4.1
for illustration.

50 | Unit I - D e s c r i p t i v e S t a t i s t i c s
Md

Q1 Q2 Q3

P25 P50 P75

Fig. 4.1 Parts in a Distribution

Keep in mind that quartiles are a natural extension of the median concept because
they are values which divide a set of data into equal parts. The difference lies in the
fact that the median divides the distribution into two parts, while the quartiles
divide the distribution into four equal parts. Since a quartile is an extension of the
median, then both basically use the same formula.

(𝑁)(3/4)− 𝑐𝑓
Q3 = L + [ ]i
𝑓

(𝑁)(1/4)− 𝑐𝑓
Q2=Md= L + [ ]i
𝑓

(𝑁)(1/4)− 𝑐𝑓
Q1 = L + [ ]i
𝑓

Where L = lower exact limit of the interval that contains Q1, Q2 and Q3
Cf = cf value of the interval below the selected interval
F = f value of the selected interval
N = total number of cases in the distribution

4.2.1.1 Computation of the quartiles for ungrouped frequency distribution

For example, consider the third quartiles of the distribution.

Solving for Q1:


N = 40
N/4 = 40/4 = 10

51 | Unit I - D e s c r i p t i v e S t a t i s t i c s
Q1= 77.5 + [(10-8)/3]
Q1= 77.5 + 2/3
Q1= 77 + .67
Q1= 77.67

Solving for Q3:


N(3/4 = 40(3/4) = 30
Q3= 86.5+1
Q3= 87.5

Interpretation: Seventy-five percent of the 40 master’s students have a score less


than or equal to 87.5 thus 25% have scores greater than 87.75. Similarly, 25% of the
students have a score less than or equal to 77.67 while 75% scored higher than
77.67.

4.2.1.2 Computation of the quartiles for grouped frequency distribution

Given the following distribution below, compute for the third quartiles.

Interval f % cf c%
95-99 3 7.5 40 100.0
90-94 5 12.5 37 92.5
85-89 8 20.0 32 80.0
80-84 11 27.5 24 60.0
75-79 10 25.0 13 32.5
70-74 3 7.5 3 7.5

Computation of Q1

N = 40, N/4 = 40/4 = 10

Q1 class = 75 – 79
L1 = 74.5
i = 5
cf = 3
f = 10

𝑁/4−𝑐𝑓
Q1 = L +[ ]𝐶
𝑓

10−3
Q1 = 74.5 + [ ]5
10

Q1 = 74.5 + 3 . 5

Q1 = 78

52 | Unit I - D e s c r i p t i v e S t a t i s t i c s
Interpretation: Twenty-five percent of the data students have a score of 78 and
below.

Follow similar procedure to compute for Q3.

4.2.2 The Interquartile Range and the Semi-Interquartile Range

The Distance between the first and the third quartiles is called the interquartile
range (IQR). IQR = Q3 - Q1

When the interquartile range is used to describe dispersion, it becomes the semi-
interquartile range. It is a type of range, but instead of representing the difference
between extreme values, it is arbitrarily defined as half the distance between the
first and third quartiles. The formula for the semi-interquartile range is:

Q3−Q1
Semi-interquartile range =
2

Using the data presented in Table 4.1 find the semi-interquartile range of the
distribution

P6000−P2000
Semi-interquartile range =
2

= 400 / 2
= P2000

Notice that the quartile deviation is one half of the range covered by the middle half
of the cases. Since Q1 and Q3 will vary less from sample to sample rather than the
most extreme cases, the quartile deviation is far more stable measure than the
range. But it does not take advantage of all the information. We are not measuring
the variability among the middle cases nor are we taking into consideration what is
happening at the extremes of the distribution.

4.3 STANDARD DEVIATION AND VARIANCE

4.3.1 Variance

The variance as a measure of variability takes the mean as the reference point
taking into account the deviations of the individual observations from the mean.
Conceptually, the variance is the average of the squared deviations from the mean.

53 | Unit I - D e s c r i p t i v e S t a t i s t i c s
In short, the variance can be stated as:

𝑠𝑢𝑚 𝑜𝑓 𝑡ℎ𝑒 𝑠𝑞𝑢𝑎𝑟𝑒𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛𝑠


Variance = 𝑆 2 =
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠

The formula of the variance for grouped frequency distributions can be presented
as:
∑ fm2 - ∑(fm)2 / n

Where f = frequency of each category


m = midpoint of each category
n = total number of observations

Before applying the formula to compute the variance using the previous data, for the
distribution in Table 4.1, compute first 𝑚2 , fm, 𝑓𝑚2 , and (𝑓𝑚)2 as presented in
Table 4.2.

Table 4.2 Computation of Variance for Grouped Data

Salary f m 𝐦𝟐 fm 𝐟𝐦𝟐 (𝐟𝐦)𝟐


Interval
P7001-8000 6 7500.50 56,257,500 45,003.0 337,545,000 2,025,727,000
P6001-7000 11 6500.50 42,256,500 71,505.5 446,821,500 5,113,036,500
P5001-6000 42 5500.50 30,255,500 231,021.0 1,270,731,000 53,370,702,000
P4001-5000 21 4500.50 20,254,500 94,510.5 425,344,500 8,932,234,600
P3001-4000 17 3500.50 12,253,500 59,508.5 208,304,500 3,541,261,600
P2001-3000 13 2500.50 6,252,500 32506.5 81,282,500 1,056,672,500
n = 110 534,055.0 1,788,029,000 74,039,634,200
∑ fm ∑ fm2 ∑(fm)2

Apply the formula using the following substitutions: ∑ fm2 = P 1,788,029,000.00;


∑(fm)2 = P 74,039,634,200.00; and N = 110.

Hence,
1,788,029,000.00−74,039,634,200.00/110
s2 =
110

1,788,029,000.00−673,087,580
s2 =
110

1,114,941,400
s2 =
110

s = 10,135,830.91

54 | Unit I - D e s c r i p t i v e S t a t i s t i c s
It can therefore concluded that the average of squared deviations from the mean is
P10,135,830.91

Meanwhile, the formula for the variance for ungrouped data is:
∑(𝐗−𝐗)𝟐
𝒔𝟐 x =
𝒏−𝒊

Where X = values of the variable x


X = the mean
N = number of cases
∑(X − X)2 = sum of the squared deviation

4.3.1.1 Sum of Squares

The sum of the squared deviations can be shortened to the sum of squares and can
be symbolized as SS. Thus,

∑(fm)2
SS = ∑(fm2 ) - for grouped data
𝑛

∑(X − X)2 for ungrouped data

If you notice SS is the numerator in the variance formula. Thus,

SS
S2 =
𝑛−1

In the data presented in Table 4.2 and the computation of variance discussed
previously, I don’t think that there will be a need for me to discuss on how to
compute for the sum of squares, since one cannot come up with the variance
without computing for the sum of squares, right? So, if you still recall, what did you
get as sum of squares for the monthly salary of health personnel at Our Father
Hospital? Well, as presented earlier, the SS value or the sum of the deviation of
scores we obtained was P1,114,941,400.00.

The sum of squares has one great advantage over variances; they can be treated
algebraically, added and subtracted from one another. This is particularly useful in
the analysis of variance in which you try to divide or break down the total variability
of a set of data into various types and sources variability. This may sound confusing
to you but don’t worry, the analysis of variance will be discussed in Module 12.

Going back to variance as a descriptive statistic for variability, the variance changes
in value as a function of the amount of variability seen in the data. When all scores
are identical (and thus fall exactly at the mean) such that there is no variability,
𝐬2 =0. As scores become more and more dispersed around the mean, this increased
variability will be reflected in the 𝐬2 value.

55 | Unit I - D e s c r i p t i v e S t a t i s t i c s
The variance of a sample of scores is represented by the symbol 𝑠 2 The variance of a
population, as opposed to that of a sample is represented by the lower case Greek
letter sigma squared, ð2.

4.3.1.2 Estimating the Population Variance

The variance computed according to the equation above is a sample variance. It tells
us the average square deviation of scores around the mean in a sample drawn from
some larger population. This is, 𝐬2 will usually be somewhat smaller than ð2. It is
easy enough to understand why this is the case. A population consists of a more
cases than are found in a sample drawn from that population, and it is likely that any
given sample will not include some of the more deviant cases that are included in
the population. These extreme cases ass to the population’s variance, but not being
included in the sample, do not influence sample variance.

There are occasions when we wish to estimate a population variance from a sample
drawn from the population, but we know that 𝐬2 tends to give a low estimate of ð2.
To give an unbiased estimate (corrected variance) of the ð2, the formula we should
use is:
∑(𝐗−𝐗)𝟐
𝐬2 = for ungrouped data
𝒏−𝟏

∑ fm2 − ∑(fm)2 / n
S2 = for grouped data
𝑛−1

Where X = values of the variable x


x = the mean of x
N = number of cases

As you can see, this unbiased estimate is inflated slightly relative to the biased
estimate by using a denominator of n-1 rather than n. this inflation brings the
unbiased estimate closer into line with the larger population variance,

Large samples show little difference between the biased and unbiased estimate
because the difference between n and n-1 is insignificant when n is large. On the
other hand, when n is small, the difference between n and n-1 is proportionally
much larger, and the difference between biased and unbiased estimates becomes
quite noticeable. Thus, from now on, we will use the unbiased formula instead of the
biased. In fact, most statistical software packages compute only for the corrected
variance.

56 | Unit I - D e s c r i p t i v e S t a t i s t i c s
4.3.2 Standard Deviation

Having learned some measures of variability, you can now turn your attention to the
most useful and frequently used measure, the standard deviation. It is defined as the
square root of the arithmetic mean of the mean of the squared deviations of the
mean.

When data have been grouped, you may simplify your work considerably by treating
each case as though it were at the midpoint of the interval even though this is not
the case. Of course, it will be possible to introduce certain inaccuracies, but the
saving in time will be substantial if computations must be carried out by hand:

The basic formula for the deviation using the grouped data is:

∑ fm2 −∑(fm2 )/n


S = √
𝑛−1

The formula for grouped data is:

∑(X−X)2 /n
S = √
𝑛−1

If you notice, this formula is similar to the variance formula. The only difference is
that with the standard deviation. You obtain the square root of the computed value.
So, the standard deviation can also be stated as:

Standard deviation = square root of the variance

S = S √s2

In our previous example, the standard deviation is simply the square root of the
variance.

1,114,941,400
s = √
109

s = √10,228,820

s = 3,198.25

The standard deviation tells us that the deviation from the mean or the variability of
the mean monthly salary of the Our Father Hospital’s health personnel reached the
amount of P3,198.25. Since we obtained a very large standard deviation, this

57 | Unit I - D e s c r i p t i v e S t a t i s t i c s
indicate that the salaries of the health personnel are generally far from the average
salary. We can say that the sample in the distribution may have not accurately
represented the population. Of all the measures of variability, the standard
deviation is the most useful especially when it is an important measure for
inferential purposes.

4.4 Z-SCORES OR STANDARD SCORES

When you wish to compare two different distributions, you may do so by


standardizing the distributions resulting to only one standardized distribution
where each value of x has a standardized value denoted by Z, which is defined by the
following for formula:
X−µ X− X̅
Z = or z =
ð s

Where
X = raw score
µ = population mean
̅
X = sample mean
s = sample standard deviation
ð = population standard deviation

Properties of Z-score

1. The sign of Z-score indicate the location of the corresponding raw score
relative to the mean. If Z is positive, the score is above the mean and if Z is
negative, the score is below the mean.
2. The Z-score can be directly transformed to a percentile score when a
distribution is normal.

Let us take an example of Brenda Tag’s final examination results on three courses of
her course in nursing:

Subject Brenda’s Grades Class Mean Standard


Deviation
Pathophysiology 86 81 5.75
Theories 76 73 6.00
Statistics 91 93 6.50

On which subject did Brenda performed well? worst? As is, you cannot answer these
questions. Transform Brenda’s scores to Z-scores, then compare.

58 | Unit I - D e s c r i p t i v e S t a t i s t i c s
Solution:
X− x
1. Pathophysiology: Z =
s

86−81
=
5.75

= 0.87

X− x
1. Theories: Z =
s

76−73
=
6.00

= 0.5

X− 𝑥
2. Statistics: Z =
s

91−93
=
6.5

= -0.3

The z-score indicates the location of the score relative to the mean.

Interpretation: Among her three subjects, Brenda Tag performed well in


Pathophysiology and performed badly in Statistical.

Here are some exercises for you to do in order to apply what you learned in this
module.

SAQ 4-1

Find the range and standard deviation of the following weighs (in kilos) of 10
students:
50, 55, 48, 60, 54, 48, 57, 45, 52, 63

59 | Unit I - D e s c r i p t i v e S t a t i s t i c s
SAQ 4-2

Find the range, the semi-interquartile range and the standard deviation of the
following distribution:

Weekly hours No. of Workers


50-54 4
45-49 12
40-44 15
35-39 13
30-34 6
N = 50

SAQ 4-3

On two final examinations (Anatomy and Pathophysiology), the class’ mean grade
was 76 and the standard deviation was 7.6. A Nursing student scored 71 in Anatomy
and 75 in Pathophysiology. In which examination was the student’s standing higher?

SAQ 4-4

A master’s student received a grade of 84 on a final examination in Research for


which the class mean grade is 76 and the standard deviation is 10. On the final
examination for the Statistics for which the classes mean grade was 82 and the
standard deviation is 8, the master student received grade of 92. In which subject
was the student’s standing higher?

Summary

This module discussed the measures of dispersion or variability. These measures


supplement the measures of central tendency.
The SAQs given this module will develop your skills. Go back to the text, understand
the illustrations, and soon you will master the measures of dispersion or variability.
Keep on reading and doing the SAQs. Our next module will be on presentation
schemes.

60 | Unit I - D e s c r i p t i v e S t a t i s t i c s
ASAQs
ASAQ 4-1

Range:
R = 63-45+1 = 18+1 = 19

Standard Deviation:
The formula for ungrouped data is:

∑(X−X)2 /n
S = √
𝑛−1

293.6
S = √
10−1

S = √32.6222

S = 5.7116

ASAQ 4-2

Weekly f m 𝒎𝟐 fm 𝒇𝒎𝟐 (𝒇𝒎)𝟐


hours
50-54 4 52 2704 208 10816 43264
45-49 12 47 2209 564 26508 318096
40-44 15 42 1764 630 26460 396900
35-39 13 37 1369 481 17797 231361
30-34 6 32 1024 192 6144 36864
∑50 ∑ 87725 ∑ 1026485

61 | Unit I - D e s c r i p t i v e S t a t i s t i c s
The basic formula for the standard deviation using the grouped data is:

∑ fm2 −∑(fm2 )/n


S = √
𝑛−1

87725 −1026485/50
S = √
5−1

67195.3
S = √
4

S = 129.6103

ASAQ 4-3

Considering that the two final examinations share a common mean and standard
deviation, it can be concluded right away in which subject the student performed
better. And that is, in Pathophysiology.

62 | Unit I - D e s c r i p t i v e S t a t i s t i c s
ASAQ 4-4

Subject Student’s Grade Class Mean Standard Deviation


Research 84 76 10
Statistics 92 82 8

Transform student’s to z-scores, then compare.

X− 𝑥
1. Research: Z =
𝑠

84−76
=
10

= 0.8

X− 𝑥
2. Statistics: Z =
𝑠

92−82
=
8

= 1.25

The student performed better in Statistics.

63 | Unit I - D e s c r i p t i v e S t a t i s t i c s
5
Graphs and Other Diagrammatic
Presentations
INTRODUCTION

When raw data have been collected, there is a need to organize them to show their
significant characteristics. From this module you will learn the different forms of
data presentation. Having learned them, hopefully you will be able to manage the
data you will collect later.

Raw data in themselves are not interesting to readers. Because data per se are not
collected solely for the investigator’s use, then they must be presented in manners
that are understandable, meaningful, and attractive.

OBJECTIVES

At the end of this module study, you will be able to:

1. Discuss the presentation of data through:


1.1 Textual method
1.2 Tabular method
1.3 Graphical method

2. Apply these presentations in the exercises and in your own data set later on.

64 | Unit I - D e s c r i p t i v e S t a t i s t i c s
5.1 THE TEXTUAL PRESENTATION

Basically, data can be presented meaningfully in three forms, namely, through


textual, tabular and graphical methods. Many people in general cannot appreciate
data even if they are presented in tables unless they are textually explained. In
textual presentation, the investigator or writer can emphasize the importance of
some figures. Narration is done in story telling fashion and can call the attention of
the readers to the relevance of other figures. The following is an example of a textual
presentation of data (Excepts from Kuan (1999) “Enhancing self-reliance among
religious older persons” Philippine Journal of Nursing. Vol. 69, Nos. 1-2, pp. 21-22)

…“Enhancing self-reliance among older persons, aged 66 to 80 years


old, is promoting wellness and adding joys to their daily quality
living. Among the 200 older persons observed and interviewed, 115
or 57.5% are female and 85 or 42.5% are male. Of the 115 women,
30 or 26% have never married, 20 or 17% are widowed, and 50 or
43% have remarried twice and 15 or 13% have remarried three
times.

Among the men, 12 or 14% are still living with their original
spouses; while 9 or 10% have been widowed twice. Sixty-four or 75%
are living with spouse of the third nuptial.

In enhancing self-reliance, an indicator has shown that those


men and women, past 66 years old, who continued to be occupied
with work had higher self-esteem. This numbered 120 or 60% as
compared to the 80 or 40% have nothing to do and felt bored with
old age. Among the 120 self-employed older person; 72 or 60% are
on 75 to 80 age-bracket. When compared to those who are not self-
employed, 52 or 65% percent are in 61 to 74 age group.

Age bracket is not a very significant variance when it comes


to maintaining for self-reliance living for this particular group. What
mattered is that those 72 or 60% falling on the 75 to 80 age group
prepared for their old age days before they required. It is important
indeed to prepare oneself early enough for older years ahead…”

5.2 THE TABULAR PRESENTATION

Tabulation is the process of condensing classified data and arranging them in the
table (similar to the frequency distribution tables presented in Module 2).
Comparisons between groups become easier because data are more understandable
when presented in a table. To tabulate data, we must first classify them according to
characteristics such as occupation, sex, height, income, nationality, age, education,
religion and so on.

65 | Unit I - D e s c r i p t i v e S t a t i s t i c s
Tables have advantages over narratives in that they provide compact way of
presenting large sets of detailed information. Through well-made tables, one can
readily see trends and comparisons. Interrelationships among different variables
become clearer when seen in tables rather than in textual or narrative form. Tables
must always be simple, direct and clear and must be appropriately titled or
labeled.

Let us consider again the narrative on the demographic profile of older persons in
Kuan’s study. One way to present this in tabular forms to create a table showing the
frequency counts and corresponding percentages as shown in table 5.1. see how
easy it is to compare the respondents across age brackets, gender, and civil status.
Look closely at Table 5.1 and study the data.

Table 5.1 Contingency Table of Age Bracket, Civil Status, and Gender of the “Old”
Respondents

Gender/Civil
Status Male Female Total
Age Never Widowed Remarried Never Widowed Remarried f %
Bracket Married Married
66-70 8 (4%) 2 (2%) 30 (15%) 2 (1%) 3 (1.5%) 6 (3%) 51 25.5
71-75 2 (1%) 1 (.5%) 4 (2%) 7 (3.5%) 5 (2.5%) 10 (5%) 29 14.5
76-80 2 (1%) 6 (3%) 30 (30%) 21 12 (6%) 49 (24.5%) 120 60
(10.5%)
Total 12 (6%) 9 (4.5%) 64 (32%) 30 (15%) 20 (10%) 65 (32.5%) 200 100

Source: Kuan, L. (1999) “Enhancing self-reliance among religious older persons” Journal of Nursing.
Vol. 69, Nos. 1-2, pp. 21-22

Do the figures on Table 5.1 make you understand better and compare across gender,
age bracket and civil status?

A tabular presentation is much briefer than the text statement however although
tables are easier to understand than text, each table must still be accompanied by an
explanation. Nevertheless, this type of presentation facilitates analysis of
relationships between and columns. This systematic arrangement is termed a
statistical table. As a general rule, statistical tables should be brief and easy to
understand for many readers.

Parts of a Statistical Table

A statistical table has four essential parts: the table heading, stubs, box head, and the
body.
1. The table heading details the table number (in Arabic numerals), and the title of
the table.
e.g. Table 5.1 Contingency Table of Age Bracket, Civil Status and Gender of
“Old” respondents

66 | Unit I - D e s c r i p t i v e S t a t i s t i c s
2. Stubs are classifications or categories usually found at the extreme left side of
the body of the table. Each stubs comprises the caption for each row.
e.g.
Age
Bracket
66-70
stubs 71-75
76-80
Total
3. Box head identify what are contained in the column. Included in the box head
are the stub head, and the column captions.
Stub head
e.g.
Age Bracket
3.1.1.1.1. Male Female Total Box head

4. The body is the main part of the table. This contains the substance or the
corresponding figures for each category.

For Table 5.1, the parts are labeled accordingly:

Table 5.1 Contingency Table of Age Bracket Civil Status, and Gender of the heading
“Old” Respondents

Age Male Female Total


Bracket Box
Never Widow Re- Never Widow Re- No. %
head
s Married ed married Married ed Married
66-70 8 (4%) 2 (1%) 30 2 3 6 51 25.5
t
(15%) (1%) (1.5%) (3%)
u 71-75 2 (1%) 1 4 (2%) 7 5 10 29 14.5
b (.5%) (3.5%) (2.5%) (5%)
s 76-80 2 (1%) 6 (3%) 30 21 12 49 120 60 body
(15%) (10.5%) (6%) (24.5%)
Total 12 (6%) 9 64 30 20 65 200 100
(4.5%) (32%) (15%) (10%) (32.5%)

67 | Unit I - D e s c r i p t i v e S t a t i s t i c s
The table, which has just been presented, is also called a master table. By definition,
a master table is a mono or single table that shows the distribution of observations
across several variables. Each observation is cross-classified with other variables.
The master table is important because from this table, various simple or summary
tables may be derived such as frequency of those belonging to certain age groups, or
civil status across gender and age bracket.

There are also tables called dummy tables. Dummy tables are plain, drawn out
tables without any figures, but they are so constructed as a preview to what the
investigator will fill in with variables.

Dummy tables are comparable to nests that an expectant bird prepares to lay her
eggs and later on little birdies. Hence, the sizes and divisions of the dummy table are
appropriately in accord to the variables being studied.

5.3 THE GRAPHICAL PRESENTATION

The graphical presentation makes use of any of the following modes:

a) Line
b) Bar
c) Pie
d) Pictograph
e) Statistical map
f) Figures of nature such as flowers, vegetables, trees, plants or human beings
g) Diagrams

The graphical presentation through its varied ways of representations of data


definitely offers several advantages over that of data presented in tabular form.
Graphical presentations are far more attractive and can entice audience to read.

However, despite all the modern presentations available, the real substance of the
data reporting is in the statistical table. Statistical tables are especially important
when large sets of detailed information and precision of information is required.
Thus, in majority of the research reports, what you will see are statistical tables
because there is no substitute for such. But these tables can be supplemented by
graphical presentation which we will discuss in this section.

68 | Unit I - D e s c r i p t i v e S t a t i s t i c s
5.3.1 How Does One Construct A Graph?

When constructing graphs, there are essential items to consider and these are:

1. Graphs must be self-explanatory, meaning graphs must have a clear and


precise title of what is being represented by the graph, the time element
involved, and the place of the source. Graphs and charts should be labeled
and numbered below the graphic presentation. Be consistent in labeling and
numbering your graphs.

2. The scales should be properly labeled, identifying the entities being


measured along the horizontal and the vertical axis. A lay-out that is pleasing
to the eyes carries well proportioned appearance which is neither
compressed nor extended.

3. Guide ruling or grids should be drawn neatly and lightly to guide the eye.
Hence the guide ruling should be minimal and should be pleasant to look at.

4. The horizontal scale carries the basis of classification. The vertical scale
carries the frequencies, the representation maybe absolute or relative and
should always start with a zero. The equal distance between tick marks on an
axis should represent numerical units.

5. Graphs should be simple, neat and have a business-like quality – hence avoid
too many trimmings and confusing colors and designs.

6. Graphs should be in accord with the type of data represented so that the
substance is seen in once.

Table 5.2 shows a list of the different modes of graphs and on what type of variable
each mode should be used.

Table 5.2 Types of Graphs Commonly Used in Processing Research Data

Type Nature of Variable Function


 For comparison of
absolute or relative
1. Bar graph/ Chart/
 Qualitative counts, rates, ranks, etc.
Diagram Lines
 Discrete quantitative between categories of a
(horizontal or Vertical
qualitative or discrete
quantitative variable.
 Shows a breakdown of
a group in terms of
percentages. This is
2. Pie chart  Qualitative
appropriate when the
number of categories is
not too many

69 | Unit I - D e s c r i p t i v e S t a t i s t i c s
3. Component Bar  Same as for the pie
 Qualitative
Diagram chart
 Graphic presentation
of the frequency
distribution of a
4. Histogram  Continuous quantitative
continuous variable or
measurement including
age groups
5. Frequency polygon  Quantitative  same as the histogram
 Shows trend data or
changes with time or age
6. Line Diagram  Time series
with respect to some
other variable
Source: Teodoro L. Sevilla “Geographical presentation” CPH, Module No. III, 1997, p 153

Discrete variable has a finite number of points such a s male and female, short and
tall, etc. in continuous variable, elements of the variable is finite, many number of
points can be drawn on a line segment, such as age, height, weight, temperature. In
most practical problems in nursing, continuous variable represent measured data;
discrete variables represent count data such as the number of men and women
participating in clinical research on asthma, or the number of accidents due to
vehicle collisions per year.

5.3.2 Advantages of the Graphical Methods

1. Graphs enable students, readers, professionals, and busy executives to easily


grasp the essential facts that numerical data intend to convey.
2. Graphs can easily attract attention and are more easily and readily
understood because they are very visual. Many times it is easier to go
through graphs than through narrative or tables. Facility and attraction in
the “reading” of the graphs can be bolstered by the use of colors and pictorial
diagrams.
3. Graphs simplify concepts that would otherwise have been expressed in so
many words and space.

5.3.3 Illustrations of Some Graphical Presentations

5.3.3.1 The Bar Graph


The bar graph is one of the most common and widely used graphical
devices. It is used to portray absolute or relative frequencies, population rates or
other numerical measurements across the categories of qualitative variable or
discrete or discontinuous quantitative variable. This graph consists of bars or heavy
lines of equal widths, either all vertical or all horizontal. The lengths of the bar
represent the magnitudes of the variables being measured. Here is an illustration:

70 | Unit I - D e s c r i p t i v e S t a t i s t i c s
60

50 48
42
39
40 35

30 27
21
20
10.3
10 7.8

Source: National
1903 1918 1939Statistics
1948 1960Office,1970
19831975 1980

Figure 5.1 Population of the Philippine Census From 1903 to 1980

Reading Interpretation

Figure 5.1 shows us the continuous growth of the Philippine population. In 1903,
the population was 7.8 million, in 1980, the population has swelled to 48.1 million.
You will notice that actual values of the population are printed on top of the bars.
This is done to give more precision and accuracy to the numerical values that the
bar represents. By looking at the bar diagram in Figure 5.1, is it not easy to detect
progressive population increase?

Bar graphs can also be constructed horizontally as shown in Figure 5.2.

Heart Disease
Pneumonias
Vascular Sys
Tuberculosis
Mal.
Neoplasm
Accidents
Septicemia
Diarrheal Dis.
Nephritis 0 20 40 60 80 100
Rate per 100,000 pop
Resp. Cond.

Figure 5.2 Ten Leading Causes of Morality in the Philippines according to the
1991 Department of Health Report

71 | Unit I - D e s c r i p t i v e S t a t i s t i c s
Reading Interpretation

Figure 5.2 shows that the leading cause of death was heart disease. There are 80
people who die of heart disease for every 100,000 population or an equivalent of 8
persons per 10,000 populations. You can go on reading now for the rest of the data
in the horizontal bar graph. Just note well the rate per 100,000 population of the
disease listed as cause of morality for year 1991.

The choice of whether bar graphs should be drawn vertically or horizontally will
depend on available space or the number of categories or variables being depicted.
The variables of qualitative kind are usually presented on horizontal bar graphs. In
many instances, variables of the discrete qualitative type are drawn on vertical bar
diagrams.

5.3.3.2 The Pie Chart

The pie chart is another way of presenting your data. It shows how a whole is
divided into its component parts. Division into parts is made through the use of
wedge –shaped figures. The area where the wedge or slice is shown is the
proportional contribution of that component to the whole pie. Take Figure 5.3 as an
example.

double master's degree


25%
37%
Bachelor's degree w/
master units
13%
Ph.D. holders
25%

Master's

Figure 5.3 Highest Educational Attainment of the Faculty at Yin-Yang University

Reading Interpretation

The whole pie or complete circle has 360 degrees or 100%. To slice the pie into each
component, a protractor is necessary. For each area taken, the component is
multiplied by 3.6 in order to determine the number of degrees when cutting the
slices.

72 | Unit I - D e s c r i p t i v e S t a t i s t i c s
In the illustration above, you can see that the faculty profile (in terms of educational
background) at Yin-Yang University is as follows: Faculty with doctoral degrees
comprises 12.5% of the total faculty population. Those with double master’s degree
are 25% of the faculty; 37.5% have only one master’s degree and 25% of the faculty
are Bachelor’s degree holders but with masters unit.

You can also color each slice of the pie. What is important to9 keep in mind is that
the component must be sliced well as part of the 360 degrees and you should label
this correctly.

Are you ready for some activities and exercises?

SAQ 5-1

In a study by Dr. L.G. Kuan (1999) on the functional ability of the patients
myasthenia gravis, write a textual presentation for the following data.

Samplae: Selected patients


Sample size: 10
Sex: all Female
Education: 3 High School Students
7 College Students
Treatment: 4 Thymectomized
6 not Thymectomized
Medication: Mestinon 1 tab BID
Prednisone 2 tab OD
Nebulizer PRN
Complaints: Ptosis in one of the eyes
Weakness
Once in a while respiratory complaints
Problems: Financial
Anxiety
Helplessness
Worry

SAQ 5-2

Present the data in SAQ 5-1 into tabular form. Be sure you label your table.

73 | Unit I - D e s c r i p t i v e S t a t i s t i c s
SAQ 5-3

Construct a vertical bar graph for the following table that shows the need for health
care in 1999. Include a write-up.
Needing Health Care Amount of care needed in%
Older persons 98%
Old adult 62%
Young adult 20%
Adolescents 30%
Children 75%
Infants 100%
Neonates 100%

SAQ 5-4

Construct a horizontal bar chart for the following data on faculty savings as of July
1999. Include a write-up.
Faculty Amount Saved
Dr. Lido 201,000
Dr. Nola 122,600
Dr. Mino 117,200
Dr. Temon 160,000
Dr. Liton 365,000
Dr. Wajo 180,000

SAQ 5-5

The following table classifies enrollment at the Yin-Yang University for SY 1998-
1999. Draw a pie chart. (to find the percent equivalent, divide each entry by the total
enrollment and round off your answers).

Major Enrollment
Nursing 606
Medicine 859
Dentistry 702
Pharmacy 495
Allied Health 527_
Total 3,189

74 | Unit I - D e s c r i p t i v e S t a t i s t i c s
5.3.3.3. The Component Bar Diagram

The component bar divides breaks down total quantities into their component
parts. When a comparison is made between two or more different groups, the
component bar diagram is preferable to the pie chart. Descriptions of the items
involved maybe written on or beside the bar. Actual amounts or percentages may
also be written in the same manner. The area of each smaller rectangle is
proportional to the relative contribution to the whole.

Let us see the following example in Fig. 5.4

100%
80% 25,000 & Above
60% 12,500 - 24,999
40% 0 - 12,499
20%
0%
Control Treatment

Source: Villamiel, J.W. (1999). Physiologic responses of comatose patients to


liturgical reading as biobehavioral nursing intervention”. Master’s thesis.
Manila college of Nursing. University of the Philippines.

Figure 5.4. Monthly Income of Faculty

Reading Intervention

Figure 5.4 details the monthly family income between the treatment and control
groups. One can see right away the differences between these two groups in terms
of increase bracket. The control group has almost the same number of individuals in
the 0-12,499 and 25,000 and above. The treatment group, on the other hand, has
more individuals belonging to the 12,5000 – 24,999 income bracket.

To detect differences or similarities between or among groups, the component bar is


very appropriate.

75 | Unit I - D e s c r i p t i v e S t a t i s t i c s
Figure 5.5 is another version of a component bar:

80
70
60
Anoxic
50
40 Traumatic
30 Toxicologic
20 Series 4
10
0
Control Treatment

Source: Villamiel J.W. (1999). Physiologic responses of comatose patients to liturgical reading as
biobehavioral nursing intervention”. Master’s thesis. Manila: College of Nursing, University of
the Philippines.

Figure 5.5 Percentage Distribution of Participants According to Coma Type

Reading Intervention

You can see easily in Figure 5.5 that majority of people in both the treatment and
control groups have anoxic type of coma. The toxicologic type is the least occurring
type for both groups.

By this time you might already have some ideas how to present your data when you
write your master thesis.

5.3.3.4 The Histogram

The histogram is frequently used in statistical work. A histogram differs from a bar
chart in that the histogram is used for continuous quantitative variable while the bar
graph is generally used for quantitative data or categorized data. Furthermore, each
bar in the histogram has real limits as bases. Let us illustrate this difference by
looking at Figures 5.6 and 5.7.

76 | Unit I - D e s c r i p t i v e S t a t i s t i c s
30

25

20

15

10

0
1 1.5 2 2.5 3

Figure 5.6 Histogram for N-298 Grades

If you notice, there are spaces between bars in Figure 5.6 because we are dealing
with grades equivalents and not really with actual scores. If we are to convert these
grades equivalents to actual scores, we will get a grade interval for each equivalent
and thus we could present this data set by creating a histogram, as shown in Figure
5.7.

30

25
2.5
20

15
2.0
10
1.5
5 3.0
1.0
0
49. 5 59.5 69.5 79.5 89.5 99.5
Figure 5.7 Grade Profile of N-298 Students

Reading Intervention

Figure 5.6 shows that most of the N-298 students received a final grade of 2.5 with
interval, 59.5, as shown by Figure 5.7

77 | Unit I - D e s c r i p t i v e S t a t i s t i c s
How Do You Construct Histograms?

Remember these two things:


1. These should be no gaps between rectangle of a histogram. In
contrast, bar diagrams have spaces in between rectangles. The exact
limits or boundaries are the numerical bases of the rectangles in the
histogram. In histogram, what is primarily is the distribution of a
continuous variable.

2. Keep in mind that the area of rectangle is proportional to both the


frequency and width of the class being shown. These necessities that the
widths of the rectangles must be equal; and the height of each rectangle
corresponds to the frequency of the interval the rectangle is representing.

Remember, histograms consist of a set of rectangles having bases on the horizontal


axis and whose base widths correspond to the class size while the heights of the
rectangles correspond to the class frequencies.

5.3.3.5 Frequency Polygon

16

14

12

f 10

2
29.5 39.5 49.5 59.5 69.5 79.5 89.5 99.5
0

Class intervals

78 | Unit I - D e s c r i p t i v e S t a t i s t i c s
Figure 5.8 A Sample Histogram
A frequency polygon is a line graph of class frequencies plotted against class marks
or midpoints. There are two ways to construct a frequency polygon. First, make a
histogram then connect the midpoints of each interval (represented by a dot on top
of each bar) by drawing a line. Second, you can immediately write down on the
horizontal or x-axis the midpoints and then connect with a line corresponding
frequency/percentages. See figure 5.8 and 5.9 for illustrations.

16

14

12

10

f 8

34.5 44.5 54.5 64.5 74.5 84.5 94.5

Midpoints
Figure 5.9 Frequency Polygon

79 | Unit I - D e s c r i p t i v e S t a t i s t i c s
5.3.3.6 Line Graphs

To construct line graphs:

1. Draw two perpendicular lines called axes: the vertical and horizontal line. The
horizontal line is labeled as the x-axis and the vertical line as the y-axis. The point at
which the two axes meet or interact is called the point of origin.
2. From the point of origin going to the right (on the horizontal axis) mark of
equal spaces to denote corresponding numbers.
3. Plot the corresponding values represented by dots. Use different colors to
indicate group membership.
4. Connect the dots with the lines.

See Figure 5.10 and 5.11 for illustrations.

105

100

95 Control

90
Liturgical
Non-Liturgical
85

80
0 10 15 30 PM 0 10 15 30

Figure 5.10 Mean Heart Pattern of Subjects

Notice that in figure 5.10 and 5.11, you have three groups of people. Their
heartbeats and diastolic BP were measured over some time interval, respectively.
The vertical axis for Figure 5.10 has the number of heartbeats while in Figure 5.11,
the y-axis has diastolic BP. The horizontal axis has the time series of 10, 15, 30
minutes in the morning and similar interval of time in the afternoon.

When there are two or more groupings of subjects, additions of colors to designate
each groups makes the graphical presentation more attractive and facilitates
grouping detection.

80 | Unit I - D e s c r i p t i v e S t a t i s t i c s
78

76

74
Control
72
Liturgical
70 Non-Liturgical

68

66
0 10 15 30 PM 0 10 15 30

Figure 5.11 Mean Diastolic BP Pattern of Subjects

In Figure 5.4 and 5.5, you’ll notice that the percentage was used instead of
frequencies. You can also do the same for the histogram and the frequency polygon.
Recall that in Module 2, we have discussed how to compute for the corresponding
percentages for determined frequencies. Now, let us do another histogram and
frequency polygon, similar to figure 5.9, but this time we will used percentages.
Figure 5.12 and 5.13 illustrate the proportion in percent of each frequency in
relation to the total frequency.

Class Interval Exact Limits m f (%)


90 - 99 89.5 – 99.5 94.5 2 5.0
80 - 89 79.5 – 89.5 84.5 4 10.5
70 – 79 69.5 – 79.5 74.5 5 12.5
60 – 69 59.5 – 69.5 64.5 12 30.0
50 – 59 49.5 – 59.5 54.5 9 22.5
40 – 49 39.5 – 49.5 44.5 5 12.5
30 – 39 29.5 – 39.5 34.5 3 7.5

81 | Unit I - D e s c r i p t i v e S t a t i s t i c s
The presentation of the frequency distribution may be graphically shown through:

a. histogram:

14

12

10

0
30-39 40-49 50-59 60-69 70-79 80-89 90-99

b. frequency polygon:

3000.00%

2500.00%

2000.00%

1500.00%

1000.00%

500.00%

0.00%
34.5 44.5 54.5 64.5 74.5 84.5 94.5

82 | Unit I - D e s c r i p t i v e S t a t i s t i c s
5.3.3.7 Pictograph

Another interesting way to present numerical values is through the use of


pictographs which are also known as pictograms. In this kind of presentation,
picture symbols are used to represent values. For example, to represent numerical
data on banana production, the pictures of bananas are drawn. Or to represent
population statistics of children, pictures of children are drawn. What is important
to keep in mind is that the picture should be appropriately fit the data being
represented.

To use pictographs as representations of data, it is good to remember the following


pointers:

1. The symbols used must be clear, appropriate and self-explanatory. For


example, if the data are about beds, the pictures of beds must be drawn;
or of tress, the pictures of trees must be drawn; if of population, persons
must be drawn.

2. Use legends to represent the number of units in a picture. For example,


one banana may represent 10 tons of banana production. Be numerically
creative.

3. Because pictographs show only approximate values, it is good to explicit


numerical values below the symbols they represent. Let us have some
illustrations.

- Davao

- Dipolog

- Cebu

- Catanduanes

- Novaliches

- Negros

Legend: One Banana = 100 Tons

Figure 5.14 Banana Production in Tons for the Month of May 1999

83 | Unit I - D e s c r i p t i v e S t a t i s t i c s
Reading Interpretation

Figure 5.14 shows that Davao produced 1,400 tons of bananas in May; followed by
Cebu which produced 900 tons of bananas; then Negros produced 700 tons, then
Dipolog and Novaliches produced 500 tons each, and the least to produce bananas is
Catanduanes with only 400 tons of bananas.

5.3.3.8. Map Graph or Cartogram

When graphical data are presented, the map graph or cartogram is one of the best
ways because with the drawn map, the represented regions are shown through
colors or other creative symbols.

A legend always accompanies a map graph. The legend explains or elucidates the
meaning of the colors, the lines or any creative symbols that may be used.

Let us take two examples from NEDA Statistical Year Book (See Figure 5.15 and
5.16).
180,350
96,037 Japanese
Other Europeans
235,681
118,151 American
British
111,705
209 Overseas Filipinos
Stateless

76,006
Other Nationalities

181,203
Other Asians

49,573
Australian

Legend:
Approximately
10,000 visitors

84 | Unit I - D e s c r i p t i v e S t a t i s t i c s
Figure 5.15 Visitors Arrival by Country 1988

Region 1 = 164.2
Region 2 = 60.9
Region 3 = 263.4
Region 4 = 130.4
Region 5 = 197.2
Region 6 = 223.8
Region 7 = 253.3
Region 8 = 130.6
Region 9 = 135.3
Region 10 = 97.4
Region 11 = 105.6
Region 12 = 97.5
NCR = 9,317.4

Legend:
= 100 persons/sq.km

Figure 5.16 Population Density by Region as of May 1, 1988

Reading Interpretation

You can see from Figure 5.16 that NCR, Region 3, Region 6, and Region 7 have higher
population density than the other regions.

85 | Unit I - D e s c r i p t i v e S t a t i s t i c s
SUMMARY

This module has taught you the different ways of presenting data, such as creating
textual, tabular, and graphical representations. You have also learned the points to
consider when using different forms of graphs. Please apply them to your own data
when you do your own projects or thesis. I hope you enjoyed learning the various
ways of making your data look attractive.

Goodbye for now and see you again in Module 6.

86 | Unit I - D e s c r i p t i v e S t a t i s t i c s
6
The Normal Distribution
INTRODUCTION

In this Module, we shall talk about the normal distribution. Normal distribution is
synonymous with the term normal curve. To recall, French-English mathematician,
Abraham de Moivre (1667-1754) presented his equation that in frequency
distribution of data, there is a central point. Later on, German mathematician,
physicist, and astronomer, Carl Frederich Gauss (1777-1855) applied this equation
by modeling the observational errors in astronomy. In some texts, you will read the
normal distribution described also s Gaussian distribution.

The normal distribution is a concept of great significance in statistics. It is a


theoretical model in which a curve is drawn over a frequency polygon that is
perfectly symmetrical and smooth. It is bell-shaped and unimodal, with its tails
extending infinitely in both directions.

You will learn its properties and uses. As you go along your research work, you will
appreciate the role of normal distribution in your data. So, study well the module
and establish the habit of analyzing data through the eyes of a normal distribution.
Take your time and study well to understand the concept. Do the activities to
enhance your skills.

OBJECTIVES

At the end of this module study, you will be able to:

1. understand what is a normal distributions;


2. learn the important aspects of normal distribution; and
3. apply the normal distribution to data.

87 | Unit I - D e s c r i p t i v e S t a t i s t i c s
6.1 NORMAL DISTRIBUTION

The normal distribution or normal curve is a theoretically perfect frequency


polygon. Its mean, median, and mode all coincide in the center. It takes the form of a
symmetrical bell-shaped curve, as seen in Figure 6.1.

Figure 6.1 The Normal Curve

The crucial point about the normal distribution is that distance along the abscissa
(horizontal axis) of the distribution, when measured in standard deviations from the
mean, always encompass the same proportion of the total area under the curve. In
other words, regardless of the precise shape of the normal curve and the nature of
the underlying distribution, the distance from any given point to the mean(when
measured in standard deviations) will cut off exactly the same proportion of the
total area.

Games of chance, human traits such as attitudes, intelligence and personality make
use of the normal distribution because it is assumed that these traits are distributed
among the population in a fairly “normal” way. For example, if you measure the IQ of
a representative sample of sufficient size, the outcome scores will assume a
distribution that is quite similar to the normal curve. It is expected that majority of
the scores will fall around the center or mean. The Wechsler IQs of the population is
said to be normally distributed with a mean of 100 and a standard deviation of 15.
Figure 6.2 shows that the IQs of the majority of the population cluster around the
mean and a few individuals have IQs below 55 or above 145.

55 70 85 100 115 130 145

Figure 6.2 IQ Distribution

88 | Unit I - D e s c r i p t i v e S t a t i s t i c s
You will find that when you do hypothesis testing, the concept of normal
distribution or normal curve is going to help you a lot. In hypothesis testing, you will
talk about the probability or the likelihood that s given difference or relationship
could have occurred by chance alone. Understanding the normal distribution or
normal curve prepares for your understanding the concepts behind the testing of
hypothesis because you will be dealing with probabilities or likelihood of happening
or chance.

Look at Figure 6.3. the baseline of the normal curve is measured off in standard
deviation units. These units are indicated by small letter z. a score that is one
standard deviation above the mean is symbolized by + 1 z and correspondingly, - 1 z
means a score that is one standard deviation below the mean. Let us take the IQ test
of Wechsler. The mean score is 100 and the standard deviation is 15. So, to follow
the discussion, one standards deviation above the mean (+ 1 z) is 115 and one
standard deviation below the mean (- 1 z) is 85.

55 70 85 100 115 130 145 IQ


-3 -2 -1 0 +1 +2 +3 Z

Figure 6.3 IQ Distribution showing standard deviation

It is given that in a normal distribution, 34.13% of the scores fall between the mean
and one standard deviation above the mean. You learn that the curve is
symmetrical; therefore 34.13% also falls between the mean and one standard
deviation below the mean. So summing up, 34.13 + 34.13 equals 68.26& of the
scores fall between – 1 z and + 1 z. With the Wechsler IQ test, this means that about
2/3 of the scores will fall between 85 and 115. It follows that of the one-third of the
scores remaining, one sixth will fall below 85 and one-sixth will be above 115. See
Figure 6.4 for illustration.
34.13 34.13
% %

13.59 % 13.59 %
.13%
.13%
% %
2.15 2.15
.13%
% .13
%
%

-32 -22 -12 X +12 +22 +32

Figure 6.4 Areas under the normal curve

89 | Unit I - D e s c r i p t i v e S t a t i s t i c s
Furthermore, of the total distribution, 27.18% falls between one and two standard
deviations from the mean. Fourteen percent falls between one and two standard
deviation above the mean, and 13.59& falls between one and two standard
deviations below the mean. Thus, 95.44% of the scores (13.59 + 34.13 + 34.13 +
13.59) fall between plus and minus two standard deviations from the mean. For the
IQ test mentioned here, this means that 95.44% of the population receives scores
between 70 and 130.

However, 4.30% of the scores fall between two and three standard deviations from
the mean, 2.15% on each side. So, 99.74% of the population has IQ scores between
55 and 145.

Let us summarize the scores in the normal curve (refer to Figure 6.4)
A. 34.13% of scores fall between x and + 1 z
13.59% of scores fall between + 1 z and 2 z
2.15% of scores fall between + 2 z and + 3 z
B. 34.13% scores fall between x and – 3 z
13.59% of scores fall between – 1 z and – 2 z
2.15% of scores fall between – 2 z and – 3 z
C. 68.26% of scores fall between the mean and + 1 z
95.44% of scores fall between the mean and + 2 z
99.74% of scores fall between the mean and +3 z

Let us borrow the example used by Munro (1986). To get into graduate studies in
the United States of America, one must take and pass the graduate Record
Examination or (GRE). The GRE is said to be normally distributed with a mean of
500 and a standard deviation of 100 (see Figure 6.5). if 68.26% of the scores fall
between + 1 z, then 68.26% of the scores fall between 400 and 600. Did you get
that? In addition, 95.44% of the scores fall between 300 and 700 while 99.74% fall
between 200 and 800.

99.74%

95.44%
68.26%

Figure 6.5 GRE Distribution

90 | Unit I - D e s c r i p t i v e S t a t i s t i c s
In Module2, we mentioned percentile ranks. The importance of percentile ranks lies
in its features of describing a given score in relation to other scores in a distribution.

Looking at the area under the normal curve and applying the formula for Z-scores
can determine the percentile rank of a given score.

Recall form Module 4:

Z=X–̅X
S

Where X = raw score


X = mean
S = standard deviation

The area under the normal curve can be determined by using the unit normal table
which includes a listing of all Z-scores and their corresponding areas (see table at
the end of this module).

We have mentioned earlier that the scores on the GRE are normally distributed with
a mean of 500 and a deviation of 100. If Betty 630, what is her percentile rank? To
solve this, draw first the normal curve for a clearer picture (see Figure 6.6) of the
problem.
?
.5000

GRE
500 630

Figure 6.6 GRE Distribution

Percentile rank refers to how many scores (in percentage) is at or below a given
score. So in our problem, we are looking at how many score is at or below 630 and
thus we are looking for the area to the left of 630.

The area under the normal curve is expressed as proportion. A normal curve has a
total area of 1.0. Since the normal curve is symmetrical, when cut in the middle, each
half has an area of .5000. Back to our problem, the area below 500 is .5000. We must
now need to determine the area between 500 and 630. To do this, convert first the
raw scores to Z-scores and then use the unit normal table.

91 | Unit I - D e s c r i p t i v e S t a t i s t i c s
X = 500 Z=0
X = 630 Z = 630 – 500 = 130 = 1.3
100 100

Look for Z = 1.3 in the unit normal table and find the corresponding area, which is
.4032. Thus, the total area below the score of 630 is .9032 (.5000 + .4032). just
multiply this area by 100 and you’ll get the percentile rank of Betty. Therefore, the
answer will be:

Betty’s percentile rank 90.32%, meaning about 90% of the students who
took the GRE scored 630 and below.

Let us do some practice activities.

SAQ 6-1

Scores on the Graduate Record Examination (GRE) are normally distributed


with a mean of 500 and the standard deviation of 100. Suppose you received a score
of 650, what is your Z-score and percentile rank?

SAQ 6-2

Suppose your classmate obtained a score of 48 on a test in which the class


mean was 35 and the standard deviation was 5. What is the z score? What is the
percentile rank of the classmate?

92 | Unit I - D e s c r i p t i v e S t a t i s t i c s
6.2 NON-NORMAL DISTRIBUTION

There are instances when a distribution does not have relatively equal numbers on
each side of the distribution, but has a large number of scores on one side. This
particular distribution is referred to as skewed. The disproportionate hump of
scores causes a “tail” to be formed at the opposite end of the distribution.

Figure 6.7 Positively Skewed Distribution

Sometimes the hump of scores are directed to the other end of the distribution as in
Figure 6.8.

Figure 6.8 Negatively Skewed Distribution

To note skewness of distribution, always compare this with normal curve shown
above. To call the distribution negatively skewed, the distribution has a tail
extending on the left side or negative side of the distribution. Correspondingly,
when the distribution has a tail extending on the right side, this is positively skewed.

There are terms you must be familiar with in relation to this topic of normal
distribution.

(a) Skewed distribution – the direction in which the distribution is out of


balance
Either negatively or positively
(b) Kurtosis – is the measure of relative peakedness or flatness of the curve.
In computer print out, zero indicates a normal curve.
(c) Leptokurtic – this is a narrow, peaked curve which indicates a positive.
(d) Platykurtic – this is a flatter curve which indicates a negative number

93 | Unit I - D e s c r i p t i v e S t a t i s t i c s
Things to remember
1. The height of the curve is not the important element; what is important
is how the frequencies are distributed in the three standard deviations
(mean, median, mode) to either side.
2. Z-scores and percentiles cannot be used if the curve varies greatly from a
normal curve.

6.3 IMPORTANT ASPECTS OF NORMAL DISTRIBUTION


In summary, the normal distribution or normal curve has the following important
characteristics (refer to Figure 6.6)

1. It is bell-shaped and symmetrical about the mean.

2. The mean is always located at the center of the curve, cutting the curve into two
equal parts. The left half is a mirror image of the right half since the curve is
symmetrical.

3. The mean, median, and mode are of equal value, thus these three measures of
central tendency are located at the same point along the x-axis. The mean is the
average of scores in the normal distribution. The median is the middle value
that divides the distribution into two equal parts. The mode is the value with
the highest frequency.

4. The tails or ends are asymptotic to the horizontal axis, meaning the tails will
never touch the baseline or the x-axis as they extend to infinity.

5. The total area under the normal curve is equal to 1 (in terms of proportion) or
100% (in terms of percentage). The area can be interpreted as a probability
value.

6. The normal curve areas may be sub-divided into at least three major standard
scores or z scores; that is, three z-scores to the left of the mean and another
three to the right.

7. The distance form one integral standard score to the next integral standard
score is measured by the standard deviation.

94 | Unit I - D e s c r i p t i v e S t a t i s t i c s
SAQ 6-3

What are the steps involved in finding areas under the normal curve?

SAQ 6-4

Given a normal distribution with X=50 S=5, what is the corresponding Z-score for
X=60?

6.4 APPLICATIONS OF THE NORMAL DISTRIBUTION


The normal distribution is important in nursing because majority of the
measurements in health care is approximately normally distributed. Variables such
as heart rate, height, weight, blood pressure, and chemistry levels follow a normal
distribution. The normal distribution is important in explain many health care
phenomena in statistical inference.

Areas under the normal curve denote probability. The larger the area, the greater is
the probability of occurrence.

Let us have some data so you can apply the normal distribution. Please do all the
exercises to sharpen your statistical skills. Remember practice makes perfect,
specially that this is a skills exercise.

SAQ 6-5

The heights of 500 students are normally distributed with a mean of 160.5 cm. and a
standard deviation of 8.8 cm. assuming that the heights are recorded to the nearest
half cm., how many of these students would you expect to have heights:

(a) greater than 165 cm?


(b) less than 155 cm?
(c) between 158.5 and 170.5 cm., inclusive?

Note: remember to look at the Unit Normal Table to guide your values of the normal distribution.

95 | Unit I - D e s c r i p t i v e S t a t i s t i c s
6.5 TRANSFORMING Z-SCORES
Let us remember that in interpreting a single score, we want to place it in some
position relative to a collection of scores from some reference group. Recall that the
percentile rank of a score tells us the percentage of scores that are at or below a
given score value. We can also have another approach to view a specific score with
some control index in mind such as the mean. So, for instance a score of 33.0 in a
distribution with a mean of 30.0 might be reported as 3.0, the difference between
the observed score and the mean. The problem with presenting the absolute
difference is that it is hard to know if the difference is relatively large or relatively
small. Hence, we need some mechanism to report the relative difference between
the observed score and the mean.

This can be done by dividing the distance of a score from the mean by the standard
deviation. So, if the standard deviation is 1.5, the score of 33.0 will be 2.0 standard
deviation above the mean [(33−30
1.5
) = 2.0].. similarly, a score of 27 will be 2.0
standard

deviations below the mean [(27−30


1.5
) = 2.0].. So, a score above the mean divided by
1.5
the standard deviation always yields a positive result, and one below the mean
always yields a positive result, and one below the mean always yields a negative
one. In this manner, we can define z-score as

(a) z=x – x = for data in a sample


S
(b) z=x - µ for data in a population
σ

Let us recall that µ is the mean of a population, x is the mean of a sample; is the
standard deviation of a population and s is the standard deviation of a sample.

The importance of a z-score is that it provides an answer to the question: How many
standard deviations away from the mean is a given score? Through the use of the z-
score we can express the relative difference between the observed score and the
mean and whether the score falls above or below the mean.

The z-score is a descriptive statistic that represents the distance between an


observed score and the mean relative to the standard deviation
z=x – x
S
Z=distance of a score above or below the mean
standard deviation

96 | Unit I - D e s c r i p t i v e S t a t i s t i c s
what is the value of transforming to z-scores? The transformation of observed
scores which are also known as raw scores to z-scores converts a distribution with
any mean and standard deviation to a distribution where the mean is 0.0 and the
standard deviation is 1.0. this is the reason why z-score transformation is often
called a standardized transformation. It makes all scales to be converted to a
standard scale. Remember that if all z-scores in a distribution are squared, their sum
εx2, will be equal to N, the number of scores in the distribution.

The conversion to z-scores always yields a mean of 0.0 and a standard deviation of
1.0, however, this does not normalize a non-normal distribution. The slope of the
distribution is not changed at all because if the distribution is skewed to begin with,
transforming the scores to z-scores will not affect the skew. But when the
population of scores on a given variable is normal, then any score can be expressed
as a percentile rank by comparing the z-score to the standard normal distribution.
Moreover, because z-scores, such as (inches, pounds, IQ scores, etc. ) an individual’s
position on one variable can be compared with his/her position on a different
variable. In this, we need to refer to the standard normal distribution.

The z-score has many values


(1) z-scores are very useful descriptive statistics that can make you
understand better your data.
(2) z-scores have important implications for advanced parametric
statistics. The z-scores are used to calculate correlation coefficients
and some aspects of probability.

SAQ 6-6

Can you do this exercise to test your understanding about z-score transformation?
(a) Suppose the center of a basketball team is 86.0 inches tall average height of
the team members is 74.0 inches. How much inches is the center player
above the mean?
(b) In the distribution normal curve below, can you show where the
distribution lies?

50 65 74 86 98

Were you able to locate the distance?

97 | Unit I - D e s c r i p t i v e S t a t i s t i c s
SUMMARY

This module brought you the normal distribution, its aspects, its importance, and its
application to data. When you work on the normal distribution, automatically, you
standardize your scores into z scores. You have learned the z scores in earlier
modules, but this particular module made you realize the usability of the z score in
inferring the meaning of your data in the light of the normal distribution. Go on, and
enjoy your statistics!

98 | Unit I - D e s c r i p t i v e S t a t i s t i c s
ASAQs

ASAQ 6-1
650−500 150
Z= = =1
100 100

ASAQ 6-2
48−35 13
Z= = = 2.6
5 5

ASQ 6-3

Steps:
1. Concert raw score to Z-score using the formula

𝑋−𝑋
Z=
𝑆

Where:
Z = standard score
X = mean
S = standard deviation
X = raw score

2. Look at the unit normal table to determine the corresponding area of the
computed Z-score.

3. If possible, sketch the curve and shade the area under investigation to get a
clearer picture of the problem.

99 | Unit I - D e s c r i p t i v e S t a t i s t i c s
ASAQ 6-4
X−X
𝑧= S
60−50
𝑧= 5
10
= 5
=2
This number 2 means that there are two
standard deviations s=5 between 60 and 50.

ASAQ 6-5
Given: N=500 σ = 8.8 cm. μ = 160.5 cm.

165−160.5
(a) X > 165 cm. 𝑧= = 0.51
8.8

P (X > 165) = 0.3050


Hence, 500 (0.3050) = 152.5 or 152 students

155−160.5
(b) X < 155 cm. Z= = 0.625
8.8

P = ( X <155 ) = 0.2676

Hence, 500 (0.2676) = 133.8 or 134 students

(c) 158.5 < X < 170.5 cm.

158.5−160.5
X = 158.5 z= = −0.23
8.8
X = 170.5:
170.5−160.5
z= = 1.14
8.8

P (158.5 < X < 170.5) = .0910 = .3729


= 0.4639

Hence, 500 (0.4639) = 232 students

100 | Unit I - D e s c r i p t i v e S t a t i s t i c s
ASAQ 6-6

(a) The center is 12 inches above the mean.


The center stands 1 foot taller than the mean.

(b)

50 65 74 86 98

μ = 74 σ = 12

In this distribution, a score of 86 is 12 points above the mean – it is the distance of


1.0 standard deviation. A score of 98 is 24.0 points the mean. When 24.0 is divided
by 12.0 it is 2.0 standard deviation above the mean. A score of 6.0 points below the
mean (x – X = -6.0) but the standard against which it is compared is 12.0. Its
distance is therefore - 0.5 standard deviation below the mean. From this illustration,
you can see that a z-score looks at the distance of a score from the mean relative to a
standard deviation.

distance of a score above or below the mean


𝑧=
standard deviation

101 | Unit I - D e s c r i p t i v e S t a t i s t i c s
7
Applications of
Descriptive Statistics

INTRODUCTION

This module will put into actual practice the descriptive statistics you have
learned from Module 1 to Module 6. You should not feel bad if you cannot answer
correctly all the exercises. Remember, this is an exercise where you apply all the
statistical concepts and knowledge you have so far studied.

Unlike the first seven modules, you will not find the answers to the exercises.
However, the sample of how the exercises were solved are given. You can follow the
flow of the topic because statistics is very logical course.

Take your time and please do all the exercises. Doing them will enhance your
knowledge and skills about statistics. Feel free to go over the first seven modules.
The key to appreciating statistics is a clear and understanding through regular study
and regarding of what statistics is all about.

OBJECTIVES

At the end of this module study, you will be able to use:

1. frequency distribution
2. measures of central tendency to your data
3. graphs, scatter grams, pie to data
4. normal distribution and study designs

102 | Unit I - D e s c r i p t i v e S t a t i s t i c s
SAQ 7-1
The following are test scores of 50 students. Construct a grouped frequency
distribution table for this data set. Your table should have the following
components: interval, exact limits, midpoints, f, %, cf, %.

73 74 74 80 71

65 76 93 74 80

80 65 95 84 75

85 92 75 76 85

71 85 84 94 72

90 97 95 75 90

76 82 86 89 76

89 70 55 59 88

82 86 99 95 73

73 75 79 78 65

SAQ 7 – 2
Represent the test scores of the 50 students graphically by creating a histogram and a
frequency polygon.

103 | Unit I - D e s c r i p t i v e S t a t i s t i c s
SAQ 7 – 3
You are a nurse manager of a large general hospital. You wish to investigate the presence of bed
sores or pressure sores among patients. You decided to arrange for nurse assessment of the
presence and severity of pressure sores of every patient in the hospital at some between 9 a.m.
and 6 p.m. on a specific day.

What is the population in this project?


What factors will make it difficult to assess every patient whom medical records report
occupying a bed on that particular day?
Even if every patient in the hospital on that day is assessed, why are the data collected
still only a sample?
SAQ 7 – 3
SAQ 7 – 4

A research on the audit of pressure sores among older persons was conducted in a certain
tertiary hospital. The tool used for this research is the Waterlow Pressure Sore Risk Assessment
Scale. It is a 10-item questionnaire, which provides a measure of the risk of developing pressure
sores. The maximum possible score is 50. A score of 10 indicates some risk of a pressure sore; a
score of 15 or more indicates high risk; and a score of 20 or more a very high risk.

Below is a table carrying the Waterlow scores from a pressure sore pressure sore prevalence
audit. For your guidance, the columns are explained as: column 3 = age group; column 4 =
mattress type classified as (01 = ordinary mattress; 02 = Vapem; 03 = Spenco; 09 = other types);
column 5 = number of pressure sores; column 6 = Waterlow score.

104 | Unit I - D e s c r i p t i v e S t a t i s t i c s
Table 8-1. Waterlow Scores of 40 Patients

PATIENT GENDER AGE GROUP SUPPORT NO. OF WATERLOW


NO. SURFACE SCORES SCORE
1 M 87-97 02 1 22
2 M 65-75 01 0 13
3 M 65-75 01 0 9
4 F 65-75 02 0 18
5 F 76-86 09 0 19
6 F 76-86 02 1 20
7 F 76-86 02 0 21
8 M 65-75 02 0 14
9 M 65-75 02 0 16
10 M 65-75 02 0 13
11 F 76-86 09 0 15
12 F 65-75 09 0 16
13 F 76-86 09 0 9
14 F 76-86 09 0 15
15 F 87-97 03 4 27
16 F 65-75 02 0 25
17 F 87-97 03 1 23
18 F 65-75 09 4 31
19 F 65-75 01 1 16
20 F 65-75 01 0 9
21 F 87-97 03 2 24
22 F 76-86 01 0 11
23 F 65-75 03 0 11
24 F 65-75 03 1 14
25 F 65-75 01 0 6
26 F 76-86 03 2 14
27 F 65-75 01 3 23
28 F 65-75 01 0 16
29 F 65-75 01 2 17
30 F 65-75 01 0 12
31 F 65-75 01 0 10
32 F 65-75 01 0 14
33 F 76-86 03 0 18
34 F 65-75 01 0 15
35 M 65-75 03 1 17
36 F 65-75 01 0 11
37 F 65-75 03 0 12
38 F 65-75 01 0 6
39 F 65-75 01 0 13
40 M 65-75 01 0 6

105 | Unit I - D e s c r i p t i v e S t a t i s t i c s
Questions:

1. Identify the measurement status of each variable (choice of nominal, ordinal, or


interval/ratio).
2. Construct a grouped frequency distribution table if possible, (with f and % as
component) for each variable
a. Gender
b. Age group
c. Support structure
d. Waterlow score
3. Comment on any broad patterns or features noticeable in each case.
4. Calculate the proportion of males and females with pressure sores.
5. What age range has more females? Males?
6. Display the four frequency distributions determined in question number 2 as (a)
simple bar graphs, (b) frequency polygon, (c) pie charts.
7. Comment on the usefulness of the graphs and charts.
8. Determine the ratio of males to females with pressure sores.

106 | Unit I - D e s c r i p t i v e S t a t i s t i c s

You might also like