You are on page 1of 195

BUSINESS

statistics
Chapter 1 – Basic Concepts
1.1 Introduction
Without noticing it, we often apply statistics in our daily lives.
 When we records number of clothing (shirts, blouses, pants, skirts, etc.) and
each colors in our cabinets, we are recording data of our collection of clothing
for future use.
 When we record our monthly expenses, it is one way in determining how much
will be our monthly budget for the succeeding months.
 When a nurse records the result of a physical examination of a patients, it is a
collection of data to aid the doctor/physician in diagnosing the type of illness
the patient has and to determine the appropriate medical treatment that must
be prescribed to the patient.
 When watchers of a basketball game talk about the number of rebounds and
assists done by a player, they are talking about some vital statistics of the
game.
 When a teacher records the examination scores of the students in a particular
subject, it is a recording of data which can be use later on to determine if th
method of teaching has been effective, to find out how the students performed
in the test, or to know the degree of ease or difficulty of the test.
1.2 Definition :
Statistics
 Is a branch of applied mathematics that deals with the collection,
organization, presentation, analysis, and interpretation of data.
 Its essential purpose is to describe and draw inferences about the
numerical properties of populations.
 concerned with scientific methods of collection, organization,
presentation, analysis, and interpretation of data. Its essential purpose
is to describe and draw inferences about the numerical properties of
populations.
 Is a science of learning from data.
 The data are numerical or qualitative descriptions of the objects that
we want to study( examples: averages, population statistics,
statistics).
 The term “data” means factual information or observations that may
either be quantitative or qualitative.
 The collection of data entails gathering of information through
interview schedules, structured questionnaires, observations,
experimentations, use of existing records and other methods.
 The data are then organized in an orderly fashion as a requisite for
data presentation.
 More often than not the data gathered are presented in graphs and
tables to give the readers a quick picture o the data distribution.
 Textual presentation is also being utilized when few data are to be
presented and explained.
 Data analysis comes after processing of data as guided by statistical
principles.
 This may involve the use of any method of statistics the choice of
which depends upon the nature or purpose of the statistical
problem at hand.
 Drawing valid conclusions and making reasonable decisions are
based on such analysis.
 For example, a political analyst can use data from a portion of the
voting population to predict the political preferences of the entire
voting population
Importance / Value of Statistics
 Familiarization with it enables one to make sense of the
many things one reads in newspapers and magazines and
on the internet
 in health and medicine,
 sports,
 finance,
 education, and
 other areas.
A statistical inquiry is a process of transforming raw data into
useful information that can tell us more about a subject and
allow us to make recommendations and possibly make
predictions of future outcomes. It consists of six stages:
1. posing questions - involves coming up with questions that, if
answered, would lead to meaningful information that would
allow us to draw a conclusion and to make recommendations.
For example,
 suppose you were in charge of the school’s funds that have been set
aside for the development of a new sports field, but aren’t sure which
type of field (e.g. cricket pitch, basketball court) would be of greatest
benefit to students.
 To investigate this issue, you would need to ask questions
such as
“What is the most popular sport among students?”
 (you want to construct a type of field that would satisfy the
majority of students),
“Are there enough funds to construct the students’ preferred
type of field?”
 (you can’t construct a type of field that you can’t afford) and
“How long will it take to construct?”
 The military’s research and development team would like to find out both
the ideal type of uniform for its soldiers to wear in combat and the ideal
type for them to wear in parades. Pose some questions that would need to
be answered by the team in order draw a conclusion and to make
recommendations.
 A restaurant owner would like to know how many chefs, waiters, cashiers
and managers to hire for his new restaurant and would like to know how
many staff to roster on during each time of the day. Pose some questions
that would need to be answered by the investigation
2. collecting data - Once we have posed questions,
we need to collect data to answer them.
 Before we do the actual collecting, we have to
decide on how we will collect the data, the type of
data we will collect and the sources from which we
will collect them.
 The sources can be either primary or secondary.
 Collecting from a primary source involves
collecting the data directly yourself by interviewing
or observing others or even conducting
experiments.
 When collecting data using any such methods, it is
important to ensure that the data to be collected
can be organised easily.
 For example, when creating a questionnaire, it would be better to include
questions that are not open-ended, but rather have a limited number of options
from which participants can choose their answers.
 This way, the answers collected can be easily tallied and organised.
 For instance, instead of asking someone “What is your favourite colour?”,
 it would be better to ask “Which of the following colours is your favourite?”
 and to list a few common colours that they can choose from, including an option
of “Other” in case they would like to answer with a colour that is not one of
those listed.
 Using a secondary source involves gathering data that has
already been collected or generated by others.
 This could involve gathering data from books or the internet. It is
important that the data to be collected are from a reliable source
and not from some obscure website or outdated book, otherwise
the data may not be accurate.
 Some reliable sources of note are government organisations
such as the Australian Bureau of Statistics and the Bureau of
Meteorology, which have strict data collection methodologies in
place to ensure the accuracy and reliability of their data.
 Determine whether the data to be gathered to investigate the
following would be from a primary or secondary source.
 Also state the method (eg. questionnaire, interview, observation,
experiment),
 if the source is to be a primary one, or the source (eg. books,
newspapers, internet),
 if the source is to be a secondary one, you would use to gather the
data.
1. the most popular subject among students at school
2. the average daily temperature in Sydney over the last month
3. the number of traffic accidents in the country each year
4. the number of visitors to the local library in an afternoon
5. the average daily temperature in your home over the past week
6. the number of goals scored by the Socceroos since the last World Cup
7. students’ main qualm with the school principal
3. organizing data
 arrange the data we have collected into a form that gives structure
and order to the data.
 A common way of accomplishing this is to use a table e.g. a
frequency table.
 How this data will be organized will vary as a function of the nature
of the statistical investigation.
 For example,
if the data collected were the incomes of a group of workers, it
would make more sense to organize the data into categories
of income ranges i.e. to tally up the number of workers within
certain income ranges such as $50,000-$60,000 rather than
tally up the number of workers with an income of a particular
value e.g. the number of workers with an income of $54,682.
The following are the HSC results of a class of 30 year 12 physics
students. 81 90 93 79 71 88 64 7 59 80
5

84 72 77 80 73 67 85 7 71 91
6

78 82 70 75 89 83 74 7 81 80
2
 Draw up a frequency table of the results with suitable groupings.
(HINT: HSC results are usually grouped into bands.)
4. summarizing and displaying data
 Once we have organized the data, we need to present the
data in a form that will be easy to read, understand and
analyze.
 Most often this will be accomplished by using a graph such
as a column graph, bar graph, pie chart, dot plot or line
chart.
 The particular type of graph to be used will depend on the
purpose of the investigation.
 For example,
 in order to present data on the proportion of
students with a particular type of favourite sport,
it may be more appropriate to use a pie chart
than a dot plot. Besides displaying the data in a
graph, it may also be beneficial to summarize
the data using statistical quantities such as the
mean, median, mode and range.
5. analyzing data and drawing conclusions
 After we have finished summarizing and displaying the data, it is time to
examine and interpret the data, to decide on what it means and to ultimately
draw conclusions from it.
 This may involve identifying trends and patterns from the graph, and
identifying how those trends and patterns change over time or across
categories (such as across different populations).
 From these trends, we can then draw conclusions and possibly make
predictions about future outcomes.

6. writing a report
 Once we have finished analyzing the data, it is time to put everything together
in a written report.
 Any report should address the background and aim of the statistical inquiry and
the questions it sought to answer, detail the data collection method (including
sources and type of data), involve a thorough discussion of the findings, list
and explain the reasoning behind the conclusions, and, if appropriate, include
recommendations for the future.
 It should also include the tables and graphs from steps 2 and 3 of the inquiry
(even if only as part of the appendix).
1.2.1 Categories of
Statistics:
1. Descriptive Statistics
 Deals with the collection and presentation of data and the summarizing
values that describe the group’s characteristics.
 Most common summarizing values are the measures of central
tendency and variation.
 This means that no attempt is being made to generalize to a larger set
of data. T
 his branch of statistics lays the foundation for all statistical knowledge.
 For example, if we measure the heights of the complete population of
students in a particular FEU and compute the mean height, that mean is
a descriptive measure because it describes a characteristic of the
complete population. If, on the other hand, we measure the heights of a
sample of 100 students and compute the mean height for the sample,
that mean is also a descriptive statistic because it describes a
characteristic of
 Example:
 Descriptive performance of a class of
40 students using mean score in a
given examination in statistics.
Suppose the mean score is 56 and the
passing score is 45, then you can say
that majority of the students passed
the test..
If only 12 out of 60 students obtained
scores above 45, then it means that
the exam is too difficult or the
teaching strategy was not effective
so either the teacher has to give a
less difficult exam, or reteach the
topic.
2. Inferential Statistics
 Deals with predictions and inferences based on the analysis and
interpretation of the results of the information gathered by the
statistician.
 Common statistical tools of inferential statistics are the t-test, z-test,
analysis of variance, chi-square, and Pearson r.
 Patterns in the data may be modeled, in a way that accounts for
randomness and uncertainty in
 the observations, to draw inferences about the process or population
being studied.
 Suppose we wish to make a statement about the
mean height in the complete population of
students in a particular FEU from the knowledge
of the mean computed on the sample of 100 and
to estimate the error involved in this statement,
and then we should use procedures from
inferential statistics. The application of these
procedures provides information about the
accuracy of the sample mean as an estimate of
the population mean; that is, it indicates the
degree of assurance we may place in the
inferences we draw from the sample to the
population.
Examples: gender, eye color, political preferences, religion, blood
type, civil status, year level, course, profession and
socioeconomic status
 Qualitative variables are used extensively in observational
studies
Examples: a college program (BSCS, ACT, BSCOE, BSN,
etc.), courses enrolled (Mathematics, English, Filipino,
Science, etc.) , Religion (Catholic, Protestant, Iglesia ni
Cristo, etc.) , gender (male, female) .
 Categories may be ordered but specific numerical values may
or may not be assigned.
 Example: Performance rating (poor. Fair, good, very good,
excellent), Score (low, average, high), Public opinion about
performance of President (1 means Poor, 2 means Average,
and 3 means Good)
1.2.2 Types of Data
 Data or information sources can be classified into two; primary
and secondary
a. Primary Data: are data which have been acquired directly from
source.
 They are also called as eye-witness accounts written by
people who experienced the particular event or behavior
and are collected especially for the task at hand. Thus, we
collect primary data when, for example, we observe certain
enrollment backlog and determine what causes such. Other
examples of primary data are minutes of the meeting, office
memos, financial records, membership lists, etc.
 Example:
 minutes of the meeting
 office memos
 financial records
 membership lists
 data obtained by measuring the height of students in
Statistics class
b. Secondary Data : Non -primary data or existing records
 For example:
 census data (published statistics) from National
Statistics Coordinating Board (NSCB) on demographics
represents secondary source of data.
1.2.3 Variable
 is a numerical characteristics or attribute associated with the
population being studied.
 a particular attribute of interest that is measurable or
observable on each and every individual or object.
 Variable can either be qualitative or quantitative.
 This indicates that measurement is not confined to
numerical or quantitative specification
Different type of variables
a. Categorical or qualitative variables
 Are variables that are classified according to some attribute or
categories.
 have labels or names rather than numbers, assigned to their
categories.
 Used extensively in observational studies
 Example ofCategorical or qualitative variables :
 gender (male, female)
 courses enrolled (Mathematics, English, Filipino, Science,
etc.)
 Religion (Catholic, Protestant, Iglesia ni Cristo, etc.)
a college program (BSCS, ACT, BSCOE, BSN, etc.).
b. Numerical-valued or quantitative variables.
 Are variables that are classified according to numerical
characteristics.
 are measured in numbers
 Example: height, weight, age, pulse rate, number of
children, and speed, grade point average (GPA), and
number of academic units enrolled.
 Can be treated as categorical variables when they are
grouped into class interval.
Examples of Numerical-valued or quantitative variables :
 Age in years (5-9,10-14,15-19 and 20 and above), Height in cm
(100-149, 150-199, 200-249), Grade in Math (1.00-1.49, 1.50-1.99,
2.00-2.49, 2.50-2.99, 3.00-3.49, 3.50-3.99, and 4.00-5.00)
 Numerical-valued variables an be classified as follows:
b.1 Discrete Variables
 assume only a finite or countable number of values.
 variables whose values are obtained by counting
 Example:
 Number of subjects/courses enrolled
 number of students per class
 number of children
 number of persons with blue eyes
 number of patients with TB
 number of males and
 females in Statistics class).
b.2 Continuous Variables
 variables whose values are obtained by measuring
 may take on any value in a given interval or
continuum of values
 Example:
 temperature, distance, area, density, age,
height, weight), all of which cannot be put into
a list because they can have any value in some
interval of real numbers, GPA
 The room number is a numerical variable that is treated as a categorical
variable because the numbers are assigned only as codes or as identifiers.
 The same is true with civil status.
 Age is a continuous numerical variable which is rounded off to the nearest
ones.
 Gender, job title, and type of illness are categorical variables.
1.2.4 Levels of Measurement
 Categorical Data
 Nominal: ordering does not exist, eg., gender, SSN, eye
color
 Ordinal: ordering does exist, eg., military rank, class levels,
rating scales
 Numerical Data
 Interval:
 distance exists but no ratios
 zero is arbitrary and not an indication of absence of
the measurement;
 eg.’s, temperature scale, IQ scores, GPA
 Ratio:
ratios exists and zero indicates an absence of the
measurement
1.2.5 Scales of Measurement
In selecting statistical tool to be used for drawing inferences on a random
sample, the type of measurement scale must be carefully chosen.
Measurements are classified into four scales:
a. Nominal Scale
 Isa measurement scale that classifies elements into two or more
categories or classes.
 The numbers indicate that the elements are different, but the
difference is not according to order or magnitude.
 data collected are simply labels or names or categories
without any implicit or explicit ordering of the labels;
 observations with the same label belong to the same
category;
 lowest level of measurement;
 frequencies or counts of observations belonging to the
same category can be obtained.
 arithmetic operations cannot be performed.
 Example:
 Example:

Variable Possible Data Values

1. Educational BSCS, BSCOE, BSBA, BSN, BSOA


Program

2. Gender Male, Female

3. Color Black, Red, Yellow, .... etc.

4. Math Course College Algebra, Probability and Statistics, Differential


Calculus, Trigonometry, ..., etc.
b. Ordinal Scale
 It is a measurement scale that ranks individuals in terms of the
degree to which they possess a characteristic.
 data collected are labels or classes with an implied ordering in
these labels
 distance between two labels cannot be quantified
 a level of measurement higher than nominal
 ranking can be done on the data
 Example:

Variable Possible Data Values

1 . Educational Attainment Baccalaureate, MS or MA, Ed.D. or


Ph.D.

Assistant Instructor, Instructor,


2. Academic Rank Assistant Professor, Associate
Professor, Full Professor

3. Job Position President, vice-President, Manager,


Department Head

4. Performance Rating Excellent. Good, Fair. Poor


 Example:
c. Interval Scale
 It is a measurement scale that, in addition to ordering scores
from highest to lowest, establishes a uniform unit in the scale so
that any distance between two consecutive scores is of equal
magnitude.
 data collected can be ordered, and in addition, may be added or
subtracted, but not divided nor multiplied
 distances between any two numbers on the scale are of known
size, the unit of measurement is constant
 (but arbitrary), and the zero point is arbitrary
 a level of measurement higher than ordinal
 Example: the aptitude scores from 80 to 90 are of equal
difference as that of the aptitude score from 90 to 100 (both
being equal to 10).
There is no absolute zero in this scale.
Example: a place where the temperature reading is 0
degree Celsius does not mean that there is no temperature
in that place.
 Example:

Variable Possible Data Values

1. Intelligence Quotient (IQ) 80, 85, 93, ...

2. Emmotional Quotient 80, 85, 93, ...

3.Temperature -100C , 00C, 150C


d. Ratio Scale
 is a measurement scale that, in addition to being an
interval scale, also has an absolute zero the scale.
 data collected has all the properties of the interval scale, and
in addition can be multiplied and divided
 has a true zero point
 is the highest level of measurement
 Examples: height, weight, are, volume, speed, rate doing
work, amount of money deposited in a bank.

Variable Possible Data Values

1. Length 10 m to 20 m

2. Height 4’ to 7’

3. Area 12 m2 to 25 m2

4. Weight 20 kg. to 50 kg.


 Example:
1.2.6 Population
 It is defined as a group of people, animals, places,
things or ideas.
 The sum total of all units of analysis.
 Ideally we would like to study the entire population, to
give more weight to our findings. Often, however, we
are unable to study the entire population and must
settle for a sample.
1.2.7 Sample
 It is a subgroup of the population.
 A portion of the total population.
 a subset or portion of the total population.
 The sample must always be viewed as an
approximation of the whole rather than as a whole in
itself. A 100 percent sample would be the entire
population; a 1 percent sample would consist of only 1
out of every 100 units in the population.
1.2.8 Parameter is a numerical measure that describes a
characteristic of a population.
 Example:
a. The population mean of the electricity bills of the
residents of a certain city is P1,500.00.
b. The population mean IQ of the student in a certain
university is 105.
1.2.9 Statistic is a numerical measure that describes a
characteristic of a sample.
 Example:

a. The sample mean of the electricity bills of 20


residents of a certain city is P1,450.00
b. The sample mean IQ of 35 students in a certain
university is 105.
Exercise:
Next Slide
is the
Answer
Answer:
Chapter 2 – Collection of Data
2.1 Introduction
 There is no formula for selecting the best method to be used
when collecting/gathering data.
 It depends on the researcher’s design of he study, the type of
data, the time allotment to complete the study, and the
researcher’s financial capacity.
 Some common methods of data collection are: interview
method, questionaire, observation, test, experiment, registration,
and the use of mechanical devices.
2.2 Methods of Collecting Data
 There are several ways by which data may be collected for scientific
inquiry.
 These are: The first two give rise to what we call primary data or data
which have been acquired directly from the source of information;
 the third is secondary data or data which have been acquired through
secondary means such as publications or existing records.
1) objective method - by measurement, counting, or observation.
 requires the use of measuring or counting instruments such as a meter stick,
weighing scale, thermometer, or any counting device.
 Thus, one may collect data by objective means either by measurement or
counting, or by observation.

2) subjective method - provided by respondent


 The subjective method relies on the information provided by identified
respondents.
 The instrument used to gather data usually in the form of questionnaire. Thus,
one may collect data by conducting an interview (which may come in the form of
a personal interview or a telephone interview) or by gathering respondent-
administered questionnaires
3) use of existing records - published statistics
 The method of using the existing records entails the use of data that
have been previously collected by another person or institution for some
other purpose.
 Thus, one may collect and use these data for as long as the source of
the data is appropriately acknowledged.
Methods of Presenting Data: textual, tabular, and graphical presentation.
 Once data have been collected, they must be processed in some way so that any
important patterns or trends become apparent.
 Simple inspection of data in its raw form will ordinarily communicate very little to
the understanding of the investigator.
 Some form of data presentation and description is required to facilitate meaningful
interpretation and to extract the maximum amount of information from them.
 Organized data may be presented in three ways:
1) textual presentation – a narrative; may be written to describe the
characteristics of he population based on the data collected and organized.
 There are surveys where the numbers are compared and
commented upon.
 The textual form of presentation serves this purpose very
well.
 Example:
In a span of four years, the country will be populated with over 94 million
people. This translates to a yearly population growth of 1.95 percent from
2005 to 2010. Women are expected to live longer than men by 5.5 years, but
less by half a year compared to five years earlier. Males (47.3M) will still
outnumber the females (46.7M).
 These results were taken from the 2000 census-based national, regional and
provincial population projections released by the National Statistics Office on
April 4, 2006.
2) tabular presentation – data are arranged and entered into the appropriate row
and/or column categories.
 In the tabular form, data are arranged and entered into the appropriate row
and/or column categories.
 A table consists of table heading, the stub, the caption, the body, and the
source. Every table should be easy to understand.
 The table heading includes the number of the table and followed by the title.
The title should adequately express in a continuous sense the nature of the
data presented.
 The stubs, given at the left, describe the data found in the rows of the table;
they give the classifications or categories into which the figures fall.
 At the top of the column in the table is the caption or box heading; this gives the
designation of the column, or identifies the figures found in that column. The
body constitutes the main part of the table.

Source: National Statistical Coordination Board (NSCB) posted on May
06, 2006.
The different methods of collecting data are as follows:
2.2.1 Interview Method

a. Direct Method - the researcher


personally interview the respondents.
- the method is appropriate to use if the
information needed is minimal and
the number of respondents is few (less
than 30 individuals).
- it is very costly and time-consuming if
the number of respondents is very large
and they are very far apart.
b. Indirect Method - the researcher uses a
telephone to interview the respondents.
- it is quite expensive too, if there
are so many respondents.
- the researcher can never be sure if he
or she is interviewing the right person
since there is no personal contact nor a
personal exchange of ideas.
- this method is biased because people
with no telephones cannot have a chance
to be included in the study.
2.2.2 Questionnaire Method
 A questionnaire is a list of well-planned question written on
paper, which can be either personally administered or mailed by
the researcher to the respondents using any of the following
forms:
a. Guided-Response Type e. Multiple-Choice Type
b. Recall Type f. Multiple-Response Type
c. Recognition Type g. Free-Response Type
d. Dichotomous type h. Rating Scale type
a. Guided-Response Type
The respondent is guided in making his her reply.
Example:
1. Have you been convicted of any crime?
Yes _______ No _______. If your answer is yes, go to
the next question. If your answer is no, go to question
number 3.
2. Attitude towards mathematics
b. Recall Type
Example:
a) Age
b) Sex
c) Civil Status
d) Length of stay in a community
e) Number of times you have been hospitalized due to a serious illness

c. Recognition Type
Example:
d. Dichotomous Type
Example:
Do you live alone? Yes ____ No ____
e. Multiple-Choice Type
Example:
Which of the following means abattoir?
a) dungeon
b) cave
c) house
d) chateau
e) none of these
f. Multiple-Response Type
Example:
What appliances/devices do you have at home? Encircle the
numbers.
1. Television 7. Vacuum cleaner
2. Refrigerator 8. Personal computer
3. DVD/VCD player 9. Fax machine
4. Piano/organ 10. Telephone
5. Electric stove
6. Gas range
g. Free-Response Type
The respondent is not guided in giving his reply. He can answer using his
style and his own way.
h. Rating Scale Type
Example:
2.2.3 Empirical Observation Method
 is a method of obtaining data by seeing, hearing,
testing, touching, and smelling.
 Thismethod is commonly used in psychological
and anthropological studies.
 Through observation, additional information,
which cannot be obtained using the other
methods like the questionnaire, may be gathered.
 The observer may participate in the activities of
the group being studied (participant observation)
or he may just be a bystander (nonparticipant
observation).
 When an observation is done in a laboratory,
as in the case of experimental studies, the
type of observation is controlled
observation.
2.2.4 Test Method
Thismethod is widely used in
psychological research and psychiatry.
Standards tests are used because of their
validity, reliability, and usability.
2.2.5 Registration Method
 Example data gathered using this method are those that are obtained from the
National Statistics Office, Department of Education, CHED, SEC, Supreme Court,
and other government agencies.

2.2.6 Mechanical Devices


 The devices that can be used when gathering data for social and educational
researches are the camera, projector, videotape, tape recorder, etc.
 In chemical, biological, and medical researches, the common devices are the X-ray
machine, microscope, ultrasound, weighing scale, CT scan, et.
 In astronomy and atmospheric researches, the telescope, barometer, computer, radar
machines, camera, and satellite are commonly used.
2.3 Sampling Techniques
 Before the collection of data, it is necessary to determine the
sample size if the population is very large.
 Example: If a researcher wants to find the average IQ of Filipino
children aged 5 to 7 years in the rural areas and he has only a
few months to spend for collecting data, sampling is allowed to
save time and money.
 To compute for the sample size, the Slovin’s formula will be
used:

where : n = sample size


N = number of cases
e = margin of error
Example:
A researcher wants to know the average income of the families
living in Barangay A which has 2,500 residents. Calculate the
sample size the researcher will need if a 5% margin of error is
allowed.
Given: N = 2,500
e = 0.05
Find: the sample size
Solution:
Several sampling techniques that can be applied in selecting the
sample without being accused as biased.
2.3.1 Random Sampling
 Thismethod gives all members of the population an equal
chance of being included in the study.
 Applicable if the target population is not classified as
clusters, sections, levels, or classes.
 This method is easy to use, but not when population is very
large, like a thousand or more.
a. Lottery Method
 Mostcommon and the easiest method of random
sampling.
 How to use the table of random numbers:
Nine individuals are assigned with the following
numbers and five to be selected as members of the
experimental group/
2.3.2 Systematic Sampling
a. Stratified Random Sampling
 Applied when the population is divided into different strata or
classes wherein each class must be presented in the study.
 Example:
Suppose a researcher wants to determine the average
income of the families in a barangay having 3,000 families,
distributed in five purok’s. Compute for the sample size n at
a 5% margin of error.
Given: N = 3,000 and e = 0.05
Solution:
b) Cluster Sampling
 When the geographical area where the study will be done is too
big and the target population is too large, the cluster sampling
technique may be appropriate.
 The selection of sample units is not by individual but by groups
called clusters.
 The area will be divided into clusters, then a desired number of
clusters will be selected at random.
 Example:
A doctor wants to make a nationwide study on the correlation
between smoking and death rate. He decided to focus on the 13
regions of the country, which can be considered as the cluster. If
three of the 13 clusters or regions are the desired sample units, the
names of the 13 clusters will be written on small pieces of paper,
then three will be picked at random using the lottery method. All
the residents of the selected three clusters will be included in the
study.
Examining Distribution
 Statistical analysis starts with data.
 Cases are the object described by a set of data.
 Cases maybe customers, companies, subjects in a study, or other objects.
 A label is a special variable used in some data sets to distinguish the
different cases.
 A variable is a characteristics of a case.
 Different cases can have different values for the variables.
 Example: Restaurant Discount Coupons
A website offers coupons that can be used to get discounts for various
items at local restaurants. Coupons for food are very popular. Figure 1 gives
information for seven restaurant coupons that were available for a recent
weekend. These are the cases. Data for each coupon are listed on a
different line, and the first column has the coupons numbered from 1 to 7.
The next columns gives the type of restaurant, the name of the restaurant,
the item being discounted, the regular price, and the discount price.
 Some variables, like the type of restaurant, the name of the
restaurant, and the item simply place coupons into categories.
The regular price and discount price columns have numerical
values for which we can do arithmetic.
 It makes sense to give an average of the regular prices, but it
does not make sense to give an “average’’ type of restaurant.
 We can, however, do arithmetic to compare the regular prices
classified by type of restaurant
Apply your knowledge
1. How much is the discount worth?
2. Read the spreadsheet.
Give the regular price and the discount price for the
Smokey Grill ribs coupon.
3. What case do the data describe?
4. How many cases are there?
5. How many variables are there?
6. What are their definitions and units of measurement?
7. What purpose do the data have?
Exercise:
1. Compute for the margin of error to be used if 800
sample units are required from a population of 2,400.
2. A researcher plans to get 588 sample units from a
population N using a 4% margin of error. What is the
value of the population N?
Next slide
is the
solution
1. Compute for the margin of error to be used if 800 sample units are
required from a population of 2,400.
Given: N = 2,400 and n = 800
Find: e
Formula:

Solution:

f.a.
2. A researcher plans to get 588 sample units from a population N
using a 4% margin of error. What is the value of the population N?
Given: n =588 ; e = 4% or 0.04
Find: N
Formula:

Solution:

N = 9,932 f.a.
2.3.3 Purposive Sampling
 The respondents of the study will be chosen based
on their knowledge of the information required by
the researcher.
 Example:

Suppose a researcher wants to make a historical


study about Town A. The target population will be
senior citizen of the town since they are most reliable
person who know the history of the town. If there are
2,000 senior citizens and a 3% margin of error is
allowed, the sample size will be 714. They will be
chosen using any of the methods discussed
previously.
2.3.4 Quota Sampling
 This technique commonly used in opinion polls.
 Suppose a salesman required to gather information as to the
most common hair shampoo used by female Filipino clients. If
he wants 2,000 sample units and he needs to do the survey
within a short timetable, he can station himself at a public place,
such as a park or mall, then ask the females what shampoo
they usually use. After meeting the required number of sample
points, the researcher is through with his collection of data.
2.3.5 Convenience Sampling
 This technique is resorted to by researcher who need the
information the fastest way possible
 The telephone can be use to interview the respondents about
their opinions on a certain issue.
 This method may be fast but it is also biased because those who
have no telephones do not have a chance to be included in the
study.
 Another example is the case of a teacher who makes a research
which requires the inclusion of students as respondents.
Conveniently, the teacher may use his own students as
respondents.
Chapter 3 – Organization and Presentation of
3.1 Introduction Data
 Data gathered can be made more interesting and organized by
presenting them in the form of graphs and table.
 Reader may not appreciate reading a statistical report on the current
population of different countries in the world if the report is just a list
of numbers from one paragraph to another.
 The data types are:
 Frequency distribution
 Correlated data and
 Time series.
 There is no need to construct the frequency distribution table if the
number of observation is less than 30.
 The data that are presented in a frequency distribution table are
called grouped data while those that are not are called ungrouped
data.
Distribution
 shows a pattern of variation of a variable
 displays how often each value occurs.
 In its simplest form, a distribution is just a list of the individual
measures that are taken on some particular variable.
 Take a close look at the texture of interrelationships among these
scores—in particular, how they spread out and how they cluster
together—and you are in effect examining their distribution. It is a
very simple concept, made even simpler by the fact that
distributions are very easy to visualize.
 Indeed, in most cases a single glimpse of a graphic representation
of a distribution will tell you more about it than several minutes of
staring at a bare list of numbers. One very simple form of graphic
representation is shown below in Figure 1.1.
 Suppose, for example, that 12 students in a statistics course have achieved the
following scores on their first exam, arranged in order from lowest to highest: 61, 69,
72, 76, 78, 83, 85, 85, 86, 88, 93, 97
Figure 1.1 Distribution of the Scores of 12 Students
on a Statistics Exam

 The horizontal axis lays out that portion of the scale of exam scores that includes all
12 of the listed values, and each student's individual score on the exam is
represented by a box placed at the appropriate point on the scale.
 Thus, the box for student 'a' is placed at 61 on the scale, the box representing
student 'b' falls at 69, and so forth.
 The type of graph shown in Figure 1.1 is useful when you are interested in
conveying detailed information about each and every measure in the distribution,
though with larger numbers of measures it can become quite cumbersome.
 Besides, for most practical statistical purposes your interest is not so much in the
individual identity of your measures as in the overall shape and texture of the
distribution that they compose.
3.2 Raw Data and Frequency Distribution
 A special table that may be constructed for any variable is the
Frequency Distribution Table (FDT).
 Whether the variable is quantitative or qualitative, an FDT may be
constructed for it.
 Thus we may have a quantitative FDT and qualitative FDT
 Constructing a qualitative FDT only requires identifying the categories
where each datum may be classified and counting how many of the
data belong to each category.
 Constructing a quantitative FDT, on the other hand, requires the
creation of classes where each datum may be classified as belonging to
one of these classes.
 Raw Data will not be or organized in any meaningful manner.
 When data are quantitative, we can use the frequency distribution
and histogram.
 Frequency distribution is a table that divides the data values
into classes and shows the number of observed values that fall
into each class.
 By converting data to a frequency distribution, we gain
perspective that helps us see the forest instead of the individual
trees.
 Histogram a more visual representation.
 Describe a frequency distribution by using a series of adjacent
rectangles, each of which has a length that is proportional to
the frequency of the observations within the range of values it
represents.
The Frequency Distribution
 A frequency distribution table is a device for organizing and presenting
data.
 When the set of data contains more than 30 cases, a frequency
distribution table may be constructed to make the task more manageable
and to save time in calculating different statistics.
 Key Terms:
 Class: each category of the frequency distribution.
 Frequency: the number of data values falling within each class.
 Class limits: the boundaries for each class.
 These determine which data values are assigned to that class.

 Class size: the width of each class.


 This is the difference between the lower limit of the class and the lower limit of the
next higher class.
 When a frequency distribution is to have equally wide classes, the approximate width of each class
is

 Class mark: the midpoint of each class.


 This is midway between the upper and lower limits.

 Steps in constructing frequency distribution table:


1. Find the range R using the formula:
R = Highest score – Lowest score
2. Compute the number of intervals or classes, n, by using the formula:
2k ≥ N, where k is the number of classes and N is the population size.
where: k = number of class interval or classes
N = population or total number of observations.
For N≥ 30: 𝑈𝑠𝑒 2𝑘 ≥ 𝑁 𝑡𝑜 𝑠𝑜𝑙𝑣𝑒 𝑓𝑜𝑟 𝑘.
For N < 30: k = 1 + 3.3 log N
3. Compute for the class size , c, using the formula:
4. Construct the classes as follows. Each class is an interval of
values defined by its lower and upper class limits.
a. The lower limit (LL) of the lowest class is conventionally
taken to be the lowest value. The lower limits of the
succeeding classes are obtained by simply adding i to the
lower limit of the preceding class.
b. The upper limit (UL) of the lowest class can then be easily
obtained by subtracting one unit of measure from the lower
limit of the next class. Using the lowest score as lower limit,
add (c – 1) to it to obtain the higher limit of desired class
interval.
5. The lower limit of the second interval may be obtained by
adding the class size to the lower limit of the first interval.
 Add (c – 1) to the result to obtain the higher limit of the
second interval.
6. Repeat step 5 to obtain the third interval, and so on, so forth.
7. When the k class intervals are completed, determine the
frequency for each class interval by counting the elements.
Solution:
Step:
1. R = 60-6 = 54
2. N = 36, Thus, use 2𝑘 ≥ 𝑁
26 = 64 𝐻𝑒𝑛𝑐𝑒 26 ≥ 36
k=6

3.
c= 9
4. Lowest score = 6, this becomes the lower limit of the first
interval.
Higher limit = 6 + (9 – 1) = 14
5. Lower limit of the second interval = 9 + 6 = 15
Higher limit of second interval = 15 + (9-1) = 23
so on, and so forth
Example 2:
Solution
to
Exercise
1
Solution:
1. R=9–2=7
2. N = 77 Hence use 2𝑘 ≥ 𝑁
27 = 128 𝑇ℎ𝑢𝑠 k= 7
3. c = 7/7 = 1
a. Class Mark: midpoint of class interval.

b. Class Boundaries: known as exact limits, can be


obtained by subtracting 0.5 from the lower limit of an
interval and adding 0.5 to the upper limit of the
interval.
3.3 Correlated Data
 A correlated set of data is a table where two or more
frequencies are shown for easy comparison.
 In table 3.7, if one wants to compare the number of cases
according to age, then the total frequency will be the sum of
the frequencies of the male and female patients per group.
3.4 Time Series Data
 This data shows the change of a variable over a period of time.
 Example an annual population growth rate in a certain place.
3.5 Graph
 After the data has been collected and tabulated, the next step is to
sketch the graph to make the data more presentable, easier to
understand, and more appealing and pleasing to the reader.
3.5.1 Line Graph
 The line graph is capable of simultaneously showing values of two quantitative
variables (y or vertical axis, and x or horizontal axis).
 It consists of linear segments connecting points observed or measured for each
variable.
 When x represents time, the result is time series view of of the y variable.
Example 3:
3.5.2 Bar Graph
 Usually presented to compare data or to determine which class or interval
is common or appears frequently in the text.
 Rectangular figures or bars are used to show variations in the frequencies
of observation.
3.5.3 Pie Chart
 The pie chart is a circular display divided into sections based on either the
number of observations within or the relative values of the segments.
 It is useful when presenting the sizes of components that make up a certain
whole entity.
Example 4: How do you do online research?
A study of 552 first-year college students asked about their
preference for the online resources. One question asked them to
pick their favorite. Here are the results:

Note that the last value of the variable resource is “Other”, which
includes all other online resources that were given as selection
options. For data sets that have a large number of values for a
categorical variable, we often create a category such as this that
includes categories that have relatively small counts or percents.
Careful judgment is needed when doing this. You don’t want to cover
up some important piece of information contained in the data by
combining data in this way.
3.5.4 Frequency Histogram
 One of the kinds graphs which can be applied for grouped
data.
 The frequency will be represented by points in the
3.5.6 Frequency Polygon
 Unlike in the frequency histogram where bars drawn by side are used, points
connected by line segment are utilized in the frequency polygon.
3.5.7 Cumulative Frequency Ogive
 Used in statistical reports.
 Determine the cumulative frequencies (CF):
a. “Less than” cumulative frequency ( <CF) is the total number of
observations whose values do not exceed the upper limit of the class
 <CF (read as “less than cumulative frequency”), each entry in <CF column will be
obtained by accumulating the frequencies starting from the frequency of the lowest
score interval. Up to th highest score.
b. “Greater than” cumulative frequency( >CF) is the total number of
observations whose values are not less than the lower limit of the class.
 >CF (read as “greater than cumulative frequency”) starting from the highest score
interval up to lowest score interval.

Figure 3.7 < Cumulative Frequency (Ogive)


3.5.8 Relative Frequency Graph
 Known as percentage frequency.
 Compute the relative cumulative frequencies (RCF):
a. “Less than” relative cumulative frequency ( <RCF)
<𝐶𝑓
< 𝑅𝐶𝐹 = x 100%
𝑁

b. “Greater than” relative cumulative frequency ( >RCF)


> 𝐶𝑓
> 𝑅𝐶𝐹 = 𝑥 100%
𝑁
3.5.9 Dot Plot
 One of the simplest graphical summaries of data.
 A horizontal axis shows the range for the data.
 Each data value is represented by a dot placed above the axis.
 Dot plots show the details of the data and are useful for comparing the
distribution of the data for two or more variables.
1. Consider the quantitative data in Table 2.4.
Seat Work:
These data show the time in days required to complete
year-end audits for a sample of 20 clients of Sanderson
and Clifford, a small public accounting firm. Use 5
classes, 5 class size, and 12 as lowest value. Construct
a frequency distribution table.
Seat work 2. Grocery Shoppers Survey – Analysis of Grocery Spend
Refer to Table 2.1. Use 400 as the lowest limit.
1. Compute the numeric frequency distribution and percentage
frequency distribution for the amount spent on groceries last month
by grocery shoppers.
2. Compute the cumulative frequency distribution and its graph, the
ogive, for the amount spent on groceries last month.
Management Questions
1. What percentage of shoppers spent less than R1 200 last month?
2. What percentage of shoppers spent R1 600 or more last month?
3. What percentage of shoppers spent between R800 and R1 600 last
month?
4. What was the maximum amount spent last month by the 20% of
shoppers who spent the least on groceries? Approximate your answer.
5. What is the approximate minimum amount spent on groceries last
month by thetop-spending 50% of shoppers?
Table 2.1
Answer to
seat work
Answer to Seat work 1:
Given: k = 5 , c = 5 and lowest value is 12:
Answer to Seat work 2:
1. The numeric frequency distribution for amount spent is
computed using the construction steps outlined earlier.
R = 2,136 − 456 = 1,680.
2𝑘 ≥ 𝑁 𝑠𝑖𝑛𝑐𝑒 𝑁 = 30
25 = 32
Thus, k = 5
𝑅
𝑐=
𝑘

c = 1680/5 = 336
2.The cumulative frequency distribution (ogive) for amount spent on
groceries last month is computed using the construction guidelines
outlined above for the ogive.
Based on the numeric frequency distribution in Table 3.15, an additional
interval (0 – < 400) is included. The cumulative frequency count for this
interval is zero, since no shopper spent less than 400 on groceries last
month. Referring to the upper limits for each successive interval above
400, the following cumulative counts are derived:
7 shoppers spent up to 800
21 (= 7 + 14) shoppers spent up to 1200
26 (= 21 + 5) shoppers spent up to 1600
29 (= 26 + 3) shoppers spent up to 2000
all 30 shoppers (= 29 + 1) spent no more than 2400 on groceries
last month.
The ogives for both the frequency counts and percentages are shown in
Table 3.15.
 Figure 3.9 shows the percentage ogive graph. Note that the %
cumulative frequency is 0% at 400 (the upper limit of the extra
interval) and 100% at the upper limit of 2400 for the last interval.
 This means that no shopper spent less than 400 or more than 2400
last month on groceries.
Management Interpretation

1. 70% of shoppers spent less than 1200 on groceries last month.


2. 3.3% (100% − 86.7%) of shoppers spent R1 600 or more on
groceries last month.
3. 63.4% (86.7% − 23.3% or 46.7% + 16.7%) of shoppers spent
between 800 and 1600 on groceries last month.
4. The bottom 20% of shoppers spent no more than 770
(approximately) on groceries last month. (Using the
percentage cumulative frequency polygon, this answer is
found by projecting 20% from the y-axis to the polygon graph
and reading off the amount spent on the x-axis.)
5. From the y-axis value at 50%, the minimum amount spent on
groceries by the top spending 50% of shoppers is (approximately)
1 000.

Note: The ogive is a less than cumulative frequency graph, but it can
also be used to answer questions of a more than nature (by
subtracting the less than cumulative percentage from 100%, or the
cumulative count from n, the sample size).
3.5.10 The Scatter Diagram
 A scatter diagram is a graphical presentation of the
relationship between two quantitative variables, and a
trendline is a line that provides an approximation of the
relationship.
 The diagram represents a pair of known or observed
values of two variables, generally referred to x and y.
 The two variables are referred to as the dependent (y)
and independent (x) variables.
 The the typical purpose fort his type of analysis is to
estimate or predict what y will be for a given value of x.
3.5.11Stem -and-Leaf Diagram
 Stem-and-Leaf diagram is a diagram that presents a graphical
display of the ungrouped data.
 It is also called as Stemplot.
 The data are arranged by its stems and leaves.
 The leading digits are called stems, the final digits are the leaves.
 This form is best for small number of observations with values
greater than 0.
2. For each datum, identify its leaf (the units
digit) and its stem (all other digits except the
last or units digit).
Example: 24 4 is the leaf
2 is the stem
79 9 is the leaf
7 is the stem
3.List the stems vertically in increasing order
from top to bottom.
4. Draw a vertical line into the right of the
stems.
5. List the leaves to the corresponding stem to
the right of the line in an increasing order.
Chapter 4: Descriptive Measures
For large data, use 2𝑘 ≥ 𝑁, N ≥ 30
where k = the class interval
N = total population
Descriptive Measures:
1. Central tendency
2. Dispersion
3. Location
4. Skewness
5. Kurtosis
Central Tendency:
 A measure of central tendency or location describes the “center” of a
given set of data.
 A value within the range of data set which describes its location or
position relative to the entire set of data.
 It is referred to as either measure of central tendency or measure of other
position or location.
 It is a single value about which the observations tend to cluster.
 The common measures of location are :
1. Arithmetic Mean
2. Median
3. Mode
Ungrouped Data
4.1 Mean
 Arithmetic mean is defined as the sum of the data values
divided by the number of observation.
 It is one of the most common measures of central tendency.
 Also referred to as arithmetic average or simply mean.
 Expressed as  (the population mean, pronounced as
“myew”) or (the sample mean, ”x bar”).

4.1.1 Population Mean

where  = population mean


= the ith data value in the population
 = the sum of
N = number of data values in the population
Example:
Suppose that Chris is a college junior majoring in business. This semester
Chris is taking five classes and the numbers of students enrolled in the
classes (that is, the class sizes) are as follows:
4.1.2 Sample Mean

where = sample mean


= the ith data value in the sample
 = the sum of
n = number of data values in the sample
Example: Sample Mean
Solution:
1. Of course, intuitively, we are likely to obtain a more accurate point
estimate of the population mean by using all of the available sample
information. The sum of all 50 mileages can be verified to be
Therefore, the mean of the sample of 50 mileages is
4.1.3 Weighted Mean
 Sometimes referred to as a weighted average.
 Each data value is weighted according to its relative importance.
 The formula for the weighted mean for a population or a sample:
where
Example: Weighted Mean
Consider the following data for shipments of peanuts from hypothetical
U.S. exporter to five Canadian cities.

a. Calculate the population arithmetic mean


b. Calculated the weighted mean.
Solution:
a.

b.

= $14.04 per thousand bags


4.2 Median
 The Median is the value that has just as many values above it as
below it.
 Example 1 : The number of bags of peanuts (in thousands)
shipped by the U.S. exporters to the five Canadian cities were:
 Ryder system, Inc. reported the following data for
Example 2:
percentage return on average assets over an 8-year period.
Raw Data 2.8 7.0 1.6 0.4 1.9 2.6 3.8 3.8
Solution:
Arranged the data in increasing order:
4.3 The Mode
 The Mode is the value that occurs with the greatest frequency.
 Ryder system, Inc. reported the following data for
Example:
percentage return on average assets over an 8-year period.
Raw Data 2.8 7.0 1.6 0.4 1.9 2.6 3.8 3.8
Solution:
Arranged in increasing order
0.4 1.6 1.9 2.6 2.8 3.8 3.8 7.0

Mode = 3.8
Example: Consider the score listed below.
Table 1:

The use of array serves as a very effective tool in facilitating the construction of
an FDT. One thing which might be done is to rewrite these 120 post test scores in
order of magnitude from highest to lowest or, if preferred, from lowest to highest.
Table 1a:

Construct the Frequency Distribution Table


1. R = 48 – 15 = 33
2. Since 27 > 120, 𝑡ℎ𝑒𝑛 𝑘 = 7
33
3. 𝑐= ≈5
7
4. Write down the classes in the first column and tally on the second
column.
5. The third column is the frequency.
6. Obtain the relative frequency column
7
RF of the lowest class is 𝑅𝐹 = 𝑥 100% = 5.8 %
120

Compute the class mark: 𝑋𝑚


𝐿𝐿+𝑈𝐿 15+19
7. Class mark of the lowest class = = = 17
2 2

Obtain the succeeding class mark by adding c to the preceding


class mark.
8. Compute the Class Boundary
Lower Class boundary = LL – ½ = 15 – 0.5 = 14.5
Upper Class boundary = UL + ½ = 19 + 0.5 = 19.5
Note: Obtain the succeeding class boundary adding c to the preceding
class boundary.
9. Compute the cumulative frequencies (CF)
Example:
for “Less than” cumulative frequency (<CF)
<CF of the lowest class = number of observations
less than or equal to the upper limit of the
lowest class, which is 19.
<CF = 7
<CF of the second class = number of observations less
than or equal to 24.
<CF = 19
for “Greater than” cumulative frequency (>CF)
>CF of the lowest class = number of observations
greater than or equal to the lower limit of the
lowest class, which is 15.
>CF = 120
>CF of the second class = number of observations
greater than or equal to 20.
<CF = 113
10. Compute the relative cumulative frequencies (RCF)
for “Less than” relative cumulative frequency (<RCF)
120
< RCF of the lowest class = x 100% = 100%
120
113
> 𝑅𝐶𝐹 𝑜𝑓 𝑡ℎ𝑒 𝑠𝑒𝑐𝑜𝑛𝑑 𝑐𝑙𝑎𝑠𝑠 = 𝑥 100% = 94.2%
120
Frequency Distribution Table.

Class f RF CM Class CF RCF (%)


(%) Boundar <CF >CF <RCF >RCF
y
15-19 7 508 17 14.5- 7 120 5.8 100
19.5
20-24 12 10 22 19.5- 19 113 15.8 94.2
24.5
25-29 21 17.5 27 24.5- 40 101 33.3 84.2
29.5
30-34 36 29.2 32 29.5- 75 80 62.5 66.7
34.5
35-39 22 18.3 37 34.5- 97 45 80.8 37.5
39.5
40-44 14 11.7 42 39.5- 111 23 92.5 19.2
44.5
45-49 9 7.5 47 44.5- 120 9 100 7.5
49.5
a. Arithmetic Mean (or simply Mean) – the sum of the
observations divided by the number of observations
totaled, denoted by µ (for population mean).
1. For the ungrouped data:
σ𝑁
𝑖=1 𝑥𝑖
2. 𝑋=
𝑁
where
𝑥𝑖 = 𝑖𝑠 𝑡ℎ𝑒 𝑖𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
N = is the total number of observation
Example: Consider the score listed below.

Ungrouped Data computation:


19+24+30+⋯+26 3958
𝜇= =
120 120

 = 33
For Grouped Mean:
σ𝑘
𝑖=1 𝑓𝑖 𝑋𝑚
𝑋=
σ𝑘
𝑖=1 𝑓𝑖

where:
𝑋 = class mark
𝑓𝑖 = 𝑖𝑡ℎ 𝑐𝑙𝑎𝑠𝑠 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
k = number of classes
σ𝑘𝑖=1 𝑓𝑖 = 𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 (𝑁)_
\\\\

𝑿 = 𝟑𝟐. 𝟒𝟔
b. Median is the middle value of an array, denoted by
Md (for population median)

Using table 1: Arrange the data in increasing order.


Table 1a:
Table 1b: Arrange data of table 1a
For Ungrouped Median:
Since the total number of observation is even(N=120), then,
𝑦𝑁 + 𝑦 𝑁 𝑦120 +𝑦 120
2 +1
2 2 2 +1
Md = =
2 2
𝑦60 + 𝑦61 33+33
= = = 33
2 2
For Grouped data:
𝑁
2
− 𝐹𝑏
𝑀𝑑𝐺 = 𝐿𝐶𝐵𝑀𝑑 + c
𝑓𝑀𝑑

where:
𝐿𝐶𝐵𝑀𝑑 = 𝑙𝑜𝑤𝑒𝑟 𝑐𝑙𝑎𝑠𝑠 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑚𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠
c = class size
𝐹𝑏 = < 𝐶𝐹 𝑖𝑚𝑚𝑒𝑑𝑖𝑎𝑡𝑒𝑙𝑦 𝑏𝑒𝑓𝑜𝑟𝑒 𝑡ℎ𝑒 𝑚𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠
𝑓𝑀𝑑 = 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑚𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠
Note:
𝑁
The median class = value of the arranged data set.
2
 Consider the FDT 𝐹𝑏 = 40

<C
F
7
19
40
Median class 75
97
11
1
𝑁 120 12
= = 60𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜n 0
2 2

Lower class boundary = 30-0.5 = 29.5


Thus, the grouped median is
𝑁
2
− 𝐹𝑏
𝑀𝑑𝐺 = 𝐿𝐶𝐵𝑀𝑑 + c
𝑓𝑀𝑑
120
2
−40
= 29.5 + (5) = 29.5 + (0.57)(5)
35
= 29.5 + 2.85
𝑀𝑑𝐺 = 32.35
c. Mode is the observation which appears the most frequent in the data set,
denoted by 𝑀𝑜 .
In the example above, the mode is
𝑀0 = 33 (the value that occurs with the greatest frequency)
For Grouped Data:
𝑓𝑀𝑜 − 𝑓𝑏
𝑀𝑜𝐺 = 𝐿𝐶𝐵𝑀𝑜 + 𝑐
2𝑓𝑀𝑜 − 𝑓𝑏 − 𝑓𝑎
where:
𝐿𝐶𝐵𝑀𝑜 = 𝑙𝑜𝑤𝑒𝑟 𝑐𝑙𝑎𝑠𝑠 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑚𝑜𝑑𝑎𝑙 𝑐𝑙𝑎𝑠𝑠
c= class size
𝑓𝑀𝑜 = 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑚𝑜𝑑𝑎𝑙 𝑐𝑙𝑎𝑠𝑠
𝑓𝑏 = 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑖𝑚𝑚𝑒𝑑𝑖𝑎𝑡𝑒𝑙𝑦 𝑏𝑒𝑓𝑜𝑟𝑒 𝑡ℎ𝑒 𝑚𝑜𝑑𝑎𝑙 𝑐𝑙𝑎𝑠𝑠
𝑓𝑎 = 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑎𝑓𝑡𝑒𝑟 𝑡ℎ𝑒 𝑚𝑜𝑑𝑎𝑙 𝑐𝑙𝑎𝑠𝑠
Using the FDT from previous example:

The modal
is the
highest
frequency

𝑓𝑀𝑜 − 𝑓𝑏
𝑀𝑜𝐺 = 𝐿𝐶𝐵𝑀𝑜 + 𝑐
2𝑓𝑀𝑜 − 𝑓𝑏 − 𝑓𝑎
35 −21 14
𝑀𝑜𝐺 = 29.5 + 5 = 29.5 + 5
2 35 −21 −22 27

𝑀𝑜𝐺 = 32.09
Graphical Presentation of the given data
Histogram It is a bar diagram where the bars are adjacent, and the
base extends from the lower true class boundary to upper true class
boundary. The height of the bar represents the number of cases within the
interval. The class boundaries are marked off along the horizontal axis and
the scale of frequency is shown on the vertical axis.
A curve drawn over the figure approximates the trend (Symmetric or normal
or bell-shaped) of the distribution of the data.
Measures of Dispersion
 A quantitative measure that describes the extent to
which the data are dispersed are generally known as
measure of dispersion.
 It is single value that describes how widely dispersed or
spread the data are.
a. Variance
 Variance is the mean of the squared differences of the
observations from their mean and is denoted by s2.
b. Standard Deviation
 Standard Deviation is the positive square root of the
variance, denoted by s .
 For Ungrouped Data:
Ungrouped Variance: 𝑠 2
2 σ𝑁
𝑖=1(𝑥𝑖 − 𝜇)
2 σ𝑁
𝑖=1 𝑥𝑖
2
𝑠 = = − 𝜇2
𝑁 𝑁
where 𝑥𝑖 = 𝑖𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
N = total number of observations
µ = mean of the ungrouped data
Standard Deviation: s
𝑠= 𝑠2
Example:
Consider the previous example, compute the ungrouped
variance and standard deviation. Using the table:
Solution:
a. Grouped variance:
σN
i=1 XI
2
s2 = − μ2
n
152 +162 +172 +⋯+482
= − 332
120
137782
= − 1089
120
= 1148.18 − 1089
s2 = 59.18 f.a.
b. Grouped standard Deviation
s = s2 = 59.18
s = 7.69 ≈ 7.7 𝑓. 𝑎.
For Grouped Data:
a. Variance: 𝑠 2
2
σ𝑘
𝑖=1 𝑓𝑖 (𝑋𝑚 −𝑋) σ𝑘 𝑓 𝑋 2 2
𝑠2 = = 𝑖=1 𝑖 𝑚
− 𝑋
𝑁 σ𝑘
𝑖=1 𝑓𝑖

where 𝑓𝑖 = 𝑖𝑡ℎ 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦


𝑋𝑚 = 𝑐𝑙𝑎𝑠𝑠𝑑 𝑚𝑎𝑟𝑘
𝑋 = 𝑚𝑒𝑎𝑛 𝑜𝑓 𝑡ℎ𝑒 𝑔𝑟𝑜𝑢𝑝𝑒𝑑 𝑑𝑎𝑡𝑎
σ𝑘𝑖=1 𝑓𝑖 = 𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 (𝑁)
b. Standard Deviation: s
𝑠= 𝑠2
Example:
Consider the following table to compute the variance and standard
deviation.
Class 𝒇𝒊 𝑿𝒎 𝒇 𝒊 𝑿𝒎 𝟐
15-19 7 17 2023
20-24 12 22 5808
25-29 21 27 15309
30-34 35 32 35840
35-39 22 37 30118
40-44 14 42 24696
45-49 9 47 19881
N= 𝟕

120 ෍ 𝒇𝒊 𝑿𝒎 𝟐 = 𝟏𝟑𝟑𝟔𝟕𝟓
𝒊=𝟏

133675
 Thus, Variance: 𝑠 2 = 120
− (32.4)2 = 64.2 f.a.

 And Standard Deviation: 𝑠 = 𝑠2 = 64.2 = 8.0 f.a.


 A value within the range of set of data which
describes its location or position relative to
the entire set of data.
 Percentile
 Decile
 Quartile
1. Percentile are numerical quantities which divide the array of data
into 100 equal parts. The 𝑗𝑡ℎ percentile, 𝑃𝑗 , is the number that
separates the bottom j% of the data from the top (100 – j)%.
 Percentile are the best applied to data that are sufficiently large
(N = 100).
 Finding the 𝑗 𝑡ℎ Percentile (𝑃𝑗 ):
1) The observations are ranked in an increasing magnitude.
𝑗
2) Evaluate 𝑁 ∗ where N is the total number of observations.
100
𝑗
3) If 𝑁 ∗ is a whole number, take 𝑃𝑗 as the average of the
100
𝑗 𝑗
values located in 𝑁 ∗ position and 𝑁 ∗ + 1 position.
100 100
Otherwise, take 𝑃𝑗 as the value corresponding the next higher
𝑗
whole number to 𝑁 ∗ in the array.
100
𝑗
4) If 𝑁 ∗ is fractional, take the next higher position.
100
Example 1:
Consider the scores of 120 BSCS students in the
Probability and Statistics post test, find the 20𝑡ℎ percentile
(𝑃20 ).
Solution:
First, arrange the observation in increasing order, then
𝑗
evaluate 𝑁 ∗ .
100
𝑗 20
𝑁∗ = 120 = 24
100 100
𝑗
Since 𝑁 ∗ = 24 is a whole number, we take 𝑃20 as the
100
average of the values located in the 24𝑡ℎ position and 25𝑡ℎ
Position.
26+26
Therefore, 𝑃20 = = 26
2
Example 2:

Consider the scores of 120 BSCS students in the


Probability and Statistics post test, find the 13𝑡ℎ percentile
(𝑃13 ).
Solution:
𝑗 13
𝑁∗ = 120 𝑥 = 15.6𝑡ℎ , take the next higher
100 100
position which is the 16𝑡ℎ position.
Thus, 𝑃13 is located in the 16𝑡ℎ position.
Therefore,
𝑃13 = 24
2. Deciles are numerical quantities which divide the array
of data into ten (10) equal parts.
 Finding the 𝑖𝑡ℎ Decile (𝐷𝑖 ):
 The 1𝑠𝑡 decile is the 10𝑡ℎ percentile, the 2𝑛𝑑 decile is
the 20𝑡ℎ percentile, … ,the 5𝑡ℎ decile is the 50𝑡ℎ
percentile (also the median)… and the 10𝑡ℎ decile is
the 100𝑡ℎ percentile.
 Example:

Consider the scores of 120 BSCS students in the


Probability and Statistics post test. Find the 6𝑡ℎ
Decile (𝐷6 ).
𝑗
Take note that 𝐷6 = 𝑃60 thus 𝑁 𝑥 100
will be evaluated:
𝑗 60
𝑁𝑥 = 120 𝑥 = 72
100 100
𝑗
Since 𝑁 𝑥 100 = 72 is a whole number, we take
𝑃60 𝑎𝑠 𝑡ℎ𝑒 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑎𝑡 𝑡ℎ𝑒 𝑣𝑎𝑙𝑢𝑒𝑠 𝑙𝑜𝑐𝑎𝑡𝑒𝑑 𝑖𝑛 𝑡ℎ𝑒 60𝑡ℎ 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 𝑎𝑛𝑑 61𝑠𝑡 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛.
33+33
𝑃60 = = 33
2
𝐷6 = 33

c. Quartiles are numerical quantities which divide the array of data into four (4) equal parts.
Finding the 𝑘 𝑡ℎ Quartile (𝑄𝑘 );
The 1𝑠𝑡 quartile is the 25𝑡ℎ percentile, 2𝑛𝑑 quartile is the 50𝑡ℎ percentile (also the 5𝑡ℎ decile and the
𝑟𝑑 𝑡ℎ 𝑡ℎ 𝑡ℎ
median), the 3 quartile is the 75 percentile, and the 4 quartile is the 100 percentile.
Example:
Find the 3𝑟𝑑 Quartile (𝑄3 ).
Solution:
𝑗
Take note that 𝑄3 = 𝑃75 thus 𝑁 𝑥 will be evaluated.
100
𝑗 75
𝑁𝑥 = 120 𝑥 = 90
100 100
𝑗
Since 𝑁 𝑥 = 90 is a whole number, we take 𝑃75 as
100
the average of the values located in the 90𝑡ℎ position and
the 91𝑡ℎ position. Therefore,
39+39
𝑃75 = = 39
2
𝑄3 = 39
Grouped Data
1. Percentile
𝑗𝑁
𝑃𝑗 = 𝐿𝐶𝐵𝑃𝑗 + 100 − <𝐶𝐹𝑏 𝑐
𝑓 𝑃𝑗

where: 𝑃𝑗 = 𝑗𝑡ℎ 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒


𝐿𝐶𝐵𝑃𝑗 = 𝑙𝑜𝑤𝑒𝑟 𝑐𝑙𝑎𝑠𝑠 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦 𝑜𝑓 𝑗𝑡ℎ 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒
< 𝐶𝐹𝑏
= 𝑙𝑒𝑠𝑠 𝑡ℎ𝑎𝑛 𝑐𝑢𝑚𝑢𝑙𝑎𝑡𝑖𝑣𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑏𝑒𝑓𝑜𝑟 𝑡ℎ𝑒 𝑗𝑡ℎ 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒
𝑓𝑃𝑗 = 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑗𝑡ℎ 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒
c = class size
N = total number of observation.
𝑗𝑁
Location: the jth position
100
2.Decile
𝒊𝑵
− <𝑪𝑭𝒃
𝟏𝟎
𝑫𝒊 = 𝑳𝑪𝑩𝑫𝒊 + 𝒄
𝒇𝑫𝒊
where: 𝐷𝑖 = 𝑖𝑡ℎ 𝑑𝑒𝑐𝑖𝑙𝑒
𝐿𝐶𝐵𝐷𝑖 = 𝑙𝑜𝑤𝑒𝑟 𝑐𝑙𝑎𝑠𝑠 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦 𝑜𝑓 𝑖𝑡ℎ 𝑑𝑒𝑐𝑖𝑙𝑒
< 𝐶𝐹𝑏 = 𝑙𝑒𝑠𝑠 𝑡ℎ𝑎𝑛 𝑐𝑢𝑚𝑢𝑙𝑎𝑡𝑖𝑣𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑏𝑒𝑓𝑜𝑟 𝑡ℎ𝑒 𝑖𝑡ℎ 𝑑𝑒𝑐𝑖𝑙𝑒
𝑓𝐷𝑖 = 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑖𝑡ℎ 𝑑𝑒𝑐𝑖𝑙𝑒
𝒊𝑵
Location: the ith position
𝟏𝟎

3. Quartile
𝒌𝑵
𝟒
− <𝑪𝑭𝒃
𝑸𝒌 = 𝑳𝑪𝑩𝑸𝒌 + 𝒄
𝒇𝑸𝒌
where: 𝑄𝑘 = 𝑘𝑡ℎ 𝑑𝑒𝑐𝑖𝑙𝑒
𝐿𝐶𝐵𝑄𝑘 = 𝑙𝑜𝑤𝑒𝑟 𝑐𝑙𝑎𝑠𝑠 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦 𝑜𝑓 𝑘𝑡ℎ 𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒
< 𝐶𝐹𝑏 = 𝑙𝑒𝑠𝑠 𝑡ℎ𝑎𝑛 𝑐𝑢𝑚𝑢𝑙𝑎𝑡𝑖𝑣𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑏𝑒𝑓𝑜𝑟 𝑡ℎ𝑒 𝑘𝑡ℎ 𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒
𝑓𝑄𝑘 = 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑘𝑡ℎ 𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒
𝒌𝑵
Location: the kth position
𝟒
Example:

Consider the scores of 120 BSCS students in the


Probability and Statistics post test, find the following of the
grouped data:
a) 20𝑡ℎ percentile (𝑃20 )
b) 6th decile
b) 3rd quartile
Solution:
𝑗𝑁
a) 𝑃20 Location: 𝑃𝑗 =
100
20(120)
𝑃20 = = 24𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
100
< 𝐶𝐹𝑏 = 𝑜𝑛

Assumed
𝑷𝟐𝟎

𝑗𝑁
𝑡ℎ − <𝐶𝐹𝑏
24 𝑝𝑜𝑠𝑖𝑡𝑖on: 𝑃𝑗 = 𝐿𝐶𝐵𝑃𝑗 + 100
𝑓𝑃
𝑐
𝑗

24 −19
𝑃20 = 24.5 + 5 = 24.5 + 0.2381(5)
21

= 24.5 + 1.19
𝑃20 = 25.69
𝑗𝑁
𝑃𝑗 = 𝐿𝐶𝐵𝑃𝑗 + 100 − <𝐶𝐹𝑏 𝑐
𝑓𝑃𝑗

24 −19
𝑃20 = 24.5 + [ ](5)
21

b) 𝐷6
𝒊𝑵 𝟔(𝟏𝟐𝟎)
Location: = = 𝟕𝟐𝒏𝒅 𝒐𝒃𝒔𝒆𝒓𝒗𝒂𝒕𝒊𝒐𝒏
𝟏𝟎 𝟏𝟎
Type equation here.

ASSUMED
6TH DECILE

𝒊𝑵
− < 𝑪𝑭𝒃
𝑫𝒊 = 𝑳𝑪𝑩𝑫𝒊 + 𝟏𝟎 𝒄
𝒇𝑫𝒊
6(120)
𝐷6 = = 72𝑛𝑑 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
10
72 − 40
𝐷6 = 29.5 + (5)
35
= 29.5 + 4.57
𝐷6 = 34.07
c) 3rd Quartile

Assumed
3rd
quartile

𝒌𝑵 3(120)
Location: = = 90𝑡ℎ
𝟒 4
𝒌𝑵
− <𝑪𝑭𝒃
𝟒
𝑸𝒌 = 𝑳𝑪𝑩𝑸𝒌 + 𝒄
𝒇𝑸 𝒌
𝟗𝟎 − 𝟕𝟓
𝑸𝟑 = 𝟑𝟒. 𝟓 + 𝟓
𝟐𝟐
= 34.5 + 3.41
𝑸𝟑 = 37.91
Chebyshev’s Theorem
 Chebyshev (1821–1894) was a Russian mathematician who
primarily worked on the theory of prime numbers and a wide
range of subjects.
 One of those subjects was probability and his theorem applies to
any data set, not only normally distributed data sets.
 His theorem states that the portion of any set of data within k
standard deviations of the mean is always at least 1-1/k2, where k
is any number greater than 1.
 For any set of data (population or sample) and any constant k > 1,
the proportion of the data that must lie within k standard
deviations on either side of the mean is at least.
 Chebyshev’s rule; for any data ser with mean, µ and standard
deviation 
1. At least 75% of the observations are within 2 of its mean µ.
2. At least 88.9% of the 3 observations are within 3 its mean.
Emperical Rule
 For a normally distributed data set with mean (µ) and standard
deviation (s) the empirical rule states that:
𝟏
𝟏− 𝒌𝟐
𝟏 1 1 3
 For k = 2, then 𝟏 − 𝟐 = 1 − 2 = 1 − = , which is
𝒌 2 4 4
at least 75% of the data must always be within two
(2) standard deviations of the mean.
𝟏 1 1 8
 For k = 3, then 𝟏 − =1 − =1 −
= which is ,
𝒌𝟐 32 9 9
at least 89% of the data must always be within three
(3) standard deviations of the mean.
Measure of Skewness
 A measure of skewness describes the extent of deviation of the
data distribution from symmetry.
 It is measured by the coefficient of skewness, denoted by SK.
 Measured by coefficient of skewness, denoted by SK and is defined
as:
3 (𝑀𝑒𝑎𝑛 −𝑀𝑒𝑑𝑖𝑎𝑛)
𝑆𝐾 =
𝜎

For symmetric distributions, SK = 0


 If the distribution of the data is symmetric, SK is zero.
 If the distribution is not symmetric and has a “tail” on the right, SK is
positive and the distribution is said to be positively skewed.
 If the distribution is not symmetric and has a “tail” on the left, SK is
negative and the distribution is said to be negatively skewed.
Example:
Determine the coefficient of skewness of 120 BSCS
students in the Probability and Statistics post test.
Solution;
For Ungrouped Data:
3(𝜇−𝑀𝑑) 3(33−33)
𝑆𝑘 = = =0
𝜎 7.7
The value of SK = 0 indicates that the distribution of
the ungrouped data follows a normal distribution.

For Grouped Data:


3 (𝑋 −𝑀𝑑) 3(32.46 −32.35)
𝑆𝐾 = = = 0.01375
𝜎 8
SK is positive which indicates that the distribution is
said to be positively skewed.
Measure of Kurtosis
 If the distribution of the data is bell- shaped, K = 0.
 If the shape of the distribution is relatively peaked, K > 0.
 If the shape is relatively flat, K < 0.
 Measures degree of peakedness of a distribution.
 Measured by the coefficient of kurtosis, denoted by K and is defined as:

For ungrouped data :


σ𝑁
𝑖=1(𝑥𝑖 − 𝑋)
4
𝐾= −3
𝑁𝜎 4
For Grouped Data:
4
σ𝑘
𝑖=1 𝑓𝑖 (𝑥𝑖 − 𝑋)
𝐾= -3
𝑁 𝜎𝐺 4
Example:

of 120 BSCS students in the Probability and


Consider the
Statistics post test.
Solution:
For Ungrouped Data:
σ𝑁
𝑖=1(𝑥𝑖 − 𝑋)
4
𝐾= −3
𝑁𝜎 4
(15−33)4 +(16−33)4 +(17−334 + ⋯+(48−33)4
𝐾= −3
(120)(7.7)4
1078582
𝐾= − 3 = 2.5 − 3
4218365

K = - 0.4
Therefore, the distribution of the ungrouped data is relatively flat K < 0.
Example:
Consider the ff. grouped data, determine the coefficient of Kurtosis.

σ𝑁 4
𝑖=1 𝑓𝑖 (𝑥𝑖 − 𝑋) = 1089648.70
Given the following quantities: N = 120;  = 8
Then,
4
σ𝑘
𝑖=1 𝑓𝑖 (𝑥𝑖 −𝑋) 1089648.70
𝐾= −3= −3
𝑁𝜎4 120 8 4

= 2.21 − 3
K = - 0.79
Therefore, the distribution of the ungrouped data is relatively flat since K
< 0.
Comparison of Mean, Median, and Mode
 A bell-shaped curve (symmetric or normal curve) is
generated when the mean, median and mode
coincide.
 However, the mean, median, and mode are affected
by what is called skewness (i.e., lack of symmetry) in
the data.
 Symmetric or Normal Curve
Figure above shows a normal curve, a negatively skewed curve, and
a positively skewed curve:
Take note that when a variable is normally distributed, the mean,
median, and mode are the same number.
When the variable is skewed to the left (i.e., negatively skewed),
the mean shifts to the left the most, the median shifts to the left the
second most, and the mode the least affected by the presence of
skew in the data.
An example of a negatively skewed graph would be the graphs of the
scores for a test that was too easy for the students.
Seat work:
The following entrance test of 100 freshmen students
in ABC College.
References:
1. Parreño, Elizabeth B., Jimenez, Ronel O., “Basic Statistics”, 2006
ed, C & E Publishing, ISBN 971-584-474-X.
2. Arao, Rosario A, et.al., “Statistics (based on CMO 03 Series 2007)”,1st
ed. Rex Book Store, ISBN 978-971-23-5682-1.
3. Weiers, Ronald M., “Introduction to Business Statistis”, 7th ed. ,
Philippine edition, 2014 Cengage Learning Asia Pte. Ltd., ISBN-13:
978-981-4624-14-5, ISBN-10: 981-4624-14-4
4. Moore, David S., et.al.,”The Practice of Statistics for Business and
Economics” 4th ed. , W.H. Freeman and Company A Macmillan
Education Imprint, © 2016, 2011, 2009, 2003 by W. H. Freeman and
Company ISBN-13: 978-1-4641-2564-5, ISBN-10: 1-4641-2564-3
5. Anderson, David R., et al., “STATISTICS FOR BUSINESS AND
ECONOMICS,” 11ed, © 2011, 2008 South-Western, Cengage Learning,
6. Wegner, Trevor, ”Applied Business Statistics Methods and Excel-based
Applications,” 3ed. First published 2013, © Juta and Company Ltd,
2013, ISBN: 978 0 7021 9709 3 (Web PDF)

You might also like