You are on page 1of 19

HANDOUTS IN MATHED 2

INTRODUCTION
Definitions of Statistics:
A branch of science which deals with the collection, presentation, analysis, and
interpretation of data.
Recorded data such as the number of business permits issued, number of customers
eating at a restaurant, the size of enrollment at USLS, and so on.
Numerical characteristics calculated for a set of data (e.g., mean, median, mode)
The backbone of Research
Two Branches of Statistics
1. Descriptive Statistics
- deals with organizing and summarizing observations so that they are easier to
comprehend
- used to describe the basic features of the data in a study
- provide simple summaries about the sample and the measures
2. Inferential Statistics
- deals with the formulation of inferences about conditions that exist in a population
from study of a sample drawn from a population.
- make inferences from the data to more general conditions
The Research Process:
Why do research?
Formulate the problem
S pecific
M - easurable
A attainable
R ealistic
T ime bound
Define the population of the study
o Population all subjects under investigation
the set of all elements of interest in a particular study
o Sample
a subset of the population
Identify the variable/s of the study
o Variable measurable characteristic of the subject
any entity that can take on different values
Example:
Problem: What is the average weekly allowance of a USLS BMath 2 student for the first semester
of AY 2012 2013?
Population of study:
All USLS BMath 2 students for the first semester, AY 2012 - 2013
Variable/s:

weekly allowance of a BMath 2 student


:
:
(Anticipated) Conclusion:
The average weekly allowance of a USLS BMath 2 student for the first semester of
AY 2012-2013 is ________.
1.

2.

Types of Variables:
Qualitative/Categorical
Attributes are in terms of categories
Examples:
a. sex:
Male /
Female
b. religious affiliation:
Roman Catholic / INC /
Quantitative/Numerical
Attributes are in terms of counts or measurements

Baptist / Islam / etc

Distinctions:
a. Discrete Variable

uses the process of counting to generate data

values of attributes are in terms of whole numbers only


Examples:
a. Number of students
b. Number of cars
b. Continuous Variable

uses the process of measuring to generate data

values of attributes may have fractional or decimal parts


Examples:
a. Weight of a package
b. Volume of water

Functions of variables:
Important if the investigation is about cause and effect
Distinctions:
a. Independent Variable
what the researcher (or nature) manipulates -- a treatment or program or cause
b. Dependent Variable
what is affected by the independent variable -- the effects or outcomes
Example:
Study/Problem: the effects of a new educational program on student achievement
Independent variable - the program
Dependent variables - measures of achievement
Defn: Measurement The process of assigning numbers to observations
Levels of Measurement
1. Nominal Level
Consists of numbers which indicate categories for purely classification or identification
purposes
The categories are mutually exclusive (the observations cannot fall into more than one
category)
The categories are exhaustive (there must be enough categories for all the
observations)
Examples: gender, religious affiliation, citizenship
2. Ordinal Level
Possesses rank order characteristics
the categories must still be mutually exclusive and exhaustive, but they also indicate
the order of magnitude of some variable
Examples: military rank, size of T-shirts (small, medium, large)
3. Interval Level
Has all the properties of the ordinal scale
A given interval (distance) between scores has the same meaning anywhere on the
scale
Intervals provide information about how much better one value is compared with
another
Has no absolute zero
Examples: temperature measured on Celsius or Fahrenheit, test scores
4. Ratio Level
Possesses all the characteristics of the interval scale
Has a true or absolute zero point
The ratio of two values is meaningful
Examples: distance, height, weight, time, cost of an automobile
EXERCISES
1.

Indicate whether each of the following examples refers to a population or to a sample.

a.
b.
c.
d.

A group of 25 customers selected to taste a new soft drink


Salaries of all CEOs in the pharmaceutical industry
Customer satisfaction ratings of all clients of a local bank
Monthly phone expenses of selected Globe subscribers
2.
Indicate whether the following are qualitative (QL), quantitative discrete
(QD) or quantitative continuous (QC) variables.
a. Brand of jeans you prefer
b. Ratio of current assets to current liabilities
c. Number of text messages received per day
d. Rating of the management skills of a company president
e. Number of banks in the municipalities and cities of Negros Occidental
f. Ranking of professional tennis players
g. Scores of freshmen college students on an attitude towards math scale
h. Time required to complete a Sudoku puzzle
i. Effectiveness of a drug for headache, measured in minutes
j.
Earnings per share
k.
Age
l.
Number of leaves
m.
Weekly allowance
n.
Distance of the students house from school
o.
Color of the hair
p.
Zip code
q.
Number of sacks of rice
3. Identify the level of measurement of the following variables.
a. Age
f. Favorite TV show
b. Place of birth
g. Shoe size
c. Number of children in the family
h. High school GPA
d. Grade in Math 1
i. Family monthly income
e. Height (in cm.)
j. Travel time (in minutes) from USLS
to residence
4. A researcher measures two individuals and the uses the resulting scores to make a
statement comparing two individuals. For each of the following statements, identify the
scale of measurement (nominal, ordinal, interval, ratio) that the researcher used.
a. I can only say that the two individuals are different.
b. I can say that one individual scored 6 points higher than the other.
c. I can say that one individual scored higher than the other, but I cannot specify how
much higher.
d. I can say that the score for one individual is twice as large as the score for the other
individual.
5. A firm is interested in testing the advertising effectiveness of a new television commercial.
As part of the test, the commercial is shown on a 6:30 PM local news program in Bacolod
City. Two days later, a market research firm conducts a telephone survey to obtain
information on recall rates (percentage of viewers who recall seeing the commercial) and
impressions of the commercial.
a. What is the population for this study?
b. What is the sample for this study?
c. Why would a sample be used in this situation? Explain.
SAMPLING TECHNIQUES
Defn: Sampling the process of selecting the subjects of the population to be included in the
sample
Types of Sampling:
A. Probability sampling
each element of the population is given a chance of being included in the sample
minimizes, if not eliminates, selection bias
1.

Simple Random
Each element of the population is given an equal chance of being included in the sample
Most basic probability sampling procedure
Foundation of all probability sampling procedures

When to use:
The population is homogeneous
A sampling frame is available
Procedure:
Lottery
Use of random number generators
2.
Systematic Random

Selecting every kth element of the population


When to use:
When the population is homogenous and there is no suspicion of a
trend or pattern in the frame or geographical layout
A sampling frame is available
Procedure:
i.
Determine the sampling interval, k
ii.
Identify the random start: 1 rs k
iii.
Determine the number of the elements to be included in the sample:
rs, rs + k, rs + 2k,

3.

Stratified Random
selecting random samples from mutually exclusive subpopulations, or strata, of the population.
When to use:
When the population is heterogeneous but can be subdivided into
homogeneous subgroups or strata
A sampling frame is available for each stratum
Procedure:
i.
Determine the proportion of each stratum relative to the population
ii.
Identify the stratum sample sizes using proportional allocation
iii.
Select the samples from each stratum using either simple or
systematic random sampling
Example: Among the 250 employees of the local office of an international insurance
company, 182 are Filipinos, 51 are Chinese, and 17 are Americans. If we use
proportional allocation to select a stratified random grievance committee of 15
employees, how many employees must we take from each race?
Solution:
Race (i)

Ni

Filipino

182

Chinese

51

American

17

Total
4.

250

ni

100

15

Cluster Random
Selecting clusters of elements rather than individual elements
When to use:
when "natural" groupings are evident in a statistical population
a sampling frame is not available
Procedure:
i.
Divide the population into clusters (M =total number of clusters)
ii.
Randomly select m clusters
iii.
Include all elements within the selected clusters to form the resulting
sample
5. Multi-stage random sampling
Repeated cluster sampling
B. Non-probability sampling
not all elements of the population are given a chance of being included in the sample
prone to selection bias

1.

Convenience / Voluntary /Haphazard/Accidental


Sample elements are selected because they are available

2.
3.

Judgmental/Purposive
The researcher selects the sample based on his judgment as to who best fit
the established criteria
Quota
Selecting sample elements nonrandomly according to some fixed quota
4. Snowball
Especially useful when you are trying to reach populations that are inaccessible or hard to find
DATA COLLECTION PROCEDURES
1. Interview
There is interaction between interviewer and respondent
Most important method of data collection
Some advantages:
o Clarifications about ambiguous questions/answers can be made
o More in-depth information can be generated
Some disadvantages:
o Time-consuming
o Costly
o Responses may be influenced by the interviewer

o
o
o
o

o
o
o
o
o

2. Questionnaire
No interaction between facilitator and respondent about the subject matter
Respondent personally answers the questions on survey forms
Some advantages:
Less costly
Less time- consuming
Responses are not influenced by the interviewer
Respondents answer the questions with relative anonymity; may answer moretruthfully
Some disadvantages:
Not effective if the respondent is illiterate
Clarifications about vague questions cannot be made
Respondents may misinterpret the questions
Intended respondents may not personally answer the forms; may request other people to
respond
Low rate of returns
3. Experimentation

a controlled study in which the researcher attempts to understand cause-and-effect relationships


The study is "controlled" in the sense that the researcher controls
(1) how subjects are assigned to groups and
(2) which treatments each group receives.

4. Observation
Like experiments, observational studies attempt to understand cause-and-effect relationships
Unlike experiments, the researcher is not able to control (1) how subjects are assigned to groups and/or (2)

which treatments each group receives.


Also used for behavioral, attitudinal studies

Web references:

1.
2.
3.
4.
5.
6.

stattrek.com/statistics/data-collection-methods.aspx
people.uwec.edu/.../researchmethods/data%20collection%20methods/...
www.fao.org/DOCREP/003/X2465E/x2465e09.htm
www.uk.sagepub.com/resources/oleary2/ch6.ppt - United Kingdom
http://www.youtube.com/watch?v=Hyh91AC_tAM
http://www.youtube.com/watch?feature=endscreen&NR=1&v=Qd8uBusuzks
ORGANIZATION AND PRESENTATION OF DATA

SUMMARIZING QUALITATIVE DATA

Frequency Distribution - A tabular summary of data showing the number (frequency) of items
in each of several non-overlapping classes.
Example: The following data were obtained from a sample of 50 soft drink purchases. Construct
a frequency distribution to summarize the data.
Coke
Coke Zero
Pepsi
Pepsi Max
Pepsi Max
Sprite
Mountain Dew
Mountain Dew
Coke
Coke

Coke Zero
Coke Zero
Coke Zero
Sprite
Coke
Coke
Coke
Pepsi
Pepsi
Pepsi

Pepsi Max
Sprite
Pepsi Max
Sprite
Coke
Coke
Pepsi Max
Pepsi Max
Coke
Coke

Pepsi
Coke
Coke Zero
Coke Zero
Pepsi
Mountain Dew
Coke
Mountain Dew
Pepsi Max
Sprite

Pepsi
Coke
Pepsi Max
Pepsi Max
Coke
Mountain Dew
Pepsi
Pepsi Max
Sprite
Mountain Dew

Table 1. Frequency Distribution of Soft Drink Purchases


Soft Drink
Coke
Coke Zero
Pepsi
Pepsi Max
Sprite
Mountain Dew
Total (n)

Frequency
(f)

50

Relative Frequency the fraction or proportion of items belonging to a class:

rf = f / n

Percent = relative frequency x 100


Table 2. Relative Frequency and Percent Distribution of Soft Drink Purchases
Soft Drink
Relative
Percent
Frequency
Coke
Coke Zero
Pepsi
Pepsi Max
Sprite
Mountain Dew
Total
Graphical presentations of qualitative data:
1. Bar graph A graphical device for depicting qualitative data that have been summarized in a
frequency, relative frequency, or percent distribution

2. Pie chart A graphical device for presenting data summaries based on subdivision of a circle
into sectors that correspond to the relative frequency for each class

USING EXCEL: Watch Excel Statistics 15: Category Frequency Distribution w Pivot Table & Pie Chart by
ExcellsFun at http://www.youtube.com/watch?v=-ERARVSfeuw
SUMMARIZING QUANTITATIVE DATA
Constructing a Frequency Distribution for Quantitative Data
1. Determine the number of non-overlapping classes.
use between 5 to 20 classes.
use enough classes to show the variation in the data, but not so many that some contain only a
few items.
2. Determine the width of each class (also called interval size).
Class width (i)= range / no. of classes
Range = highest value lowest value
3. Determine the class limits.
Lower class limit identifies the smallest possible data value assigned to the class
Upper class limit identifies the largest possible data value assigned to the class
4.

Count the number of data values belonging to each class.


Example: These data show the time in days required to complete year-end audits for a sample of
30 clients of a small accounting firm. Develop a frequency distribution for the data.
12
15
20
22
14

14
15
27
21
18

19
18
22
33
16

18
17
23
28
13

16
21
15
14
27

30
31
25
22
18

Steps in Constructing a Frequency Distribution:


Step 1: Number of classes = 6
Step 2: Range = ________
Class width = range / no. of classes = __________
Step 3:
Lower class limit of first interval = lowest value in the data set = _______
Lower class limit of second interval = lower class limit of 1 st interval + class width
___________
What is the upper class limit of the first interval?
Table 4. Frequency Distribution of Audit Times
Audit Time (in days)
Tally
Frequency

Total
In two to three sentences, describe how the audit time data is distributed.
__________________________________________________________________________________________________
__________________________________________________________________________________________________
__________________________________________________________________________________________________
Other Components of a Frequency Distribution
Class Boundaries - the true or real limits of an interval
the specific points that serve to separate adjoining classes along a measurement scale for
continuous variables
can be determined by identifying the points that are halfway between the upper and lower stated
class limits, respectively, of adjoining classes
1. Class Marks or Class Midpoints the value halfway between the lower and upper class
limits
2. Relative frequencies obtained by dividing the class frequency by the total frequency
3. Percentages obtained by multiplying the relative frequencies by 100%
4. Cumulative frequencies the number of data items with values less than or equal to the
upper class limit of each class; obtained by summing the frequencies
5. Cumulative percentages obtained by dividing the cumulative frequencies by the total
number of cases and then multiplying the result by 100. Cumulative percentages provide
information on the percentage of values less than or equal to a specified value.
Example: Using the audit time data, complete the following table.
Frequenc
y
Audit Time

Class
Boundari
es

Class
Marks

Relative
Frequenc
y

Percenta
ge

Cumulati
ve
Frequenc
y

Cumulati
ve
Percenta
ge

Graphical Representations of Quantitative Frequency Distributions:

1. Histogram A graph consisting of a series of vertical columns or rectangles with no gaps


between bars
each bar is drawn with a base equal to the class boundaries and a height corresponding to the
class frequency
a suitable graph for representing data obtained from continuous variables.
2. Frequency Polygon Constructed by plotting class marks (X) against class frequencies (Y)
and connecting the consecutive points by straight lines
to close the frequency polygon, an additional class interval is added to both ends of the
distribution, each with zero frequency.
3. Ogive A graph of a cumulative frequency distribution plotting the upper class boundaries
(X) against the cumulative frequencies (Y)
the lower end of the graph is connected to the X-axis by adding another interval.

USING EXCEL: Watch the following videos:


A. by DannyRocksExcels:
1. Two Ways to Create a Frequency Distribution Report in Excel , http://www.youtube.com/watch?
v=nh5ObAKfj1o&feature=fvsr

B. by ExcellsFun:
1. Excel Statistics 20: P1 Quantitative Freq. Dist. w Formulas, http://www.youtube.com/watch?v=ERARVSfeuw
2. Excel Statistics 21: P2 Quantitative Freq. Dist. w Formulas, http://www.youtube.com/watch?
v=vCUMqHKwFn8&feature=BFa&list=ULx8ePdM9LquM
2. Excel Statistics 22: Histogram & Ogive Charts & % Cumulative Frequency,
http://www.youtube.com/watch?v=x8ePdM9LquM&feature=BFa&list=ULvCUMqHKwFn8

Stem and Leaf Plots


a type of graph that is similar to a histogram but shows more information.
summarizes the shape of a set of data (the distribution) and provides extra detail
regarding individual values.
the data are arranged by place value:
o Stems - the digits in the largest place
o Leaves - the digits in the smallest place
Example: The following data are the result of a 150-question aptitude test given to 50
individuals who were interviewed for a position at a manufacturing company.
112
73
126
82
92
115
95
84
68
100

72
92
128
104
108
76
141
119
98
85

69
76
118
132
96
91
81
113
115
94

97
86
127
134
100
102
80
98
106
106

10
73
124
83
92
81
106
75
95
119

Procedure:
1. Arrange the leading digits of each data value to the left of a vertical line.
2. To the right of the vertical line, record the last digit for each data value corresponding to its
first digit.
3. Sort the digits on each line in rank order in order to obtain a stem-and-leaf display.
Stem and Leaf Plot
6
7
8
9
10
11
12
13
14

Shapes of Distributions
1.
2.

Symmetric the shape of the left side of the distribution is a mirror image of the right side
Skewed the two sides of the distribution are not mirror images of each other

10

a. Positively skewed (skewed to the right) scores tend to cluster toward the lower end
of the scale (i.e., the smaller numbers) with increasingly fewer scores at the upper end
of the scale (the larger numbers)
b. Negatively skewed (skewed to the left) most of the scores tend to occur toward the
upper end of the scale while increasingly fewer score occur toward the lower end
EXERCISES
1. Maris Steakhouse uses a questionnaire to ask customers how they rate the server, food
quality, cocktails, prices, and atmosphere at the restaurant. Each characteristic is rated
on a scale of outstanding (O), very good (V), good (G), average (A), and poor (P).
Construct a frequency distribution, bar graph, and pie chart to summarize the following
data collected on food quality. What is your feeling about the food quality ratings at the
restaurant?
G
O
V
G
A
O
V
O
V
G
O
V
A
V
O
P
V
O
G
A
O
O
O
G
O
V
V
A
G
O
V
P
V
O
O
G
O
O
V
O
G
A
O
V
O
O
G
V
A
G
2. The following are the final examination test scores of 50 statistics students.
68
55
65
42
64

45
56
59
56
42

a.
b.
c.
d.

38
50
37
42
53

52
54
57
49
63

54
38
46
49
33

43
40
29
43
60

69
54
64
41
63

44
55
58
55
41

52
51
53
49
48

64
55
37
47
50

Construct a frequency distribution using 7 classes.


Develop a histogram and an ogive for the frequency distribution you constructed.
Make a stem-and-leaf plot for the above data set.
What do these descriptive statistics tell you about the performance of the students
in the exam?

3. The following data are the scores of 50 individuals who answered a 150-item aptitude test
as a requirement for a job application.
112
73
126
82
92
115
95

107
73
124
83
92
81
106

97
86
127
134
100
102
80

69
76
118
132
96
91
81

72
92
128
104
108
76
141

100
119
106
94
85
68
95

115
98
84
75
98
113
119

106

a. Construct a frequency distribution for this data set using 8 classes.


b. Construct a histogram and an ogive.
c. What can you say about the performance of the 50 job applicants who took the aptitude
test? Use the graphs to explain your answer.
4. The number of friend requests
6
14
22
17
25
13
0
13
9
7
14
17
a.
b.
c.
made.

confirmed during a week by 37 Facebook users were:


15
12
18
11
23
10
13
17
8
20
18
13
16
15
0
15
14
15
13
3
15
7
23
10
15

Present this set of data in the form of a frequency distribution. Use 7 classes.
Plot a frequency polygon of the distribution. What is the shape of the distribution?
In not more than 5 sentences, describe the frequency distribution and polygon that you
BASIC SUMMATION NOTATION
In Statistics, it is frequently necessary to work with sums of numerical values. We use the

symbol

(capital Greek letter sigma) to represent the sum of a set of numbers. Given a set of

n observations represented by 1 , as the first value,


, then the sum can be expressed as

X 2 as the second value, and so on up to X n

11
n

X
i 1

X1 X 2 K X n

When we are summing over all the values of


often omitted and we simply write

X i that are available, the limits of summation are

. In fact, some authors even drop the subscript and let

represent the sum of all available data.

x 7 , find
Example 1. If x1 3 , x 2 5 , and 3
x
x x 2 x3 3 + 5 + 7 = 15
a) i = 1
2

x
b) i =
c)

(x

2) 2

Example 2. Given

x1 2, x 2 3, x3 1, y1 4, y 2 2, and y 3 5 , evaluate

xy
a) i i
b)
c)

xi yi

x y
i

DATA ANALYSIS
Measure - a number that summarizes a particular characteristic of a given data set.
Parameter a measure of the population; usually represented by lowercase Greek letters
Statistic a measure of the sample; usually represented by lowercase letters of the English
alphabet
MEASURES FOR QUALITATIVE DATA
Summarized using the following measures:
proportions (relative frequencies)
percentages
Example: gender
coded as

M0
F1

Not appropriate to get the average gender


But: percentage of females in the group; proportion of males
MEASURES FOR QUANTITATIVE DATA

MEASURES OF CENTRAL TENDENCY


ARITHMETIC MEAN
(or simply, mean) is computed by summing all the observations in the sample and
dividing the sum by the number of observations.

Population Mean:

Sample Mean:

xi
N , where xi ith score or observation; N population size
xi
X
n , where xi ith score or observation; n sample size

Example 1: During a particular summer month, the eight salespeople in an appliance store sold
the following number of central air-conditioning units: 8, 11, 5, 14, 8, 11, 16, 11. Considering this
month as the statistical population of interest, the mean number of units sold is

12
i

Note: For reporting purposes, one generally reports the measures of location to one additional
digit beyond the original level of measurement.
WEIGHTED MEAN
also called weighted average
an arithmetic mean in which each value is weighted according to its importance in the
overall group
formulas for the population, and sample weighted means are identical:

w or

Xw

wX
w

each value in the group (X) is multiplied by the appropriate weight factor (w), and
the products are then summed and divided by the sum of the weights.
Example 2: In a multiproduct company, the profit margins for the companys four product lines
during the past fiscal
year were: line A, 4.2percent; line B, 5.5 percent; line C, 7.4 percent; and line D, 10.1 percent.
The unweighted mean profit margin is

x
N

However, unless the four products are equal in sales, this unweighted average is incorrect.
Assuming the sales totals in the following table, the weighted mean correctly describes the
overall average.
Product Line

Profit Margin, X
(%)

Sales, in Php
(w)

4.2

30,000,000

5.5

20,000,000

7.4

5,000,000

10.1

3,000,000

Total

Php58,000,000

wX
126,000,00
0
110,000,00
0
37,000,00
0
30,300,00
0
Php303,300,00
0

MEDIAN
the value of the middle item of an array (arrangement of the values in either ascending
or descending order)
If N or n is odd, the median is the middle value of the array
If N or n is even, the median is the mean of the two middle values.
When N or n is large, the following procedure is used:
N 1
n 1
or
2
o Find the position of the median value in the array : 2

Population Median:

Sample Median :

~ x N 1
2

~
x x n 1
2

Example 3: The eight salespeople described in Example 1 sold the following number of central
air-conditioning units,
in ascending order: 5, 8, 8, 11, 11, 11, 14, 16. The value of the median is

~ x n 1 x 4.5
2

13

Remark: The value of the median is between the fourth and fifth value in the ordered
group. Since both these
values equal 11 in this case, the median equals 11.0.
MODE
the observation that occurs most frequently; in a frequency polygon, the value
corresponding to the highest peak
not necessarily unique, unlike the mean and the median
o does not always exist; in a rectangular distribution where all the frequencies are
equal, there is no mode
o may have correspond to multiple values; there may be two or more scores with the
same highest frequency.
Unimodal the distribution has a single mode
Bimodal the distribution has two modes
Polymodal the distribution has multiple modes
Example 4: The eight salespeople described in Example 1 sold the following number of central
air-conditioning units: 8, 11, 5, 14, 8, 11, 16, and 11. The mode for this group of values is the
value with the greatest frequency, or
mode=
RELATIONSHIP BETWEEN THE MEAN AND THE MEDIAN

symmetrical distribution: mean = median = mode


positively skewed distribution: mean > median
negatively skewed distribution: mean < median
REMARK: The latter two relationships are always true, regardless of whether or not the
distribution is unimodal.

USE OF THE MEAN, MEDIAN, AND MODE

For representing population data:


o The Mode: indicates where most of the observed values, such as hourly wage rates in
a company, are located. It can be useful as a descriptive measure for a population
group, but only if there is one clear mode.
o

The Median: always an excellent measure by which to represent the typical level of
observed values, such as wage rates, in a population. This is true regardless of whether
there is more than one mode or whether the population distribution is skewed or
symmetrical. The lack of symmetry is no special problem because the median wage
rate, for example, is always the wage rate of the middle person when the wage rates
are listed in order of magnitude.

The Mean: also an excellent representative value for a population, but only if the
population is fairly symmetrical. For nonsymmetrical data, the extreme values (for
instance, a few very high wage rates for technical specialists) will serve to distort the
value of the mean as a representative value.

Thus, the median is generally the best measure of data location for describing
population data.

For representing sample data:


Recall: the purpose of statistical inference with sample data is to make generalizations about
the population from which the sample was selected.
o
o

The mode is not a good measure of location with respect to sample data because its
value can vary greatly from sample to sample.
The median is better than the mode because its value is more stable from sample to
sample.

However, the value of the mean is the most stable of the three measures.

Thus, for sample data, the best measure of location generally is the arithmetic mean.

14

EXERCISES
1. The following are scores of 50 high school students in a 150-item achievement test in
Mathematics.
112
73
126
82
92
a.
b.

107
73
124
83
92

97
86
127
134
100

69
76
118
132
96

72
92
128
104
108

115
95
84
68
100

81
106
75
95
119

102
80
98
106
106

91
81
113
115
94

76
141
119
98
85

Find the mean, median, and mode.


What is the shape of the distribution?
2. According to a survey, the average person spends 45 minutes a day listening to recorded
music. The following data were obtained for the number of minutes spent listening to
recorded music for a sample of 30 individuals.
88.3
0.0
85.4
29.1
4.4
52.9

4.3
99.2
0.0
28.8
67.9
145.6

4.6
34.9
17.5
0.0
94.2
70.4

7.0
81.7
45.0
98.9
7.6
65.1

9.2
0.0
53.3
64.5
56.6
63.6

a. Compute the mean. Do these data appear to be consistent with the average reported
by the newspaper? Explain your answer.
b. Compute the median. Between the mean and the median, which measure do you think
is more appropriate to use for this data set? Why?
3. During a 30-day period, the daily number of cars rented of a car rental company are as
follows:
7
5
9

10
5
10

6
7
4

7
8
7

9
4
5

4
6
9

7
9
8

9
7
9

9
12
5

8
7
7

a. Find the mean, median, and mode.


b. If the break-even point for the company is 8 cars per day, is the company doing well?
Explain.
4. Find the preferred measure of central location for the sample whose observations18, 10, 11,
98, 22, 15, 11, 25, and 17 represent the number of automobiles sold during this past month
by 9 different automobile agencies. Justify your choice.
5. For a sample of 15 students at an elementary-school snack bar, the following sales amounts
arranged in ascending order of magnitude are observed: Php10, 10, 25, 25, 27, 30, 33, 35,
40, 43, 45, 45, 50, 55, 60.
a. Determine the mean, median, and mode for these sales amounts.
b. How would you describe the distribution from the standpoint of skewness?
6. The following table shows the percentage of defective items in an assembly department.
Determine the overall percentage defective of all items assembled during the sampled week.
Shift
1
2
3

Percentage
defective
1.1
1.5
2.3

Number of Items,
in thousands
210
120
50

7. The average IQ of 10 students in a mathematics course is 114. If 9 of the students have IQs
of 101, 125, 118, 128, 106, 115, 99, 118, and 109, what must be the other IQ?

15

8. What is the average for a student who received grades of 85, 76, and 82 on 3 tests and a 79
on the final examination in a certain course if the final examination counts three times as
much as each of the 3 tests?
MEASURES OF NON-CENTRAL POSITION

describe or locate the position of certain noncentral pieces of data relative to the entire
set of data
often referred to as fractiles or quantiles
values below which a specific fraction or percentage of the observations in a given set
must fall

PERCENTILES
values that divide a set of observations into 100 equal parts
denoted by P1, P2, , P99, such that 1% of the data falls below P 1, 2% falls below P2, and 99%
falls below P99.
Steps in Finding Percentiles:
1. Rank the given data in increasing order of magnitude.
2. Find the position of the ith percentile:

i
n
100
, where k = the position of the ith percentile in the ordered data set;

i = the ith percentile


n = the number of observations in the data set
3. If k is a whole number, the ith percentile is the average of the kth observation and the
(k+1)th observation.
4. If k is a fractional value, the ith percentile is the (k+1)th observation.
Example: The following are the lives of 40 car batteries (in years).
1.6
1.9
2.2
2.5
2.6

2.6
2.9
3.0
3.0
3.1

31.
3.1
3.1
3.2
3.2

3.2
3.3
3.3
3.3
3.4

3.4
3.4
3.5
3.5
3.6

3.7
3.7
3.7
3.8
3.8

3.9
3.9
4.1
4.1
4.2

4.3
4.4
4.5
4.7
4.7

Find P85.
DECILES

values that divide a set of observations into 10 equal parts


denoted by D1, D2, , D9, are such that 10% of the data falls below D1, 20% falls below D2,
, and 90% falls below D9.
Deciles are found in exactly the same way that we found percentiles

Example: Use the data on car battery lives to find D 7.

QUARTILES

values that divide a set of observations into 4 equal parts


denoted by Q1, Q2, and Q3, are such that 25% of the data falls below Q1, 50% falls below
Q2, and 75% falls below Q3
also found in exactly the same way that we solved for percentiles and deciles.

Example: Use the data on car battery lives to find Q 3.

MEASURES OF VARIATION

16

Given the following data sets:


Set A
Set B

3
3

4
7

5
7

6
7

8
8

9
8

10
8

12
9

15
15

Find the mean and median values.


Remarks:
The measures of central location do not give an adequate description of a given
distribution.
These measures only describe the typical or representative values; these do not describe
how the observations spread out from the average.
Measures Of Variation describe the degree of dispersion, scatter or spread of scores in a
distribution.
RANGE

difference in value between the highest (maximum) and the lowest (minimum)
observation
can be computed very quickly but is not very useful
considers only the extremes and does not take into consideration the bulk of the
observations.

The range is used when:


1. the data are too scant or too scattered to justify the computation of a more precise measure
of variability.
2. a knowledge of extreme scores or a total spread is all that is wanted.
VARIANCE

a measure of variability that is based on the difference between the value of each
observation (xi) and the mean
deviation about the mean = the difference between each xi and the mean

Population Variance:

( xi )
N

Sample Variance:
2

s2

( xi X )
n 1

STANDARD DEVIATION

defined to be the positive square root of the variance

Population Standard Deviation:

Sample Standard Deviation:

REMARKS:
The sample variance may be thought of as the average of the squared deviations from the
mean
The greater the deviations, the greater the variance
The variance is of little use in descriptive statistics because its calculated value is
expressed in square units of measurement
the standard deviation is more widely used; it has the same unit of measurement as the
raw data
Calculation of the Variance and Standard Deviation: Raw Score Method

s2

n xi2 ( xi ) 2
n(n 1)
(Raw score formula)

17

32

xi

71

64

50

48

63

38

41

47

52

xi 506

1,0245,0414,0962,5002,3043,9691,4441,6812,2092,704 x 2 26,972
i

xi2

s2

10(26,972) (506) 2 269,720 256,036 13,684

152.04
10(9)
90
90

s 152.04 12.33
The standard deviation is used when:
1. the statistic having the greatest stability is desired.
2. coefficients of correlation and other statistics are to be computed later.
3. the mean is the preferred measure of central tendency.
APPLICATIONS OF THE STANDARD DEVIATION
COEFFICIENT OF VARIATION
a measure of relative variability
expresses the standard deviation as a percentage of the mean
expressed in percent
can be used to compare the variability of two or more distributions even when the
observations are expressed in different units of measurement: the smaller the CV the less
variable the values of a given set compared to another data set
formula:

CV

s
100%
X

Remarks: In the investing world, the coefficient of variation allows you to determine how much
volatility (risk) you are assuming in comparison to the amount of return you can expect from
your investment. In simple language, the lower the ratio of standard deviation to mean return,
the better your risk-return tradeoff.
Example: Consider two investment proposals, A and B, with the following data:

The coefficient of variation for each proposal is:


For A: $107.70/$230 x 100% = 47%
For B: $208.57/$250 x 100% = 83%

Therefore, because the coefficient is a relative measure of risk, B is considered more risky than A.
STANDARD SCORE

tells the relative location of a particular raw score with regard to the mean of all the scores in
a series.
is a transformed raw score.
expressed in terms of standard deviation units from the mean.
Has a mean of zero.
o a positive standard score indicates that the transformed raw score is above or higher
than the mean
o a negative standard score shows that the given raw score is below or lower than the
mean.
The formula for transforming a raw score to a standard score, represented by z, is

18

x X
s

usually used to compare observations in two or more different distributions of raw scores
which have different means and/or different standard deviations.

Example: Ruben got a final grade of 85 in both English and Physics. The mean final grades of
his class in these two courses are 80 in English and 75 in Physics with standard deviations of 12
and 10, respectively. In which subject was his academic performance better in relation to his
class?

EMPIRICAL RULE
When the data are believed to approximate a bell-shaped distribution, the empirical rule can
be used to determine the percentage of data values that must be within a specified number of
standard deviations of the mean, that is,
Approximately 68% of the data values will be within 1 standard deviation of the mean.
Approximately 95% of the data values will be within 2 standard deviations of the mean.
Approximately 99.7% of the data values will be within 3 standard deviations of the mean.
Example: Liquid detergent cartons are filled automatically on a production line. Filling weights
frequently have a bell-shaped distribution. If the mean filling weight is 16 ounces and the
standard deviation is 0.25 ounces, use the empirical rule to draw conclusions about the
distribution of filling weights.

EXERCISES
1. A goal of management is to help their company earn as much as possible relative to the
capital invested. One measure of success is return on equity the ratio of net income to
stockholders equity. Shown here are return on equity percentages for 25 companies.
Find the range, variance, and standard deviation.
9.0
15.8
17.3
12.8
5.0
2.

19.6
52.7
31.1
12.2
30.3

22.9
17.3
9.6
14.5
14.7

41.6
12.3
8.6
9.2
19.2

During a 30-day period, the daily number of cars


follows:
7
10
6
7
9
4
7
5
5
7
8
4
6
9
9
10
4
7
5
9
8
Find the range, variance, and standard deviation.

11.4
5.1
11.2
16.6
6.2
rented of a car rental company are as
9
7
9

9
12
5

8
7
7

3. Many national academic achievement and aptitude tests, such as the SAT, report
standardized test scores with the mean for the normative group used to establish scoring
standards converted to 500 with a standard deviation of 100. Suppose that the distribution
of scores for such a test is known to be approximately normally distributed. Determine the
approximate percentage of reported scores that would be
a. between 400 and 600
b. between 500 and 700
c. greater than 700
d. less than 200
4. A manufacturing firm regularly places orders with two different suppliers, A and B. The
following data are the number of days required to fill orders for these suppliers.

19

Supplier A: 11
10
9
10
11
11
10
11
10
10
Supplier B: 8
10
13
7
10
11
10
7
15
12
Use the range and standard deviation to determine which supplier provides the more
consistent and reliable delivery times.
5. A production department uses a sampling procedure to test the quality of newly produced
items. The department employs the following decision rule at an inspection station: If a
sample of 14 items has a variance of more than .005, the production line must be shut
down for repairs. Suppose the following data have been collected:
3.43 3.45 3.43 3.48 3.52 3.50 3.39
3.48 3.41 3.38 3.49 3.45 3.51 3.50
Should the production line be shut down? Why or why not?
6. Two friends want to take a summer holiday before going to college in the autumn. They
are looking for somewhere with plenty of clubs where they can party all night.
Unfortunately they have left it rather late to book and there are only two resorts, Medlena
and Bistry, available within their budget. When they ask about the ages of the holidaymakers at these resorts their travel agent says the only thing he can tell them is that that
the mean age of people going to Medlena is 19 whereas the mean age of visitors to Bistry
is 22. Just as they are about to book holidays in Medlena because it seems to attract the
sort of young crowd they want to be with the travel agent says. Ive got some more
figures, the standard deviation of the ages of visitors to Medlena is 8 and the standard
deviation of the ages of visitors to Bistry is 2. Should they change their minds on the basis
of this new information, and if so, why?

You might also like