Professional Documents
Culture Documents
STATISTICS
Prepared by:
Ms. KAREN S. TAFALLA
INTRODUCTION
STATISTICS is a collection of methods for planning experiments,
obtaining data, and then organizing, summarizing, analyzing,
interpreting, and drawing conclusions based on the data.
A. Types of variables:
Qualitative variable measures a quality or characteristic on
each experiment unit.
Ex. - taste ranking: excellent, good, fair, poor,
- color of M&M candy: brown, yellow, red orange,
green, blue
Quantitative variable measures a numerical quantity or
amount on each experiment unit.
Ex. - weight of package ready to be shipped
- volume of orange juice in a glass
B.
Identify each variable as quantitative or qualitative:
1. Amount of time it takes to assemble a simple puzzle
2. Number of students in a first grade classroom
3. Rating of newly elected politician ( excellent, good,
fair, poor )
4. State in which a person lives.
C. Identify the following quantitative variables as discrete
or continuous:
1. Population in a particular area of the Philippines
2. Weight of newspapers recovered for recycling on a
single day.
3. Time to complete a probability exam
D.
A data set consist of the ages at death for each of the
41 past president of the United States
1. Is this a set of measurements a population or a
sample?
2. What is the variable being measured?
3.
Is the variable in part b quantitative or qualitative?
E.
Determine which of the four level of measurement is
most appropriate:
1. Weights of a sample of M&M candies
2. Instructors rated as superior, above average, average,
or poor
3. Lengths (in minutes) of movies
4. Zip codes
5. Movies listed according to their genre, such as comedy,
adventure, and romance
FREQUENCY DISTRIBUTION
When the set of data includes a large number of
observe values. It becomes practical to group the data into
classes or categories with the corresponding number of
terms falling into each class. The result is a tabular
arrangement called a frequency distribution.
Definition of terms:
A frequency table categories (or classes) of scores,
along with counts (or frequencies) of the number of scores
that fall into each category.
The frequency for a particular class is the number of
original scores that fall into that class.
Lower class limits are the smallest number that can actually
belong to the different classes.
Upper class limits are the largest number that can actually
belong to the different classes.
Class boundaries are the numbers used to separate
classes, but without the gaps created by the class limits.
They are obtained increasing the upper class limits and
decreasing the lower class limits by the same amount so
that there are no gaps between consecutive classes. The
amount be added or subtracted is one-half the difference
between the upper limit of one class and the lower limit of
the following class.
Class marks are the midpoints of the classes. They can be
found by adding lower class limits and dividing by 2.
Step 7:
List the lower class limits in a vertical column,
and enter the upper class limits, which can be easily
identified at this stage.
Step 8:
Represent each score by a tally in the
appropriate class.
Step 9:
Replace the tally marks in each class with the
total frequency count for that class.
51
61
74
68
78
62
71
88
72
66
77
82
68
68
73
56
82
66
71
58
75
67
75
86
66
70
71
64
73
85
74
62
84
66
92
91
57
61
78
63
73
58
79
61
83
88
81
75
57
68
70
54
79
62
78
59
70
66
81
CLASS
CLASS
MIDPOINT TALLY
INTERVAL BOUNDARIES
50 55
49. 5 55.5
56 61
55.5 61.5
62 67
68 73
74 79
80 85
86 91
92 97
61.5 67.5
67.5 73.5
73.5 79.5
79.5 85.5
85.5 91.5
92.5 97.5
FREQUENCY
3.
7.5
9.5
6.5 8.0
4.0
5.5
6.0
5.6
12.5
3.5
3.0
2.4
3.8
4.5
8.0
2.5
7.5 5.0
10.0 8.0
3.5
2.6
8.5
2.5
6.4
7.6
9.0 2.0
6.5
5.0
7.7
9.3
6.5
8.2
8.8
1.0
b.
FREQUENCY POLYGON
The frequency polygon is a modification of the histogram;
only, the frequency polygon is line graph where the class
frequencies is plotted against the class marks. To close the
polygon, an extra class mark at each end must be added. The
frequency polygon can also be obtained by connecting
midpoints of the tops of the rectangles in the histogram.
c. OGIVES
A line graph showing the cumulative frequency of distribution
is called an ogive. For the less than ogive, the less than
cumulative frequencies are plotted against the upper class
boundaries. For the greater than ogive, the greater than
cumulative frequencies are plotted directly above the lower
class boundaries. These graphs are useful in estimating the
number of observations that are less than or more than a
specified value.
Steps:
1. Divide each measurement into two parts: the stem and
the leaf.
2. List the stem in a column, with a vertical line to their right.
3. For each measurement, record the leaf potion in the
same row as its corresponding stem.
4. Order the leaves from the lowest to highest in each stem.
5. Provide a key to your stem and leaf coding so that the
reader can recreate the actual measurements if
necessary.
Example:
The data below ate the GPAs of 30 Adamson University
freshmen, recorded at the end of the freshmen year.
Construct a stem and leaf plot to display the distribution
of the data.
2.0
3.1
1.9
2.5
1.9
2.3
2.6
3.1
2.5
2.1
2.9
3.0
2.7
2.5
2.4
2.7
2.5
2.4
3.0
3.4
2.6
2.8
2.5
2.7
2.9
2.7
2.8
2.2
2.7
2.1
DESCRIPTIVE STATISTICS
MEASURES OF CENTRAL TENDENCY
A measure of central tendency gives a single
value that acts as a representative average of
the values of all the outcomes of your
experiment. Three parameters that measure the
center of the distribution in some sense are of
interest. These parameters, called the
population mean, the population median and the
population mode.
a. THE MEAN
For Ungrouped Data:
Let x1 , x2 , x3 ,. xn be n observations of a random variable X. The
sample mean, denoted by x, is the arithmetic average of these
values. That is,
_
x1 + x2 + x3 ++ xn
x (x-bar) =
------------------------------n
For Grouped Data
k
_
fi xi
i =1
x (x-bar) =
--------- k fi
i=1
Where:
B. THE MEDIAN
For Ungrouped Data:
Let x1 , x2 , x3 ,. xn be a sample observations arranged in the order of smallest to largest. The
sample median for this collection is given by the middle observation if n is odd. If n is even, the
sample median is the average of the two middle observations.
For Grouped Data:
When the data are grouped into a frequency distribution, the median is obtained by finding the cell
that has the middle umber and then interpolating within the cell.
n/2 <cfi-1
n/2 >cfi-1
~
~
x = Lbi + -------------------- (i)
OR
x = Ubi - -------------------- (i)
fi
fi
where:
Lbi
= lower class boundary of the interpolated interval
Ubi
= lower class boundary of the interpolated interval
<cfi-1 = less than cumulative frequency of the class before interpolated interval
>cfi-1 = greater than cumulative frequency of the class before interpolated interval
fi
= frequency of the interpolated interval
i
= class size
n
= number of data points
C. THE MODE
The last measure of central tendency is the mode. For a finite
population, the population mode is the value of X that occurs most
often. The mode of a sample is the value that occurs most often in
the sample. The drawback to this measure is that there might not be
a unique mode. There might be no single number that occurs more
often that any another. For this reason, the mode is not a particularly
useful descriptive measure.
When the data are grouped into a frequency distribution, the
midpoint of the cell with the highest frequency is the mode, since
this point represents the highest point (greatest frequency).
EXAMPLES:
The reaction times for a random sample of 9 subjects to a stimulant
were recorded as 2.5, 3.6, 3.1, 4.3, 2.9, 2.3, 2.6, 4.1 and 4.3
seconds. Calculate the mean, median and mode.
2.5 + 3.6 + 3.1 + 4.3 + 2.9 + 2.3 + 2.6 + 4.1 + 4.3
Mean = -----------------------------------------------------------------9
Mean = 3.3
1.
Median : 2.3, 2.5, 2.6, 2.9, 3.1 , 3.6, 4.1, 4.3, ,4.3
Median = 3.1
Mode = 4.3
2.
The frequency table (on the right side) represent the final
examination for an statistics course. Find the mean, the
median and the mode.
Class Interval
Frequency
Class mark
Cumulative
Frequency
<CF
10 19
14.5
20 29
24.5
30 39
34.5
40 49
44.5
12
50 59
54.5
17
60 69
11
64.5
28
70 79
14
74.5
42
80 89
14
84.5
56
90 99
94.5
60
fi xi
Mean = -------------- fi
(3)(14.5) + (2)(24.5) +( 3)(34.5) + (4)(44.5) + (5)(54.5) +
(11)(64.5) + 14(74.5)+ (14)(84.5) +(4)(94.5)
Mean = -------------------------------------------------------------------------------3 + 2 + 3 + 4 + 5 + 11 + 14 + 14 + 14
Mean = 66
n/2 <cfi-1
Median = Lb + -------------------- (i)
fi
60/2 28
Median = 69.5 + -------------------- (10)
14
Median = 70.93
Mode = Classmark with the highest frequency
Mode = 74.5 and 84.5
MEASURES OF VARIABILITY
Refers to the extent of scatter or dispersion around the
zone of central tendency
A. RANGE
One measure of variation is the range, which has the advantage of
being very easy to compute. The range, R, of a set of n measurements is
defined as the difference between the largest and smallest
measurements.
Formula:
Range = Highest score Lowest Score or R = (H L)
B. VARIANCE and STANDARD DEVIATION
The variance of a population of N measurements is defined to be the
average of the squares of the deviations of the measurements about their
mean . The population variance is denoted by and is given by the
formula
(x - )
= -------------for ungrouped data
N
(x - )
= ----------------for grouped data
1.
s = sqrt (0.6325)
= 0.795298686 or 0.80 (sample standard deviation)
Frequency
Class mark
Cumulative
Frequency
10 19
14.5
20 29
24.5
30 39
34.5
40 49
44.5
12
50 59
54.5
17
60 69
11
64.5
28
70 79
14
74.5
42
80 89
14
84.5
56
90 99
94.5
60
= 20.80264406 or 20.80
Measures of Shape
Measures of Shape
Skewness
refers to the symmetry of a
distribution. A distribution
which is not symmetric with
respect to its mean can be
termed as either positivelyskewed or negatively-skewed
Kurtosis
refers to the flatness or
peakedness of a particular
distribution
Skewness
SK = 0
where:
Xi N -
individual reading
standard deviation
mean
population size
Symmetric (Normal)
SK > 0
Positively Skewed
SK< 0
Negatively Skewed
negative skew: The left tail is longer than the right tail. It
has relatively few low values. The distribution is said to
be left-skewed or "skewed to the left; Example
(observations): 1,1000,1001,1002,1003
positive skew: The right tail is longer the left tail. It has
relatively few high values. The distribution is said to be
right-skewed or "skewed to the right".Example
(observations): 1,2,3,4,100.
Kurtosis
k = 3
k = S[(Xi - )/]
N
where:
Xi N -
individual reading
standard deviation
mean
population size
MesoKurtic (Normal)
k > 3
LeptoKurtic
k < 3
PlatyKurtic
Examples
1.
2.
3.
4.
Midpoint
Frequency
23.5-26.5
25.0
26.5-29.5
28.0
36
29.5-32.5
31.0
51
32.5-35.5
34.0
63
35.5-38.5
37.0
58
38.5-41.5
40.0
52
41.5-44.5
43.0
34
44.5-47.5
46.0
16
47.5-50.5
49.0
11/ k2
B. EMPIRICAL RULE
Another rule helpful in interpreting a value for a
standard deviation is the Empirical rule, which
applies to a data set having a distribution that is
approximately bell-shaped. The empirical rule is
often stated in abbreviated form, sometimes
called the 68-95-99 rule.
MEASURES OF POSITION
A. PERCENTILE
A set of n measurements on the variable
x has been arranged in order of
magnitude. The pth percentile is the value
that separate the bottom p% of the ranked
score from the top (100-p)%.
( Xnp + Xnp+1 )
if np is integer
Any percentile =
if np is non-integer
n(1-p) >cfi
-------------------- (i)
fi
where:
Lb
Ub
<cfi
interval
>cfi
interval
fi
i
n
p
B.
C.