You are on page 1of 7

1

Math 103
Statistics and
Probability
Frequency Distributions
CJD
Raw Data
60 74 74 58 72
58 82 52 26 72
66 66 60 92 78
46 38 50 66 50
62 64 68 62 84
54 66 66 44 60
84 70 76 72 66
70 64 52 40 78
76 42 50 64 48
64 40 82 54 74
Raw Data Test Scores in a Statistics Test
Minimum=26 Maximum=92
Summarize using a frequency distribution with 8 classes.
class width = ceiling((92-26)/8) = 9
7 . 13
64
~
7 . 62
=
=
=

CJD
Constructing a Frequency Table
3. The first lower class limit = lowest score excess/2
(excess = class width * classes range ).
4. Add the class width to the previous lower class limit to
get the next lower class limit.
5. The upper class limit = lower class limit + class width
accuracy of given data.
6. Tally the data in the appropriate class interval.
1. Decide on the number of classes .
2. Determine the class width
a. divide the range by the number of classes accurate
to one decimal more than the given data.
b. class width is next higher number with the same
accuracy as given data.
CJD
Frequency Distribution
Frequency Table Group data into classes and show in a
table the number of observations in each class.
0.00 99 94.5 95-103
0.02 1 90 85.5 86-94
0.12 6 81 76.5 77-85
0.22 11 72 67.5 68-76
0.30 15 63 58.5 59-67
0.18 9 54 49.5 50-58
0.08 4 45 40.5 41-49
0.06 3 36 31.5 32-40
0.02 1 27 22.5 23-31
0.00 18 13.5 14-22
Relative
Frequency
Class
Frequency
Class
Midpoint
LowerClass
Boundary
Class
Interval
2
CJD
Key Terms
Classes and Number of Classes
Upper and Lower Class Limits
Upper and Lower Class Boundaries
Class Mark or Class Midpoint
Class Width
= Upper Class Boundary Lower Class Boundary
Class Frequency
Total Frequency
Relative Frequency Distribution
Percentage Distribution
CJD
Mean from a Frequency Table
use class midpoint of classes for variable x
x = class midpoint
f = frequency
f = n
x =
f
(f x)

62.3 Mean
50 Tot Freq
3,114 Sum
90 1 90
486 6 81
792 11 72
945 15 63
486 9 54
180 4 45
108 3 36
27 1 27
Midpt *
Freq
Frequency Class
Midpoint
CJD
Weighted Mean
x =
w
(w x)

Each individual value x may have
a weight w associated with it.
In computing mean from frequency table,
the xs are the class midpoints
and the ws are the frequencies
CJD
Quantiles from a Frequency Table
Assume uniformly distributed in each class
P
94
: 94/100*50=47 observations
P
94
= 76.5+9*(47-43)/6
= 76.5+6 = 82.5
50 1 94.5 85.5
49 6 85.5 76.5
43 11 76.5 67.5
32 15 67.5 58.5
17 9 58.5 49.5
8 4 49.5 40.5
4 3 40.5 31.5
1 1 31.5 22.5
Cumu
lative
Frequ
ency
Frequ
ency
Upper
Class
Bound
ary
Lower
Class
Bound
ary
D
4
: 4/10*50=20 observations
D
4
= 58.5+9*(20-17)/15
= 58.5+1.8 = 60.3
Q
3
: 3/4*50=37.5 observations
Q
3
= 67.5+9*(37.5-32)/11
= 67.5+4.5 = 72.0
Q
2
: 2/4*50=25 observations
Q
2
= 58.5+9*(25-17)/15 = 63.3
3
CJD
Standard Deviation from Freq Table
) 1 (
1
2
1
2

=

= =
n n
x f x f n
s
k
i
k
i
i i i i
use class midpoint of classes for variable x
i
i indexes the classes from 1 to k
n is the sum of the frequencies f
i
of each class
N
x f x f N
k
i
k
i
i i i i
= =

=
1
2
1
2

CJD
Example
202,986 3,114 50 Sum
8,100 8,100 90 1 90
39,366 6,561 486 6 81
57,024 5,184 792 11 72
59,535 3,969 945 15 63
26,244 2,916 486 9 54
8,100 2,025 180 4 45
3,888 1,296 108 3 36
729 729 27 1 27
f*(x^2) x^2 f*x Freq f Class
Midpoint x
5 . 13 45071 . 13
50
3114 ) 202986 ( 50
2
=

=
CJD
Frequency Bar Chart
Frequency Bar Chart
0
2
4
6
8
10
12
14
16
23-31 32-40 41-49 50-58 59-67 68-76 77-85 86-94
Test Scores
F
r
e
q
u
e
n
c
y
Label with Class Limits
Show gaps between bars
CJD
Frequency Histogram
Frequency Histogram
0
2
4
6
8
10
12
14
16
22.5 31.5 40.5 49.5 58.5 67.5 76.5 85.5 94.5
Test Scores
F
r
e
q
u
e
n
c
y
Label with Class Boundaries
No gaps between bars
4
CJD
Frequency Polygon
Frequency Polygon
0
2
4
6
8
10
12
14
16
18 27 36 45 54 63 72 81 90 99
Test Scores
F
r
e
q
u
e
n
c
y
Label with Class Marks
Add extra lower class and extra higher class
CJD
Cumulative Distributions
100% 50 less than 94.5
98% 49 less than 85.5
86% 43 less than 76.5
64% 32 less than 67.5
34% 17 less than 58.5
16% 8 less than 49.5
8% 4 less than 40.5
2% 1 less than 31.5
0% 0 less than 22.5
Cumulative
Percent
Cumulative
Frequency
Class
Boundaries
CJD
Frequency Ogive
Label with Class Boundaries
Frequency at a class boundary is frequency <= class boundary
Frequency Ogive
0
10
20
30
40
50
60
22.5 31.5 40.5 49.5 58.5 67.5 76.5 85.5 94.5
Test Scores
C
u
m
u
l
a
t
i
v
e

F
r
e
q
u
e
n
c
y
CJD
Qualitative Data
256 Pajero Mitsubishi
314 Lancer Mitsubishi
423 Sarao Jeep
380 CRV Honda
567 City Honda
439 Civic Honda
375 Innova Toyota
548 Altis Toyota
128 Prius Toyota
602 Vios Toyota
Sales (MM) Model Maker
Classes : by Model or by Maker
5
CJD
Pie Charts
Vehicle Sales by Maker
Toyota
42%
Honda
34%
Jeep
10%
Mitsubishi
14%
CJD
Scatter Diagram

0
0.0 0.5

1.0 1.5
10
20

NICOTINE
T
A
R
CJD
Bell Shaped Distributions
Mode = Mean = Median
Bell Shaped (Normal) Distribution
CJD
Symmetry and Skewness
Mean = Median
SKEWED LEFT
(negatively)
SYMMETRIC
Mean
Median
SKEWED RIGHT
(positively)
Mean
Median
SYMMETRIC
6
CJD
Measure of Skewness
s
x x
SK
)
~
( 3
=

)
~
( 3
= SK
Pearsonian Coefficient of Skewness
SK is positive if positively skewed
negative if negatively skewed
In general, SK ranges from -3 to 3
SK
std dev
median
mean
data
3*(62.3 63.3) / 13.5 = -0.22 3*(62.7 64) / 13.7 = -0.28
13.45 13.7
63.3 64.0
62.3 62.7
grouped ungrouped
Test Scores
CJD
Empirical Rule
Note: In this
figure S = SD
CJD
Empirical Rule Example
92 74 66 60 50
84 74 66 60 50
84 72 66 60 48
82 72 66 58 46
82 72 64 58 44
78 70 64 54 42
78 70 64 54 40
76 68 64 52 40
76 66 62 52 38
74 66 62 50 26
8 . 103 3
6 . 21 3
1 . 90 2
3 . 35 2
4 . 76
0 . 49
7 . 13
7 . 62
= +
=
= +
=
= +
=
=
=





In our example of test scores,


{ % 68
{ % 95
{ % 7 . 99
w/in 1 SD : 35 observations; w/in 2 SD : 48 observations; w/in 3 SD : 50 obs.
CJD
Chebyshevs Theorem
applies to distributions of ANY shape.
the fraction of any set of data lying within k
standard deviations of the mean is always at least
of all data, where k is any number > 1.
2
1
1
k

at least 3/4 (75%) of all values lie within


2 standard deviations of the mean.
at least 8/9 (89%) of all values lie within
3 standard deviations of the mean.
7
CJD
Chebyshev Example
92 74 66 60 50
84 74 66 60 50
84 72 66 60 48
82 72 66 58 46
82 72 64 58 44
78 70 64 54 42
78 70 64 54 40
76 68 64 52 40
76 66 62 52 38
74 66 62 50 26
8 . 103 3
6 . 21 3
1 . 90 2
3 . 35 2
4 . 76
0 . 49
7 . 13
7 . 62
= +
=
= +
=
= +
=
=
=





{ % 75 >
{ % 89 >
w/in 1 SD : 35 observations; w/in 2 SD : 48 observations; w/in 3 SD : 50 obs.
Chebyshevs Theorem is conservative.
But it applies to all distributions (regardless of shape).
CJD
Exercise
Question : If the mean height of buildings in a city is 16.5
meters with a standard deviation of 4.2 meters, what
interval of building heights would contain at least 80% of
the population ?
Answer :
By Chebyshev's theorem,
1 1/k
2
= 0.8 or 1/k
2
= 0.2 or k
2
= 5 or k = 2.24
So at least 80% would lie within 2.24 sd's of the mean
16.5 2.24 (4.2) = 16.5 9.4
So the building heights h would be in the interval
7.1 m h 25.9 m
CJD
End

You might also like