Professional Documents
Culture Documents
STATISTICS
BFC 34303
Chapter 1 :
Review on Descriptive Statistics
INTRODUCTION
These are Mathematics marks for 30
students who are taking Test 1
WHAT IS STATISTIC?
~ Statistic is a numerical measurement
describing some characteristics of a
sample
~ Eg: The sample mean ,variance
WHAT IS VARIABLE ?
~ Any measured characteristic or
attribute that differs for different
elements
x i
1 4 2 ... 3 1 2
x i 1
n 20
41
2.05
20
OR
x 0 1 2 3 4 5
f 2 5 7 3 2 1
fx i i
2(0) 5(1) 7(2) 3(3) 2(4) 1(5)
x i 1
k 20
f
i 1
i 2.05
MEDIAN
The median is the middle value of a set of data that is arranged in
order of magnitude.
th
Let x(k) be the k observation in a set of data which has been
arranged in ascending or descending order.
For example, consider the following set of numbers
9 2 7 10 5 16
After arrangement, it becomes
2 5 7 9 10 16
Thus, between x3 7 and x 4 9
median is 8
Themedianof a set data x1 ,x 2 ,...,x n is denoted
by x(m) and x m may becalculated as:
x n1 ,if n is odd
2
xm
1
x x
2 2 2 1 ,if n is even
n n
Example :
Find the median for the following sets of data
a) 21, 24, 17, 28, 36, 20, 32
b) 3.56, 2.7, 5.48, 8.61, 4.35, 6.22
Solution:
a) The data arranged in ascending order :
17 , 20 , 21 , 24 , 28 , 32 , 36
Since n = 7 , which is odd, thus the
median is x x
m n 1 x 24
4
2
b) The data arranged in ascending order :
2.71 , 3.56 , 4.35 , 5.48 , 6.22 , 8.61
Since n = 6 , which is even, thus the
median is
1
xm x 6 x 6
2 2 2 1
1
x3 x 4
2
1
4.35 5.48
2
4.915
MODE
a) 2, 3, 3, 4, 5, 28, 5, 5
b) 2, 3, 5, 8, 10
2
(ii) If r is not an integer, then round up to the next
integer.
Q2 is also called median.
Interquartile Range = Q3 Q1
PERCENTILES
Percentiles divide a set of data which are arranged in
ascending order into 100 equal parts.
To find percentile ( Pk ):
k
Let r n
100
where : n number of observations
k percentile for Pk
(i) If r is an integer:
1 th
Pk r observation ( r 1)th observation
2
(ii) If r is not an integer, then round up to the next
integer.
Third quartile Q3
k 3
r n 7 5.25 ( not an integer )
4 4
Q3 6 observation 32
th
40 percentile P40
th
k 40
r n 7 2.8 (not an integer )
100 100
P40 3 observation 21
rd
Example :
Variability
The goal for variability is to obtain a measure
of how spread out the scores are in a
distribution.
A measure of variability usually accompanies a
measure of central tendency as basic
descriptive statistics for a set of scores.
MEASURES OF DISPERSION
REMARK
Range is not a good measure of dispersion because it is influenced by the
extreme values and the calculation does not cover all observations.
Variance and standard deviation are most useful and widely used
measure of dispersion. Although they are influenced by the extreme
values, the calculations cover all the observations
REMARK
Standard deviation measures how spreads out the values in a data set are.
If the data points are all close to the mean, then the standard deviation is
close to zero.
If many data points are far from the mean, then the standard deviation is
far from zero.
If all the data values are equal, then the standard deviation is zero.
VARIANCE x
x i
X
fx
i i
n f
i
(X X) 2
S 2
i
n 1 for i 1,2,...,n
Commonly use formulae
STANDARD
DEVIATION
2
x nX
2 2
2
nX fx
S
2 i 2
S i i
n 1 n 1
S VARIANCE
x fx
2
xi2
i
fx 2
i i S2
n i i
n
n 1 n 1
Example :
Calculate the variance and standard deviation for the
following sets of sample data. Hence, determine which data
is more disperse about the mean.
Set 1 : 16,10,9,2,5,2,7
Set 2 : 10,32,8,12,14,36,20,8,40,4,32,1
For Data 1:
Data 1 : 16,10,9,2,5,2,7
n
2
x x2 n xi
i 1
2 4 X 2
i 1
i
n
2 4
5 25 S
2
7 49 n 1
9 81
51
2
10 100 519
7 24.571849
16 256 6
n n
Xi 51
i1
i 519
X
i1
2
S 24.571849 4.957
For Data 2:
Data 2 : 10,32,8,12,14,36,20,8,40,4,32,1
n
2
n n
n xi Xi 217 i 5929
2
X
i 1
X 2
i1 i1
i 1
i
n
S
2
n 1
217
2
5929
12 182.265 Hence, data 2 is
11 more disperse
than data 1
S 182.265 13.5
STEM-AND-LEAF DIAGRAMS
Used to extract every data value in dataset.
The digit(s) in the greatest place value(s) of the data
values (or the other digits) are the stems.
The digits in the next greatest place values (or the last
digit of the value) are the leaf.
To construct a stem-and-leaf diagram:
1. Place the stems in order vertically from smallest to
largest.
2. Place the leaf in order in each row from smallest
to largest.
3. Create a key for the stem-and-leaf diagram so that
people know how to interpret the diagram.
Example :
STEM-AND-LEAF DIAGRAMS
Shape of distribution
A perfectly symmetric curve is one in which both sides of
the distribution would exactly match the other if the figure
were folded over its central point.
An example is shown below:
The distribution shows that most data are clustered at the right.
The left tail extends farther from the data centre than the right
tail. Therefore, the distribution is skewed to the left or
negatively skewed.
Example :
Marks of a recent Mathematics test are as given below:
73, 42, 67, 78, 99, 84, 91, 82, 86, 94
Based on the marks given:
(a) Construct a stem-and-leaf diagram.
(b) What is the highest and lowest mark?
(c) Interpret the distribution.
Solution:
(a) Mathematics Test Mark
Stem Leaf
4 2
5
6 7
7 3 8
8 2 4 6
9 1 4 9
Key:
9 9 means 99 marks
(b) Highest mark = 99, Lowest mark = 42
(c) Negatively skewed
BOX-AND-WHISKER PLOTS
70
max
Q1 Q2 Q3 60
min max
50
0 10 20 30 40 50 60 70
40 Q3
Horizontal Box and Whisker
30
Q2
20
10
min
Vertical Box and Whisker
0
BOX-AND-WHISKER PLOTS
To construct a box-and-whisker plot:
min max
Q1 Q2 Q3
10 20 30 40 50 60 70 80 90 100
The data lies within the upper and lower inner fence, so the data has no outlier.
min max
Q1 Q2 Q3
10 20 30 40 50 60 70 80 90 100
Q1 Q2 Q3
min max
SHAPE OF DATA DISTRIBUTION
(SYMMETRY AND SKEWNESS)
Q1 Q2 Q3
min max
SHAPE OF DATA DISTRIBUTION
(SYMMETRY AND SKEWNESS)
Q1 Q2 Q3
min max
Example :
Data :
40, 32, 61, 52, 65, 68, 41, 61, 70, 66, 57, 55, 45,
51, 62, 69, 31, 50, 72, 66, 41, 54, 65, 79, 66
(a) Display the data in a stem and leaf diagram.
(b) Find the first, second and third quartiles, upper and lower inner
fence.
(a) Construct a box and whisker plot for the above data.
Solution :
(a) Stem Leaf
3 1 2
4 0 1 1 5
5 0 1 2 4 5 7
6 1 1 2 5 5 6 6 6 8 9
7 0 2 9
Key:
5 4 means 54
(b) Number of observation, n = 25, min = 31 , max = 79
1
r 25 6.25 , Q1 = the 7th observation
4
= 50
2
r 25 12.5 , Q2 = the 13th observation
4
= 61
3
r 25 18.75, Q3 = the 19th observation
4
= 66
31 50 61 66 79
10 20 30 40 50 60 70 80 90 100
f i xi
x i 1
k
f
i 1
i
f
i 1
i total no. of frequency
xi class mark
Example :
Find the mean for the following data
Class Frequency, fi
0 x <10 2
10 x <20 17
20 x <30 26
30 x <40 10
40 x <50 5
0 10
SOLUTION: x
2
Class Class mark, Frequency, fixi
xi fi
0 x <10 5 2 10
10 x <20 15 17 255
20 x <30 25 26 650
30 x <40 35 10 350
40 x <50 45 5 225
fi = 60 f x
i i 1490
k
f xi 1490
x 24.83
i
x i 1
k
f
i 1
i 60
MODE of a frequency distribution
d1
mod e Lm c
d1 d 2
Lm = lower boundary of the class containing the
mode
d1 = the diff. between the frequency of the mode
class and the frequency of the class
immediately before it.
d2 = the diff. between the frequency of the mode
class and the frequency of the class
immediately after it
C = size of the mode class
Example :
Find the mode of frequency distribution given below:
Class Frequency
15 - 19 1
20 - 24 4
25 - 29 22
30 - 34 35
35 - 39 20
40 - 44 8
SOLUTION:
Lm 29.5
d1
d1 35 22 mod e Lm c
d 2 35 20 d1 d 2
c5
13
mode 29.5 5
13 15
= 31.8
Mode from histogram
Draw a line from the left upper
corner
Draw a of the
line highest
from vertical
the right bar
upper
frequency to the left
corner of upper
the corner
highest of the bar
vertical
Mode is estimated from the
next
to vertical
the right bar corner of the
upper
intersection point of both lines
vertical bar before it
Histogram should be drawn on a
graph paper in order to obtain an
accurate answer
Frequency
30
25
20
15
10
5
NOTE :
n
2 FL
m Lm c
fm
L m lower boundary
n total no. of frequency
FL cumulative frequency of the class before median class
fm frequency of median class
c size of median class
Example :
Calculate the median for the following data
Class Frequency, f
0x<5 7
5 x <10 27
10 x <15 35
15 x < 20 54
20 x < 25 63
25 x < 30 43
30 x < 35 25
35 x < 40 17
40 x < 45 9
45 x < 50 4
SOLUTION:
Class Frequency, f Cum. Frequency, FL
0x<5 7 7
5 x <10 27 34
10 x <15 35 69
15 x < 20 54 123
20 x < 25 63 186
25 x < 30 43 229
30 x < 35 25 254
35 x < 40 17 271
40 x < 45 9 280
45 x < 50 4 284
f 284
The median class is 20 x < 25 with the
corresponding frequency as 63.
Hence, the median is n
2 FL
m Lm
Lm 20 fm
c
f 284 1
FL 123 2 (284) 123
m 20 5
63
fm 63
c5 21.51
Quartile
Quartiles divide a set of data which are
arranged in ascending order into 4 equal
parts
Percentile
Percentiles divide a set of data which are
arranged in ascending order into 100 equal
parts
Decile
Deciles divide a set of data which are
arranged in ascending order into 10 equal
parts
For grouped data;
k
4 n FL
Qk Lk Ck, k 1, 2,3,..
fk
k
100 n FL
Pk Lk Ck, k 1, 2,3,..,99
fk
k
10 n FL
Dk Lk Ck, k 1, 2,3,..,9
fk
Lk = lower boundry of the class where Qk ,Pk ,Dk lies
n = total number of observations
FL = cumulative frequency before the class Qk ,Pk ,Dk
fk = frequency of the class where Qk ,Pk ,Dk lies
ck = class width where Qk ,Pk ,Dk lies
Example :
Height (cm) 3-5 6-8 9-11 12-14 15-17 18-20
Frequency 1 2 11 10 5 1
7.5 3
= 8.5 + 3 9.73
11
Q3 is in third class with boundries (11.5-14.5 )
Thus, Lk 11.5, f k 10, FL 14, c=3
Q3 = P75
22.5-14
=11.5 + 3
10
14.05
INTERQUARTILE RANGE
Defined as the difference between the
third quartile and the first quartile
Interquartile range = Q3 - Q1
fx
2
fx
2
Variance, S2
f
f -1
f 21 fx fx 2
= 204 2676
Solution:
Range = upper boundary of the last data
- lower boundary of the first class
= 18.5 0.5 = 18
fx
2
fx f
2
S 2
S 34.71
2
f 1
204
2 S = 34.71
2676
21
20 5.892
Example of the use of quartiles in spot speed
study: Frequency histogram example
Northbound Frequency Histogram
35
30
Modal Speed = 31 mph Modal Speed = 33 mph
25
Frequency (f i)
20
15
10
0
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
Speed Class (mph)
Frequency distribution curve
Frequency Distribution
12.00
8.00
6.00
4.00
Pace
28-34 mph
2.00
0.00
10 15 20 25 30 35 40 45 50
Speed (mph)
Cumulative frequency distribution curve
Cumulative Frequency Curve
110.00
100.00
90.00
P85 = 35.5 mph
80.00
Frequency (%)
70.00
60.00
50.00
Median Speed=P 50= 31 mph
40.00
30.00
20.00
10.00
0.00
0 10 20 30 40 50 60