You are on page 1of 15

UECM2623/UCCM2623 Numerical Methods and Statistics/UECM1693 Mathematics for Physics II

Chapter 1: Descriptive Statistics


1.1

Some terms

Raw data
Raw data is data recorded in the sequence in which they are collected and before they are processed or
ranked
Table 1: The weights of 20 students in kg (Quantitative raw data)
61
66

68
65

65
62

67
67

68
60

71
73

69
69

63
70

74
70

64
71

B
D

C
B

Table 2: The grades of UCCM2623 of 20 students (Qualitative raw data)


A
B

B
A

C
B

A
B

C
B

B
A

B
C

A
D

Arrays
An arrangement of numerical raw data in ascending order or descending order of magnitude
60
68

61
68

62
69

63
69

64
70

65
70

65
71

66
71

67
73

67
74

Ungrouped data
Contains information on each member of a sample or population individually
Examples: Data presented in Table 1 and Table 2
Grouped data
Data presented in classes or intervals.
Example:
UCCM2623 Scores
Number of students

1.2

10 12
4

13 15
12

16 18
20

19 21
14

Organizing and Graphing Qualitative Data

1.2.1 Frequency distributions for qualitative data


A tabular arrangement that lists all categories and the number of elements that belong to each of the
categories.
Example 1.1. A sample was taken of 25 students who were planning to go to college. The courses he/she
intended to choose:
Engineering Infotech
Engineering Business
Business
Business
Business
Other
Biotech
Biotech
Biotech
Biotech
Infotech
Biotech
Biotech
Other
Business
Engineering Business
Other
Engineering Biotech
Biotech
Other
Infotech
Construct a frequency distribution table for these data.
Chapter 1 - 1

UECM2623/UCCM2623 Numerical Methods and Statistics/UECM1693 Mathematics for Physics II

Solution.

Course
Biotech
Business
Engineering
Infotech
Others

Tally

Frequency
8
4
4
25

Total:

1.2.2 Relative frequency and percentage distributions


Tabular arrangement that lists the relative frequencies and percentages for all categories.
relative frequency of a category =

frequency of that category


f
=
sum of all frequencies
f

Percentage = relative frequency 100%

Example 1.2. Determine the relative frequency and percentage distributions for the data in Example 1.1.
Solution.

Course

Relative
Frequency

Biotech
Business
Engineering
Infotech
Others

Percentage
32%

0.24
16%
0.12
Total:

16%
100%

1.2.3 Graphical presentation of qualitative data


Bar Graphs (bar chart)
A graph made of bars whose heights represent the frequencies of respective categories.

Example 1.3. Construct a bar chart for the data in Example 1.1.
Solution.
Frequency
8
6
4
2
Biotech

Business Engineering Infotech


Chapter 1 - 2

Others

Course

UECM2623/UCCM2623 Numerical Methods and Statistics/UECM1693 Mathematics for Physics II

1.3

Organizing and graphing quantitative data

1.3.1 Frequency Distribution for quantitative data

Lists all the classes and the number of values that belong to each class.
Data presented in the form of a frequency distribution are called grouped data.

Note:

Generally, the grouping process destroys some of the original information

The classes are non-overlapping i.e. each value belongs to one and only one class
Class
An interval that includes all the values that falls within two numbers, the lower and upper limits
Class limits
Endpoints of each interval
Class Boundary
Class boundary is the dividing line between two classes. It is given by the midpoint of the upper limit of
one class and the lower limit of the next higher class
Class width / class size
Class width is the difference between the upper and lower class boundary
class width = upper boundary lower boundary
Class mark / class midpoint
Class mark is the midpoint of the class interval
class mark = (lower class limit + upper class limit ) / 2
Constructing frequency distribution tables
1.

Determine the number of classes, usually varies from 5 to 20, depending mainly on the number of
observations in the data set.
Find 2k where k is the smallest number such that 2k is greater than the number of observations
(n).

2.

Determine the class interval or width ( i )


Must cover at least the distance from the smallest value (L) in the raw data up to the largest value
(H)
largest value( H ) smallest value( L)
approximate class width =
number of classes

3.

The class width is usually rounded to some convenient number.


The rounding of this number may slightly change the number of classes initially intended.

Determine the lower limit of the first class or the starting point.
Any convenient number that is equal to or less than the smallest value in the data set can be used
as the lower limit of the first class.

Chapter 1 - 3

UECM2623/UCCM2623 Numerical Methods and Statistics/UECM1693 Mathematics for Physics II

Example 1.4. Sample of birth-weights (oz) from 50 consecutive deliveries is given below. Construct a
frequency distribution table.
86
120
123
104
121

111
91
128
133
104

118
89
134
132
98

121
122
115
106
115

92
115
84
98
107

124
138
138
125
127

108
118
140
146
122

104
99
105
108
135

132
95
124
132
126

125
115
144
98
89

Solution.

Birthweights (oz)
80-89
90-99

Tally

f
4

8
110-119
120-129
130-139

13
3

1.3.2 Relative frequency and percentage distributions


relative frequency of a class =

frequency of that class


f
=
sum of all frequencies f

Percentage = relative frequency 100%

Example 1.5. Calculate the relative frequencies and percentages distributions for the data in Example
1.4.
Solution.
Birthweights (oz)

Class Boundaries

80-89

79.5 - 89.5

90-99
100-109
110-119
120-129

119.5 - 129.5

130-139

129.5 - 139.5

140-149

139.5 - 149.5

Relative Frequency

Percentage
8%

0.14
0.16
0.14
0.26

89.5 - 109.5

14%
16%
14%
16%

0.06
Chapter 1 - 4

6%

UECM2623/UCCM2623 Numerical Methods and Statistics/UECM1693 Mathematics for Physics II

Grouped (quantitative) data can be displayed in a histogram or a polygon.

1.3.3 Histogram
Three types of histogram
1.
Frequency histogram
Relative frequency histogram
2.
3.
Percentage histogram
A frequency histogram consists of a set of rectangle having
a) The bases on a horizontal axis with centres at the class marks and lengths equal to the class interval
sizes
b) The areas proportional to the class frequencies
If the class intervals all have equal size
the height of the rectangles are proportional to the class frequencies
otherwise
the height of the rectangles must be adjusted
Procedures to draw a histogram:
1.
Mark the class boundary of each interval on the horizontal axis.
2.
For each class, mark the frequencies (or relative frequencies or percentages) on the vertical
axis.
Draw a bar for each class so that its height represents the frequency of that class. (No gap
3.
between each bars)
4.
Label the histogram.

1.3.4 Polygon
Polygon is a line graph formed by joining the midpoints of the tops of successive bars in a histogram.
Next, we mark two more classes (with zero frequencies), one at each end, and mark the midpoints.
Three types of polygon:
1.
Frequency polygon
2.
Relative frequency polygon
3.
Percentage polygon

Example 1.6. Reconsider the data in Example 1.4 and draw


i)
the frequency histogram and frequency polygon
ii)
the relative frequency histogram and relative frequency polygon
iii)
the percentage histogram and percentage polygon

Chapter 1 - 5

UECM2623/UCCM2623 Numerical Methods and Statistics/UECM1693 Mathematics for Physics II

The frequency histogram and frequency polygon


Frequency
15
10

79.5

89.5

99.5

109.5

119.5

129.5

139.5

149.5
Birth-weight (oz)

The relative frequency histogram and relative frequency polygon


Relative Frequency
0.30
0.25
0.20
0.15
0.10
0.05
79.5

89.5

99.5

109.5

119.5

129.5

139.5

149.5
Birth-weight (oz)

The percentage histogram and percentage polygon


Percentage Relative Frequency
30
25
20
15
10
5
79.5

89.5

99.5

109.5

119.5

129.5

139.5

149.5
Birth-weight (oz)

Example 1.7. The frequency distribution gives the weight of 35 objects, measured to the nearest kg.
Draw a histogram to illustrate the data.
Weight (kg)
Frequency

68
4

Solution.
adjusted frequency =

9 11
6

12 17
10

18 20
3

standard class width


frequency
class width

Chapter 1 - 6

21 29
12

UECM2623/UCCM2623 Numerical Methods and Statistics/UECM1693 Mathematics for Physics II

Weight (kg)
68
9 11

Class width
3

Frequency
4
6

12 17

Height of rectangle (adjusted frequency)


4
6

10

18 20

21 29

12

Adjusted Frequency
6
5
4
3
2
1
5.5

8.5

11.5

14.5

17.5

20.5

23.5

26.5

29.5
Weight (kg)

1.3.5 Cumulative frequency distribution


A table that presents the total number of values that fall below the upper boundary of each class.
It is constructed for quantitative data only.
cumulative frequency of a class
cumulative relative frequency =
sum of all frequencies in the data set
cumulative percentage = cumulative relative frequency 100%

Example 1.8. Refer to data in Example 1.4, construct its cumulative frequency distribution, cumulative
relative frequency and cumulative percentage.
Birthweights (oz)
<79.5

Cumulative
frequency
0
4

<99.5
<109.5

19

<119.5
<129.5
<139.5
<149.5

26
39
47
55

Cumulative relative
frequency
0
0.08
0.22

Cumulative
percentage, %
0%
8%
22%
38%

0.52
0.78
0.94
1

78%
94%
100%

1.3.6 Ogive / Cumulative frequency curve


A curve drawn for the cumulative frequency distribution by joining the dots marked above the upper
boundaries of classes at heights equal to the cumulative frequencies of respective classes.
Chapter 1 - 7

UECM2623/UCCM2623 Numerical Methods and Statistics/UECM1693 Mathematics for Physics II

Note:
1.
The ogive starts at the lower boundary of the first class and ends at the upper boundary of the last
class.
2.
If relative cumulative frequency is used in place of cumulative frequency, the graph is called
relative cumulative frequency curve or percentage ogive.

Example 1.9. Draw an ogive for the data in Example 1.4. Estimate from the ogive,
a)
the total number of deliveries that their birth-weights were less than 95oz.
b)
the value of X , if 20 % of the deliveries were of birth-weights X oz or more.

Cumulative frequency

Solution.
Ogive

55
50
45
40
35
30
25
20
15
10
5
0
79.5

89.5

99.5

109.5

119.5

129.5

139.5

149.5

Birth-Weight (oz)

1.4

Measures of central tendency


Represent a data set by some numerical measures (typical values).
A single value that summarizes a set of data.
It locates the centre of the values.
Give the centre of a histogram or a frequency distribution curve.

3 measures will be considered here:


1.
Median
2.
Mode
3.
Mean

1.4.1 Median
Median is the value of the middle term in a data set that has been ranked in increasing or decreasing order
Median is the value of the

n +1
th term in a ranked data set; n = total number of elements in the set .
2

Note:
1.
If n is odd, then median is the value of the middle term in the ranked data.
2.
If n is even, then median is the average value of the two middle terms.
Chapter 1 - 8

UECM2623/UCCM2623 Numerical Methods and Statistics/UECM1693 Mathematics for Physics II

Example 1.10. Find the median of set A = { 10, 5, 19, 8, 3 } and set B = { 2, 7, 3, 6, 4, 5 }
Solution.

Note:
Median is not influenced by the extreme value. (Extreme values are values that are very small or very
large relative to the majority of the values in a data set.)
For grouped data in the form of frequency distribution of single-valued classes
Median can be found either from ungrouped frequency distribution or from the cumulative frequency
distribution.

Example 1.11. Find the median of the following frequency distribution.


No. of children
Frequency

0
3

1
5

2
12

3
9

4
4

5
2

Solution.

1.4.2 Mode
Mode is the value that occurs with the highest frequency in a data set.

Example 1.12. Find the mode of each of the following data set.
i)
74, 9, 5, 8, 3, 8, 8
iii)
2, 6, 6, 6, 3, 8, 8, 8, 3
ii)
2, 2, 6, 6, 8, 8, 9, 9
iv)
B, C, D, A, A, C, C, C, B, A
Solution.

Note:
1.
Mode is not influenced by the extreme value.
2.
Mode may not exist, exist one mode(unimode), two modes(bimodal) or more than two
modes(multimodal).
3.
Mode can be used for both quantitative and qualitative data
Chapter 1 - 9

UECM2623/UCCM2623 Numerical Methods and Statistics/UECM1693 Mathematics for Physics II

Example 1.13. Find the mode of the following frequency distribution.


No. of children
Frequency

0
3

1
5

2
12

3
9

4
4

Solution.

1.4.3 Mean
The mean for population data x1 , x 2 , ..., x N is denoted by and is defined as
x + x + ... + x N
1 N
= 1 2
=
xi
N
N i =1
The mean for sample data x1 , x 2 , ..., x n is denoted by
X =

x1 + x 2 + ... + x n 1
=
n
n

n
i =1

X and is defined as

xi

Example 1.14. Find the arithmetic mean for the data set { 158, 189, 265, 127, 191 }
Solution.

Note:
1.
Mean not necessary takes one of the values in the original data
2.
Mean is influenced by extreme value
For grouped data in the form of frequency distribution of single-valued classes

X =

f 1 x1 + f 2 x 2 + ... + f n x n 1
=
n
n

n
i =1

f i xi =

f i x i
f i

Example 1.15. Find the mean of the following frequency distribution.

fi

2
1

5
3

6
4

8
2

xi

fi

f i xi

24

16

xi

Solution.

Chapter 1 - 10

5
2

UECM2623/UCCM2623 Numerical Methods and Statistics/UECM1693 Mathematics for Physics II

For grouped data in the form of frequency distribution


Suppose data are grouped into k class intervals, and
f i = the frequency of class i
mi = the midpoint of class i

f i =population size

N=
n=

mean for population data:

f i mi
N

mean for sample data:

X=

f i mi
n

f i = sample size

Example 1.16. Find the mean of the following frequency distribution.


Weight (kg)
Frequency

68
4

Solution.
Class interval

9 11
6

68

21 29
12

12 17

18 20

21 29

10

14.5

19

25

6
60

10
145

3
57

12
300

f i mi

1.5

18 20
3

9 11

Class midpoint ( mi )
Frequency ( f i )

12 17
10

Measures of dispersion

Sometimes, with the measures of central tendency only are not enough to reveal the whole picture of the
distribution of a data set. This is because the measure of central tendency does not describe how the data
is distributed
Data set
A
B

Data
1, 2, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11
4, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 8
Set A

Mean
6
6

Median
6
6
Set B


1 2 3 4 5 6 7 8 9 10 11




4 5 6

Mode
6
6


7 8

Note: The mean, median and mode are the same for data set A and B but the distribution of the data are
different.

1.5.1 Measures of dispersion for ungrouped data


Range
The range for a data set {x1 , x 2 , ..., x n } is defined to be the difference between the largest value and
smallest value.
Range = largest value smallest value
Chapter 1 - 11

UECM2623/UCCM2623 Numerical Methods and Statistics/UECM1693 Mathematics for Physics II

Example 1.17. Find the range for data set A and data set B above.

Variance
The variance is the average of the squared deviation of the data from the mean.

Consider a population of N measurements x1 , x 2 , ..., x N


Population Mean = =

1
N

N
i =1

Population Variance = 2 =

xi

1
N

N
i =1

( xi ) 2 =

1 N 2
( xi ) 2
N i =1

Consider a sample of n measurements x1 , x 2 , ..., x n


Sample Mean = X =

1
n

n
i =1

xi

1 n
1
( xi X ) 2 =
Sample Variance = s =
n 1 i =1
n 1
2

n
i =1

1
x
n
2
i

n
i =1

xi

Standard Deviation
The standard deviation is the positive square root of the variance

Sample standard deviation = s = s 2


Population standard deviation = = 2
Note: 1. A small standard deviation means that the data are distributed closely to their mean.
2. A large standard deviation means that the data are widely scattered about their mean.
3. It is influenced by extreme values.

Example 1.18. Data shows the salary per day for all 6 employees of a small company.
29.50, 16.50, 35.40, 21.30, 49.70, 24.60
Calculate the variance and standard deviation for these data.
Solution.
Mean, =
xi

xi

( xi ) 2

29.50

0.00

0.00

xi
870.25

5.90
- 8.20
20.20
- 4.90

34.81
67.24
408.04
24.01

1253.16
453.69
2470.09
605.16

16.50
35.40
21.30
49.70
24.60
Total

Chapter 1 - 12

UECM2623/UCCM2623 Numerical Methods and Statistics/UECM1693 Mathematics for Physics II

Method 1:
Population variance = 2 =

1
N

N
i =1

( xi ) 2

Population standard deviation = =


Method 2:

xi2 =
Population variance = 2 =

1 N 2
( xi ) 2
N i =1

Population standard deviation = =

Example 1.19. A sample consists of 5 data values: 72, 49, 79, 55 and 57. Calculate the variance and
standard deviation.
Solution.
n = 5 , xi =

xi2 =
1
Sample variance = s =
n 1

i =1

1
x
n
2
i

n
i =1

xi

Sample standard deviation = s =

1.5.2 Measures of dispersion for grouped data


Variance
Population Variance = 2 =

Sample Variance = s 2 =

1
N

N
i =1

f i ( mi ) 2 =

1
1
f i ( mi X ) 2 =
n 1 i =1
n 1
n

f i mi2
f i mi

N
N
n
i =1

f i mi2

1
n

n
i =1

f i mi

Example 1.20. Find the variance from the following frequency distribution if it represent
a)
population
b)
sample
Height (m)
Frequency

20 22
3

23 25
6

26 28
12

Chapter 1 - 13

29 31
9

32 34
2

UECM2623/UCCM2623 Numerical Methods and Statistics/UECM1693 Mathematics for Physics II

Solution.
Height

Midpoint, m

Frequency, f

fm

f m2

63

1323

6
12
9
2

324
270
66

8748
8100
2178

20 22
23 25
26 28
29 31
32 34
Total:

24
27
30
33

2 =

f i mi2
f i mi

N
N

s2 =

1
n 1

1.6

n
i =1

f i mi2

=
1
n

n
i =1

f i mi

Measures of position

Measures of position determine the position of a single value in relation to other values in a sample or a
population data set.

1.6.1 Quartiles
Quartiles are 3 summary measures that divide a ranked data set into 4 equal parts.
second quartile (Q2) is the median of a data set.
first quartile (Q1) is the value of the middle term among the observations that are less than
the median.
third quartile (Q3) is the value of the middle term among the observations that are greater
than the median.

To Find The Quartiles of Ungrouped Data


Consider n items arranged in ascending order. Then,
1
( n + 1) th
4

The first quartile = Lower quartile = Q1 =


The second quartile = Median = Q2 =
The third quartile = Upper quartile =

value

1
( n + 1) th value
2
3
Q3 = ( n + 1)th value
4

When n is odd, the rule locate the exact position of the quartiles.
When n is even,
a)

When n is even and

n
2

is even, then round all decimal values of

into .5 value , for example: 2.25


6.75

2.5
6.5

Chapter 1 - 14

1
3
( n + 1) or ( n + 1) values,
4
4

UECM2623/UCCM2623 Numerical Methods and Statistics/UECM1693 Mathematics for Physics II

b)

1
3
( n + 1) or ( n + 1)
4
4
value which is greater than .5 value and round down the values which is smaller than .5 value, for
example:
3.75
4
2
2.25

When n is even and

n
2

is odd, then round up the decimal value of the

To Find The Quartiles of Grouped Data (from Ogive)


n
The first quartile = Lower quartile = Q1 = th value
4
n
The second quartile = Median = Q2 = th value
2
3n
The third quartile = Upper quartile = Q3 =
th value
4

1.6.2 Interquartile Range(IQR)


Interquartile Range, IQR = Q3 Q1
The semi-interquartile range = The quartile deviation =

Q3 Q1
2

1.6.3 Percentiles
The (approximate) value of the kth percentile, denoted by Pk is
Pk = value of the

kn
th term in a ranked data set
100

where k denotes the number of the percentile and n represents the sample size. Note that round
the nearest integer or .5 value, for example: 2.2
2.3
2.7
2.8

2.0
2.5
2.5
3.0

kn
to
100

Example 1.21. The following are the scores of 12 students in a mathematics class.
75
80
68
53
99
58
76
73
85
88
91
79
a)
Find the values of the three quartiles. Where does the score of 88 lie in relation to these quartiles?
b)
Find the interquartile range.
c)
Find the quartile deviation.
d)
Find the value of the 62nd percentile.
Solution.

Chapter 1 - 15

You might also like