You are on page 1of 21

CHAPTER 2

DESCRIPTIVE STATISTICS

L1 Numerical Summary of Data


L2 - Data Display and summary

Learning Objectives:
At the end of the lesson, students should be able to:
-

Explain the concepts of


- sample mean, population mean,
- sample variance, population variance, sample standard
deviation,

Compute and interpret the sample mean, sample variance,


sample standard deviation, sample median, an sample range

Descriptive
Statistics
-Methods

of organizing,
display, and describe
important features of
data by
* tables,
* graphs, and
* summary(numerical) measures

Population Sample ( Definition)


Population:

A collection, or set, of individuals or objects or events


whose properties are to be analyzed.
( the number UTP students)

Sample:

A subset of the population. The number of individuals of a


sample is called the sample size.
( the number of engineering students in UTP)

Illustration of selection of a sample from a population

Definition
Variable:
A characteristic of the objects in a population.

CGPA of UTP students (number)


Gender of an engineering graduate ( category: male or female)

Its value may change from one object to another in the


population

A data set consists of one or more variables

Numerical Descriptive Measures

Use to identify
the center
spread
Important features of distribution

Measure of central tendency: gives the center of a


histogram or frequency distribution curve.
Common measures of the central tendency: mean,
median

Measures of dispersion: gives the spread of the data


Common measures of dispersion: range, variance and
standard deviation
7

Nx
n

Measures of central tendency: (i)Mean


Population mean (mu) :
Sum of all values
In the population

The population size

Sample mean

Sum of all values


In the sample

The sample size

1
~
x

x
i
f
n
s
e
v
n
n
n
x
(
)
(

1
)
2
2
2
(n21) od
.

Measures of central tendency: (ii)Median

Median: value of the middle term in a data set that has


been ranked in increasing order.

Calculation of the median:

Rank the data set in increasing order


Find the middle term.

x1
41.,~
x1
40x2
40,~
x2
3

Consider the following data sets ( age of workers):

Company 1: 47

38

35

40

36

45

Company 2:

70

33

18

52

27

49

Mean or median is usually not a sufficient measure to


reveal the shape of distribution

Measures of dispersion: provide information about the


variation of a data set.

10

Measures of dispersion:

Range = Largest value smallest value

Standard deviation:
Most- used dispersion.
Tells how closely the values of the data set are
clustered around the mean.

Variance: square of the standard deviation

Values of the variance and standard deviation are never


negative.

Population variance and mean : parameters


Sample variance and mean: statistic

11

(
x
)

2
2
2

Measures of dispersion:
Population variance :

Population standard deviation is

12

1
ss
n
(s
x
1n
)x

22222
2

Measures of dispersion:

Sample Variance

Sample Standard Deviation:

13

Example 1:
Find the mean, variance and standard deviation for the following
observations:
55

68 90 42

89

70

14

Pictorial & Tabular Methods


1. Stem-and-Leaf Displays:
How to construct a Stem-and-Leaf Display:
1. Each numerical data is divided into two parts:
- The leading digit(s) becomes the stem,
and the remaining digit(s) becomes the leaf
2. List the stem values in a vertical column.
3. Record the leaf for each observation beside its stem.
4. Write the units for stems and leaves on the display.

Stem & Leaf Display


Stem: tens digit

Example:
No of hours that 30
students spent working
on computers:
75 52 80 96 65
79 71 87 93 95
69 72 81 61 76
86 79 68 50 92
83 84 77 64 71
87 72 92 57 98

Leaf: ones digit

How to construct a box plot


Step 1: Arrange the numbers from smallest to the largest.
Step 2: Find the median, Q2, the lower quartile, Q2 and the
upper quartile, Q3 of a given set of data.
Step 3: Find the interquartile range (IQR). The IQR is the
difference between the upper quartile and the lower quartile.
Step 4: Start to draw the Box-plot either horizontally or vertically.
Step 5: Calculate the 1.5IQR and determine the range of
1.5IQR from upper quartile and the lower quartile.
The value(s) that place outside of the 1.5IQR range called
the outlier(s). The value(s) that place outside of the 3IQR range
17
called the extreme outlier(s).

WHAT IS IMPORTANT IN A GRAPHICAL DISPLAY:

Shape of the distribution ( symmetrical or


non- symmetrical)
Presence of outliers

18

Example 1:
The cold start ignition time of an automobile engine
obtained for a test vehicle are as follows:
1.75 1.91 1.92 2.35 2.53 2.62 3.09 3.15
a)Calculate the sample median, the quartiles and the
IQR
b) Construct a box plot of the data. Comment on your
plot.

Example 2:
The cold start ignition time of an automobile engine
obtained for a test vehicle are as follows:
1.75 1.91 1.92 2.35 2.53 2.62 3.09 5.15
Construct a box plot of the data.

EXAMPLE 3
Given a list of marks on a recent quiz for 10
students:
51, 47, 55, 49, 55, 46, 55, 89, 51, 54
i) Determine

the mode, the median and the mean of

the data.
ii) Construct a box-plot of the data and comment on
the plot

21

You might also like