You are on page 1of 25

SUMMIARIZATION OF DATA

y What is statistics and Biostatistics? y What is data? y Data Types y Descriptive statistics y Frequency tables y Graphical techniques

MEASURES OF CENTRAL TENDENCY One of the most useful summary number that tell us what is the middle or average value in data set is called central value. It can be calculated by

Mean Median Mode


3

Criteria of a Satisfactory Central Value:


y It should based on all observations y Quickly and easily calculated y Not unduly influenced by abnormally large or small

values
y Relatively stable in repeated samples

MEAN
y The MEAN (or arithmetic mean) is also

known as the AVERAGE. It is calculated by totaling the results of all the observations and dividing by the total number of observations.
Sum of all observations Mean = --------------------------------------Number of Observations

MEAN

..

Can only be calculated for numerical data e.g. Measurement of height of 7 women : 141, 141, 143, 144, 145, 146, 155 cm Total = 1015 cm Mean = 1015/7 = 145cm Is affected by the extreme values e.g.
141, 141, 143, 144, 145, 146, 185 cm Mean = 1045/7 = 150 cm
6

MEDIAN
The MEDIAN is the value that divides a distribution into two equal halves.

y The median is useful when some measurements are

much bigger or much smaller than the rest. The mean of such data will be biased toward these extreme values.
y The median is not influenced by extreme values.

MEDIAN cont d
Middle value if data is odd, Average of middle two values if data is even FORMULA : (n+1)/2, If data is odd eg. Weights of 7 pregnant women (in kg): 40, 41, 42, 43, 44, 47, 72 MEDIAN= (7+1) /2 = = 43 KG
4th Value i.e.
8

Mean & Median


When there are no extreme values in a sample, the mean and the median of the sample will be close in value.

MODE
Most frequently occurring value in a set of observations e.g. 2, 5, 7, 5, 8, 5, 3, 6, 7, 9, So mode = 5
10

Symmetric distribution:
y A symmetrical distribution has the same number of

values above and below the mean which is represented by the peak of the curve.
Mean and median are equal

Mean = median
11

Skewed distributions:
Skewness is the degree of departure from symmetry of a distribution . A positively skewed distribution has a "tail" which is pulled in the positive direction. A negatively skewed distribution has a "tail" which is pulled in the negative direction.

12

So, Mean, Median and Mode


y Mean is used for numerical data and for symmetrical

distribution.
y The median is used for ordinal data or for numerical

data whose distribution is skewed.


y The mode is used primarily for repeated

measurements.
13

MEASURES OF VARIATION
     Range Standard Deviation Quartiles Percentiles Coefficient of Variation

14

Range:
is defined as the difference between the highest (maximum) and the lowest (minimum) observation e.g. Heights of 7 women are 142, 141, 143, 144, 145, 146, 155 cm Range= 155 141 = 14 cm

15

Standard Deviation
y The STANDARD DEVIATION is a measure, which describes how much individual measurements differ, on the average, from the mean. y A large standard deviation shows that there is a

wide scatter of measured values around the mean, while a small standard deviation shows that the individual values are concentrated around the mean with little variation among them.

16

STANDARD DEVIATION (SD) ..


SD = Steps to calculate SD: 1. Calculate mean of all observations 2. Calculate difference between each individual measurement and mean 3. Square all these differences 4. Take sum of all squared differences 5. Divide this sum by number of measurements 6. Finally take the square root of value
17

Example: Standard Deviation


X 3 3 4 4 4 5 5 5 6 6 6 6 7 7 8 8 9 10 10 11 Sum X-Q -3.35 -3.35 -2.35 -2.35 -2.35 -1.35 -1.35 -1.35 -0.35 -0.35 -0.35 -0.35 0.65 0.65 1.65 1.65 2.65 3.65 3.65 4.65 0 (X -Q) 11.22 11.22 5.52 5.52 5.52 1.82 1.82 1.82 0.12 0.12 0.12 0.12 0.42 0.42 2.72 2.72 7.02 13.32 13.32 21.62 106.55

y Standard Deviation

7(X - Q) = n

106.55 20

= 5.33

18

QUARTILES
y The Points which divide the distribution of data into

four equal parts e.g.


y If we want to find the points below which 25% and

50% values of the distribution lie, these are called first and 2nd quartiles.
y 2nd quartile is also equal to median of the data

19

PERCENTILES :
y Points, which divide all the measurements into 100

equal parts e.g.


y 3rd percentile (P3)

value below which 3 % of

measurements lie.
y 50th percentile (P50) or median

value below

which 50% of measurements lie.

20

COEFFICIENT OF VARIATION (C.V.)


Ratio of SD to the mean, expressed as a percentage
CV = SD/mean x 100 %

CV is used to overcome the difficulties of comparison of frequency distribution measured in different units or with widely different means CV depicts the size of variation relative to the mean Measure that is independent of units of measurement

21

EXAMPE:
In two series of adults and children following values were obtained for the height. Find which series shows greater variation?

Persons Adults children

Mean Height 160cm 60cm

SD 10cm 5cm

22

EXAMPE: (contd )
 

CV for adults = 10/160 x100 = 6.25% CV for children = 5/60x100 = 8.33%

Conclusion: Thus, we find that heights in children show greater variation than in adults.

23

EXAMPE 2: (contd ..)




In a sample of boys SBP and weight were measured as follows Find which characteristic shows greater variation?

Characteristic SBP weight

Mean 120 60 kg

SD 10 4

24

Solution of Example 2


CV of SBP

= 10/120 x 100 = 8.33% CV of height = 4/60 x 100 = 6.67 % Conclusion: Thus, SBP is found to be a more variable characteristic than height 8.33/6.67 = 1.25 times
25

You might also like