You are on page 1of 5

MEASURES OF CENTRAL TENDENCY Central tendency is the middle point of a distribution and measures of central tendency means measuring

sets of data in terms of the central location of the data in a data set. Accordingly, measures of central tendency include three important tools mean (average), median and mode. Measures of central tendency are generally calculated among ungrouped and grouped data and the formulae for the same would be accordingly different. Please refer to slides for relevant formulae Ungrouped data Ungrouped data is the raw data that is not organized into groups and consists of list of numbers. For example, daily prices of stocks listed on stock markets like the Bombay Stock Exchange (BSE) or National Stock Exchange (NSE), or monthly indices of Wholesale Price Index (WPI), monthly wages or workers, etc. Example The marks of seven students in an economics test out of a total of 20 marks are Students Marks in Economics Roll No. (out of 20) 1 2 3 4 5 6 7 19 18 15 13 17 12 11

Grouped data Data that has been organised into groups as a frequency distribution is grouped data. Large sets of ungrouped data of monthly (e.g. 1000 months) WPI can be grouped into different class indices. Example: The frequencies distribution of marks obtained by 200 students in an economics test out of 20

Marks Interval 15 6 10 11 15 16 20

Frequencies of Students 59 39 42 60

Arithmetic Mean (or Average) Arithmetic mean is the average of a numerical set and is found by dividing the sum of a set of numbers by the total number of members in the set. A set of data can be ungrouped data and grouped data. Median The value of a numerical set that equally divides the number of the values that are larger and smaller is the median. Prior to calculating the median of an ungrouped data, the data should be altered in an ascending order. Mode The value of a numerical set that appears with the greatest frequency is known as the mode. Relationship between Mean-Median-Mode When the mean, median and mode of derived from a data set coincide (mean=median=mode), indicates that the distribution of the data is symmetric. Symmetric data indicates that the data is equally balanced. Qualitative meaning of symmetry could be ones reflection in the mirror that depicts the exact and direct display of one looks, continuous rotation of a giant wheel, etc. Similarly, symmetric data (in statistics) is considered to be reflective of complete information without any differences, fluctuations or changes. The relationship between mean median mode can also be considered to determine whether the data is asymmetric or skewed. When mean is greater than median, which is further greater than mode, then the distribution of data is considered to be positively skewed and points in the positive direction (i.e. to the right). For example, if the test was

difficult and almost everyone performed poorly in the class, then the resulting distribution would most likely be positively skewed When the mean is less than the median, which is further less than mode, then the distribution of data is considered to be negatively skewed and points in the negative direction. For example, in an essay test most performed well while very few performed poorly then the distribution would point towards the negative direction. For a moderately skewed distribution, the empirical relationship between mean, median and mode is: Mean Mode = 3(Mean Median) MEASURES OF DISPERSION Dispersion refers to variations across a data set. Accordingly, measures of dispersion is related to determining whether the distribution of data vary or differ from one another. The basis of calculating measures of dispersion is through determining the measures of central tendency and relevant tools considered are Range, Interquartile Range, Variance and Standard Deviation. Range The quickest measure of dispersion is the range, which is calculated as the difference between maximum (highest) and minimum (lowest) values in a data set. Range however, ignores the distribution of other data in a data set and provides a distorted view or incomplete information about the data. Interquartile range Interquartile range is an extension of the range that considers quartiles within a data set. Quartiles of a data set are three points that divide the data set into four parts. The three values are first quartile or Q1 which mainly represent the initial 25% of the data set, second quartile (or median) or Q2, which represents the initial 50% of the data set and third quartile or Q3, which represents the initial 75% of the data set. Interquartile range is the difference between Q3 andQ1. The interquartile range summarizes the spread or variation of values in a data set especially around the median. However, like range it provides incomplete information about the data Example (with even number of observations): Data set: 59, 60, 64, 67, 68, 69, 70, 71, 72, 73 Data is in an ascending order

Step 1: Split the data into 2 parts: 1st part is between 59 and 68, i.e. 59, 60, 64, 67, 68 and 2nd part is between 69 and 73, i.e. 69, 70, 71, 72, 73 Step 2: Identify the mid-point in the 1st part which is 64 and the mid-point in the 2nd part which is 71. Step 3: Find out Q1 = 64 and Q3 = 71 Example (with odd number of observations): Data set: 6, 47, 49, 15, 42, 41, 7, 39, 43, 40, 36 Data in ascending order: 6, 7, 15, 36, 39, 40, 41, 42, 43, 47, 49 Step 1: Find out the median = 40 Step 2: Find out Q1 = 25.5 [The mid-point between 1st observation 6 and the median 40 is (15+36)/2] Step 3: Find out Q3 = 42.5 [The mid-point between median 40 and the last observation 49 is (42+43)/2] Variance and Standard Deviation The variance and standard deviation describe how far or close the numbers or observations of a data set lie from the mean (or average). Variance is the measure of the average distance between each of a set of data points and their mean value; equal to the sum of the squares of the deviation from the mean value. Standard deviation though calculated as the square root of the variance is the absolute value calculated to indicate the extent of deviation from the average of the data set. For example, is the average wages earned by a group of 100 workers equals Rs 20000 per month and the standard deviation calculated was 5000, then it implies that there are workers whose incomes lie above or below (vary) from the average wages by Rs 5000. The standard deviation in this example was measured to determine the level of disparity in wages among 100 workers. To determine the deviation in wages among each of the workers we calculate the standard score which is the difference between the wage of one workers and average wage across all workers, the whole divided by the standard deviation. For example, if a workers wage was Rs 17000, then the standard score would be minus 0.6, which indicates the workers wage of Rs 17000 deviates from the mean by (-0.6) multiplied by standard deviation of Rs 5000 which equals minus 3000 (or varies less from the average by Rs. 3000). Chebyshevs Theorem Please refer to the slides for the explanation

Coefficient of Variation (CV) Coefficient of variation is a relative measure to calculate and compare two different settings that has two separate means and standard deviations and is calculated as the standard deviation divided by the mean and the whole multiplied by 100. Thus, CV measures the amount of variation in data groups that have different means. Suppose, a teacher wishes to evaluate the relative variation in marks (out of 100 marks) in Business Environment subject of two classes of students Class A and Class B. Class As average marks are 40 and standard deviation is 5, whereas Class Bs average marks are 70 and standard deviation is 7. Coefficient of Variation for Class A = (5/40)*100 = 12.5% Coefficient of Variation for Class B = (7/70)*100 = 10.0% Class B has a less relative variation in marks than Class A because the average marks of Class B is more than Class A.

You might also like