You are on page 1of 14

Central Tendency

Any set of measurements has two important properties: the central or typical value and the spread about that value. A measure of central tendency for a collection of data values is a number that is meant to convey the idea of centralness for the data set. The most commonly used measures of central tendency are the mean, the median, and the mode.

Mean
The mean for a population is denoted by mu (), while the mean of a sample is represented by X . The mean or average is obtained by adding up all the scores and dividing the results by the number of scores where:
=

X
N

= population mean

X = sum of the X score (see lesson 1)


N = total number of scores in the population or,
X =
X

X
n

= sample mean

X = sum of the x score (see lesson 1)


n = total number of scores in the sample Example 1 The following data give the IQ scores of five UWF students 90, 110, 105, 95, 85 Find the mean IQ for these students: Solution: The variable in this example is the IQ of students. Lets denote it by X. Then the five values of X are X1 = 90, X2 = 110, X3 = 105, X4 = 95, X5 = 85 1
by C. Pearson & W. Moomaw 2002. All rights reserved Basic/Central Tendency/Dr. Carolyn Pearson

Where X1, represents the IQ of the first student, X2, represents the IQ of the second student, and so forth. The sum of the IQs of these five students is:

X X X

= X1 + X 2 + X 3 + X 4 + X 5 = 90 +110 +105 +95 +85 = 485

Note that the given information is on only five students. Therefore, it represents a sample. Because the data contain five values, n = 5. Substituting the values of X and n in the sample formula, the mean IQ of these five students is
X =

X
n

485 = 97 5

Thus, the five students had an average IQ of 97, which according to Figure 1 would place the mean IQ in the average range.

Standard Deviation IQ Percent of Population

-4 to -3 40 to 55

-3 to -2 55 to 70

-2 to -1 70 to 85

-1 to 0 85 to 100

0 to +1 100 to 115

+1 to +2 115 to 130

+2 to +3 130 to 145

+3 to +4 145 to 160

2.27% in the Mentally Impaired Range

13.59% Borderline

68.26% in the Average range

13.59% High 2.27% in the "Gifted" Average range

Figure 1 The distribution of IQ scores in the general population. The value of X for a particular sample will depend on what values of the population are included in that example. Sometimes a data set may contain a few very small or a few very large values. Such values are called outliers. A major shortcoming of the mean as a measure of central tendency is that it is very sensitive to outliers. Example 2 illustrates this point. Example 2 The following data give the IQ scores of five college students: 90, 25, 105, 95, 85

Find the mean IQ for these students: 2


by C. Pearson & W. Moomaw 2002. All rights reserved Basic/Central Tendency/Dr. Carolyn Pearson

Solution: The variable in this example is the IQ of students. Lets denote it by X. Then the five values of X are
X 1 = 90 , X 2 = 25 , X 3 = 105 , X 4 = 95 , X 5 = 85

Where X1, represents the IQ of the first student, X2, represents the IQ of the second student, and so forth. The sum of the IQs of these five students is:

X X X

= X1 + X 2 + X 3 + X 4 + X 5 = 90 + 25 +105 + 95 + 85 = 400

Note that the given information is on only five students. Therefore, it represents a sample. X Because the data contain five values, n = 5. Substituting the values of and n in the sample formula, the mean IQ of these five students is
X =

X
n

400 = 80 5

Thus, the five students had an average IQ of 80, which according to Figure 1 would place the students mean IQ in the borderline mentally impaired range. Thus, by inserting one outlier in the values, the mean has been drastically altered. This example encourages us to be cautious. We should remember that the mean is not always the best measure of central tendency because it is heavily influenced by outliers. Sometimes other measures of central tendency give a more accurate impression of a population or sample.

Standard Deviation IQ Percent of Population

-4 to -3

-3 to -2

-2 to -1 70 to 85

-1 to 0

0 to +1

+1 to +2 115 to 130

+2 to +3

+3 to +4

40 to 55 55 to 70

85 to 100 100 to 115

130 to 145 145 to 160

2.27% in the Mentally Impaired Range

13.59% Borderline

68.26% in the Average range

13.59% High Average

2.27% in the "Gifted" range

Figure 1 (Repeated) -The distribution of IQ scores in the general population. Example 3 Find the mean of the IQ scores for the following frequency table, to one decimal place. 3
by C. Pearson & W. Moomaw 2002. All rights reserved Basic/Central Tendency/Dr. Carolyn Pearson

Table 3 Frequency distribution for student IQ IQ (X) 125 120 115 110 105 100 95 90 85 80 Total Frequency (f) 1 1 4 8 6 10 6 5 3 1 45

Solution: Notice that the frequencies for the different IQ scores are not the same. You cannot simply add up the different IQs and divide by 10 (1015/10 101.5). The frequency of each score must be considered.
X =

n 125 x 1 +120 x 1 +115 x 4 +110 x 8 +105 x 6 +100 x 10 + 95 x 6 + 90 x 5 + 85 x 3 + 80 x 1 = 1 +1 + 4 + 8 + 6 +10 + 6 + 5 + 3 +1 4570 = 45 = 101 .6

Example 4 Adding or subtracting a constant to a mean. Using the IQ scores of five college students from Example 1. 90, 110, 105, 95, 85
X =

X
n

90 + 110 + 105 + 95 + 85 485 = = 97 5 5

Add 10 points to each of the scores. What does it do to the mean? The scores become: 100, 120, 115, 105, 95

4
by C. Pearson & W. Moomaw 2002. All rights reserved Basic/Central Tendency/Dr. Carolyn Pearson

X =

X
n

100 + 120 + 115 + 105 + 95 535 = = 107 5 5

It is no accident that the new mean is 10 points higher than the original. If a constant c is added to each score in a distribution that has a mean of X , the revised scores will have a mean of X +c . This principle holds for subtraction as well with the revised mean becoming X c . Example 5 Multiplying or dividing a mean by a constant. Using the IQ scores and mean of five college students from Example 1. 90, 110, 105, 95, 85
X =

X
n

90 + 110 + 105 + 95 + 85 485 = = 97 5 5

Multiply each score by 2 (disregard the fact that these are not legitimate IQ scores). What does it do to the mean? The scores and mean become: 180, 220, 210, 190, 170
X =

X
n

180 + 220 + 210 + 190 + 170 970 = = 194 5 5

Once again, it is no accident that the revised mean is twice the original mean. If every score in a distribution is multiplied by a constant, the mean of the revised scores is c X . Recall from Algebra that dividing is the same as multiplying by 1/c.

Median
The median is the value of the middle term in a data set that has been ranked in increasing order. Example 6 Using the IQ scores of five college students from Example 3-1. 90, 110, 105, 95, 85 Find the median IQ for these students. Solution: First the scores must be ranked in increasing order 85, 90, 95, 105, 110 When the number of values in the data set is odd, the median will be the middle value in the ordered array. When the number of values in the data set is even, the median will be the average of the two middle values in the ordered array. So, in our example the median is 95. 5
by C. Pearson & W. Moomaw 2002. All rights reserved Basic/Central Tendency/Dr. Carolyn Pearson

Example 7 Find the median for the ages of the following eight high school students: 19, 14, 15, 18, 16, 15, 17, 18. Solution: First the ages must be ranked in increasing order 14, 15, 15, 16, 17, 18, 18, 19 Since there is an even number of ages, the median will be the average of the two middle numbers. Since the two middle numbers are located in the fourth and fifth positions, the
Median = 16 + 17 = 16 .5 2

Mode
The mode is the value that occurs with the highest frequency in a data set. Example-8 Find the mode for the ages of the following five high school students. 16, 15, 16, 17, 18 Solution: In this data set, 16 occurs twice and each of the other values only occur once. Because 16 occurs with the highest frequency, it is the mode. A major shortcoming of the mode is that a data set may not have a mode or may have more than one mode. Example 9 A data set with no Mode. 14, 15, 16, 17, 18 Example 10 A data set with more than one Mode. 14, 15, 15, 16, 17, 18, 18, 19

Relationship between the Mean, Median, and Mode

6
by C. Pearson & W. Moomaw 2002. All rights reserved Basic/Central Tendency/Dr. Carolyn Pearson

Frequency distributions can take on three important shapes: symmetrical, positively skewed, and negatively skewed. Knowing the values of the mean, median, and mode can give us some idea about the shape of a frequency distribution. For a symmetric histogram and frequency curve with one peak (see Figure 2), the values of the mean, median, and mode are fairly close or often exactly the same, and they lie at the center of the distribution.

Figure 2 Mean, median, and mode for a symmetric histogram and frequency curve. For a histogram and frequency curve skewed to the right (see Figure 3), the value of the mean is largest, that of the mode is the smallest, and the value of the median lies between the two. (Notice that the mode always occurs at the peak point.) The value of the mean is largest in this case, because it is sensitive to outliers that occur in the right tail (positively skewed). These outliers pull the mean to the right.

7
by C. Pearson & W. Moomaw 2002. All rights reserved Basic/Central Tendency/Dr. Carolyn Pearson

Figure 3 Mean, median, and mode for a histogram and frequency curve positively skewed. If a histogram and frequency curve is skewed to the left (see Figure 4), the value of the mean is the smallest and that of the mode the largest, and the value of the median lies between the two. In this case, the outliers in the left tail pull the mean to the left (negatively skewed).

8
by C. Pearson & W. Moomaw 2002. All rights reserved Basic/Central Tendency/Dr. Carolyn Pearson

Figure 4 Mean, median, and mode for a histogram and frequency curve negatively skewed. Which measure of central tendency is best? The mean is the most frequently used measure of central tendency. It is the most precise for inferential purposes, and is the foundation for statistical concepts that will be introduced in later lessons. Because the mean is influenced by the value of every score in a distribution, it lies far from the bulk of scores in extremely skewed distributions. Consequently, the mean is drawn toward the elongated tail more than is the median or mode. Hence, the median is usually the preferred measure of central tendency with highly skewed distributions. If the variables under consideration represent only a nominal scale the only choice of measurement of central tendency is the mode.

9
by C. Pearson & W. Moomaw 2002. All rights reserved Basic/Central Tendency/Dr. Carolyn Pearson

Figure 5 Selecting Among the Mean, Median, and Mode This should hold you for a few minutes. The concepts of mean, median, and mode are extremely important in the field of statistics. If there is any confusion at this point, please review before proceeding the rest of this week.

Box-And-Whisker Plots
The Box-and-Whisker plot (or Boxplot for short) is a graphical display that is a simple and useful tool for summarizing and exploring a frequency distribution. It includes the presence of possible outliers. It illustrates the range of data. It shows a measure of dispersion such as the upper quartile, lower quartile and interquartile range (IQR) of the data set as well as the median as a measure of central location, which is useful for comparing sets of data. It also gives an indication of the symmetry or skewness of the distribution. The main reason for the popularity of boxplots is that they offer a lot of information in a compact way.

10
by C. Pearson & W. Moomaw 2002. All rights reserved Basic/Central Tendency/Dr. Carolyn Pearson

Steps to Construct a Boxplot: The boxplot is a based on the interquartile range (IQR). Recall from Lesson 2 "Quartiles" are values in a given set of observations that divide the data in 4 equal parts. These values can be denoted by Q1, Q2 and Q3, where 25% of the data falls below Q1, 50% of the data falls below Q2, and 75% of the data falls below Q3. To find the IQR divide the data into four equal groups. Data set #1 Data set #2 Data set #3 Data set #4

1. Put the data in numerical order. 2. Divide the data into two equal High and Low groups at the median. (If the median is a data point, include it in both the high and low groups). 3. Find the median of the low group (Q1). 4. The median of the high group is the third quartile (Q3).

Figure 6 The IQR is the distance between Q1 and Q3. Once the interquartile range is determined, it is an easy task converting it to a box-and-whisker plot. 5. Horizontal lines are drawn at the median and at the upper and lower quartiles. These horizontal lines are joined by vertical lines to produce the box.

11
by C. Pearson & W. Moomaw 2002. All rights reserved Basic/Central Tendency/Dr. Carolyn Pearson

6. A vertical line is drawn up from the upper quartile to the most extreme data point that is within a distance of 1.5 (IQR) of the upper quartile. A similar defined vertical line is drawn from the lower quartile.

7. Each data point beyond the end of the vertical line is an outlier and is marked with an asterisk (*) or the case number.

Example 9 will demonstrate the steps in making a boxplot. Example 9 The following data give the income (in thousands of dollars) for a sample of 12 households. 23, 17, 32, 60, 22, 52, 29, 38, 42, 92, 27, 46. Construct a box-and-whisker plot for this data. Solution: 1. Put the data in numerical order: 17 22 23 27 29 32 38 42 46 52 60 92 2. Divide the data into two equal High and Low groups at the median. (If the median is a data point, include it in both the high and low groups.)

12
by C. Pearson & W. Moomaw 2002. All rights reserved Basic/Central Tendency/Dr. Carolyn Pearson

Median =

32 + 38 = 35 2

3. Find the median of the low group (Q1):


Median = 23 + 27 = 25 2

4. The median of the high group is the third quartile (Q3).


Median = 46 + 52 = 49 2

5. Horizontal lines are drawn at the median and at the upper and lower quartiles. These horizontal lines are joined by vertical lines to produce the box.

6. A vertical line is drawn up from the upper quartile to the most extreme data point that is within a distance of 1.5 (IQR) of the upper quartile. A similar defined vertical line is drawn from the lower quartile. IQR = 49 25 = 24 1.5 X 24 = 36 Minimum lower vertical line = 25 36 = -11 Smallest value =17, therefore, draw vertical line here Maximum upper vertical line = 49 +36 = 85 Largest value within max = 60, therefore, draw vertical line here

13
by C. Pearson & W. Moomaw 2002. All rights reserved Basic/Central Tendency/Dr. Carolyn Pearson

7. Each data point beyond the end of the vertical line is an outlier and is marked with and asterisk (*) or the case number.

Figure 7 is an SPSS generated Box-and-Whisker Plot of the sample data.

Figure 3.7 Box-and-Whisker Plot of Example

14
by C. Pearson & W. Moomaw 2002. All rights reserved Basic/Central Tendency/Dr. Carolyn Pearson

You might also like