You are on page 1of 5

Statistics Measures of Central Tendency Unit Plan

I. Introduction

Review: A measure is a number that represents a characteristic of an entire data set, or the relation of a
specific data point to such a characteristic.

A measure can be either a parameter or a statistic.

There are three main classes of measure:

- Measures of Central Tendency (or Centrality) CT


- Measures of Variation
- Measures of Position

A measure of central tendency is a number that attempts to express the average, middle or center of
a data set, or what is a typical value in a data set.

However, these are vague notions, and when we try to express them mathematically, they result in several
rival measures. Which measure is most appropriate to use depends on the context and is often a point of
controversy.

The relationships and differences between these rival measures are very interesting.

II. The Basic Measures of Central Tendency

The mean is what is most commonly referred to when people speak of average. Because of issues later in
this course, we must distinguish between the mean of a sample and the mean of a population and given
them different symbols, although they are calculated the same way.

Let = a datum in a given data set. Then:

Sample Mean Population Mean

() ()
= =

This symbol is read x bar This symbol is read mu

The median is an alternate way of trying to express average. The median is the middle number of an
ordered data set (that is, the numbers are in order from least to greatest.) The median is less susceptible to
extreme changes due to outliers (data points that are very atypical.) Since outliers are often the result of
errors in data collection, median is often a preferred measure over mean.

[Give the Nine Broke Guys and Bill Gates example, Republican vs. Democrat income example]
* If the set has no middle element (which happens whenever n is even), take the two data points next to
the middle line, find their mean, and let this serve as the median.*

Median is often represented by the symbol 2 . The reason for this will become clear later in the chapter.

The mode of a data set is the entry that occurs the most often in that set. Each set has a unique mean and
median, but this is not true of mode. A set may have more than one mode, or no mode at all.

If all the entries in a data set only appear once, we say there is no mode, rather than saying that every entry
is a mode.

It is usually inappropriate to use mode as a measure of CT, but not always. For example, it is the only such
measure that can be calculated from qualitative data.

[Give example of teacher experience article.]

Later on we will learn a more sophisticated definition of mode as a maximum of a probability density
function. This makes mode far more interesting and useful than it is in Chapter 2.

The midrange is found by dividing LDP + HDP by 2. The midrange has several advantages in advanced
Statistics, and several severe drawbacks as well. It is very sensitive to outliers, for example. For now it is
enough to know how to calculate it.

HW: p. 67-69 #1-4, #15-31 odds (part as on;y), p. 73 #54

III. Special Measures of Central Tendency and Weighted Tables

1.) Weighted Means

A weighted mean is used when we want to express that some data points are more important (given
greater weight) than others. To find a weighted mean, organize the data set into a table where the first
column consists of the data points. Then, in the next column, assign each data point its appropriate weight.
Make a third column that is the product of the first two. The weighted mean then is:

()
=
()

Often, but not always, the weights are arranged such that their sum is one. In that case, the
formula reduces to:

= ()
Sometimes, for example when reading a scientific paper, we are presented with a frequency table but not
with the original data set (especially if that data set is very large.) We can use the frequency to get an
estimate of the mean in the following manner: We will assume that each data point in a class falls at the
midpoint of the class (maybe now you see why we chose to represent midpoint with in the last section.)
This isnt true of course, but as some data points in a class will most likely be above the midpoint and some
below it, it is a fairly reliable estimator.

The formula then is quite similar to that for weighted mean:

() ()
. . =
()

2.) Finding the Mean of Grouped Data

This is similar to estimating the mean of a frequency table, except we know the values of the individual data
points. Grouped data is a type of data set where large numbers of data points all have the same value.

Ex: 100 people were asked to rate their mood on a scale of 1 to 5. 3 people said 1, 10 people said 2, 22
people said 3, 42 people said 4, and 33 people said 5. What was the mean mood of the people in the
room?

The repeated values go in the x column and the number of times each value appears in the set goes in the
frequency column. The mean of grouped data is calculated in exactly the same way that you would
estimate the mean of a frequency table.

3.) Estimating the Median of a Frequency Table

We can also estimate the median of a frequency table. There are two ways to do this: One is to let the
midpoints stand in for data points, and have each midpoint appear times in a synthetic data set.

For example, if the frequency of class [10 20) is 6, the midpoint of that class , 15, would appear in our
synthetic set 6 times.

The other method is to draw an ogive from the table, then draw a horizontal line coming out of the vertical
axis at /2, and see where it intersects the ogive line. The horizontal coordinate of the intersection is an
estimator of median.

Note that these two methods might give somewhat different results, but since they are estimates, that
should not overly concern us.

HW: Weighted Tables Topic Practice


IV. Skew

Skew is the relationship between mean and median. In the next section we will learn a precise numerical
expression of skew, but for now we will approach it on the ordinal level.

If mean>median, we say that a data set has positive skew or is skew right

If mean<median, we say that a data set has negative skew or is skew left

If mean=median, we say that a data set is symmetric.

Sometimes if the mean and the median are very close, but not exactly equal, we say that the set is roughly
symmetric.

A special type of symmetric set is called a uniform set in such a set the data points are spread evenly
through the interval covered by the set. For example, {1, 2, 3, 4, 5} is a uniform set, whereas {1, 4, 5, 6, 9} is
symmetric, but not uniform.

Hint: To remember the left vs. right distinction, remember that it is based on mean. When a set is skew
left, the mean will be to the left of median on the number line. When a set is skew right, the mean will be
to the right of median on the number line.

[Return to the Income Example and integrate with the notion of Skew.]

HW: Skew Topic Practice

V. Applications

We can use the ideas in this section and a bit of Algebra to answer questions such as these:

Katies grades in the first three quarters of the year were 86, 92 and 88. What grade does she need to get
in the fourth quarter in order to earn a 90 average (mean) for the year? ______

A data set has a minimum data point of 20. What would the maximum data point have to be in order for
the midrange to be 50? _______

Sam has two brothers. His younger brother is six years younger than he is and his older brother is three
years older than he is. The average (mean) of their ages is 36. How old is each brother?

______, ______, ______

Sam has two brothers. His younger brother is six years younger than he is and his older brother is three
years older than he is. The average (median) of their ages is 36. How old is each brother?

______, ______, ______


These might seem like frivolous questions but they have applications in data recovery and other fields.
Also, they show up a lot on the SAT!

HW: MCT Worksheets

MCT Test

You might also like