Professional Documents
Culture Documents
org
C HAPTER
Introduction to Data
Chapter Outline
1.1
W HY S TUDY S TATISTICS ?
1.2
C LASSIFYING VARIABLES
1.3
L EVELS OF M EASUREMENT
www.ck12.org
What is Statistics?
Statistics is one of the most honestly useful math topics you are likely to study. Nearly every kind of occupation
and human activity can benefit from an application of statistics. In the most general sense, statistics describes a
set of tools and techniques that can be used to describe, organize, and interpret information or data. What are
those data? They could be the scores of students on their final exam in history, the speed with which a new drug
relieves headaches, the number of complaints received by customers, the free-throw percentage for different college
basketball players, or the average price of going out for pizza on a Friday night.
Statistics is a tool that helps us understand the world around us and make better decisions with the information we
have available to us. By knowing which products sell better at a particular time of year, for example, a business can
make the best use of product placement and advertising. Knowing what time(s) of the day, week, or year are the
busiest, a restaurant manager can efficiently schedule her employees so as not to waste labor costs. Coaches that
know the statistics on their players can use that information to help them field stronger offensives or defensives.
Applications
One type of business that makes extensive use of statistics is insurance sales. Insurance companies are just like most
companies from the standpoint that they are in business to make a profit for their investors. That does not mean that
buying insurance is a bad idea for individual people, or that the companies deliberately overcharge their customers,
but it does mean that the companies are very careful to charge enough for each policy to insure that the company
makes money overall.
How can the companies know for certain how many people are going to make claims against their insurance policies?
Or how big their claims will be? They cant know for certain since they dont have a way to see the future, but they
can get a very reliable idea of the average number of claims from a specific population of people through the use of
sample groups and the application of probability and statistics.
Example A
Predicting the weather is a tricky job. There are a nearly infinite number of possible variables that can affect the
temperature and chance of precipitation for any given day. Of course, a weatherman cannot possibly take all of these
variables into account every time he/she makes a prediction, so he/she must identify the most influential variables
and just watch them closely for each prediction. Suppose that according to records, it has rained an average of 5
days during the month of April for each year over the last 15 years. If it is currently April 25th and there has been
no rain, should the weatherman warn everyone to bring an umbrella to work for the next five days?
2
www.ck12.org
Maybe. However, the information regarding the average number of rainy days in April over the last 15 years
probably wont have much to do with it. Although the history may be suggestive of a particular number of rainy
days, it is certainly no guarantee of a specific result. If the weather conditions such as temperature, cloud cover,
relative humidity, etc., are all conducive to rain, then he/she is likely to predict rain, but the fact that there are only
five days left is certainly no assurance that there must be rain all five final days so that the average will be fulfilled.
Example B
Suppose a car insurance company reviews the police records for thousands of speeding tickets and minor car
accidents over a ten-year period, and notes the following:
TABLE 1.1:
Boys ages 16 - 23
Girls ages 16 - 28
Speeding Tickets
4,532
1,242
Fender Benders
1,725
1,715
Would it make sense for the company to charge the same rates for boys and girls?
It certainly does not look like it. According to the statistics, boys are nearly four times as likely to drive over the
speed limit, and although there were slightly fewer recorded accidents for girls than boys, note that the age range for
the girls was greater than for the boys. The greater age range suggests that there may have been more girls actually
driving than boys, yet they ended up in nearly the same number of accidents!
However, it is extremely important to note that without data regarding the actual number of boys and girls in each
group, we cant really get a good feel for the overall increased likelihood of boys making claims.
Lesson Summary
Statistics is about how to think clearly about data. There is no question that a little practice in learning how to think
statistically will help you see the world more clearly and accurately at least when it comes to making sense of
the data that surrounds us. Our objectives in this course are to help you identify the kinds of questions that can be
answered by statistics, to show you the tools you can use to organize and summarize your data, and to help you
practice the most important skill of all: to clearly interpret what those data are saying.
www.ck12.org
Learning Objectives
Introduction
Data in its original form, just a list of numbers, names, letters, colors, etc., is known as raw data, and is often not
particularly useful without some kind of organization. Without some sort of context and some level of organization,
data can seem like just a bunch of meaningless values.
Data can be classified into two general types, quantitative and qualitative. There are a number of ways to group or
organize each type of data to make it more useful.
In this lesson, you will be introduced to some basic vocabulary of statistics and learn how to distinguish between
different types of variables. We will use the real-world example of information about the Giant Galapagos Tortoise.
The Galapagos Islands, off the coast of Ecuador in South America, are famous for the amazing diversity and
uniqueness of life they possess. One of the most famous Galapagos residents is the Galapagos Giant Tortoise,
which is found nowhere else on earth. Charles Darwins visit to the islands in the 19th Century and his observations
of the tortoises were extremely important in the development of his theory of evolution.
4
www.ck12.org
The tortoises lived on nine of the Galapagos Islands, and each island developed its own unique species of tortoise.
In fact, on the largest island, there are four volcanoes, and each volcano has its own species. When first discovered,
it was estimated that the tortoise population of the islands was around 250,000. Unfortunately, once European ships
and settlers started arriving, those numbers began to plummet. Because the tortoises could survive for long periods
of time without food or water, expeditions would stop at the islands and take the tortoises to sustain their crews
with fresh meat and other supplies for the long voyages. Also, settlers brought in domesticated animals like goats
and pigs that destroyed the tortoises habitat. Today, two of the islands have lost their species, a third island has no
remaining tortoises in the wild, and the total tortoise population is estimated to be around 15,000. The good news is
there have been massive efforts to protect the tortoises. Extensive programs to eliminate the threats to their habitat,
as well as breed and reintroduce populations into the wild, have shown some promise.
TABLE 1.2:
Island
Volcano
or
Wolf
Darwin
Alcedo
Sierra Negra
Cerro Azul
Santa Cruz
Espaola
San Cristbal
Santiago
Pinzn
Pinta
Species
Climate
Type
Shell Shape
intermediate
dome
dome
Estimate of
Total Population
1139
818
6,320
Population
Density (per
km2 )
228
205
799
Number of
Individuals
Repatriated
40
0
0
becki
microphyes
vandenburghi
guntheri
vicina
nigrita
hoodensis
chathamensis
darwini
ephippium
abingdoni
semi-arid
semi-arid
humid
humid
humid
humid
arid
semi-arid
flat
dome
dome
saddle
dome
694
2.574
3,391
869
1,824
122
155
730
200
559
286
357
210
1,293
55
humid
arid
arid
intermediate
saddle
saddle
1,165
532
1
124
134
Does not apply
498
552
0
Repatriation
is the process of raising tortoises and releasing them into the wild when they are grown to avoid local
predators that prey on the hatchlings.
5
www.ck12.org
Classifying Variables
Statisticians refer to an entire group that is being studied as a population. Each member of the population is called a
unit, or subject. In this example, the population is all Galapagos Tortoises, and the units are the individual tortoises.
It is not necessary for a population or the units to be living things, like tortoises or people. For example, an airline
employee could be studying the population of jet planes in her company by studying individual planes.
A researcher studying Galapagos Tortoises would be interested in collecting information about different characteristics of the tortoises. Those characteristics are called variables. Each column of the previous figure contains a
variable. In the first column, the tortoises are labeled according to the island (or volcano) where they live, and in the
second column, by the scientific name for their species. When a characteristic can be neatly placed into well-defined
groups, or categories, that do not depend on order, it is called a categorical variable, or qualitative variable.
The last three columns of the previous figure provide information in which the count, or quantity, of the characteristic
is most important. For example, we are interested in the total number of each species of tortoise, or how many
individuals there are per square kilometer. This type of variable is called a numerical variable, or quantitative
variable. The figure below explains the remaining variables in the previous figure and labels them as categorical or
numerical.
TABLE 1.3:
Variable
Climate Type
Shell Shape
Explanation
Many of the islands and volcanic
habitats have three distinct climate
types.
Over many years, the different
species of tortoises have developed
different shaped shells as an adaptation to assist them in eating vegetation that varies in height from island
to island.
There are two tortoise breeding centers on the islands. Through these
programs, many tortoises have been
raised and then reintroduced into
the wild.
Type
Categorical
Categorical
Numerical
www.ck12.org
We have already defined a population as the total group being studied. Most of the time, it is extremely difficult or
very costly to collect all the information about a population. In the Galapagos, it would be very difficult and perhaps
even destructive to search every square meter of the habitat to be sure that you counted every tortoise. In an example
closer to home, it is very expensive to get accurate and complete information about all the residents of the United
States to help effectively address the needs of a changing population. This is why a complete counting, or census, is
only attempted every ten years. Because of these problems, it is common to use a smaller, representative group from
the population, called a sample.
Errors in Sampling
We have to accept that estimates derived from using a sample have a chance of being inaccurate. This cannot be
avoided unless we measure the entire population. The researcher has to accept that there could be variations in the
sample due to chance that lead to changes in the population estimate. A statistician would report the estimate of the
parameter in two ways: as a point estimate (e.g., 915) and also as an interval estimate. For example, a statistician
would report: I am 95% confident that the true number of tortoises is actually between 561 and 1075. This range
of values is the unavoidable result of using a sample, and not due to some mistake that was made in the process
of collecting and analyzing the sample. The difference between the true parameter and the statistic obtained by
sampling is called sampling error. It is also possible that the researcher made mistakes in her sampling methods in
a way that led to a sample that does not accurately represent the true population. For example, she could have picked
an area to search for tortoises where a large number tend to congregate (near a food or water source, perhaps). If
this sample were used to estimate the number of tortoises in all locations, it may lead to a population estimate that is
too high. This type of systematic error in sampling is called bias. Statisticians go to great lengths to avoid the many
potential sources of bias. We will investigate this in more detail in a later chapter.
Lesson Summary
In statistics, the total group being studied is called the population. The individuals (people, animals, or things) in
the population are called units. The characteristics of those individuals of interest to us are called variables. Those
variables are of two types: numerical, or quantitative, and categorical, or qualitative.
Because of the difficulties of obtaining information about all units in a population, it is common to use a small,
representative subset of the population, called a sample. An actual value of a population variable (for example,
number of tortoises, average weight of all tortoises, etc.) is called a parameter. An estimate of a parameter derived
from a sample is called a statistic.
Whenever a sample is used instead of the entire population, we have to accept that our results are merely estimates,
and therefore, have some chance of being incorrect. This is called sampling error.
Points to Consider
How do we summarize, display, and compare categorical and numerical data differently?
What are the best ways to display categorical and numerical data?
Is it possible for a variable to be considered both categorical and numerical?
How can you compare the effects of one categorical variable on another or one quantitative variable on
another?
7
www.ck12.org
Review Questions
1. In each of the following situations, identify the population, the units, and each variable, and tell if the variable
is categorical or quantitative.
1.
(a) A quality control worker with Sweet-Tooth Candy weighs every 100th candy bar to make sure it is very
close to the published weight.
(b) Doris decides to clean her sock drawer out and sorts her socks into piles by color.
(c) A researcher is studying the effect of a new drug treatment for diabetes patients. She performs an
experiment on 200 randomly chosen individuals with type II diabetes. Because she believes that men
and women may respond differently, she records each persons gender, as well as the persons change in
blood sugar level after taking the drug for a month.
2. In Physical Education class, the teacher has the students count off by twos to divide them into teams. Is this
a categorical or quantitative variable?
3. A school is studying its students test scores by grade. Explain how the characteristic grade could be
considered either a categorical or a numerical variable.
www.ck12.org
Learning Objective
Understand the difference between the levels of measurement: nominal, ordinal, interval, and ratio.
Introduction
This lesson is an overview of the basic considerations involved with collecting and analyzing data.
Levels of Measurement
In the first lesson, you learned about the different types of variables that statisticians use to describe the characteristics of a population. Some researchers and social scientists use a more detailed distinction, called the levels of
measurement, when examining the information that is collected for a variable. This widely accepted (though not
universally used) theory was first proposed by the American psychologist Stanley Smith Stevens in 1946. According
to Stevens theory, the four levels of measurement are nominal, ordinal, interval, and ratio.
Each of these four levels refers to the relationship between the values of the variable.
Nominal measurement
A nominal measurement is one in which the values of the variable are names. The names of the different species
of Galapagos tortoises are an example of a nominal measurement.
Ordinal measurement
An ordinal measurement involves collecting information of which the order is somehow significant. The name of
this level is derived from the use of ordinal numbers for ranking (1st , 2nd , 3rd etc.). If we measured the different
species of tortoise from the largest population to the smallest, this would be an example of ordinal measurement. In
ordinal measurement, the distance between two consecutive values does not have meaning. The 1st and 2nd largest
tortoise populations by species may differ by a few thousand individuals, while the 7th and 8th may only differ by a
few hundred.
Interval measurement
With interval measurement, the distance between any two values has a specific meaning. An example commonly
cited for interval measurement is temperature (either degrees Celsius or degrees Fahrenheit). A change of 1 degree
is the same if the temperature goes from 0 C to 1 C as it is when the temperature goes from 40 C to 41 C. In
addition, there is meaning to the values between the ordinal numbers. That is, a half of a degree has meaning.
9
www.ck12.org
Ratio measurement
A ratio measurement is the estimation of the ratio between a magnitude of a continuous quantity and a unit
magnitude of the same kind. A variable measured at this level not only includes the concepts of order and interval,
but also adds the idea of nothingness, or absolute zero. With the temperature scale of the previous example, 0 C
is really an arbitrarily chosen number (the temperature at which water freezes) and does not represent the absence
of temperature. As a result, the ratio between temperatures is relative, and 40 C, for example, is not twice as hot as
20 C. On the other hand, for the Galapagos tortoises, the idea of a species having a population of 0 individuals is all
too real! As a result, the estimates of the populations are measured on a ratio level, and a species with a population
of about 3,300 really is approximately three times as large as one with a population near 1,100.
Comparing the Levels of Measurement
Using Stevens theory can help make distinctions in the type of data that the numerical/categorical classification
could not. Lets use an example from the previous section to help show how you could collect data at different levels
of measurement from the same population. Assume your school wants to collect data about all the students in the
school.
If we collect information about the students gender, race, political opinions, or the town or sub-division in which
they live, we have a nominal measurement.
If we collect data about the students year in school, we are now ordering that data numerically (9th , 10th , 11th , or
12th grade), and thus, we have an ordinal measurement.
If we gather data for students SAT math scores, we have an interval measurement. There is no absolute 0, as SAT
scores are scaled. The ratio between two scores is also meaningless. A student who scored a 600 did not necessarily
do twice as well as a student who scored a 300.
Data collected on a students age, height, weight, and grades will be measured on the ratio level, so we have a ratio
measurement. In each of these cases, there is an absolute zero that has real meaning. Someone who is 18 years old
is twice as old as a 9-year-old.
It is also helpful to think of the levels of measurement as building in complexity, from the most basic (nominal) to
the most complex (ratio). Each higher level of measurement includes aspects of those before it. The diagram below
is a useful way to visualize the different levels of measurement.
Lesson Summary
Data can be measured at different levels, depending on the type of variable and the amount of detail that is collected.
A widely used method for categorizing the different types of measurement breaks them down into four groups.
Nominal data is measured by classification or categories. Ordinal data uses numerical categories that convey a
10
www.ck12.org
meaningful order. Interval measurements show order, and the spaces between the values also have significant
meaning. In ratio measurement, the ratio between any two values has meaning, because the data include an absolute
zero value.
Point to Consider
1. In each of the following situations, identify the level(s) at which each of these measurements has been
collected.
a. Lois surveys her classmates about their eating preferences by asking them to rank a list of foods from
least favorite to most favorite.
b. Lois collects similar data, but asks each student what her favorite thing to eat is.
c. In math class, Noam collects data on the Celsius temperature of his cup of coffee over a period of several
minutes.
d. Noam collects the same data, only this time using degrees Kelvin.
2. Which of the following statements is not true.
(a) All ordinal measurements are also nominal.
(b) All interval measurements are also ordinal.
(c) All ratio measurements are also interval.
(d) Stevens levels of measurement is the one theory of measurement that all researchers agree on.
3. Look at Table 1.2 in Section 1. What is the highest level of measurement that could be correctly applied to
the variable Population Density?
(a) Nominal
(b) Ordinal
(c) Interval
(d) Ratio
NOTE: If you are curious about the does not apply in the last row of Table 1.2, read on! There is only one
known individual Pinta tortoise, and he lives at the Charles Darwin Research station. He is affectionately known
as Lonesome George. He is probably well over 100 years old and will most likely signal the end of the species, as
attempts to breed have been unsuccessful.
11
www.ck12.org
C HAPTER
Chapter Outline
12
2.1
F REQUENCY TABLES
2.2
2.3
2.4
S HAPES OF D ISTRIBUTIONS
www.ck12.org
Charts and graphs, when created carefully, can provide instantaneous information about a data set without having
to calculate or even have knowledge of statistical measures. This chapter will concentrate on frequency tables, a
precursor to graphing. When we look at all the values in a sample or a dataset, we are looking at a distribution.
Often it is the distribution of data and not the individual data points themselves - that are of interest to us. You
may have earned a B- on your most recent philosophy exam, for example, but you want to know how you performed
relative to your classmates. To answer that question, your first need to know what the distribution of exam scores
looks like.
Frequency Tables
As an example of how organizing data can help us better understand the world around us, lets take a look at the
impact of limited resources and recycling challenges on our planet. The earth has seemed so large in scope for
thousands of years that it is only recently that many people have begun to take seriously the idea that we live on
a planet of limited and dwindling resources. This is something that residents of the Galapagos Islands are also
beginning to understand. Because of its isolation and lack of resources to support large, modernized populations of
humans, the problems that we face on a global level are magnified in the Galapagos. Basic human resources such
as water, food, fuel, and building materials must all be brought in to the islands. As the human population grows
exponentially, the Islands are confronted with the problem of what to do with all the waste.
Lets look specifically at one resource: bottled water. Bottled water consumption worldwide has grown and continues
to grow at a phenomenal rate. According to the Earth Policy Institute, 154 billion gallons were produced in 2004.
While there are places in the world where safe water supplies are unavailable, most of the growth in consumption
has been due to other reasons. The largest consumer of bottled water is the United States, which arguably could
be the country with the best access to safe, convenient, and reliable sources of tap water. Only a small fraction
of the plastic that is recycled. In addition, huge volumes of carbon emissions are created when these bottles are
manufactured using oil and transported great distances by oil-burning vehicles.
www.ck12.org
Frequency
1
1
3
4
6
8
7
2
When creating a frequency table, it is often helpful to use tally marks as a running total to avoid missing a value or
over-representing another. But those tally marks typically dont appear in the final table.
Tally
Frequency
|
|
|||
||||
||||
@
@|
||||
@
@ |||
||||
@
@ ||
||
1
1
3
4
6
8
7
2
TABLE 2.3:
Country
Italy
Mexico
United Arab Emirates
Belgium and Luxembourg
France
14
www.ck12.org
Below is a grouped frequency distribution for the water-consumption data. A bracket, [ or ], indicates that the
endpoint of the interval is included in the class. A parenthesis, ( or ), indicates that the endpoint is not included.
It is common practice in statistics to include a number that borders two classes as the larger of the two numbers in
an interval. For example, [80 90) means this classification includes everything from 80 and gets infinitely close to,
but not equal to, 90. 90 is included in the next class, [90 100).
TABLE 2.4:
Liters per Person
[80 90)
[90 100)
[100 110)
[110 120)
[120 130)
[130 140)
[140 150)
[150 160)
[160 170)
[170 180)
[180 190)
Frequency
4
3
1
0
1
1
2
0
2
0
1
Relative Frequencies
If you were evaluating a set of data describing the numbers of As, Bs, Cs, and Ds that students earned on
a particular test, and needed to display the data on a relative frequency table, how would you go about it?
A relative frequency table is specifically designed to display the ratio of each individual frequency to the total
frequency of the data. Sometimes these may be represented as percentages.
15
www.ck12.org
Example A
The students in a class were asked what kind of music they liked. 18 liked rock, 11 liked pop, 5 liked hip hop, and 8
liked country. Create a frequency and relative frequency table using this information.
To create the frequency table, we just need one column for each category:
TABLE 2.5:
Rock
18
Pop
11
Hip Hop
5
Country
8
To convert to a relative frequency table, just divide each frequency by the total:
TABLE 2.6:
Rock
18
42 = .43
Pop
11
42 = .26
Hip Hop
5
42 = .12
Country
8
42 = .19
To build a relative frequency table, start by grouping the values into categories or bins, depending on the type of data
you have. You should try to limit the number of total groups to less than a dozen in most cases. Once you have all of
your data separated into groups, tally the number of values in each group.
To calculate the relative frequency of each category, divide the number of values in a group by the overall frequency.
The decimal you get will represent the part of the entire sample that is represented by that category. Once you have
calculated all of the relative frequencies for every category, add them up to make sure they total 1.0.
NOTE: If you are graphing the relative frequencies of a continuous variable, you will need to specify how to
handle any values that fall right on the edge of a bin. Here are a couple of ways to do this:
You can specify on your table that values equal to lower class limits are included in the bin, but values equal to
upper class limits are not (this is the conventional method). This means that a value of 5 would be considered
part of a 5-10 class, but not part of a 1-5 class.
You can also define each category so that there are no overlapping values:
Example B
You are given a bag of marbles in multiple colors, if there are 25 red, 22 yellow, 17 green, and 28 blue marbles, what
are the relative frequencies of each color?
Solution
Start by totaling the number of marbles: 25 + 22 + 17 + 28 = 92 total marbles
16
www.ck12.org
= .272
22 yellow marbles
92 totalmarbles
= .239
17 green marbles
92 total marbles
= .185
28 blue marbles
92 total marbles
= .304
Note that each of the relative frequencies can also be understood as percentages:
.272 = 27.2% red marbles
.239 = 23.9% yellow marbles
.185 = 18.5% green marbles
.304 = 30.4% blue marbles
27.2% + 23.9% + 18.5% + 30.4% = 100%
Example C
A police officer is reviewing accident statistics for her city. She notes that there were a total of 23 incidents involving
teen drivers between ages sixteen and twenty-one, 19 incidents involving drivers aged twenty-two through twentysix, 19 involving twenty-seven to forty-year-olds, and 18 for ages above forty-one.
What are the relative frequencies for each age range?
17
www.ck12.org
Solution
The total number of accidents is:
23 + 19 + 19 + 18 = 79 total accidents
= .291
= .241
= .241
= .228
Lesson Summary
A frequency table is useful to organize data into classes according to the number of occurrences, or frequency, of
each class. Relative frequency shows the percentage of data in each class.
Vocabulary
A relative frequency table compares the number of entries in each of several categories to the number of
entries in the total sample size.
Binning is the common term for the process of dividing data up into multiple categories, classes, or intervals
in preparation for graphing.
A continuous variable is a variable that can represent any value between a given minimum and maximum.
Age is a common continuous variable, since age can be measured in infinitely small increments. By contrast,
18
www.ck12.org
a discrete variable can only represent specific values in a given range. The number rolled on a standard die
is a discrete variable since it can only be one of the numbers 1 6, no partials or fractions.
Review Questions
1. Lois was gathering data on the plastic beverage bottle consumption habits of her classmates, but she ran out
of time as class was ending. When she arrived home, something had spilled in her backpack and smudged the
data for the 2s. Fortunately, none of the other values was affected, and she knew there were 30 total students
in the class. Complete her frequency table.
TABLE 2.7:
Number of Plastic Beverage Bottles per Week
1
2
3
4
5
6
7
8
Tally
Frequency
||
|||
||
|||
||||
@
@ ||
||||
@
@|
|
2. The following frequency table contains exactly one data value that is a positive multiple of ten. What must
that value be?
TABLE 2.8:
Class
[0 5)
[5 10)
[10 15)
[15 20)
[20 25)
[25 30)
[30 35)
[35 40)
Frequency
4
0
2
1
0
3
0
1
(a) 10
(b) 20
(c) 30
(d) 40
(e) There is not enough information to determine the answer.
3. The following table includes the data from the same group of countries from the earlier bottled water consumption example, but is for the year 1999, instead. Create a frequency table for this data set.
19
www.ck12.org
TABLE 2.9:
Country
Italy
Mexico
United Arab Emirates
Belgium and Luxembourg
France
Spain
Germany
Lebanon
Switzerland
Cyprus
United States
Saudi Arabia
Czech Republic
Austria
Portugal
4. The following table shows the potential energy that could be saved by manufacturing each type of material
using the maximum percentage of recycled materials, as opposed to using all new materials. Construct a
frequency table, including the actual frequency, the relative frequency (round to the nearest tenth of a percent),
and the relative cumulative frequency. Assume a bin width of 25 million BTUs
TABLE 2.10:
Manufactured Material
Aluminum Cans
Copper Wire
Steel Cans
LDPE Plastics (e.g., trash bags)
PET Plastics (e.g., beverage bottles)
HDPE Plastics (e.g., household cleaner bottles)
Personal Computers
Carpet
Glass
Corrugated Cardboard
Newspaper
Phone Books
Magazines
Office Paper
20
www.ck12.org
Guided Practice
1. The Sackmore and Headbut village football teams have played each other 50 times. Sackmore has won 10
times, Headbut has won 35 times, and the teams have drawn 5 times. Based on past performance, what is the
probability that Sackmore will win the next match?
2. Tony estimates that the probability that there will be an empty space in the car park when he arrives at work
is 45 . His estimate is based on 50 observations. On how many of these 50 days was he unable to find an empty
space in the car park?
3. A pair of dice (one red, one green) is cast 30 times, and on 4 of these occasions, the sum of the numbers facing
up is 7. What is the relative frequency that the sum is 7?
4. In 1990, there were approximately 10,000 fast food outlets in the US that specialized in Mexican food. Of
these, the largest were Taco Bell with 4809 outlets, Taco Johns with 430 outlets and Del Taco with 275 outlets.
The relative frequency that a fast food outlet that specializes in Mexican food is none of the above is:
Solutions
1. So far, Sackmore hase won 10 out of the 50 matches. We can write this as a fraction, which (reduced) is: 15 .
This fraction isnt really the probability of Sackmore winning, but it is an estimate of that probability. We say
that the relative frequency of Sackmore winning is 15 .
2. If Tony has figured that he is able to find a space 4 of every 5 times he arrives, then he is not able to find a
x
space 1 in every 5 times. If we set the ratio: 15 = 50
, we can solve for x to find that he did not have a space 10
times.
4
2
3. Out of thirty throws, four of them were 7s. The relative frequency is 30
or 15
.
4. The likelihood that a restaurant is not one of the top three would equal the number of Mexican fast food
restaurants that are not one of the three: 10, 000 4809 430 275 = 4486, divided by the total number of
Mexican fast food restaurants, 10,000:
4, 486
= .4486 or 44.86%
10, 000
More Practice
30 Students in a class surveyed each other to find out their favorite movie series, and recorded the results in a table
like the one shown below.
TABLE 2.11:
Movie Series
Twilight
Lord of the Rings
Pirates of the Caribbean
Harry Potter
Narnia
High School Musical
Number of Likes
7
5
9
6
2
1
www.ck12.org
3. 100 people were asked whether they were left-handed. 8 people answered yes. What is the relative frequency
of left-handed people in the survey?
4. The relative frequency of getting a white candy from a particular bag is 0.3. If the bag contains 100 candies,
estimate the number of whites.
5. Kyle observed 80 cars as they drove by his bedroom window. 24 of them were red. What is the relative
frequency of red cars?
6. The relative frequency of rain in April is .6. There are 30 days in April. Estimate the number of days of rain
expected in April.
Use the table below listing the heights of 100 male semiprofessional soccer players.
TABLE 2.12:
Heights (Inches)
59.95-61.95
61.95-63.95
63.95-65.95
65.95-67.95
67.95-69.95
69.95-71.95
71.95-73.95
73.95-75.95
7.
8.
9.
10.
11.
12.
22
Frequency of Students
5
3
40
17
12
1
Total = 100
Relative Frequency
3/100 = 0.03
15/100 = 0.15
40/100 = 0.40
12/100 = 0.12
7/100 = 0.07
1/100 = 0.01
Total = 1.00
www.ck12.org
Identify and translate data sets to and from a bar graph and a pie graph.
Categorical Variables: Bar Graphs and Pie Graphs
We live in an age of unprecedented access to increasingly sophisticated and affordable personal technology. Cell
phones, computers, and televisions now improve so rapidly that, while they may still be in working condition, the
drive to make use of the latest technological breakthroughs leads many to discard usable electronic equipment. Much
of that ends up in a landfill, where the chemicals from batteries and other electronics add toxins to the environment.
Approximately 80% of the electronics discarded in the United States is also exported to third world countries, where
it is disposed of under generally hazardous conditions by unprotected workers1 . The following table shows the
amount of tonnage of the most common types of electronic equipment discarded in the United States in 2005.
TABLE 2.13:
Electronic Equipment
Cathode Ray Tube (CRT) TVs
CRT Monitors
Printers, Keyboards, Mice
Desktop Computers
Laptop Computers
Projection TVs
Cell Phones
LCD Monitors
The type of electronic equipment is a categorical variable, and therefore, this data can easily be represented using
the bar graph below:
23
www.ck12.org
The bars in a bar graph usually are separated slightly. The graph is just a series of disjoint categories, all represented
along the same axis. The height of each bar tells you the frequency of that particular value in the data set. It doesnt
make sense to talk about the shape of this distribution of values. If we rearranged the categories in a different order,
the same data set could be made to look different. Do not try to infer shape from a bar graph!
Pie Graphs
Usually, data that can be represented in a bar graph can also be shown using a pie graph (also commonly called
a circle graph or pie chart). In this representation, we convert the count into a percentage so we can show each
category relative to the total. Each percentage is then converted into a proportionate sector of the circle. To make
this conversion, simply multiply the percentage by 3.6, which represents 360 (the total number of degrees in a circle)
divided by 100% (the total percentage available).
Here is a table with the percentages and the approximate angle measure of each sector:
TABLE 2.14:
Electronic Equipment
Cathode Ray Tube (CRT)
TVs
CRT Monitors
24
389.8
4.5
16.2
www.ck12.org
Lesson Summary
Bar graphs are used to represent categorical data. Pie (or circle) graphs are also useful ways to display categorical
variables, especially when it is important to show how percentages of an entire data set fit into individual categories.
Points to Consider
What characteristics of quantitative data make it easier or harder to graph than categorical data?
Review Questions
1. Computer equipment contains many elements and chemicals that are either hazardous, or potentially valuable when recycled. The following data set shows the contents of a typical desktop computer weighing
approximately 27kg. Some of the more hazardous substances, like Mercury, have been included in the other
category, because they occur in relatively small amounts that are still dangerous and toxic.
TABLE 2.15:
Material
Plastics
Lead
Aluminum
Kilograms
6.21
1.71
3.83
25
www.ck12.org
Kilograms
5.54
2.12
0.27
0.60
0.23
0.05
6.44
TABLE 2.16:
Material
Kilograms
Plastics
Lead
Aluminum
Iron
Copper
Tin
Zinc
Nickel
Barium
Other elements and chemicals
6.21
1.71
3.83
5.54
2.12
0.27
0.60
0.23
0.05
6.44
26
www.ck12.org
Identify and translate data sets to and from a histogram, a relative frequency histogram, and a frequency
polygon.
Identify histogram distribution shapes as skewed or symmetric and understand the basic implications of these
shapes.
Displaying Univariate Data
Dot Plots
A dot plot is one of the simplest ways to represent numerical data. After choosing an appropriate scale on the axes,
each data point is plotted as a single dot. Multiple points at the same value are stacked on top of each other using
equal spacing to help convey the shape and center.
Example
The following is a data set representing the percentage of paper packaging manufactured from recycled materials for
a select group of countries.
TABLE 2.17:
Country
Estonia
New Zealand
Poland
Cyprus
Portugal
United States
Italy
Spain
Australia
Greece
Finland
Ireland
Netherlands
Sweden
France
Germany
Austria
Belgium
Japan
27
www.ck12.org
The dot plot for this data would look like this:
Notice that this data set is centered at a manufacturing rate for using recycled materials of between 65% and 70%.
It is spread from 34% to 98%, and appears very roughly symmetric, perhaps even slightly skewed left. Dot plots
have the advantage of showing all the data points and giving a quick and easy snapshot of the shape, center, and
spread. Dot plots are not much help when there is little repetition in the data. They can also be very tedious if you
are creating them by hand with large data sets, though computer software can make quick and easy work of creating
dot plots from such data sets.
Stem-and-Leaf Plots
One of the shortcomings of dot plots is that they do not show the actual values of the data. You have to read or infer
them from the graph. From the previous example, you might have been able to guess that the lowest value is 34%,
but you would have to look in the data table itself to know for sure. A stem-and-leaf plot is a similar plot in which
it is much easier to read the actual data values. In a stem-and-leaf plot, each data value is represented by two digits:
the stem and the leaf. In this example, it makes sense to use the tens digits for the stems and the ones digits for the
leaves. The stems are on the left of a dividing line as follows:
Once the stems are decided, the leaves representing the ones digits are listed in numerical order from left to right:
28
www.ck12.org
It is important to explain the meaning of the data in the plot for someone who is viewing it without seeing the original
data. For example, you could place the following sentence at the bottom of the chart:
NOTE: 5|69 means 56% and 59% are the two values in the 50s.
If you could rotate this plot on its side, you would see the similarities with the dot plot. The general shape and center
of the plot is easily found, and we know exactly what each point represents. This plot also shows the slight skewing
to the left that we suspected from the dot plot. Stem plots can be difficult to create, depending on the numerical
qualities and the spread of the data. If the data values contain more than two digits, you will need to remove some
of the information by rounding. A data set that has large gaps between values can also make the stem plot hard to
create and less useful when interpreting the data.
Stem plots can also be a useful tool for comparing two distributions when placed next to each other. These are
commonly called back-to-back stem plots.
In a previous example, we looked at recycling in paper packaging. Here are the same countries and their percentages
of recycled material used to manufacture glass packaging:
TABLE 2.18:
Country
Cyprus
www.ck12.org
In a back-to-back stem plot, one of the distributions simply works off the left side of the stems. In this case, the
spread of the glass distribution is wider, so we will have to add a few extra stems. Even if there are no data values in
a stem, you must include it to preserve the spacing, or you will not get an accurate picture of the shape and spread.
We have already mentioned that the spread was larger in the glass distribution, and it is easy to see this in the
comparison plot. You can also see that the glass distribution is more symmetric and is centered lower (around the
30
www.ck12.org
mid-50s), which seems to indicate that overall, these countries manufacture a smaller percentage of glass from
recycled material than they do paper. It is interesting to note in this data set that Sweden actually imports glass from
other countries for recycling, so its effective percentage is actually more than 100.
Histograms
Once you create a frequency table, you are ready to create a graphical representation called a histogram. Lets
revisit our data about student bottled beverage habits.
Frequency
1
1
3
4
6
8
7
2
In this case, the horizontal axis represents the variable (number of plastic bottles of water consumed), and the vertical
axis is the frequency, or count. Each vertical bar represents the number of people in each class of ranges of bottles.
For example, in the range of consuming [1 2) bottles, there is only one person, so the height of the bar is at 1. We
can see from the graph that the most common class of bottles used by people each week is the [6 7) range, or six
bottles per week.
A histogram is for numerical data. With histograms, the different sections are referred to as bins. Think of a column,
or bin, as a vertical container that collects all the data for that range of values. If a value occurs on the border between
two bins, it is commonly agreed that this value will go in the larger class, or the bin to the right. It is important when
31
www.ck12.org
drawing a histogram to be certain that there are enough bins so that the last data value is included. Often this means
you have to extend the horizontal axis beyond the value of the last data point. In this example, if we had stopped
the graph at 8, we would have missed that data, because the 8s actually appear in the bin between 8 and 9. Very
often, when you see histograms in newspapers, magazines, or online, they may instead label the midpoint of each
bin. Some graphing software will also label the midpoint of each bin, unless you specify otherwise.
A relative frequency histogram is just like a regular histogram, but instead of labeling the frequencies on the
vertical axis, we use the percentage of the total data that is present in that bin. For example, there is only one data
1
value in the first bin. This represents 32
, or approximately 3%, of the total data. Thus, the vertical bar for the bin
extends upward to 3%.
Frequency Polygons
A frequency polygon is similar to a histogram, but instead of using bins, a polygon is created by plotting the
frequencies and connecting those points with a series of line segments.
To create a frequency polygon for the bottle data, we first find the midpoints of each classification, plot a point at
the frequency for each bin at the midpoint, and then connect the points with line segments. To make a polygon with
the horizontal axis, plot the midpoint for the class one greater than the maximum for the data, and one less than the
minimum.
Here is a frequency polygon constructed directly from the previously-shown histogram:
32
www.ck12.org
Frequency polygons are helpful in showing the general overall shape of a distribution of data. They can also be
useful for comparing two sets of data. Imagine how confusing two histograms would look graphed on top of each
other!
Shape, Center, Spread
Center and spread are important descriptors of a data set. The shape of a distribution of data is very important as
well. Shape, center, and spread should always be your starting point when describing a data set.
Referring to our imaginary student poll on using plastic beverage containers, we notice that the data are spread out
from 0 to 9. The graph for the data illustrates this concept, and the range quantifies it. Look back at the graph and
notice that there is a large concentration of students in the 5, 6, and 7 region. This would lead us to believe that the
center of this data set is somewhere in this area. It is also important that you see that the center of the distribution is
near the large concentration of data. This is done with shape.
Shape is harder to describe with a single statistical measure, so we will describe it in less quantitative terms. A very
important feature of this data set, as well as many that you will encounter, is that it has a single large concentration
33
www.ck12.org
of data that appears like a mountain. A data set that is shaped in this way is typically referred to as mound-shaped .
Mound-shaped data will usually look like one of the following three pictures:
Think of these graphs as frequency polygons that have been smoothed into curves. In statistics, we refer to these
graphs as density curves. The most important feature of a density curve is symmetry. The first density curve
above is symmetric and mound-shaped. Notice the second curve is mound-shaped, but the center of the data is
concentrated on the left side of the distribution. The right side of the data is spread out across a wider area. This
type of distribution is referred to as skewed right. It is the direction of the long, spread out section of data, called
the tail, that determines the direction of the skewing. For example, in the 3rd curve, the left tail of the distribution is
stretched out, so this distribution is skewed left. Our student bottle data set has this skewed-left shape.
Lesson Summary
A frequency table is useful to organize data into classes according to the number of occurrences, or frequency, of
each class. Relative frequency shows the percentage of data in each class. A histogram is a graphical representation
of a frequency table (either actual or relative frequency). A frequency polygon is created by plotting the midpoint of
each bin at its frequency and connecting the points with line segments. Frequency polygons are useful for viewing
the overall shape of a distribution of data, as well as comparing multiple data sets. For any distribution of data, you
should always be able to describe the shape, center, and spread. A data set that is mound shaped can be classified
as either symmetric or skewed. Distributions that are skewed left have the bulk of the data concentrated on the
higher end of the distribution, and the lower end, or tail, of the distribution is spread out to the left. A skewed-right
distribution has a large portion of the data concentrated in the lower values of the variable, with the tail spread out
to the right.
Points to Consider
What characteristics of a data set make it easier or harder to represent it using frequency tables, histograms,
or frequency polygons?
What characteristics of a data set make representing it using frequency tables, histograms, or frequency
polygons, more or less useful?
What effects does the shape of a data set have on the statistical measures of center and spread?
How do you determine the most appropriate classification to use for a frequency table or the bin width to use
for a histogram?
Review Questions
TABLE 2.20:
Country
Italy
34
www.ck12.org
Mexico
United Arab Emirates
Belgium and Luxembourg
France
Spain
Germany
Lebanon
Switzerland
Cyprus
United States
Saudi Arabia
Czech Republic
Austria
Portugal
a.
2. The following table shows the potential energy that could be saved by manufacturing each type of material
using the maximum percentage of recycled materials, as opposed to using all new materials.
TABLE 2.21:
Manufactured Material
Aluminum Cans
Copper Wire
Steel Cans
LDPE Plastics (e.g., trash bags)
PET Plastics (e.g., beverage bottles)
HDPE Plastics (e.g., household cleaner bottles)
Personal Computers
Carpet
Glass
Corrugated Cardboard
Newspaper
Phone Books
Magazines
Office Paper
a.
a. Construct a frequency table, including the actual frequency, the relative frequency (round to the nearest
tenth of a percent), and the relative cumulative frequency. Assume a bin width of 25 million BTUs.
35
www.ck12.org
36
www.ck12.org
Distribution Shapes
Histograms are a very common method of visualizing quantitative data, and that means that understanding how
to interpret histograms is a valuable and important skill in virtually any career. There are a number of things to
pay particular attention to when reading a histogram , including the range of the data and the size of the bins. It is
particularly useful to recognize the shape of a histogram because that understanding can lead to valuable conclusions
about the nature of the data. In this section, we focus on naming common shapes of distributions and exploring what
we can say about the data that have these shapes.
Bell-Shaped
A histogram with a prominent mound in the center and similar tapering to the left and right. One indication of
this shape is that the data is unimodal meaning that the data has a single mode, identified by the peak of the
curve. Note that a normally distributed data set creates a symmetric histogram that looks like a bell, leading to the
common term for a normal distribution: a bell curve.
Uniform
A uniform shaped histogram indicates data that is very consistent; the frequency of each class is very similar to
that of the others. A data set with a uniform-shaped histogram may be multimodal the having multiple intervals
with the maximum frequency. One indication of a uniform distribution is that the data may not be split into enough
37
www.ck12.org
separate intervals or classes. Another possibility is that the scale of the histogram may need to be adjusted in order
to offer meaningful observations.
Right-Skewed
A right-skewed histogram has a peak that is left of center and a more gradual tapering to the right side of the graph.
This is a unimodal data set. This shape indicates that there are a number of data points, perhaps outliers , that are
greater than the mode.
Left-Skewed
A left-skewed histogram has a peak to the right of center, more gradually tapering to the left side. It is unimodal
also. This shape indicates that the outliers may be smaller in value that the cluster of more typical values around the
mode.
38
www.ck12.org
Undefined Bimodal
This shape is not specifically defined, but we can note regardless that it is bi-modal, having two separated classes or
intervals equally representing the maximum frequency of the distribution.
Example A
Describe the shape of the histogram and state a few notable characteristics:
39
www.ck12.org
Solution
This is a right-skewed distribution. If the modal class of 80-85kg represents a healthy normal weight, this graph
would suggest a sample that tended toward being overweight.
Example B
Identify the general shape of the histogram and what the shape indicates about the data:
Solution
This is a slightly tricky one. The overall shape appears somewhat left-skewed and obviously unimodal at first glance.
However a closer look tells a different story. The shape is deceiving in large part because the vertical axis does not
start at 0, which exaggerates the differences between the classes.
Look what happens if we re-draw the histogram with the same data but with the vertical axis at 0:
Pretty huge difference, isnt it? Now it is apparent that this is really a pretty uniform distribution, and that there is
not a very meaningful difference in frequency between the classes.
Example C
The image below represents data on the relative masses of a number of sampled black holes.
40
www.ck12.org
Solution
Most of the individual histograms are clearly unimodal, and all are clustered rather closely around a single peak,
with the exception of GRS 1915. Most of the graphs appear largely symmetrical, with the others being right-shifted.
The sharp and narrow peaks in most of the plots suggest that the mass measurements are generally consistent. The
location of the majority of the peaks at the same general location on the scale would suggest that the masses of
the different black holes appear similar at this scale. The tendency of the non-symmetrical plots to be right-shifted
suggests that it would be more reasonable to favor slightly greater mass estimates than slightly lesser ones.
The GRS 1915 plot is notably different, and the broad peak suggests that perhaps clear data on the mass of that
particular black hole is difficult to come by.
Vocabulary
Multimodal histograms have more than one peak in the data. Recall that the mode is the most common
value, so a multimodal histogram represents data with multiple classes that have a frequency equal to the
greatest single frequency in the data.
Unimodal histograms have a single peak, and represent data with a single most common frequency.
Outliers are uncommon frequencies occurring some distance from the peak. We will learn how to identify
these in later units.
A normal distribution creates a histogram in the shape of a bell. This bell curve makes it clear that the
majority of the data lies close to the mean.
Guided Practice
www.ck12.org
TABLE 2.22:
approximate min:
approximate max:
approximate range:
More Practice
Identify which images show symmetric distributions and which show skewed distributions. Identify what type of
symmetric or skewed distributions are displayed.
1.
42
www.ck12.org
2.
3.
4.
5. What do you think is the shape of the distribution of the age at which a child takes its first steps? Why?
(a)
(b)
(c)
(d)
(e)
Symetric Uniform
Skewed left
Skewed right
Symmetric Unimodal
Symmetric Bimodal
6. What do you think is the shape of the distribution of rolling a 6-sided die 1,000 times is? Why?
(a)
(b)
(c)
(d)
(e)
Symmetric Uniform
Skewed left
Skewed right
Symmetric Unimodal
Symmetric Bimodal?
www.ck12.org
(a)
(b)
(c)
(d)
44
www.ck12.org
8.
9.
10.
11.
45
www.ck12.org
C HAPTER
Describing Distributions
Chapter Outline
46
3.1
M EASURES OF C ENTER
3.2
3.3
3.4
3.5
R EFERENCES
www.ck12.org
It makes sense to summarize a data set by identifying a value around which the data is centered. Three commonly
used statistics that quantify the idea of center are the mode, median and mean. This lesson is an overview of these
three basic statistics that are used to measure the center of a set of data.
Mode
The mode is defined as the most frequently occurring value in a data set. While many elementary school children
learn the mode as their first introduction to measures of center, as you delve deeper into statistics, you will most likely
encounter it less frequently. The mode really only has significance for data measured at the most basic of levels. The
mode is most useful in situations that involve categorical (qualitative) data that is measured at the nominal level. For
example, the mode might be used to describe the most common MM color.
Example A
The students in a statistics class were asked to report the number of children that live in their house (including
brothers and sisters temporarily away at college). The data is recorded below:
1, 3, 4, 3, 1, 2, 2, 2, 1, 2, 2, 3, 4, 5, 1, 2, 3, 2, 1, 2, 3, 6
In this example, the mode could be a useful statistic that would tell us something about the families of statistics
students in our school. In this case, 2 is the mode as it is the most frequently occurring number of children in the
sample, telling us that a large number of students in our class have two children in their home.
Notice how careful we are to NOT apply this to a larger population and assume that this will be true for any
population other than our class! In a later chapter, you will learn how to correctly select a sample that could represent
a broader population.
www.ck12.org
the mode becomes. In those cases, we would most likely search for a different statistic to describe the center
of such data.
b. If each data value occurs an equal number of times, we usually say, There is no mode. Again, this is a case
where the mode is not at all useful in helping us to understand the behavior of the data.
You are probably comfortable calculating averages. The average is a measure of center that statisticians call the
mean. Most students learn early on in their studies that you calculate the mean by adding all of the numbers and
dividing by the number of numbers. While you are expected to be able to perform this calculation, most real data sets
that statisticians deal with are so large that they very rarely calculate a mean by hand. It is much more critical that
you understand why the mean is such an important measure of center. The mean is actually the numerical balancing
point of the data set. A certain math teacher might refer to the mean as being "the center of mass".
We can illustrate this physical interpretation of the mean. Below is a graph of the class data from the last example.
FIGURE 3.1
If you have snap cubes like you used to use in elementary school, you can make a physical model of the graph, using
one cube to represent each students family and a row of six cubes at the bottom to hold them together like this:
48
www.ck12.org
There are 22 students in this class and the total number of children in all of their houses is 55, so the mean of this
data is 55 22 = 2.5 children in each students family. Statisticians use the symbol X to represent the mean when X
is the symbol for a single measurement. It is pronounced x bar.
It turns out that the model that you created balances at 2.5. In the pictures below, you can see that a block placed at
3 causes the graph to tip left, and while one placed at 2 causes the graph to tip right. However, if you place the block
at about 2.5, it balances perfectly!
49
www.ck12.org
The median is simply the middle number in a set of data. Think of five students seated in a row in statistics class:
Aliyah Bob Catalina David Elaine
Which student is sitting in the middle? If there were only four students, what would be the middle of the row? These
are the same issues you face when calculating the numeric middle of a data set using the median.
Lets say that Ron has taken five quizzes in his statistics class and received the following grades:
Before finding the median, you must put the data in order. The median is the numeric middle. Placing the data in
order from least to greatest yields:
The middle number in this case is the third grade, or 90, so the median of this data is 90. Notice that just by
coincidence, this was also the third quiz that he took, but this will usually not be the case.
Of course, when there is an even number of numbers, there is no true value in the middle. In this case we take the
two middle numbers and find their mean. If there are four students sitting in a row, the middle of the row is halfway
between the second and third students.
Example B
Take Rhondas quiz grades:
The second and third numbers straddle the middle of this set. The mean of these two numbers is 90, so the median
of the data is 90.
50
www.ck12.org
Both the mean and the median are important and widely used measures of center. So you might wonder why we
need them both. There is an important difference between them that can be explained by the following example.
Suppose there are 5 houses in a community with the following prices:
$55,000 $58,000 $60,000 $61,000 $3,200,000
The "median housing price" is far more useful in real estate because it tells us that half of the houses on the market
are less and half are more. The mean of this data set $686,000, and it is not at all descriptive of the market. The
important thing to understand is that extremely large or small values in a data set can have a large influence on the
mean.
FIGURE 3.2
51
www.ck12.org
FIGURE 3.3
Outliers
So why are the mean and median so different in our earlier example about home values? It is because there is one
price that is extremely different from the rest of the data. In statistics, we call such extreme values outliers. The
mean is affected by the presence of an outlier; however, the median is not. You can see this in the graphs shown
above. A statistic that is not affected by outliers is called resistant. We say that the median is a resistant measure of
center, and the mean is not resistant. In a sense, the median is able to resist the pull of a far away value, but the mean
is drawn to such values. It cannot resist the influence of outlier values. Remember the balancing point example?
If you created another number that was far away, you would be forced to move the block toward it to make it stay
balanced.
Population Mean vs. Sample Mean
Now that we understand some basic concepts about the mean, it is important to be able to represent and understand
the mean symbolically. When you are calculating the mean as a statistic from a finite sample of data, we call this the
sample mean and as we have already mentioned, the symbol for this is X. Written symbolically then, the formula
for a sample mean is:
x =
(x1 + x2 + + xn )
n
You may have remembered seeing the symbol before on a calculator or in another mathematics class. It is called
sigma, the Greek capital S. In mathematics, we use this symbol as a shortcut for the sum of. So, the formula is
the sum of all the data values (x1 , x2 , etc.) divided by the number of observations (n).
Recall that the mean of an entire population is a parameter. The symbol for a population mean is another Greek
letter, . It is the lowercase Greek m and is called mu (pronounced mew, like the sound a cat makes). In this
case the symbolic representation would be:
=
52
(X1 + X2 + + Xn )
N
www.ck12.org
The formula is very much the same, because we calculate the mean the same way, but we typically use capital X for
the individuals in the population and capital N to represent the size of the population.
In general, statisticians say that x, the mean of a portion of the population is an estimate of , the mean of the
population, which is usually unknown. In this course you will learn to determine how good that estimate is.
Lesson Summary
When examining a set of data, we use descriptive statistics to provide information about where the data is centered.
The mode is a measure of the most frequently occurring number in a data set and is most useful for categorical data
and data measured at the nominal level. The mean and median are two of the most commonly used measures of
center. The mean, or average, is the sum of the data points divided by the total number of data points in the set. In a
data set that is a sample from a population, the sample mean is notated as x. When the entire population is involved,
the population mean is . The median is the numeric middle of a data set. If there are an odd number of numbers,
this middle value is easy to find. If there is an even number of data values, however, the median is the mean of the
middle two values. The median is resistant, that is, it is not affected by the presence of outliers. An outlier is a
number that has an extreme value when compared with most of the data. The mean is not resistant, and therefore
the median tends to be a more appropriate measure of center to use in examples that contain outliers. Because the
mean is the numerical balancing point for the data, is in an extremely important measure of center that is the basis
for many other calculations and processes necessary for making useful conclusions about a set of data.
Points to Consider
a. How do you determine which measure of center best describes a particular data set?
b. What are the effects of outliers on the various measures of spread?
c. How can we represent data visually using the various measures of center?
Review Questions
1. In Lois second grade class, all of the students are between 45 and 52 inches tall, except one boy, Lucas, who
is 62 inches tall. Which of the following statements is true about the heights of all of the students?
(a) The mean height and the median height are about the same
(b) The mean height is greater than the median height.
(c) The mean height is less than the median height.
(d) More information is needed to answer this question.
(e) None of the above is true.
2. Enrique has a 91, 87, and 95 for his statistics grades for the first three quarters. His mean grade for the year
must be a 93 in order for him to be exempt from taking the final exam. Assuming grades are rounded following
valid mathematical procedures, what is the lowest whole number grade he can get for the 4th quarter and still
be exempt from taking the exam?
3. The chart below shows the data from the Galapagos tortoise preservation program with just the number of
individual tortoises that were bred in captivity and reintroduced into their native habitat.
53
www.ck12.org
TABLE 3.1:
Island or Volcano
Wolf
Darwin
Alcedo
Sierra Negra
Cerro Azul
Santa Cruz
Espaola
San Cristbal
Santiago
Pinzn
Pinta
mode
median
mean
explain the difference between your answers to (b) and (c).
Review Answers
1. There is an outlier that is larger than most of the data. This outlier will pull the mean towards it while the
median tends to stay in the center of the data, clustered somewhere between 45 and 52.
2. His mean for all four quarters would need to be at least 92.5 in order to receive the necessary grade. Multiplying 92.5 by 4, yields 370 as the necessary total. His existing grades total to 273. 370 273 = 97.
3.
a. 0
b. 210
c. 299
4. There is one extreme point, 1293, which causes the mean to be greater than the median.
54
www.ck12.org
In the previous lesson, we concentrated on statistics that provided information about the way in which a data set is
centered. Another important feature that can help us understand more about a data set is the manner in which the
data is distributed or spread. There are several numbers we can calculate that help us understand how data is spread.
This section will focus on the measures of variability (or spread) that can be used to describe distributions of any
shape.
Range
For most students, their first introduction to a statistic that measures spread is the range. The range is simply the
difference between the smallest value (minimum) and the largest value (maximum) in the data. Lets return to the
data set used in the previous lesson:
A quartile divides the data into four approximately equal groups. The lower quartile, sometimes abbreviated as Q1
, is also know as the 25th percentile. A percentile is a statistic that identifies the percentage of the data that is less
than the given value. Technically, the median is a middle quartile and is referred to as Q2 . Because it is the numeric
middle of the data, half of the data is below the median and half is above. The upper quartile, or Q3 , is also know
as the 75th percentile.
Your first exposure to percentiles was most likely as a baby. To check a childs physical development, pediatricians
use height and weight charts that help them to know how the child compares to children of the same age. A child
whose height is in the 70th percentile is taller than 70% of the children of their same age.
Returning to a previous data set:
1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 5, 6
55
www.ck12.org
Recall that the median (50th percentile) of this dataset is 2. The quartiles can be thought of as the medians of the
upper and lower halves of the data.
In this case, there are an odd number of numbers in each half. If there were an even number of numbers, then we
would follow the procedure for medians and average the middle two numbers of each half. Look at the following set
of data:
The median in this set is 90. Because it is the middle number, it is not technically part of either the lower or upper
halves of the data, so we do not include it when calculating the quartiles. However, not all statisticians agree that this
is the proper way to calculate the quartiles in this case. As we mentioned in the last section, some things in statistics
are not quite as universally agreed upon as in other branches of mathematics. The exact method for calculating
quartiles is another one of those topics.
Interquartile Range
The interquartile range (IQR) is the range of the data that contains the middle 50% of cases. Recall that you find
the range by subtracting the minimum value from the maximum value in the dataset. You calculate in the IQR in a
simlar way, except that you find the difference between the 1st quartile (Q1 ) and the 3rd quartile (Q3 ).
Therefore,
IQR = Q3 Q1
Example
A recent study proclaimed Mobile, Alabama the wettest city in America. The following table lists a measurement
of the approximate annual rainfall in Mobile for the last 10 years. Find the Range and IQR for this data.
56
www.ck12.org
TABLE 3.2:
Year
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
Rainfall (inches)
90
56
60
59
74
76
81
91
47
59
First, place the data in order from smallest to largest. The range is the difference between the minimum and
maximum rainfall amounts.
To find the IQR, first identify the quartiles, and then subtract Q3 Q1
Even though we are doing easy calculations, statistics is never about meaningless arithmetic and you should always
be thinking about what a particular statistical measure means in the real context of the data. In this example, the
range tells us that there is a difference of 44 inches of rainfall between the wettest and driest years in Mobile. The
IQR shows that there is a difference of 22 inches of rainfall even in the middle 50% of the data. It appears that
Mobile experiences wide fluctuations in yearly rainfall totals, which might be explained by its position near the Gulf
of Mexico and its exposure to tropical storms and hurricanes. The IQR will be useful in the next section because it
allows us to visual if data is bunched up or spread out.
Points to Consider
a. How do you determine which measure of center best describes a particular data set?
57
www.ck12.org
1. The chart below shows the data from the Galapagos tortoise preservation program with just the number of
individual tortoises that were bred in captivity and reintroduced into their native habitat.
TABLE 3.3:
Island or Volcano
Wolf
Darwin
Alcedo
Sierra Negra
Cerro Azul
Santa Cruz
Espaola
San Cristbal
Santiago
Pinzn
Pinta
mode
median
mean
upper and lower quartiles
The percentile for the number of Santiago tortoises reintroduced.
Review Answers
1. 270
a.
b.
c.
d.
e.
58
0
210
222
Q1 : 0, Q3 : 498
72.7%
www.ck12.org
The five-number summary is a numerical description of a data set comprised of the following measures (in order):
minimum value, lower quartile, median, upper quartile, maximum value. When you are asked to summarize what
you know about a distribution of data, it is often a good starting point to report the five-number summary along with
the shape of the distribution.
Example
The huge population growth in the western United States in recent years, along with a trend toward less annual
rainfall in many areas and even drought conditions in others, has put tremendous strain on the water resources
available now and the need to protect them in the years to come. Here is a listing of the amount of water held by
each major reservoir in Arizona stated as a percentage of that reservoirs total capacity.
TABLE 3.4:
Lake/Reservoir
Salt River System
Lake Pleasant
Verde River System
San Carlos
Lyman Reservoir
Show Low Lake
Lake Havasu
Lake Mohave
Lake Mead
Lake Powell
% of Capacity
59
49
33
9
3
51
98
85
95
89
This data set was collected in 1998, and the water levels in many states have taken a dramatic turn for the worse. For
example, Lake Powell is currently at less than 50% of capacity1 .
Placing the data in order from smallest to largest gives the following:
3, 9, 33, 49, 51, 59, 85, 89, 95, 98
Since there are 10 numbers, the median is the average of 51 and 59, which is 55. Recall that the lower quartile is
the 25th percentile, or where 25% of the data is below that value. In this data set, that number is 33. Also, the upper
quartile is 89. Therefore, the five-number summary is as shown:
59
www.ck12.org
A box-and-whisker plot is a very convenient and informative way to display the information captured in the five
number summary. A box-and-whisker plot shows the center and spread of the values on a single quantitative variable.
To create the box part of the plot, first draw a rectangle that extends from the lower (first) quartile to the upper
(third) quartile. Then draw a line through the interior of the rectangle at the median. Finally, connect the ends of the
box to the minimum and maximum values using line segments to form the whiskers.
Here is the box plot for this data:
The plot divides the data into quarters. You can also usually learn something about the shape of the distribution
from the sections of the plot. If each of the four sections of the plot is about the same length, then the data will
be symmetric. In this example, the different sections are not exactly the same length. The left whisker is slightly
longer than the right, and the right half of the box is slightly longer than the left. We would most likely say that
this distribution is moderately symmetric. In other words, there is roughly the same amount of data in each section.
The different lengths of the sections tell us how the data are spread in each section. The numbers in the left whisker
(lowest 25% of the data) are spread more widely than those in the right whisker.
Here is the box plot (as the name is sometimes shortened) for reservoirs and lakes in Colorado:
In this case, the third quarter of data (between the median and upper quartile), appears to be a bit more densely
concentrated in a smaller area. The data values in the lower whisker also appear to be much more widely spread than
in the other sections. Looking at the dot plot for the same data shows that this spread in the lower whisker gives the
data a slightly skewed-left appearance (though it is still roughly symmetric).
60
www.ck12.org
Box-and-whisker plots are often used to get a quick and efficient comparison of the general features of multiple data
sets. In the previous example, we looked at data for both Arizona and Colorado. How do their reservoir capacities
compare? You will often see multiple box plots either stacked on top of each other, or drawn side-by-side for easy
comparison. Here are the two box plots:
The plots seem to be spread the same if we just look at the range, but with the box plots, we have an additional
indicator of spread if we examine the length of the box (or interquartile range). This tells us how the middle 50%
of the data is spread, and Arizonas data values appear to have a wider spread. The center of the Colorado data
(as evidenced by the location of the median) is higher, which would tend to indicate that, in general, Arizonas
reservoirs are less full, as a percentage of their individual capacities, than Colorados. Recall that the median is a
resistant measure of center, because it is not affected by outliers. The mean is not resistant, because it will be pulled
toward outlying points. When a data set is skewed strongly in a particular direction, the mean will be pulled in the
direction of the skewing, but the median will not be affected. For this reason, the median is a more appropriate
measure of center to use for strongly skewed data.
Even though we wouldnt characterize either of these data sets as strongly skewed, this affect is still visible. Here
are both distributions with the means plotted for each.
Notice that the long left whisker in the Colorado data causes the mean to be pulled toward the left, making it lower
than the median. In the Arizona plot, you can see that the mean is slightly higher than the median, due to the slightly
elongated right side of the box. If these data sets were perfectly symmetric, the mean would be equal to the median
in each case.
61
www.ck12.org
Here are the reservoir data for California (the names of the lakes and reservoirs have been omitted):
80, 83, 77, 95, 85, 74, 34, 68, 90, 82, 75
At first glance, the 34 should stand out. It appears as if this point is different from the rest of the data. Notice that
without the outlier, the distribution is really roughly symmetric.
This data set had one obvious outlier, but when is a point far enough away to be called an outlier? We need a standard
accepted practice for defining an outlier in a box plot. This rather arbitrary definition is that any point that is more
than 1.5 times the IQR outside the box will be considered an outlier. Because the IQR is the same as the length of
the box, any point that is more than one-and-a-half box lengths below Q1 or above Q3 is plotted as a separate point
and not included in the whisker.
The calculations for determining the outlier in this case are as follows:
Lower Quartile: 74
Upper Quartile: 85
Interquartile range (IQR) : 85 74 = 11
1.5 IQR = 16.5
Cut-off for outliers in left whisker: 74 16.5 = 57.5. Thus, any value less than 57.5 is considered an outlier.
Notice that we did not even bother to test the calculation on the right whisker, because it should be obvious from
a quick visual inspection that there are no points that are farther than even one box length away from the upper
quartile.
Lesson Summary
The five-number summary is a useful collection of statistical measures consisting of the following in ascending order:
minimum, lower quartile, median, upper quartile, maximum. A box-and-whisker plot is a graphical representation
of the five-number summary showing a box bounded by the lower and upper quartiles and the median as a line in
the box. The whiskers are line segments extended from the quartiles to the minimum and maximum values. Each
whisker and section of the box contains approximately 25% of the data. The width of the box is the interquartile
range, or IQR, and shows the spread of the middle 50% of the data. Box-and-whisker plots are effective at giving
an overall impression of the shape, center, and spread of a data set. While an outlier is simply a point that is not
62
www.ck12.org
typical of the rest of the data, there is an accepted definition of an outlier in the context of a box-and-whisker plot.
Any point that is more than 1.5 times the length of the box (IQR) from either end of the box is considered to be an
outlier.
Points to Consider
What characteristics of a data set make it easier or harder to represent it using dot plots, stem-and-leaf plots,
histograms, and box-and-whisker plots?
Which plots are most useful to interpret the ideas of shape, center, and spread?
Review Questions
1. Here are the 1998 data on the percentage of capacity of reservoirs in Idaho.
70, 84, 62, 80, 75, 95, 69, 48, 76, 70, 45, 83, 58, 75, 85, 70
62, 64, 39, 68, 67, 35, 55, 93, 51, 67, 86, 58, 49, 47, 42, 75
1.
a.
b.
c.
d.
e.
2. Here are the 1998 data on the percentage of capacity of reservoirs in Utah.
80, 46, 83, 75, 83, 90, 90, 72, 77, 4, 83, 105, 63, 87, 73, 84, 0, 70, 65, 96, 89, 78, 99, 104, 83, 81
2.
a.
b.
c.
d.
e.
www.ck12.org
5. The following table contains recent data on the average price of a gallon of gasoline for states that share a
border crossing into Canada.
a. Find the five-number summary for this data.
b. Show all work to test for outliers.
c. Graph the box-and-whisker plot for this data
TABLE 3.5:
State
Alaska
Washington
Idaho
Montana
North Dakota
Minnesota
Michigan
New York
Vermont
New Hampshire
Maine
64
www.ck12.org
Box-and-whisker plots (or box plots) are commonly used to compare a single value or range of values for easier,
more effective decision-making. Box and whisker plots are very effective and easy to read, and can summarize data
from multiple sources and display the results in a single graph.
Use box and whisker plots when you have multiple data sets from independent sources that are related to each other
in some way. Examples include comparing test scores between schools or classrooms, and exploring data from
before and after a process change.
Remember that the line inside the box represents the middle value when the data points are arranged numerically.
Because the median is only identified by location in a series, it can sometimes be very indicative of the trend or
average of the data set as a whole, and sometimes is not useful for that purpose at all (see Example A).
Recall that skewed data appears as a longer tail in one direction on a histogram, it is similar on a box plot. If the
box in a box plot is stretched in one direction or the other, then the data is skewed in that direction. Data skewed
right indicates a closer concentration of values on the left, since the plot indicates values more strung out on the
right side.
A longer box indicates a greater interquartile range since the sides of the box indicate the 1st and 3rd quartiles.
A greater interquartile range is an indicator of data that may be somewhat unreliable. Since the interquartile range
represents the 50% of the data closest to the median, a greater range in this section of the plot suggests that the
median may not be a great indicator of central tendency.
A plot with long whiskers represents a greater range for the overall sample than simply a longer box itself does.
Data covering a greater range is naturally less reliable as an indicator of highly probable values, but given the option,
longer whiskers are less of a concern than a long box. A broad range of possibilities but a strong likelihood of central
values is more reliable to use for prediction than a moderate overall range with little concentration at the median.
Example A
Identify the 5 number summary and any outliers depicted in the box plot below:
Solution
The 5 number summary is depicted by the vertical bars in the box and by the endpoints of the whiskers:
65
www.ck12.org
Minimum: 13
1st Quartile: 16
Median: 19
3rd Quartile: 22
Maximum: 24
Outliers (depicted by open circles disconnected from the box and whiskers): 4 and 30
Example B
What is indicated by the shape of the box plot below?
Solution
The box in the plot extends nearly to the lower extreme, indicating that the data less than the median is likely at least
relatively consistent, since there is not a large jump between the lower 25% and the minimum. The longer whisker
on the upper side suggests that there may be larger variance among the greater values, since there is a greater distance
from the 3rd quartile to the upper extreme than from the median to the 3rd quartile.
Lesson Summary
If you were asked to evaluate a box plot to find the median, quartiles, extremes and outliers, would you know how?
What does it mean if the box in a box plot is unusually long or short? Does a long whisker on one or both sides
mean something important?
With the practice you have had now, these questions should be easy!
Median: the center vertical line in the box
1st and 3rd Quartiles: the leftmost and rightmost vertical lines of the box
Lower and Upper Extremes: the endpoints of the whiskers
Vocabulary
The interquartile range is calculated by subtracting the 1st quartile from the 3rd quartile and represents the
middle 50% of the sample.
Guided Practice
1. Make a Box and Whisker plot from the following data sets.
(a) Initial weight (December) of 14 women in a weight loss study (pounds)190, 175, 187, 199, 205, 187, 176, 180, 187, 191
66
1.
(b) Weights of the same women one month later (January) 187, 174, 181, 189, 196, 178, 174, 176, 181, 186, 188, 191, 183, 1
1.
(c) Weights of the same women in February 181, 165, 176, 182, 190, 176, 171, 170, 171, 185, 187, 181, 179, 186
www.ck12.org
2.
3.
4.
5.
6.
7.
Solutions
1. For all three sets, first organize the data by increasing numerical order and identify the five-number summary
(FNS). Once you have the FNS, create the box plot for each just as in the examples above. The three plots
should resemble the images below:
(a)
(b)
(c)
2. If we compare the boxplots for (a) and (c), we can see that the median weight has dropped by about 9 pounds.
In addition, there are essentially no individuals in boxplot (c) that weigh above the median in boxplot (a).
3. The median in December was 189, and in February it was 180.
4. The maximum in December was 205, and went down to 190 by February.
5. The minimum weight in December was 175, and it also went down, to 165 by February.
6. The range decreased notably, from 30 pounds in December, to 25 pounds in February.
7. It would appear that the method was effective, at least in the short term.
More Practice
1. What is the five number summary of the following box and whisker plot?
2. The box plot shows the heights in inches of boys on a High School Baseball Team. What is the 5 number
summary of the plot?
67
www.ck12.org
3. Listed are the heights in inches of girls on a High School Ski Team. Make a plot of the girls heights.
58, 59, 59, 60, 62, 65, 68, 69, 70, 70, 71
4. Comparing the heights between the two teams, which has the taller players on average? How do you know?
Use the box and whisker plot below to examine scores received on an English GED Test to answer questions
5-9.
5.
6.
7.
8.
9.
Use the graph below that shows how much girls spent on average per month on clothes during August.
10. How many girls shop for clothes? (Hint: can you answer this question?)
11. What percent of girls spent less than $85.00 in August on clothes?
12. Would you expect the mean number of dollars spent to be higher or lower than the median? Explain.
Use the graphs below to compare the amount of time a teenager spends in the bathroom getting ready for
school and the amount of time they spend in the bathroom getting ready to go to a party.
TIME SPENT GETTING READY FOR SCHOOL
68
www.ck12.org
13. What percent of teenagers spend at least 15 minutes getting ready for a party?
14. What is the 3rd Quartile for the time spent getting ready for a party?
15. Is it more common for a teenager to spend more than 1 hour getting ready for school or between 1 and 2 hrs
getting ready for a party? Explain.
69
www.ck12.org
C HAPTER
Normal Distributions
Chapter Outline
4.1
4.2
S TANDARD D EVIATION
4.3
T HE E MPIRICAL R ULE
4.4
Z-S CORES
71
www.ck12.org
Introduction
Most high schools have a set amount of time in-between classes during which students must get to their next class. If
you were to stand at the door of your statistics class and watch the students coming in, think about how the students
would enter. Usually, one or two students enter early, then more students come in, then a large group of students
enter, and finally, the number of students entering decreases again, with one or two students barely making it on
time, or perhaps even coming in late!
Now consider this. Have you ever popped popcorn in a microwave? Think about what happens in terms of the
rate at which the kernels pop. For the first few minutes, nothing happens, and then, after a while, a few kernels
start popping. This rate increases to the point at which you hear most of the kernels popping, and then it gradually
decreases again until just a kernel or two pops.
Heres something else to think about. Try measuring the height, shoe size, or the width of the hands of the students in
your class. In most situations, you will probably find that there are a couple of students with very low measurements
and a couple with very high measurements, with the majority of students centered on a particular value.
All of these examples show a typical pattern that seems to be a part of many real-life phenomena. In statistics,
because this pattern is so pervasive, it seems to fit to call it normal, or more formally, the normal distribution.
Examples of values that typically follow a normal distribution include:
The normal distribution is an extremely important concept. It occurs often in the data we collect from the natural
world, and it is a critical component of many of the more theoretical ideas that are the foundation of statistics. This
chapter explores the details of the normal distribution.
When graphing the data from each of the examples in the introduction, the distributions from each of these situations
would be mound-shaped and mostly symmetric. A normal distribution is a perfectly symmetric, mound-shaped
distribution. It is commonly referred to the as a normal curve, or bell curve.
72
www.ck12.org
Because so many real data sets closely approximate a normal distribution, we can use the idealized normal curve to
learn a great deal about such data. With a practical data collection, the distribution will never be exactly symmetric,
so just like situations involving probability, a true normal distribution only results from an infinite collection of data.
Also, it is important to note that the normal distribution describes a continuous random variable.
Center
Due to the exact symmetry of a normal curve, the center of a normal distribution, or a data set that approximates a
normal distribution, is located at the highest point of the distribution, and all the statistical measures of center we
will study (the mean, median, and mode) are equal.
It is also important to realize that this center peak divides the data into two equal parts.
73
www.ck12.org
Knowing that the values in a dataset are exactly or approximately normally distributed allows you to get a feel for
how common a particular value might be in that set. Because the values of a normal distribution are predictably
clustered around the middle of the distribution, you can estimate the rarity of a given value in the set.
Spread
Lets go back to our popcorn example. The bag advertises a certain time, beyond which you risk burning the popcorn.
From experience, the manufacturers know when most of the popcorn will stop popping, but there is still a chance that
there are those rare kernels that will require more (or less) time to pop than the time advertised by the manufacturer.
The directions usually tell you to stop when the time between popping is a few seconds, but arent you tempted to
keep going so you dont end up with a bag full of un-popped kernels? Because this is a real, and not theoretical,
situation, there will be a time when the popcorn will stop popping and start burning, but there is always a chance, no
matter how small, that one more kernel will pop if you keep the microwave going. In an idealized normal distribution
of a continuous random variable, the distribution continues infinitely in both directions.
Because of this infinite spread, the range would not be a useful statistical measure of spread. The most common way
to measure the spread of a normal distribution is with the standard deviation, or the typical distance away from the
mean. Because of the symmetry of a normal distribution, the standard deviation indicates how far away from the
maximum peak the data will be. With a smaller standard deviation, the data appear heavily concentrated around the
mean. If a distribution has a larger standard deviation, the data are spread farther from the mean value.
74
www.ck12.org
Assessing Normality
The best way to determine if a data set approximates a normal distribution is to look at a visual representation.
Histograms and box plots can be useful indicators of normality, but they are not always definitive. It is often easier
to tell if a data set is not normal, as shown in these plots:
Example
The following data set tracked high school seniors involvement in traffic accidents. The participants were asked the
following question: During the last 12 months, how many accidents have you had while you were driving (whether
75
www.ck12.org
TABLE 4.1:
Year
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
The histogram appears to show a roughly mound-shaped and symmetric distribution. The box plot does not appear
to be significantly skewed. We would conclude that the distribution is reasonably normal.
Lesson Summary
A normal distribution is a perfectly symmetric, mound-shaped distribution that appears in many practical and real
data sets. It is an especially important foundation for making conclusions, or inferences, about data.
Points to Consider
How can we use normal distributions to make meaningful conclusions about samples and experiments?
Review Questions
1. Which of the following data sets is most likely to be normally distributed? For the other choices, explain why
you believe they would not follow a normal distribution.
76
www.ck12.org
(a) The hand span (measured from the tip of the thumb to the tip of the extended 5th finger) of a random
sample of high school seniors
(b) The annual salaries of all employees of a large shipping company
(c) The annual salaries of a random sample of 50 CEOs of major companies, 25 women and 25 men
(d) The dates of 100 pennies taken from a cash drawer in a convenience store
77
www.ck12.org
If a vertical line is drawn from an inflection point to the x-axis, you would be marking the location of the score in
the distribution that was one standard deviation from the mean (or the center) of the distribution). For all normal
distributions, approximately 68% of all the data is located within 1 standard deviation of the mean.
If we consider using the unit of standard deviation as a step along the x-axis, then 1 step to the right or 1 step to the
left is considered 1 standard deviation away from the mean. 2 steps to the left or 2 steps to the right are considered 2
standard deviations away from the mean. Likewise, 3 steps to the left or 3 steps to the right are considered 3 standard
deviations away from the mean.
The larger the size of that one step, the larger the standard deviations numerical value. Hence, in a distribution
where all the scores are tightly clustered around the mean, the standard deviation is small. When the scores are more
spread out, the standard deviation is larger.
78
www.ck12.org
Once the value of the standard deviation has been calculated, you can determine the data value that is exactly one,
two or three standard deviations from the mean. For example. if the value of the mean for this distribution was 58,
and the value of the standard deviation was 5, then you could identify the values that fall each step away.
Now that you understand the distribution of the data and exactly how it moves away from the mean, you are ready
to calculate the standard deviation of a data set. For the calculation steps to be organized, a table is used to record
the results for each step. The table will consist of 3 columns. The first column will contain the data and will be
labeled x. The second column will contain the differences between the data values and the mean of the data set.
This column will be labeled (x x). The final column will be labeled (x x)2 , and it will contain the square of each
of the values recorded in the second column. Note that we are using the symbols for sample statistics, rather than
population parameters.
Example A
Calculate the standard deviation of the following numbers:
2, 7, 5, 6, 4, 2, 6, 3, 6, 9
Solution
Step 1: Write the list of data values in a column. It is not necessary to organize the data.
Step 2: Calculate the mean of the data values.
x =
2 + 7 + 5 + 6 + 4 + 2 + 6 + 3 + 6 + 9 50
=
= 5.0
10
10
Step 3: Calculate the differences between the data values and the mean. this value is called a deviation. It
is represented symbolically by the equation (x x).
The farther a data point is from the mean, the larger its
deviation. Enter the deviation for each data value in the second column.
TABLE 4.2:
x
2
7
5
(x x)
-3
2
0
79
www.ck12.org
1
-1
-3
1
-2
1
4
x
6
4
2
6
3
6
9
Step 4: Now square each deviation score and put that value in the third column.
Step 5: Calculate the mean of the third column and then take the square root of the answer. This value is the
standard deviation (s) of the data set.
TABLE 4.3:
(x x)
2
9
4
0
1
1
9
1
4
1
16
9 + 4 + 0 + 1 + 1 + 9 + 1 + 4 + 1 + 16 46
=
= 4.6
10
10
s = 4.6 2.1
s2 =
(x )2
.
n
The standard deviation of the data set is approximately 2.1.
Step 5 can be written using the formula s =
Now that you have completed all the steps, here is the table that was used to record the results of each
calculation step along the way.
TABLE 4.4:
x
2
7
5
6
4
80
(x x)
-3
2
0
1
-1
(x x)
2
9
4
0
1
1
www.ck12.org
(x x)
-3
1
-2
1
4
x
2
6
3
6
9
Interpretation
Once youve calculated the standard deviation, how do you interpret it? The standard deviation can be thought of as
the average deviation score of any data point to the mean. In other words, in this data set, the "average" difference
between scores and the mean is 2.1. It is also true, as we saw earlier, that this means approximately two-thirds of the
data in the data set can be found between the values of 2.9 and 7.1. (Remember, that is the mean plus or minus one
standard deviation).
Example B
A company wants to test its exterior house paint to determine how long it will retain its original color before fading.
The company mixes 2 brands of paint by adding different chemicals to each brand. 6 one-gallon cans are made for
each paint brand, and the results are recorded for every gallon of each brand of paint. The following are the results
obtained in the laboratory:
TABLE 4.5:
Brand A (Time in months)
15
65
55
35
45
25
TABLE 4.6:
x
15
65
55
35
45
25
(x x)
-25
25
15
-5
5
-15
(x x)
2
625
625
225
25
25
225
81
www.ck12.org
15 + 65 + 55 + 35 + 45 + 25 240
=
= 40
6
6
r
(x )2
s=
n
r
625 + 625 + 225 + 25 + 25 + 225
s=
6
r
1750
s=
291.66 17.1
6
x =
TABLE 4.7:
(x x)
0
10
-5
0
5
-10
x
40
50
35
40
45
30
(x x)
2
0
100
25
0
25
100
40 + 50 + 35 + 40 + 45 + 30 240
=
= 40
6
6
r
(x )2
s=
n
r
0 + 100 + 25 + 0 + 25 + 100
s=
6
r
250
41.66 6.5
s=
6
x =
Another measure of spread that is used to describe normally distributed data is variance. Variance is simply the
square of the standard deviation (2 ors2 ). Although the variance does not have as nice an interpretation as the
standard deviation, it does have important mathematical properties that make it useful in some situations.
To calculate the variance (2 ) for a population of normally distributed data:
Step 1: Determine the mean of the data values.
Step 2: Subtract the mean of the data from each value in the data set to determine the difference between the
data value and the mean: (x ).
82
www.ck12.org
Step 3: Square each of these differences and determine the total of these positive, squared results.
Step 4: Divide this sum by the number of values in the data set.
These steps for calculating the variance of a data set for a population can be summarized in the following formula:
2 =
(x )2
n
where:
x is a data value.
is the population mean.
n is number of data values (population size).
These steps for calculating the variance of a data set for a sample can be summarized in the following formula:
s2 =
(x x)2
n1
where:
x is a data value.
x is the sample mean.
n is number of data values (sample size).
The only difference in the formulas is the number by which the sum is divided. For a population, it is divided by n,
and for a sample, it is divided by n 1.
Example C
Calculate the variance of the 2 brands of paint in Example B. These are both small populations.
TABLE 4.8:
Brand A (Time in months)
15
65
55
35
45
25
Solution
Brand A:
TABLE 4.9:
x
15
(x x)
-25
(x x)
2
625
83
www.ck12.org
(x x)
25
15
-5
5
-15
x
65
55
35
45
25
15 + 65 + 55 + 35 + 45 + 25 240
=
= 40
6
6
(x )2
s2 =
n
625
+
625 + 225 + 25 + 25 + 225 1750
s2 =
=
291.66
6
6
x =
Brand B:
TABLE 4.10:
(x x)
0
10
-5
0
5
-10
x
40
50
35
40
45
30
(x x)
2
0
100
25
0
25
100
40 + 50 + 35 + 40 + 45 + 30 240
=
= 40
6
6
(x )2
s2 =
n
0
+
100
+ 25 + 0 + 25 + 100 250
s2 =
=
41.66
6
6
x =
From the calculations done in Example B and in Example C, you should have noticed that the square root of the
variance is the standard deviation, and the square of the standard deviation is the variance. Taking the square root of
the variance will put the standard deviation in the same units as the given data. The variance is simply the average of
the squares of the distance of each data value from the mean. If these data values are close to the value of the mean,
the variance will be small. This was the case for Brand B. If these data values are far from the mean, the variance
will be large, as was the case for Brand A.
The variance and the standard deviation of a data set are always positive values.
Example D
The following data represents the morning temperatures ( C) and the monthly rainfall (mm) in July for all the
Canadian cities east of Toronto:
84
www.ck12.org
Temperature ( C)
11.7
13.7
10.5
14.2
13.9
14.2
10.4
16.1
16.4
4.8
15.2
13.0
14.4
12.7
8.6
12.9
11.5
14.6
89.1
Precipitation (mm)
18.6
37.1
70.9
102
59.9
58.0
73.0
77.6
86.6
40.3
119.5
36.2
85.5
59.2
97.8
122.2
82.6
Which data set is more variable? Calculate the standard deviation for each data set
Solution
(x x)
-1
1
-2.2
1.5
1.2
1.5
-2.3
3.4
3.7
-7.9
2.5
0.3
1.7
0
-4.1
0.2
-1.2
1.9
x
11.7
13.7
10.5
14.2
13.9
14.2
10.4
16.1
16.4
4.8
15.2
13.0
14.4
12.7
8.6
12.9
11.5
14.6
x 228.6
=
12.7
n
18
(x )2
s2 =
n
136.86
s2 =
7.6
18
x =
(x )2
n
r
136.86
s=
2.8
18
s=
The variance of the data set is approximately 7.6 C, and the standard deviation of the data set is approximately
2.8 C.
85
www.ck12.org
(x x)
-54.5
-36.0
-2.2
28.9
-13.2
-15.1
-0.1
4.5
16.0
13.5
-32.8
46.4
-36.9
12.4
-13.9
24.7
49.1
9.5
x
18.6
37.1
70.9
102.0
59.9
58.0
73.0
77.6
89.1
86.6
40.3
119.5
36.2
85.5
59.2
97.8
122.2
82.6
x 1316.1
=
73.1
n
18
(x )2
s2 =
n
14016
778.66
s2 =
18
x =
(x x)2
n
r
14016
27.9
s=
18
s=
The variance of the data set is approximately 778.66 mm, and the standard deviation of the data set is approximately
27.9 mm.
Therefore, the data values for the precipitation are more variable. This is indicated by the large variance of the data
set.
Lesson Summary
In this lesson, you learned that the standard deviation of a set of data is a value that represents a measure of the
spread of the data from the mean. You also learned that the variance of the data from the mean is the square of the
standard deviation. Calculating the standard deviation manally was an additional lesson you learned in this section.
Points to Consider
Does the value of standard deviation stand alone, or can it be displayed with a normal distribution?
Are there defined increments for how data spreads away from the mean?
Can the standard deviation of a set of data be applied to real-world problems?
86
www.ck12.org
Review Questions
1. Without using technology, calculate the variance and the standard deviation of each of the following sets of
numbers.
1.
175 cm
179 cm
179 cm
181 cm
183 cm
183 cm
184 cm
184 cm
185 cm
187 cm
Without using technology, calculate the standard deviation of this set of data.
3. A group of grade 10 students at one high school were asked to record the number of hours they watched
television per week, the results are recorded in the table shown below:
TABLE 4.13:
2.5
8
3
9
4.5
9.5
4.5
10
5
10.5
5
11
5.5
13
6
16
6
26
7
28
87
www.ck12.org
Learning Objectives
Apply the Empirical Rule to questions about normal distributions.
Use the percentages associated with normal distribution to solve problems.
The following graph shows a normal distribution. Notice that vertical lines are drawn at points that are exactly one
standard deviation to the left and right of the mean. We have described standard deviation as a measure of the typical
distance away from the mean. But how much of the data is actually within one standard deviation of the mean? To
answer this question, think about the space, or area, under the curve. The entire data set, or 100% of it, is contained
under the whole curve. What percentage would you estimate is between the two lines?
www.ck12.org
For this normal distribution, 68% of the data values are located between 53 and 63 (within 1 standard deviation of
the mean); 95% of the data values are located between 48 and 68 (within 2 standard deviations of the mean); and
finally, 99.7% of the data values are located between 43 and 73 (within 3 standard deviations of the mean).
Percentages Under the Normal Curve
The emprical rule can be used to answer real world problems when both the mean and the standard deviation of a
normally distributed data set are known.
Example A
The lifetimes of a certain type of calculator battery are normally distributed. The mean life is 400 hours, and the
standard deviation is 50 hours. For a group of 5000 batteries, how many are expected to last...
a. between 350 hours and 450 hours?
b. more than 300 hours?
c. less than 300 hours?
Solutions
a. 68% of the batteries lasted between 350 hours and 450 hours. This means that (5000 .68 = 3400) 3400
batteries are expected to last between 350 and 450 hours.
b. 95% + 2.35% = 97.35% of the batteries are expected to last more than 300 hours. This means that (5000
.9735 = 4867.5 4868) 4868 of the batteries will last longer than 300 hours.
89
www.ck12.org
c. Only 2.35% of the batteries are expected to last less than 300 hours. This means that (5000 .0235 = 117.5
118) 118 of the batteries will last less than 300 hours.
Example B
A bag of chips has a mean mass of 70g with a standard deviation of 3g. Assuming normal distribution; create a
normal curve, including all necessary values.
a. If 1250 bags are processed each day, how many bags will have a mass between 67g and 73g?
b. What percentage of chips will have a mass greater than 64g?
Solutions
a. Between 67g and 73g, lies 68% of the data. If 1250 bags of chips are processed, 850 bags will have a mass
between 67 and 73 grams.
b. 97.35% of the bags of chips will have a mass greater than 64 grams.
Now you can represent the data that your teacher gave to you for your recent Math test on a normal distribution
curve. The mean mark was 61 and the standard deviation was 15.6.
From the normal distribution curve, you can say that your mark of 71 is within one standard deviation of the
mean. You can also say that your mark is within 68% of the data. You did very well on your test.
Lesson Summary
You have learned the significance of standard deviation. You are now able to represent data on the bell-curve and to
interpret a given normal distribution curve. In addition, you can calculate the standard deviation of a given data set
both manually and by using technology. All of this knowledge can be applied to real world problems which you are
now able to answer.
Points to Consider
Is the normal distribution curve the only way to represent data?
90
www.ck12.org
The normal distribution curve shows the spread of the data but does not show the actual data values. Do other
representations of data show the actual data values?
Vocabulary
Normal Distribution: A symmetric bell-shaped curve with tails that extend infinitely in both directions from
the mean of a data set.
68-95-99.7 Rule: The percentages that apply to how the standard deviation of the data spreads out from the
mean of a set of data, also know as the Empirical Rule
Guided Practice
1. If the depth of the snow in my yard is normally distributed, with = 2.500 and = .2500 , what is the probability
that a randomly chosen location will have a snow depth between 2.25 and 2.75 inches?
2. If the height of women in the U.S. is normally distributed with = 50 800 and = 1.500 , what is the probability
that a randomly chosen woman in the U.S. is shorter than 50 500 ?
Solutions
1. 2.25 inches is 1, and 2.75 inches is + 1, so the area encompassed approximately represents 34% +
34% = 68%.
The probability that a randomly chosen location will have a depth between 2.25 and 2.75 inches is 68%.
2. This one is slightly different, since we arent looking for the probability of a limited range of values. We
want to evaluate the probability of a value occurring anywhere below 50 500 . Since the domain of a normal
distribution is infinite, we cant actually state the probability of the portion of the distribution on that end
because it has no end! What we need to do is add up the probabilities that we do know and subtract them
from 100% to get the remainder.
Here is that normal distribution graphic again, with the height data inserted:
Recall that a normal distribution always has 50% of the data on each side of the mean. That indicates that
50% of U.S. females are taller than 50 800 , and gives us a solid starting point to calculate from. There is
another 34% between 50 6.500 and 50 800 and a final 13.5% between 50 500 and 50 6.500 . Ultimately that totals:
50% + 34% + 13.5% = 87.5%.
Since 87.5% of U.S. females are 50 500 or taller, that leaves 12.5% that are less than 50 500 tall.
91
www.ck12.org
Review Questions
1. Ninety-five percent of all cultivated strawberry plants grow to a mean height of 11.4 cm with a standard
deviation of 0.25 cm.
a. If the growth of the strawberry plant is a normal distribution, draw a normal curve showing all the values.
b. If 225 plants in the greenhouse have a height between 11.15 cm and 11.65 cm, how many plants were in
the greenhouse?
c. How many plants in the greenhouse would we expect to be shorter than 10.9 cm?
2. A survey was conducted at a local high school to determine the number of hours that a student studied for the
final Math 10 exam. To achieve a normal distribution, 325 students were surveyed. The results showed that
the mean number of hours spent studying was 4.6 hours with a standard deviation of 1.2 hours.
a.
b.
c.
d.
1
2
hour. Is Harry a
3. The average life expectancy for a dog is 10 years 2 months with a standard deviation of 9 months.
a. If a dogs life expectancy is a normal distribution, draw a normal curve showing all values.
b. What would be the lifespan of almost all dogs? (99.7%)
c. In a sample of 825 dogs, how many dogs would have life expectancy between 9 years 5 months and 10
years 11 months?
d. How many dogs, from the sample, would we expect to live beyond 10 years 11 months?
4. Ninety-five percent of all Marigold flowers have a height between 10.9 cm and 119.0 cm and their height is
normally distributed.
a.
b.
c.
d.
e.
5. Use the 68-95-99.7 rule on a normal distribution of data with a mean of 185 and a standard deviation of 10, to
answer the following questions. What percentage of the data would measure
a.
b.
c.
d.
e.
92
www.ck12.org
4.4 Z-Scores
Learning Objective
z-Scores
A z -score is a measure of the number of standard deviations a particular data point is away from the mean. For
example, lets say the mean score on a test for your statistics class was an 82, with a standard deviation of 7 points.
If your score was an 89, it is exactly one standard deviation to the right of the mean; therefore, your z-score would
be 1. If, on the other hand, you scored a 75, your score would be exactly one standard deviation below the mean,
and your z-score would be 1. All values that are below the mean have negative z-scores, while all values that are
above the mean have positive z-scores. A z-score of 2 would represent a value that is exactly 2 standard deviations
below the mean, so in this case, the value would be 82 14 = 68.
To calculate a z-score for which the numbers are not so obvious, you take the deviation and divide it by the standard
deviation.
z=
Deviation
Standard Deviation
You may recall that deviation is the mean value of the variable subtracted from the observed value, so in symbolic
terms, the z-score would be:
z=
As previously stated, since is always positive, z will be positive when x is greater than and negative when x is
less than . A z-score of zero means that the term has the same value as the mean. The value of z represents the
number of standard deviations the given value of x is above or below the mean.
Example A
What is the z-score for an A on the test described above, which has a mean score of 82? (Assume that an A is a 93.)
The z-score can be calculated as follows:
x
93 82
z=
7
11
z=
1.57
7
z=
93
4.4. Z-Scores
www.ck12.org
If we know that the test scores from the last example are distributed normally, then a z-score can tell us something
about how our test score relates to the rest of the class. From the Empirical Rule, we know that about 68% of the
students would have scored between a z-score of 1 and 1, or between a 75 and an 89, on the test. If 68% of the
data is between these two values, then that leaves the remaining 32% in the tail areas. Because of symmetry, half of
this, or 16%, would be in each individual tail.
Example B
On a nationwide math test, the mean was 65 and the standard deviation was 10. If Robert scored 81, what was his
z-score?
81 65
z=
10
16
z=
10
z = 1.6
z=
Example C
On a college entrance exam, the mean was 70, and the standard deviation was 8. If Helens z-score was 1.5, what
was her exam score?
z = x
z=
x = +z
x = 70 + (1.5)(8)
x = 58
z-Scores and Probability
Knowing the z-score of a given value is great, but what can you do with it? How does a z-score relate to probability?
For example, how likely (or unlikely) is an occurrence of a z-score of 2.47 or greater?
Remember that the area under a normal curve follows the empirical rule. Here is that graphic again showing what
proportion of the scores in a distribution fall within one, two or three standard deviations of the mean:
94
www.ck12.org
It is easy enough to see from the curve above that about 84% of all scores will fall below a z-score of 1. What do
we do when we want to know the proportion of scores that are less than a z-score of 1.2? The area can be calculated
using calculus, but we can also use a z-table to look up the area, as shown below.
TABLE 4.14:
z
-3.0
-2.9
-2.8
-2.7
-2.6
-2.5
-2.4
-2.3
-2.2
-2.1
-2.0
-1.9
-1.8
-1.7
-1.6
-1.5
-1.4
-1.3
-1.2
-1.1
-1.0
-0.9
-0.8
-0.7
-0.6
-0.5
-0.4
-0.3
-0.2
-0.1
0.0
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
0
0.0013
0.0019
0.0026
0.0035
0.0047
0.0062
0.0082
0.0107
0.0139
0.0179
0.0228
0.0287
0.0359
0.0446
0.0548
0.0668
0.0808
0.0968
0.1151
0.1357
0.1587
0.1841
0.2119
0.2420
0.2743
0.3085
0.3446
0.3821
0.4207
0.4602
0.5000
0.5000
0.5398
0.5793
0.6179
0.6554
0.6915
0.7258
0.7580
0.7881
0.8159
0.8413
0.8643
0.01
0.0013
0.0018
0.0025
0.0034
0.0045
0.0060
0.0080
0.0104
0.0136
0.0174
0.0222
0.0281
0.0352
0.0436
0.0537
0.0655
0.0793
0.0951
0.1131
0.1335
0.1563
0.1814
0.2090
0.2389
0.2709
0.3050
0.3409
0.3783
0.4168
0.4562
0.4960
0.5040
0.5438
0.5832
0.6217
0.6591
0.6950
0.7291
0.7612
0.7910
0.8186
0.8438
0.8665
0.02
0.0013
0.0017
0.0024
0.0033
0.0044
0.0059
0.0078
0.0102
0.0132
0.0170
0.0217
0.0274
0.0344
0.0427
0.0526
0.0643
0.0778
0.0934
0.1112
0.1314
0.1539
0.1788
0.2061
0.2358
0.2676
0.3015
0.3372
0.3745
0.4129
0.4522
0.4920
0.5080
0.5478
0.5871
0.6255
0.6628
0.6985
0.7324
0.7642
0.7939
0.8212
0.8461
0.8686
0.03
0.0012
0.0017
0.0023
0.0032
0.0043
0.0057
0.0075
0.0099
0.0129
0.0166
0.0212
0.0268
0.0336
0.0418
0.0516
0.0630
0.0764
0.0918
0.1094
0.1292
0.1515
0.1762
0.2033
0.2327
0.2644
0.2981
0.3336
0.3707
0.4091
0.4483
0.4880
0.5120
0.5517
0.5910
0.6293
0.6664
0.7019
0.7357
0.7673
0.7967
0.8238
0.8485
0.8708
0.04
0.0012
0.0016
0.0023
0.0031
0.0041
0.0055
0.0073
0.0096
0.0126
0.0162
0.0207
0.0262
0.0329
0.0409
0.0505
0.0618
0.0749
0.0901
0.1075
0.1271
0.1492
0.1736
0.2005
0.2297
0.2611
0.2946
0.3300
0.3669
0.4052
0.4443
0.4841
0.5160
0.5557
0.5948
0.6331
0.6700
0.7054
0.7389
0.7704
0.7996
0.8264
0.8508
0.8729
0.05
0.0011
0.0016
0.0022
0.0030
0.0040
0.0054
0.0071
0.0094
0.0122
0.0158
0.0202
0.0256
0.0322
0.0401
0.0495
0.0606
0.0735
0.0885
0.1057
0.1251
0.1469
0.1711
0.1977
0.2266
0.2579
0.2912
0.3264
0.3632
0.4013
0.4403
0.4801
0.5199
0.5597
0.5987
0.6368
0.6736
0.7088
0.7422
0.7734
0.8023
0.8289
0.8531
0.8749
0.06
0.0011
0.0015
0.0021
0.0029
0.0039
0.0052
0.0070
0.0091
0.0119
0.0154
0.0197
0.0250
0.0314
0.0392
0.0485
0.0594
0.0721
0.0869
0.1038
0.1230
0.1446
0.1685
0.1949
0.2236
0.2546
0.2877
0.3228
0.3594
0.3974
0.4364
0.4761
0.5239
0.5636
0.6026
0.6406
0.6772
0.7123
0.7454
0.7764
0.8051
0.8315
0.8554
0.8770
0.07
0.0011
0.0015
0.0021
0.0028
0.0038
0.0051
0.0068
0.0089
0.0116
0.0150
0.0192
0.0244
0.0307
0.0384
0.0475
0.0582
0.0708
0.0853
0.1020
0.1210
0.1423
0.1660
0.1922
0.2207
0.2514
0.2843
0.3192
0.3557
0.3936
0.4325
0.4721
0.5279
0.5675
0.6064
0.6443
0.6808
0.7157
0.7486
0.7794
0.8079
0.8340
0.8577
0.8790
0.08
0.0010
0.0014
0.0020
0.0027
0.0037
0.0049
0.0066
0.0087
0.0113
0.0146
0.0188
0.0238
0.0301
0.0375
0.0465
0.0571
0.0694
0.0838
0.1003
0.1190
0.1401
0.1635
0.1894
0.2177
0.2483
0.2810
0.3156
0.3520
0.3897
0.4286
0.4681
0.5319
0.5714
0.6103
0.6480
0.6844
0.7190
0.7518
0.7823
0.8106
0.8365
0.8599
0.8810
0.09
0.0010
0.0014
0.0019
0.0026
0.0036
0.0048
0.0064
0.0084
0.0110
0.0143
0.0183
0.0233
0.0294
0.0367
0.0455
0.0559
0.0681
0.0823
0.0985
0.1170
0.1379
0.1611
0.1867
0.2148
0.2451
0.2776
0.3121
0.3483
0.3859
0.4247
0.4641
0.5359
0.5754
0.6141
0.6517
0.6879
0.7224
0.7549
0.7852
0.8133
0.8389
0.8621
0.8830
95
4.4. Z-Scores
www.ck12.org
0.8849
0.9032
0.9192
0.9332
0.9452
0.9554
0.9641
0.9713
0.9773
0.9821
0.9861
0.9893
0.9918
0.9938
0.9953
0.9965
0.9974
0.9981
0.9987
0.8869
0.9049
0.9207
0.9345
0.9463
0.9564
0.9649
0.9719
0.9778
0.9826
0.9865
0.9896
0.9920
0.9940
0.9955
0.9966
0.9975
0.9982
0.9987
0.8888
0.9066
0.9222
0.9357
0.9474
0.9573
0.9656
0.9726
0.9783
0.9830
0.9868
0.9898
0.9922
0.9941
0.9956
0.9967
0.9976
0.9983
0.9987
0.8907
0.9082
0.9236
0.9370
0.9485
0.9582
0.9664
0.9732
0.9788
0.9834
0.9871
0.9901
0.9925
0.9943
0.9957
0.9968
0.9977
0.9983
0.9988
0.8925
0.9099
0.9251
0.9382
0.9495
0.9591
0.9671
0.9738
0.9793
0.9838
0.9875
0.9904
0.9927
0.9945
0.9959
0.9969
0.9977
0.9984
0.9988
0.8944
0.9115
0.9265
0.9394
0.9505
0.9599
0.9678
0.9744
0.9798
0.9842
0.9878
0.9906
0.9929
0.9946
0.9960
0.9970
0.9978
0.9984
0.9989
0.8962
0.9131
0.9279
0.9406
0.9515
0.9608
0.9686
0.9750
0.9803
0.9846
0.9881
0.9909
0.9931
0.9948
0.9961
0.9971
0.9979
0.9985
0.9989
0.8980
0.9147
0.9292
0.9418
0.9525
0.9616
0.9693
0.9756
0.9808
0.9850
0.9884
0.9911
0.9932
0.9949
0.9962
0.9972
0.9980
0.9985
0.9989
0.8997
0.9162
0.9306
0.9430
0.9535
0.9625
0.9700
0.9762
0.9812
0.9854
0.9887
0.9913
0.9934
0.9951
0.9963
0.9973
0.9980
0.9986
0.9990
0.9015
0.9177
0.9319
0.9441
0.9545
0.9633
0.9706
0.9767
0.9817
0.9857
0.9890
0.9916
0.9936
0.9952
0.9964
0.9974
0.9981
0.9986
0.9990
The z-score table above provides the area under the standard normal distribution that falls to the left of each particular
z value. That is the value shaded in the diagram below. The area can be interpreted as the probability that a score in
the distribution is less than the score that corresponds to z.
FIGURE 4.1
For example, a z-score of zero (remember that is the z-score that corresponds to the mean), has a probability of 0.5
because half of the scores in the normal distribution are lower than the mean. Although the table only provides the
area to the left of each z value, remember that the area under the entire standard normal distribution is equal to one.
So to find the probability of getting a value greater than z, look up the probability for z in the table and subtract it
from one.
96
www.ck12.org
Example A
What is the probability that a value with a z-score less than 2.47 will occur in a normal distribution?
Solution
Scroll up to the table above and find 2.4 on the left or right side. Now move across the table to 0.07 on the top
or bottom, and record the value in the cell: 0.9932. That tells us that 99.32% of values in the set are at or below a
z -score of 2.47.
Example B
What is the probability that a value with a z-score greater than 1.53 will occur in a normal distribution?
Solution
Scroll up to the table of z-score probabilities again and find the intersection between 1.5 on the left or right and 3 on
the top or bottom, record the value in the cell: 0.937 .
That decimal lets us know that 93.7% of values in the set are below the z-score of 1.53. To find the percentage that
is above that value, we subtract 0.937 from 1.0 (or 93.7% from 100%), to get 0.063 or 6.3%.
Example C
What is the probability of a random selection being less than 3.65, given a normal distribution with = 5 and
= 2.2?
Solution
This question requires us to first find the z-score for the value 3.65, then calculate the percentage of values below
that z-score from a reference.
1. Find the z -score for 3.65, using the z-score formula:
z=
(x)
3.65 5 1.35
=
0.61
2.2
2.2
2. Now we can scroll up to our z-score reference above and find the intersection of -0.6 and 0.01, which should
be .2709
There is approximately a 27.09% probability that a value less than 3.65 would occur from a random selection of a
normal distribution with mean 5 and standard deviation 2.2.
Finding the Probability Between Two z-Scores
How would you calculate the proportion of scores that fall between z-scores of -0.08 and +1.92? We can do this by
looking up each probability separately and subtracting.
97
4.4. Z-Scores
www.ck12.org
Example A
What is the probability associated with a scores that fall between z = 1.2 and z = 2.31?
Solution
FIGURE 4.2
z=
(x)
8.45 10 1.55
=
0.78
2
2
z=
10.25 10 0.25
=
.13
2
2
3. Now find the percentages for each, using a reference (dont forget we want the probability of values less than
our negative score and less than our positive score, so we can find the values between):
98
www.ck12.org
There is approximately a 33.4% probability that a value between 8.45 and 10.25 would result from a random selection
of a normal distribution with mean 10 and standard deviation 2.
Vocabulary
A z -score is a measure of how many standard deviations there are between a data value and the mean.
A z -score probability table is a table that associates z-scores to area under the normal curve. The table may
be used to associate a Z-score with a percent probability.
Guided Practice with z-Score Calculations
1. What is the z-score of the price of a pair of skis that cost $247, if the mean ski price is $279, with a standard
deviation of $16?
2. What is the z-score of a 5-scoop ice cream cone if the mean number of scoops is 3, with a standard deviation
of 1 scoop?
3. What is the z-score of the weight of a cow that tips the scales at 825 lbs, if the mean weight for cows of her
type is 1150 lbs, with a standard deviation of 77 lbs?
4. What is the z-score of a measured value of 0.0034, given = 0.0041 and = 0.0008?
99
4.4. Z-Scores
www.ck12.org
Solutions
1. First find the difference between the measured value and the mean, then divide that difference by the standard
deviation:
247 279
16
32
z=
16
z = 2
z=
2. This one is easy: The difference between 5 scoops and 3 scoops is +2, and we divide that by the standard
deviation of 1, so the z -score is +2.
3. First find the difference between the measured value and the mean, then divide that difference by the standard
deviation:
z=
4. First find the difference between the measured value and the mean, then divide that difference by the standard
deviation:
z=
0.0034 0.0041
0.0008
0.0007
z=
0.0008
z = 0.875
1. What is the probability of occurrence of a value with z-score greater than 1.24?
2. What is the probability of z < .23?
3. What is P(Z < 2.13)?
Solutions
1. Since this is a positive z-score, we can use the value for z = 1.24 directly from the table, and just express it as
a percentage: 0.8925 or 89.25%
2. This is a negative z-score : 40.9%
3. This is a positive z-score, and we need the percentage of values below it, so we can use the percentage
associated with z = +2.13 directly from the table: 0.9834 or 98.34%
100
www.ck12.org
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
Given a distribution with a mean of 70 and standard deviation of 62, find a value with a z-score of -1.82.
What does a z-score of 3.4 mean?
Given a distribution with a mean of 60 and standard deviation of 98, find the z-score of 120.76.
Given a distribution with a mean of 60 and standard deviation of 21, find a value with a z-score of 2.19.
Find the z-score of 187.37, given a distribution with a mean of 185 and standard deviation of 1.
What is the probability of a z-score between +1.99 and +2.02?
What is the probability of a z-score between -1.99 and +2.02?
What is the probability of a z-score between -1.20 and -1.97?
What is the probability of a z-score between +2.33 and-0.97?
What is the probability of a z-score greater than +0.09?
What is the probability of a z-score greater than -0.02?
What is P(1.42 < Z < 2.01)?
What is the probability of the random occurrence of a value between 56 and 61 from a normally distributed
population with mean 62 and standard deviation 4.5?
14. What is the probability of a value between 301 and 329, assuming a normally distributed set with mean 290
and standard deviation 32?
15. What is the probability of getting a value between 1.2 and 2.3 from the random output of a normally distributed
set with = 2.6and = .9?
101
www.ck12.org
C HAPTER
Relationships between
Quantitative Variables
Chapter Outline
5.1
102
www.ck12.org
Understand the concepts of bivariate data and correlation, and the use of scatterplots to display bivariate data.
Understand when the terms positive, negative, strong, and perfect apply to the correlation between two
variables in a scatterplot graph.
Calculate the linear correlation coefficient and coefficient of determination.
Understand properties and common errors of correlation.
Introduction
So far we have learned how to describe distributions of a single variable. But what if we notice that two variables
seem to be related? We may notice that the values of two variables, such as verbal SAT score and GPA, behave in
the same way and that students who have a high verbal SAT score also tend to have a high GPA (see table below).
In this case, we would want to study the nature of the connection between the two variables.
TABLE 5.1: A Table of Verbal SAT Values and GPAs for Seven Students
Student
1
2
3
4
5
6
7
SAT Score
595
520
715
405
680
490
565
GPA
3.4
3.2
3.9
2.3
3.9
2.5
3.5
These types of studies are quite common, and we can use the concept of correlation to describe the relationship
between the two variables.
Correlation measures the linear relationship between two quantitative variables. Correlation is possible when we
have bivariate data. In other words, when the subjects in our dataset have scores on two separate quantitative
variables, we have bivariate data. In our example above, we notice that there are two observations (verbal SAT score
and GPA) for each subject (in this case, a student). Can you think of other scenarios when we would use bivariate
data?
If we carefully examine the data in the example above, we notice that those students with high SAT scores tend to
have high GPAs, and those with low SAT scores tend to have low GPAs. In this case, there is a tendency for students
to score similarly on both variables, and the performance between variables appears to be related.
103
www.ck12.org
Scatterplots, like the one below, display bivariate data and provide a visual representation of the relationship
between the two variables. In a scatterplot, each point represents a paired measurement of two variables for a
specific subject, and each subject is represented by one point on the scatterplot.
Direction of Relationship
Examining a scatterplot graph allows us to obtain some idea about the relationship between two variables. When the
points on a scatterplot graph produce a lower-left-to-upper-right pattern (see below), we say that there is a positive
correlation between the two variables. This pattern means that when the score of one observation is high, we expect
the score of the other observation to be high as well, and vice versa.
When the points on a scatterplot graph produce a upper-left-to-lower-right pattern (see below), we say that there is a
negative correlation between the two variables. This pattern means that when the score of one observation is high,
we expect the score of the other observation to be low, and vice versa.
When all the points on a scatterplot lie on a straight line, you have what is called a perfect correlation between the
two variables (see below).
104
www.ck12.org
A scatterplot in which the points do not have a linear trend (either positive or negative) is called a zero correlation
or a near-zero correlation (see below).
Magnitude of Relationship
When examining scatterplots, we also want to look not only at the direction of the relationship (positive, negative,
or zero), but also at the magnitude of the relationship. If we drew an imaginary oval around all of the points on
the scatterplot, we would be able to see the extent, or the magnitude, of the relationship. If the points are close to
one another and the width of the imaginary oval is small, this means that there is a strong correlation between the
variables (see below).
105
www.ck12.org
However, if the points are far away from one another, and the imaginary oval is very wide, this means that there is a
weak correlation between the variables (see below).
Correlation Coefficient
While examining scatterplots gives us some idea about the relationship between two variables, we use a statistic
called the Pearson correlation coefficient to give us a more precise measurement of the relationship between the
two variables. We use r to denote the correlation coefficient, and r has the following properties:
r is always a value between -1 and +1
The further an r value is from zero, the stronger the relationship between the two variables. The absolute value
of the coefficient indicates the magnitude, or strength, of the relationship.
The sign of r indicates the nature of the relationship: A positive r indicates a positive relationship, and a
negative r indicates a negative relationship.
If two variables have a perfect linear relationship (meaning they fall on a straight line), then r is equal to 1.0 or -1.0,
depending on the direction of the relationship. When there is no linear relationship between two variables, r=0. It is
important to remember that a correlation coefficient of 0 means that there is no linear relationship, but there may still
be a relationship between the two variables. For example, there could be a quadratic relationship between them.
The name of this statistics is the Pearson product-moment correlation coefficient. It is symbolized by the letter r.
Generally speaking, you may think of the values of r in the following manner:
106
www.ck12.org
Coefficient of Determination
At the risk of overloading you with new terms, there is one more that is worth learning in this lesson, the coefficient
of determination. The coefficient of determination is very simple to calculate if you know the correlation coefficient,
since it is just r2 . The coefficient of determination can be interpreted as the percentage of variation of the y variable
that can be attributed to the relationship. In other words, a value of r2 = .63 can be interpreted as 63% of the
variation in Y can be attributed to the variation in X."
Thus, the correlation coefficient not only provides a measure of the relationship between the variables, but it also
gives us an idea about how much of the total variance of one variable can be associated with the variance of the
other. The higher the correlation we have between two variables, the larger the portion of the variance that can be
explained by the independent variable.
Example A
Elaina is curious about the relationship between the weight of a dog and the amount of food it eats. Specifically, she
wonders if heavier dogs eat more food, or if age and size factor in. She works at the Humane Society, and does some
research. After some calculation, she determines that dog weight and food weight exhibit an r-value of 0.73. What
can Elaina say about the relationship, based on her research? What percentage of the increases in food intake can
she attribute to weight, according to her research?
107
www.ck12.org
Solution
The calculated r-value of 0.73 tells us that Elainas data demonstrates a moderate to strong correlation between the
variables.
Since the coefficient of determination tells us the percentage of changes in the output variable that can be attributed
to the input variable, we need to calculate r2 :
r2 = (0.73)2 = .5329
Approximately 53% of increases in food intake can be attributed to the linear relationship between food intake and
the weight of the dog. Since weight explains less than 100% of the difference in food intake, this suggests that other
factors, perhaps age and size, are also involved.
Example B
Tuscany wonders if barrel racing times are related to the age of the horse. Specifically, she wonders if older horses
take longer to complete a barrel racing run. As a member of the Pony Club, she does some research, and determines
that horse age to barrel run time exhibits an r-value of 0.52.
What can Tuscany say about horse age vs barrel race time, according to her research?
Solution
Tuscanys research suggests that there is a moderate to weak correlation between horse age and barrel run time. In
other words, the research suggests that (0.52)2 = .27 = 27% of the differences between barrel run times could be
attributable to the linear relationship between barrel run time and the age of the horse.
How to Calculate r
To understand how this coefficient is calculated, lets suppose that there is a positive relationship between two
variables, X and Y . If a subject has a score on X that is above the mean, we expect the subject to have a score on Y
that is also above the mean. Pearson developed his correlation coefficient by computing the sum of cross products.
He multiplied the two scores, X and Y , for each subject and then added these cross products across the individuals.
Next, he divided this sum by the number of subjects minus one. This coefficient is, therefore, the mean of the cross
products of scores.
Pearson used standard scores (z-scores, t-scores, etc.) when determining the coefficient.
Therefore, the formula for this coefficient is as follows:
rXY =
zX zY
n1
In other words, the coefficient is expressed as the sum of the cross products of the standard z-scores divided by the
number of degrees of freedom. For correlation, df = n-1, where n is the number of bivariate data points in your
analysis.
108
www.ck12.org
If you have the raw scores for your data, along with the mean and standard deviation of each variable, you can
also calculate r using the following formula. This formula is less computationally intensive when you are trying to
calculate r by hand:
rXY =
SP
(n 1)(sx )(sy )
where SP stands for sum of products. To calculate SP, use the following formula:
SP = xy
( x)( y)
n
An equivalent formula that uses the raw scores only is called the raw score formula and is written as follows:
n xy x y
rXY = rh
i rh
i
2
2
n x ( x)
n y2 ( y)2
Note that n is used instead of n 1, because we are using actual data and not z-scores. Lets use our example from
the introduction to demonstrate how to calculate the correlation coefficient using the raw score formula.
Example C
What is the Pearson product-moment correlation coefficient for the two variables represented in the table below?
SAT Score
595
520
715
405
680
490
565
GPA
3.4
3.2
3.9
2.3
3.9
2.5
3.5
In order to calculate the correlation coefficient, we need to calculate several pieces of information, including xy, x2 ,
and y2 . Therefore, the values of xy, x2 , and y2 have been added to the table.
109
www.ck12.org
TABLE 5.3:
SAT Score (X)
595
520
715
405
680
490
565
3970
Student
1
2
3
4
5
6
7
Sum
GPA (Y )
3.4
3.2
3.9
2.3
3.9
2.5
3.5
22.7
xy
2023
1664
2789
932
2652
1225
1978
13262
x2
354025
270400
511225
164025
462400
240100
319225
2321400
y2
11.56
10.24
15.21
5.29
15.21
6.25
12.25
76.01
Method 1
We can calculate SP from the raw data as follows:
SP = 13262
(3970)(22.7)
= 387.86
7
After calculating the values of sx = 107.89 and sy = 0.632, we then calculate r as:
rXY =
387.86
= 0.948
(6)(107.89)(0.632)
Method 2
If we were to apply the raw formula to solve this problem, we find the following:
(7)(13262) (3970)(22.7)
n xy x y
rXY = rh
i rh
i= p
[(7)(2321400) 39702 ][(7)(76.01) 22.72 ]
n x2 ( x)2
n y2 ( y)2
=
2715
0.95
2864.22
Correlation is a measure of the linear relationship between two variables-it does not necessarily state that one variable
is caused by another. For example, a third variable or a combination of other things may be causing the two correlated
variables to relate as they do. Therefore, it is important to remember that we are interpreting the variables and the
variance not as causal, but instead as relational.
When examining correlation, there are four things that could affect our results: lack of linearity, outliers, homogeneity of the group, and sample size.
As mentioned, the correlation coefficient is the measure of the linear relationship between two variables. However,
while many pairs of variables have a linear relationship, some do not. For example, lets consider performance anxiety. As a persons anxiety about performing increases, so does his or her performance up to a point. (We sometimes
110
www.ck12.org
call this good stress.) However, at some point, the increase in anxiety may cause a persons performance to go
down. We call these non-linear relationships curvilinear relationships. We can identify curvilinear relationships
by examining scatterplots (see below). One may ask why curvilinear relationships pose a problem when calculating
the correlation coefficient. The answer is that if we use the traditional formula to calculate these relationships,
it will not be an accurate index, and we will be underestimating the relationship between the variables. If we
graphed performance against anxiety, we would see that anxiety has a strong affect on performance. However, if we
calculated the correlation coefficient, we would arrive at a figure around zero. Therefore, the correlation coefficient
is not always the best statistic to use to understand the relationship between variables.
Outliers
The correlation coefficient is also very sensitive to outliers, or points that fall far away from the general trend of
the data. A single point that can greatly change the value of r, which means that the existence of outliers can either
mask or inflate the apparent strength of the linear relationship between two variables. In the graph below, r = 0.84
if you include the outlier that falls well above the general trend of the rest of the data. However, if you ignore that
single point, r increases to 0.99. Because of this sensitivity, it is always important to plot your data before running a
correlation analysis and investigate any outlying points carefully.
111
www.ck12.org
Sample Size
Finally, we should consider sample size. One may assume that the number of observations used in the calculation
of the correlation coefficient may influence the magnitude of the coefficient itself. However, this is not the case. Yet
while the sample size does not affect the correlation coefficient, it may affect the accuracy of the relationship. The
larger the sample, the more accurate of a predictor the correlation coefficient will be of the relationship between the
two variables.
Lesson Summary
Bivariate data are data sets with two observations that are assigned to the same subject. Correlation measures the
direction and magnitude of the linear relationship between bivariate data. When examining scatterplot graphs, we
can determine if correlations are positive, negative, perfect, or zero. A correlation is strong when the points in the
scatterplot lie generally along a straight line.
112
www.ck12.org
The correlation coefficient is a precise measurement of the relationship between the two variables. This index can
take on values between and including 1.0 and +1.0.
To calculate the correlation coefficient, we most often use one of the following two formulas when calculating the
coefficient by hand.
rXY =
SP
(n 1)(sx )(sy )
where
SP = xy
( x)( y)
n
When calculating the correlation coefficient, there are several things that could affect our computation, including
curvilinear relationships, homogeneity of the group, and the size of the group.
Review Questions
1. Give 2 scenarios or research questions where you would use bivariate data sets.
2. In the space below, draw and label four scatterplot graphs. One should show:
a.
b.
c.
d.
a positive correlation
a negative correlation
a perfect correlation
a zero correlation
3. In the space below, draw and label two scatterplot graphs. One should show:
a. a weak correlation
b. a strong correlation.
4. What does the correlation coefficient measure?
5. The following observations were taken for five students measuring grade and reading level.
TABLE 5.4: A Table of Grade and Reading Level for Five Students
Student Number
1
2
3
4
5
f.
Grade
2
6
5
4
1
Reading Level
6
14
12
10
4
a. Draw a scatterplot for these data. What type of relationship does this correlation have?
b. Use the raw score formula to compute the Pearson correlation coefficient.
113
www.ck12.org
6. A teacher gives two quizzes to his class of 10 students. The following are the scores of the 10 students.
Quiz 1
15
12
10
14
10
8
6
15
16
13
Quiz 2
20
15
12
18
10
13
12
10
18
15
a. Compute the Pearson correlation coefficient, r, between the scores on the two quizzes.
b. Find the percentage of the variance, r2 , in the scores of Quiz 2 associated with the variance in the scores
of Quiz 1.
c. Interpret both rand r2 in words.
7. What are the three factors that we should be aware of that affect the magnitude and accuracy of the Pearson
correlation coefficient?
114
www.ck12.org
C HAPTER
Relationships between
Categorical Variables
Chapter Outline
6.1
C ONTINGENCY TABLES
6.2
6.3
C ONDITIONAL P ROBABILITY
6.4
115
www.ck12.org
Suppose you wanted to evaluate how gender affects the type of movie chosen by movie-goers. How might you
organize data on male and female watchers, and action, romance, comedy, and horror movie types, so it would be
easy to compare the different combinations?
Contingency tables are used to evaluate the interaction of two different categorical variables. Contingency tables
are sometimes called two-way tables because they are organized with the outputs of one variable across the top, and
another down the side. Consider the table below:
TABLE 6.1:
Male
42
58
Chocolate Candy
Fruit Candy
Female
77
23
This is a contingency table comparing the variable Gender with the variable Candy Preference. You can see that,
across the top of the table are the two gender options for this particular study: male students and female students.
Down the left side are the two candy preference options: chocolate and fruit. The data in the center of the table
indicates the reported candy preferences of the 200 students polled during the study.
Commonly, there will be one additional row and column for totals, like this:
TABLE 6.2:
Chocolate Candy
Fruit Candy
TOTAL
Male
42
58
100
Female
77
23
100
TOTAL
119
81
200
Notice that you can run a quick check on the calculation of totals, since the total of totals should be the same from
either direction: 119 + 81 = 200 = 100 + 100.
A marginal distribution is how many overall responses there were for each category of the variable. The marginal
distribution of a variable can be determined by looking at the "Total" column (for type of candy) or the "Total" row
(for gender). For example, we can see that the marginal totals for type of candy are 119 chocolate and 81 fruit.
Similarly, the marginal total for gender tells us there were an equal number of males and females in the study.
Example A
Construct a contingency table to display the following data:
116
www.ck12.org
250 mall shoppers were asked if they intended to eat at the in-mall food court or go elsewhere for lunch. Of the 117
male shoppers, 68 intended to stay, compared to only 62 of the 133 female shoppers
Solution
First, lets identify our variables and set up the table with the appropriate row and column headers.
The variables are gender and lunch location choice:
TABLE 6.3:
Male
Female
TOTAL
Food Court
Out of Mall
TOTAL
Now we can fill in the values we have directly from the text:
TABLE 6.4:
Food Court
Out of Mall
TOTAL
Male
68
Female
62
TOTAL
117
133
250
TABLE 6.5:
Food Court
Out of Mall
TOTAL
Male
68
49
117
Female
62
71
133
TOTAL
130
120
250
Example B
Referencing data from Example A, answer the following:
a.
b.
c.
d.
e.
Solution
a. If we read across the row Food Court, we see that there were a total of 130 shoppers eating in, and that 62
62
.477 or 47.7%.
of them were female. To calculate percentage, we simply divide: 130
b. The male shoppers were distributed as 68 food court and 49 out of mall.
c. The marginal distribution is the distribution of data in the margin, or in the TOTAL column. In this case,
we are interested in the data on lunch location preference, which is found in the far right column: 130 food
117
www.ck12.org
Example C
Using the given data:
a. Construct a contingency table
b. Identify the marginal distributions
c. Identify 3 different percentage-based observations
Out of 213 polled amateur drag racers, 37 drove cars with turbo-chargers, 59 had superchargers, and the rest were
normally aspirated. The racers themselves were split between 102 rookies and 111 veterans. The rookies evidently
preferred turbos, since 29 of them had turbo-charged vehicles, and avoided superchargers, since there were only 12
of them.
Solution
a. Set up the table with the appropriate headers, and fill in the data we know. Note that this time we will need
a 3 2 table instead of a 2 2 (it is still a two- way table though, as there are only two variables: engine
aspiration and driver experience):
TABLE 6.6:
Rookie
Veteran
TOTAL
Turbocharger
29
Supercharger
12
Normal Aspiration
37
59
117
TOTAL
102
111
213
Now we can update the table with the missing data, calculated using addition or subtraction:
TABLE 6.7:
Rookie
118
Turbocharger
29
Supercharger
12
Normal Aspiration
61
TOTAL
102
www.ck12.org
8
37
47
59
56
117
111
213
b. The marginal distribution refers to the overall data for each of the two variables:
Aspiration type is distributed as follows: 37 Turbos, 59 Superchargers, and 117 normally aspirated.
Driver experience distribution: 102 Rookies and 111 Veterans.
61
102 = 0.598 or 59.8% of Rookies drive normally aspirated cars.
47
59 = 0.7966 or 79.66% of the Superchargers were in cars driven
47
111 = 0.4234 or 42.34% of Veterans use Superchargers.
by Veterans.
Vocabulary
A contingency table or two-way table is used to organize data from multiple categories of two variables so
that various assessments may be made.
A marginal distribution is the distribution of data in the margin of a table. It may also be described as the
distribution of the data for a single variable.
Guided Practice
TABLE 6.8:
X
Y
TOTAL
A
47
TOTAL
32
100
200
105
TABLE 6.9:
X
Y
TOTAL
A
47
100 - 32 = 68
105
B
100 - 47 = 53
32
200 - 105 = 85
TOTAL
200 - 100 = 100
100
200
www.ck12.org
3. There are 32 Bs that are also Ys, out of the total of 100 Bs:
47
= 0.47
4. 47 of the 100 As are Xs, 100
32
100
= 32%
More Practice
TABLE 6.10:
Male Drivers
Female Drivers
TOTAL
Sports Cars
72
36
108
Pickup Trucks
67
71
138
Luxury Cars
36
68
104
TOTAL
175
175
350
1.
2.
3.
4.
5.
6.
7.
120
www.ck12.org
Introduction
The concept of probability plays an important role in our daily lives. Assume you have an opportunity to invest some
money in a software company. Suppose you know that the companys records indicate that in the past five years, its
profits have been consistently decreasing. Would you still invest your money in it? Do you think the chances are
good for the company in the future?
Here is another illustration. Suppose that you are playing a game that involves tossing a single die. Assume that
you have already tossed it 10 times, and every time the outcome was the same, a 2. What is your prediction of the
eleventh toss? Would you be willing to bet $100 that you will not get a 2 on the next toss? Do you think the die is
loaded?
Notice that the decision concerning a successful investment in the software company and the decision of whether or
not to bet $100 on the next outcome of the die are both based on probabilities of certain sample results. Namely, the
software companys profits have been declining for the past five years, and the outcome of rolling a 2 ten times in a
row seems strange. From these sample results, we might conclude that we are not going to invest our money in the
software company or bet on this die. In this lesson, you will learn mathematical ideas and tools that can help you
understand such situations.
An event is something that occurs, or happens. For example, flipping a coin is an event, and so is walking in the
park and passing by a bench. Anything that could possibly happen is an event.
Every event has one or more possible outcomes. While tossing a coin is an event, getting tails is the outcome of that
event. Likewise, while walking in the park is an event, finding your friend sitting on the bench is an outcome of that
event.
Suppose a coin is tossed once. There are two possible outcomes, either heads, H, or tails, T . Notice that if the
experiment is conducted only once, you will observe only one of the two possible outcomes. An experiment is the
process of taking a measurement or making an observation. These individual outcomes for an experiment are each
called simple events.
Example A
A die has six possible outcomes: 1, 2, 3, 4, 5, or 6. When we toss it once, only one of the six outcomes of this
experiment will occur. The one that does occur is called a simple event.
121
www.ck12.org
Example B
Suppose that two pennies are tossed simultaneously. We could have both pennies land heads up (which we write as
HH), or the first penny could land heads up and the second one tails up (which we write as HT ), etc. We will see
that there are four possible outcomes for each toss, which are HH, HT, T H, and T T .
What we have accomplished so far is a listing of all the possible events of an experiment. This collection is called
the sample space of the experiment. The sample space is the set of all possible outcomes of an experiment, or the
collection of all the possible simple events of an experiment. We will denote a sample space by S.
Example C
We want to determine the sample space of throwing a die and the sample space of tossing a coin.
Solution
As we know, there are 6 possible outcomes for throwing a die. We may get 1, 2, 3, 4, 5, or 6, so we write the sample
space as the set of all possible outcomes:
S = {1, 2, 3, 4, 5, 6}
Similarly, the sample space of tossing a coin is either heads, H, or tails, T , so we write S = {H, T }.
Example D
Suppose a box contains three balls, one red, one blue, and one white. One ball is selected, its color is observed, and
then the ball is placed back in the box. The balls are scrambled, and again, a ball is selected and its color is observed.
What is the sample space of the experiment?
It is probably best if we draw a tree diagram to illustrate all the possible selections.
As you can see from the tree diagram, it is possible that you will get the red ball, R, on the first drawing and then
another red one on the second, RR. You can also get a red one on the first and a blue on the second, and so on. From
the tree diagram above, we can see that the sample space is as follows:
122
www.ck12.org
Example E
Consider the same experiment as in the last example. This time we will draw one ball and record its color, but we
will not place it back into the box. We will then select another ball from the box and record its color. What is the
sample space in this case?
Solution
The tree diagram below illustrates this case:
You can clearly see that when we draw, say, a red ball, the blue and white balls will remain. So on the second
selection, we will either get a blue or a while ball. The sample space in this case is as shown:
www.ck12.org
P(A) =
Example F
When tossing two coins, what is the probability of getting a head on both coins, HH? Is the probability classical?
Since there are 4 elements (outcomes) in the sample space set, {HH, HT, T H, T T }, its size is 4. Furthermore, there
is only 1 HH outcome that can occur. Therefore, using the formula above, we can calculate the probability as shown:
P(A) =
Notice that each of the 4 possible outcomes is equally likely. The probability of each is 0.25. Also notice that the
total probability of all possible outcomes in the sample space is 1.
Example G
What is the probability of throwing a die and getting A = 2, 3, or 4?
There are 6 possible outcomes when you toss a die. Thus, the total number of outcomes in the sample space is 6.
The event we are interested in is getting a 2, 3, or 4, and there are three ways for this event to occur.
P(A) =
Example H
Consider tossing two coins. Assume the coins are not balanced. The design of the coins is such that they produce
the probabilities shown in the table below:
124
www.ck12.org
TABLE 6.11:
Outcome
HH
Probability
HT
2
9
TH
2
9
TT
1
9
4
9
What is the probability of observing exactly one head, and what is the probability of observing at least one head?
Notice that the simple events HT and T H each contain only one head. Thus, we can easily calculate the probability
of observing exactly one head by simply adding the probabilities of the two simple events:
P = P(HT ) + P(T H)
2 2
= +
9 9
4
=
9
Similarly, the probability of observing at least one head is:
An event is something that occurs, or happens, with one or more possible outcomes.
An experiment is the process of taking a measurement or making an observation.
A simple event is the simplest outcome of an experiment.
The sample space is the set of all possible outcomes of an experiment, typically denoted by S.
Review Questions
i.
ii.
iii.
iv.
www.ck12.org
2. The Venn diagram below shows an experiment with six simple events. Events A and B are also shown. The
probabilities of the simple events are:
2
9
1
P(3) = P(5) = P(6) =
9
P(1) = P(2) = P(4) =
1.
a. Find P(A)
b. Find P(B)
2. A box contains two blue marbles and three red ones. Two marbles are drawn randomly without replacement.
Refer to the blue marbles as B1 and B2 and the red ones as R1, R2, and R3.
a. List the outcomes in the sample space.
b. Determine the probability of observing each of the following events:
(a)
126
www.ck12.org
Calculate the conditional probability that event A occurs, given that event B has occurred.
Understand the difference between independent and dependent events.
Introduction
In this lesson, you will learn about the concept of conditional probability and be presented with some examples
of how conditional probability is used in the real world. Once we understand the basics of conditional probability
in this chapter, we can begin to think about how these concepts can be used to determine whether two categorical
variables are related or, in the language of probability, whether they are independent.
Notation
We know that the probability of observing an even number on a throw of a die is 0.5. Let the event of observing an
even number be event A. Now suppose that we throw the die, and we know that the result is a number that is 3 or
less. Call this event B. Would the probability of observing an even number on that particular throw still be 0.5? The
answer is no, because with the introduction of event B, we have reduced our sample space from 6 simple events to 3
simple events. In other words, since we have a number that is 3 or less, we now know that we have a 1, 2 or 3. This
becomes, in effect, our sample space. Now the probability of observing a 2 is 31 . With the introduction of a particular
condition (event B), we have changed the probability of a particular outcome. The Venn diagram below shows the
reduced sample space for this experiment, given that event B has occurred:
The only even number in the sample space for B is the number 2. We conclude that the probability that A occurs,
given that B has occurred, is 1:3, or 13 . We write this with the notation P(A|B), which reads the probability of A,
given B. So for the die toss experiment, we would write P(A|B) = 31 .
Conditional Probability of Two Events
If A and B are two events, then the probability of event A occurring, given that event B has occurred, is called
conditional probability. We write it with the notation P(A|B), which reads the probability of A, given B.
127
www.ck12.org
To calculate the conditional probability that event A occurs, given that event B has occurred, take the ratio of the
probability that both A and B occur to the probability that B occurs. That is:
P(A|B) =
P(A B)
P(B)
For our example above, the die toss experiment, we proceed as is shown below:
P(A|B) =
P(A B)
P(2)
=
=
P(B)
P(1) + P(2) + P(3)
1
6
3
6
1
3
Example
A medical research center is conducting experiments to examine the relationship between cigarette smoking and
cancer in a particular city in the USA. Let A represent an individual who smokes, and let C represent an individual
who develops cancer. This means that AC represents an individual who smokes and develops cancer, AC0 represents
an individual who smokes but does not develop cancer, and so on. We have four different possibilities, or simple
events, and they are shown in the table below, along with their associated probabilities.
TABLE 6.12:
Simple Events
AC
AC0
A0C
A0C0
Probabilities
0.10
0.30
0.05
0.55
These simple events can be studied, along with their associated probabilities, to examine the relationship between
smoking and cancer.
We have:
A : individual smokes
www.ck12.org
P(CA)
P(A) .
Before we can use this relationship, we need to calculate the value of the denominator. P(A) is the probability of an
individual being a smoker in the city under consideration. To calculate it, remember that the probability of an event
is the sum of the probabilities of all its simple events. A person can smoke and have cancer, or a person can smoke
and not have cancer. That is:
P(C|A) =
P(C|A0 ) =
In this calculation, P(A0 ) = P(A0C) + P(A0C0 ) = 0.05 + 0.55 = 0.60. P(A0 ) can also be found by using the Complement Rule as shown: P(A0 ) = 1 P(A) = 1 0.40 = 0.60.
From these calculations, we can clearly see that a relationship exists between smoking and cancer. The probability
that a smoker develops cancer is 25%, and the probability that a nonsmoker develops cancer is only 8%. Keep in
mind, though, that it would not be accurate to say that smoking causes cancer. However, our findings do suggest a
strong link between smoking and cancer.
Independence
Suppose you are flipping a coin and at the same time rolling a dice. Obviously, the probability of rolling a 3 has
nothing to do with whether the coin lands heads or tails. Such events are known as independent.
Event B is said to be independent of event A if P(BlA) = P(B). Alternatively, P(AlB) = P(A)
If the above is not true, then the events are said to be dependent. There are other less obvious examples that we
frequently encounter. Suppose your math teacher was recently at an event featuring door prizes. The prizes varied
in value from water bottles to a kayak. Suppose there are 200 names in the drawing. If you are like you math teacher
and never win anything, would you prefer them to start the drawing with the kayak or the water bottles? How does
the probability of getting you name drawn change as they draw more names?
129
www.ck12.org
Lesson Summary
If Aand Bare two events, then the probability of event Aoccurring, given that event Bhas occurred, is called conditional probability. We write it with the notation P(A|B), which reads the probability of A, given B.
Conditional probability can be found with the equation P(A|B) =
P(AB)
P(B) .
Vocabulary
1. If P(A) = 0.3, P(B) = 0.7, and P(A B) = 0.15, Find P(A|B) and P(B|A).
2. Two fair coins are tossed. i. List the possible outcomes in the sample space. ii. Two events are defined as
follows:
A: {At least one head appears}
B: {Only one head appears}
Find P(A), P(B), P(A B), P(A|B), and P(B|A)
3. A box of six marbles contains two white, two red, and two blue. Two marbles are randomly selected without
replacement and their colors are recorded. i. List the possible outcomes in the sample space. ii. Let the
following events be defined:
A: {Both marbles have the same color}
B: {Both marbles are red}
C: {At least one marble is red or white}
Find P(B|A), P(B|A0 ), P(B|C), P(A|C), and P(C|A0 )
Review Answers
1. 0.21, 0.5
2. 3/4, 1/2, 1/2, 1, 2/3
3. 1/3, 0, 1/14, 1/7, 1
130
www.ck12.org
Apply what you have learned about conditional probabilities to determine if two categorical variables influence
each other or not.
Bivariate Relationships for Categorical Data
Suppose you conducted a survey where you asked each person two questions: "Do you have Cable TV?" and "Did
you go on vacation in the past year?." You now have data on two categorical variables for each person. As we have
seen, whenever you have two pieces of data from each person, you can organize the data into a two-way frequency
table, or contingency table.
Here is data collected from a group of individuals who answered those two questions in a contingency table:
TABLE 6.13:
Have Cable TV
Dont Have Cable TV
Total
Took a Vacation
97
14
111
No Vacation
38
17
55
Total
135
31
166
The numbers in the frequency table show the numbers of people that fit each pair of preferences. For example, 97
people have cable TV and took a vacation last year. 38 people have cable TV but did not take a vacation last year.
The totals of the rows and columns have been added to the frequency table for convenience. From the far right
column you can see that 135 people have cable TV and 31 people dont have cable TV for a total of 166 people
surveyed. From the bottom row you can see that 111 people took a vacation and 55 people did not take a vacation
for a total of 166 people surveyed.
You can use the two-way frequency table to calculate probabilities about the people surveyed. For example, you
could find:
a. The probability that a random person selected from this group took a vacation last year.
b. The probability that a random person from this group who has cable TV took a vacation last year.
c. Whether or not choosing a person with cable TV and choosing a person who took a vacation are independent events for this population of 166 people.
Example A
Suppose you choose a person at random from the group surveyed below. Let A be the event that the person chosen
took a vacation last year. Find P(A).
TABLE 6.14:
Have Cable TV
Took a Vacation
97
No Vacation
38
Total
135
131
www.ck12.org
14
111
17
55
31
166
Solution
There were 166 people surveyed, so there are 166 outcomes in the sample space. 111 people took a vacation last
year.
P(A) =
111
0.67 or 67%
166
Example B
Suppose you choose a person at random from the group surveyed below. Let A be the event that the person chosen
took a vacation last year. Let B be the event that the person chosen has cable TV. Find P(A|B).
TABLE 6.15:
Have Cable TV
Dont Have Cable TV
Total
Took a Vacation
97
14
111
No Vacation
38
17
55
Total
135
31
166
Solution
You are looking for the probability that the person took a vacation given that they have cable TV. Since you know
that the person has cable TV, the sample space has been restricted to the 135 people with cable TV. 97 of those
people took a vacation.
P(A|B) =
97
.72 or 72%
135
Suppose you wanted to use the conditional probability formula for this calculation.
P(A|B) =
P(A B)
=
P(B)
97
166
135
166
97
0.72 or 72%
135
With the conditional probability formula, each probability is calculated with the sample space of 166. The two
166s cancel each other out, and the result is the same. Sometimes it makes sense to use the conditional probability
formula, and sometimes it is easier to think logically about what is being asked.
Example C
Suppose you choose a person at random from the group surveyed below. Let A be the event that the person chosen
took a vacation last year. Let B be the event that the person chosen has cable TV. Are events A and B independent?
132
www.ck12.org
TABLE 6.16:
Have Cable TV
Dont Have Cable TV
Total
Took a Vacation
97
14
111
No Vacation
38
17
55
Total
135
31
166
Solution
Lets remind ourselves what it means for two events to be independent. As we learned in the last chapter, two events
are independent if the following statement is true:
P (A | B) = P (A)
From Example A, you know that P(A) = 67%. From Example B, you know that P(A|B) = 72% because these
probabilities are not equal, the two events are NOT independent (they are dependent). People with cable TV are
more likely to have taken a vacation as opposed to people without cable TV, so knowing that a person has cable TV
increases the probability that they took a vacation.
Vocabulary
A group of 110 students was surveyed about what grade they were in and whether they preferred dogs or cats. 20 9th
graders preferred dogs, 5 9th graders preferred cats, 16 10th graders preferred dogs, 4 10th graders preferred cats,
22 11th graders preferred dogs, 6 11th graders preferred cats, 30 12th graders preferred dogs, and 7 12th graders
preferred cats.
1. Construct a two-way frequency table to organize this data.
2. Suppose a person is chosen at random from this group. Let C be the event that the student prefers cats. Let T
be the event that the student is in 10th grade. Find P(C) and P(C|T ).
3. Are events C and T independent?
Solutions
TABLE 6.17:
9th Grade
Dogs
20
Cats
5
Total
25
133
www.ck12.org
16
22
30
88
4
6
7
22
20
28
37
110
22
2. There are 110 students total. 22 of them prefer cats. P(C) = 110
= 20%. P(C|T ) means the probability that the
student prefers cats given that they are in 10th grade. Restrict the sample space to the 20 10th grade students.
4
4 of them prefer cats. P(C|T ) = 20
= 20%.
3. The events are independent because P(C) = P(C|T ). Being in 10th grade does not affect the probability of the
student preferring cats.
More Practice
TABLE 6.18:
6th Grade
7th Grade
8th Grade
Total
Walk
30
25
40
95
Bus
120
170
130
420
Car
65
25
41
131
Total
215
220
211
646
6. If a student is chosen at random from this group, what is the probability that he or she is a 6th grade student
who takes the bus?
7. If a 6th grade student is chosen at random from this group, what is the probability that he or she takes the bus?
8. If a student who takes the bus is chosen at random from this group, what is the probability that he or she is in
134
www.ck12.org
6th grade?
9. The previous three questions each have to do with 6th grade and taking the bus. Why are the answers to these
questions different?
10. Are the events being in 6th grade and taking the bus independent? Justify your answer.
For 11-15, use the following information:
A hospital runs a test to determine whether or not patients have a particular disease. The test is not always accurate.
The two-way table below summarizes the numbers of patients in the past year that received each result.
TABLE 6.19:
Has Disease
Does Not Have Disease
Total
Total
104
572
676
11. If a patient is chosen at random from this group, what is the probability that he or she has the disease?
12. A patient from this group received a positive test result. What is the probability that he or she has the disease?
13. A patient from this group has the disease. What is the probability that he or she received a positive result on
the test?
14. A false positive is when a patient receives a positive result on the test, but does not actually have the disease.
What is the probability of a false positive for this sample space?
15. How many of the 676 patients received accurate test results?
135
www.ck12.org
C HAPTER
Functions as Models
Chapter Outline
136
7.1
R EVIEW OF F UNCTIONS
7.2
www.ck12.org
Learning Objectives
Definition
We start with remembering the definition of a function. A function is a set of ordered pairs in which the first
coordinate, usually x, matches with exactly one second coordinate, y. Equations that follow this definition can be
written in function notation. The y coordinate represents the dependent variable, meaning the values of this variable
depend upon what is substituted for the other variable.
A function can be expressed as an equation, as shown below. In the equation, f represents the function name and
(x) represents the variable. In this case the parentheses do not mean multiplication; rather, they separate the function
name from the independent variable.
input
You may have seen functions represented by a function machine. These emphasize the fact that functions are rules
that explain how the input and output are related. For example, the function below triples the value of the input (x)
and subtracts 1 from it. If 3 is fed into the machine, 3(3) 1 = 8 comes out.
137
www.ck12.org
When naming a function the symbol f (x) is often used. The symbol f (x) is pronounced as f of x. This means that
the equation is a function that is written in terms of the variable x. So, for example, if the function above is named
f , then we write the function as f(x) = 3x - 1.
Functions as Relations
www.ck12.org
Functions may be presented in many ways. Some of the most common ways to represent functions include:
a graph
ordered pairs
an equation
a table of values
an arrow or mapping diagram
Example A
TABLE 7.1:
Representation
Set of ordered pairs
Equation
Graph
Example
(1,3), (2,6), (3,9), (4,12) (a subset of the ordered pairs
for this function)
y = 3x
Solution
The figure above actually shows the same function depicted in three different ways!
In the first representation, we are given a set of ordered pairs. To verify that this is a function, we must ensure
that each x-value is associated with a single y-value. In this example, the first number in each pair (the x-value) is
different, so we can be certain that there are no cases where a particular x is associated with more than one y.
In the second representation, the equation of a line, it is apparent that any number put in place of x will result in a
different y, since the x number is simply being multiplied by 3.
The third representation above is a graph. A good way to determine whether a relation is a function when looking
at a graph is by doing a "vertical line test". If a vertical line can be drawn anywhere on the graph such that the line
crosses the relation in two places, then the relation is not a function. If all possible vertical lines will only cross the
relation in one place, then the relation is a function.
139
www.ck12.org
The vertical line test works because if a vertical line crosses a relation in more than one place it means that there
must be two y values corresponding to one x value in that relation.
The graph above of y = 3x shows it is a function because any vertical line that is drawn only crosses the relation in
one place.
Conversely, the graph below of x = y2 shows it is not a function because a vertical line can be drawn that crosses the
relation in two places.
Example B
Solutions
a. This relation is not a function because 5 is paired with 11 and with 12.
b. This relation is a function because every x is paired with only one y. A vertical line through the graph will
always only encounter a single point.
140
www.ck12.org
Function Families
Functions come in all different shapes, and there is great value in being able to recognize a function pattern. If
mathematicians are cooks, then families of functions are their ingredients. Each family of functions has its own
flavor and personality. Before you learn to combine functions to create an infinite number of potential models, you
need to get a clear idea of the name of each function family and how it acts.
The following are functions that you may have seen in previous math courses:
The Squaring Function: f (x) = x2
The squaring function is commonly called a parabola and is useful for modeling the motion of falling objects. All
parabolas are transformations of this squaring function.
The Reciprocal Function: f (x) = x1 =
1
x
The reciprocal function is also known as a hyperbola and a rational function. It has two parts that are disconnected
and is not defined at zero. Simple electric circuits are modeled with the reciprocal function.
In this course, we will be working with three function families that are quite common in the kind of data that we
would like to model. These include linear, exponential and logistic functions.
The Identity (Linear) Function: f (x) = x
141
www.ck12.org
The identity function is the simplest function and all straight lines are transformations of the identity function family.
The Exponential Function Family: f (x) = ex
The exponential function family is one of the first functions you see where x is not the base of the exponent. This
function eventually grows much faster than any power function. f (x) = 2x is a very common exponential function
as well. Many applications like biology and finance require the use of exponential growth.
The Logistic Function: f (t) =
C
1+abt
C
1+aekt
The logistic function is a combination of the exponential function and the reciprocal function. This curve is
very powerful because it models population growths where the maximum population is limited by environmental
resources.
Vocabulary
www.ck12.org
Guided Practice
Solutions
1. There are two different outputs (or y-values) for the input (or x-value) of 1. Because we cannot know
whether 1 should go with 5 or 7 at any given time, this relation is not a function.
2. Since y = x, any time a number is chosen to represent x, that, and only that, number becomes y. From this it is
apparent that each input has one and only one output: This relation is a function.
3. Dont be fooled! This is a function, there is only one unique output for each input. The fact that both x values
2.1 and 1 are associated with y value 4 does not mean that 2.1 and 1 do not have a specific associated value.
Also, no matter how close two xs (2 and 2.1, for instance) may be, if they are not exactly the same, they dont
affect the definition of a function.
4. This is a function, very similar to #2. Any value chosen for x has one and only one associated value for y (4
times as big).
5. This is not a function. This graph looks like a "<", with the point on the origin. Any value chosen for x will
have 2 associated y values. For instance: 4 = |-4| and 4 = |4|.
More Practice
1.
2.
3.
4.
5.
For Questions 6 - 14, identify each relation as either a function, or not a function:
6. (2, 4) (4, 6) (6, 8) (3, 4) (5, 7) (8, 2)
7. (-1, 6) (0, 4) (-4, 0) (-1, -6) (-3, -8)
8.
9.
143
12.
13.
14.
15. At a Prom dance, each boy pins a corsage on his date. Is this an example of a function?
16. Later, at the same dance, Cory shows up with two dates, does this change the answer?
144
www.ck12.org
www.ck12.org
As you learn more and more mathematical methods and skills, it is important to think about the purpose of mathematics and how it works as part of a bigger picture. Mathematics is used to solve problems that often arise from
real-life situations. Mathematical modeling is a process by which we start with a real-life situation and arrive at a
quantitative solution using the tools of mathematics.
Modeling involves creating a set of mathematical equations that describes a situation, solving those equations, and
using them to understand the real-life problem. Often the model needs to be adjusted because it does not describe
the situation as well as we wish.
A mathematical model can be used to gain understanding of a real-life situation by learning how the system works,
which variables are important in the system, and how they are related to each other. Models can also be used
to predict what a system will do for different values of the independent variable. Lastly, a model can be used to
estimate quantities that are difficult to evaluate exactly.
Mathematical models are like other types of models. The goal is not to produce an exact copy of the real object
but rather to give a representation of some aspect of the real thing. One of the most difficult parts of the modeling
process is determining which function best describes a situation. We often find that the function we choose is not
appropriate.Then we must choose a different one. Keep this in mind: if we try to fit a function to real data, it will
rarely fit perfectly. However, if the function fits the data fairly well, we can chose it as our model. We can then use
the properties of that function to describe what is happening in our data. That is what it means to model data.
Examples of How Models are Fit
Below are several examples of how mathematical models can be fit to data to help explain the behavior of the data
and allow us to predict what may happen if the values were to change. For now, read along and follow the thought
process that is used to select a model and evaluate its fit. In the next few sections, you will be learning how to fit
linear, exponential and logistic models to data and interpret what those models tell us.
Example A
You have a cylinder that is filled with water to a height of 50 centimeters. The cylinder has a hole at the bottom
which is covered with a stopper. The stopper is released at time t = 0 seconds and allowed to empty. The following
data shows the height of the water in the cylinder at different times.
Time(sec)
10
12
14
16
18
20
22
24
Height(cm)
50
42.5
35.7
29.5
23.8
18.8
14.3
10.5
7.2
4.6
2.5
1.1
0.2
145
www.ck12.org
a. Find the height (in centimeters) of water in the cylinder as a function of time in seconds.
b. Find the height of the water when t = 5 seconds.
c. Find the height of the water when t = 13 seconds.
Solution
Begin by graphing to get a visual image of what the relationship looks like. Lets begin by defining the variables.
Define x = the time in seconds
y = height of the water in centimeters
Notice that most of the points seem to fit on a straight line when the water level is high. Assume that a function
relating the height of the water to the time is linear.
14.3
4
= 3.58
35.7 = 3.58(4) + b
b = 50
www.ck12.org
We can conclude that when the water level is high, the relationship between the height of the water and the time
is a linear function. When the water level is low, we must change our assumption. There must be a non-linear
relationship between the height and the time.
www.ck12.org
We conclude that a quadratic function represents the situation more accurately than a linear function. However, for
high water levels the linear function is an equally good representation.
Example B
A scientist counts 2,000 fish in a lake. The fish population increases at a rate of 1.5 fish per generation but the lake
has space and food for only 2,000,000 fish. The following table gives the number of fish (in thousands) in each
generation.
Generation
12
16
20
24
28
Number (thousands)
15
75
343
1139
1864
1990
1999
Solution
We will define the variables and graph the relationship again.
Define x = the generation number
y = the number of fish in the lake.
148
www.ck12.org
We know that a population can increase exponentially. So, maybe we assume that we can use an exponential function
to describe the relationship between the generation number and the number of fish.
Solve
a. Since the population increases at a rate of 1.5 per generation, assume the function y = 2(1.5)x
b. The number of fish in generation 10 is: y = 2(1.5)10 = 115 thousand fish
c. The number of fish in generation 25 is: y = 2(1.5)25 = 50502 thousand fish
Check
To check the validity of the solutions, lets plot the answers to b) and c) on the scatter plot (see the red dots above).
We see that the answer to b) fits the data well but the answer to c) does not seem to follow the trend very closely.
The result is not even on our graph!
When the population of fish is high, the fish compete for space and resources so they do not increase as fast. We
must change our assumptions.
2023.6
1+1706.3(2.71)484x
2023.6
1+1706.3(2.71).484(10)
2023.6
1+1706.3(2.71).484(25)
Check
To check the validity of the solutions, lets plot the answers to b) and c) on the scatter plot. We see that the answer
to both b) and c) are close to the rest of the data.
149
www.ck12.org
We conclude that a logistic function represents the situation more accurately than an exponential function. However,
for small populations the exponential function is an equally good representation, and it is much easier to use in most
cases.
Example C
Dependent Variable
Independent Variable
3.31
3.64
4.4
4.84
5.324
Solution
Start by graphing the points, to get a sense of the shape. This can help you rule out certain models, and lead you in
the direction of which models might be a good fit.
150
www.ck12.org
The graph almost looks straight, but has a slight curve, so it cant be linear. It could be quadratic or cubic, but lets
check if its exponential:
3.31
3.64
3.64
= 1.1
3.31
4.4
4.4
= 1.1
4
4.84
4.84
= 1.1
4.4
5.324
5.324
= 1.1
4.84
4
= 1.1
3.64
Since the ratios are all the same and equal to 1.1, it is an exponential function with a growth factor of 1.1. Given the
point (0,4), the initial value is 4. The function is:
f (x) = 4 (1.1)x .
Review Questions
A scientist counts 2,000 fish in a lake. The fish population increases at a rate of 1.5 fish per generation but the lake
has space and food for only 2,000,000 fish. The following table gives the number of fish (in thousands) in each
generation.
Generation
12
16
20
24
28
Number (thousands)
15
75
343
1139
1864
1990
1999
1. Which function seems to best fit the data: linear, quadratic, or exponential?
2. Find the model for the function of best fit.
3. Find the number of fish as a function of generation.
151
152
www.ck12.org
www.ck12.org
C HAPTER
Linear Models
Chapter Outline
8.1
8.2
153
www.ck12.org
Introduction
We come across many examples of slope in everyday life. For example, a slope is in the pitch of a roof, the grade or
incline of a road, and the slant of a ladder leaning on a wall. In math, we use the word slope to define steepness in a
particular way.
Slope =
Slope =
rise
run
Slope =
rise 4y 3
=
= = 0.75
run 4x 4
If the car were driving to the right it would climb the hill. We say this is a positive slope. Anytime you see the graph
of a line that goes up as you move to the right, the slope is positive.
154
www.ck12.org
If the car were to keep driving after it reached the top of the hill, it may come down again. If the car is driving to the
right and descending, then we would say that the slope is negative. The picture above has a negative slope of -0.75.
So as we move from left to right, positive slopes increase while negative slopes decrease.
The slope of a function that describes real, measurable quantities is often called a rate of change. In that case, the
slope refers to a change in one quantity (y) per unit change in another quantity (x).
Example A
Andrea has a part time job at the local grocery store. She saves for her vacation at a rate of $15 every week. Express
this rate as money saved per day and money saved per year.
Converting rates of change is fairly straight forward so long as you remember the equations for rate (i.e. the equations
for slope) and know the conversions. In this case 1 week = 7 days and 52 weeks = 1 year.
$15 1 week
$15
15
=
=
dollars per day $2.14 per day
1 week 7 days
7 days
7
$15 52 week
52
rate =
= $15
= $780 per year
1 week 1 year
year
rate =
Example B
A candle has a starting length of 10 inches. Thirty minutes after lighting it, the length is 7 inches. Determine the rate
of change in length of the candle as it burns. Determine how long the candle takes to completely burn to nothing.
In this case, we will graph the function to visualize what is happening.
We have two points to start with. We know that at the moment the candle is lit (time = 0) the length of the candle is
10 inches. After thirty minutes (time = 30) the length is 7 inches. Since the candle length is a function of time we
will plot time on the horizontal axis, and candle length on the vertical axis. Here is a graph showing this information.
155
www.ck12.org
The rate of change of the candle is simply the slope. Since we have our two points (x1 , y1 ) = (0, 10) and (x2 , y2 ) =
(30, 7) we can move straight to the formula.
Rate of change =
4y y2 y1
(7 inches) (10 inches)
3 inches
=
=
=
= 0.1 inches per minute
4x x2 x1 (30 minutes) (0 minutes) 30 minutes
The slope is negative. A negative rate of change means that the quantity is decreasing with time.
We can also convert our rate to inches per hour.
rate =
=
= 6 inches per hour
1 minute
1 hour
1 hour
To find the point when the candle burns to nothing, or reaches zero length, we can read off the graph (100 minutes).
We can use the rate equation to verify this algebraically.
Slope is a measure of change in the vertical direction for each step in the horizontal direction.
4y
Slope = rise
run or slope = 4x
1
The slope between two points (x1 , y1 ) and (x2 , y2 ) = yx22 y
x1
Review Questions
Use the slope formula to find the slope of the line that passes through each pair of points.
1. (-5, 7) and (0, 0)
2. (-3, -5) and (3, 11)
156
www.ck12.org
3.
4.
5.
6.
7.
157
www.ck12.org
Learning Objectives
Introduction
Earlier we learned that correlation could be used to assess the strength and direction of a linear relationship between
two variables. We llustrated the concept of correlation through scatterplot graphs. We saw that when variables were
correlated, the points on a scatterplot graph tended to follow a straight line. If we could draw this straight line, it
would, in theory, represent the change in one variable associated with the change in the other. This line is called the
least squares line, or the linear regression line, or the line of best fit.
Linear regression involves using data to calculate a line that best fits that data, and then using that line to predict
scores on one variable from another. Prediction is simply the process of estimating scores of the outcome (or
dependent) variable based on the scores of the predictor (or independent) variable.
To generate the regression line, we look for a line of best fit. There are many ways one could define this best fit.
Statisticians define the best-fit line to be the one that minimizes the sum of the squared distances from the observed
data to the line. This method of fitting the data line so that there is minimal difference between the observations and
the line is called the method of least squares.
Our goal in the method of least squares is to fit the regression line to the data by having the smallest sum of squared
distances possible from each of the data points to the line. In the example below, you can see the calculated distances,
or residual values, from each of the observations to the regression line. The smaller the residuals, the better the fit.
158
www.ck12.org
As you can see, the regression line is a straight line that expresses the relationship between two variables. Since the
regression line is used to predict the value of Y for any given value of X, all predicted values will be located on the
regression line itself.
When predicting one score by using another, we use an equation such as the following, which is equivalent to the
slope-intercept form of the equation for a straight line:
Y = bX + a
where:
Y (pronounced Y-hat) is the score that we are trying to predict.
b is the slope of the line; it is also called the regression coefficient.
a is the y-intercept; it is also called the regression constant. It is the value of Y when the value of X is 0.
Linear Models
In the previous chapter, we discussed the average rate of change of a function on an interval. For many functions,
the average rate of change is different on different intervals. A linear function, however, has the same average rate
of change on every interval. When a linear model is used to describe data, it is assuming a constant rate of change.
The general formula of the linear model allows for easy calculation and interpretation of both the slope (the constant
rate of change) and the y-intercept. In this model:
Y = bX + a
the slope, b, tells us the change in the dependent (Y) variable for every unit change in the independent variable (X),
and the y-intercept, a, tells us the value of Y when X=0.
Example A
Imagine that a town of 30,000 people grows by 2,000 people each year. Since the population, P, is growing at a
constant rate of 2,000 people per year, P is a linear function of time, t. To generate the equation of this model, we
calculate the slope and y-intercept.
159
www.ck12.org
Solution
What is the slope? Remember that the slope of a linear function is the average rate of change. We know that average
rate of change of the population is 2,000 people per year. Therefore, b=2,000.
What is the y-intercept? The y-intercept is the size of the population at time=0. We should treat t=0 as the initial
size of the towns population. So, in this problem, P=30,000.
This means our equation looks like this:
P = 2, 000t + 30, 000
We can then easily calcuate the size of the population in 10 years. At a steady growth rate of 2,000 people per year,
the town will grow from 30,000 to 50,000 over 10 years.
b=
n xy x y
n x2 ( x)2
or
b = (r)
sY
sX
where:
r is the correlation between the variables X and Y .
sY is the standard deviation of the Y scores.
sX is the standard deviation of the X scores.
To calculate the regression constant a (or y-intercept), we use the following formula:
a=
ybx
n
or
a = y bx
Example B
Find the least squares line (also known as the linear regression line or the line of best fit) for the example measuring
the verbal SAT scores and GPAs of students that was used in the previous section.
TABLE 8.1:
regression.
Student
160
SAT and GPA data including intermediate computations for computing a linear
GPA (Y )
xy
x2
y2
www.ck12.org
GPA (Y )
3.4
3.2
3.9
2.3
3.9
2.5
3.5
22.7
xy
2023
1664
2789
932
2652
1225
1978
13262
x2
354025
270400
511225
164025
462400
240100
319225
2321400
y2
11.56
10.24
15.21
5.29
15.21
6.25
12.25
76.01
Using these data points, we first calculate the regression coefficient and the regression constant as follows:
b=
a=
n xy x y
n x2 ( x)2
2715
(7)(13, 262) (3, 970)(22.7)
=
0.0056
2
(7)(2, 321, 400) 3, 970
488900
ybx
0.094
n
NOTE: If you performed the calculations yourself and did not get exactly the same answers, it is probably due to
rounding in the table for xy.
Now that we have the equation of this line, it is easy to plot on a scatterplot. To plot this line, we simply substitute
two values of X and calculate the corresponding Y values to get two pairs of coordinates. Lets say that we wanted
to plot this example on a scatterplot. We would choose two hypothetical values for X (say, 400 and 500) and then
solve for Y in order to identify the coordinates (400, 2.334) and (500, 2.89). From these pairs of coordinates, we can
draw the regression line on the scatterplot.
FIGURE 8.1
161
www.ck12.org
One of the uses of a regression line is to predict values. After calculating this line, we are able to predict values by
simply substituting a value of a predictor variable, X, into the regression equation and solving the equation for the
outcome variable, Y . In our example above, we can predict the students GPAs from their SAT scores by plugging
in the desired values into our regression equation, Y = 0.0056X + 0.094.
For example, say that we wanted to predict the GPA for two students, one who had an SAT score of 500 and the
other who had an SAT score of 600. To predict the GPA scores for these two students, we would simply plug the
two values of the predictor variable into the equation and solve for Y (see below).
TABLE 8.2: GPA/SAT data, including predicted GPA values from the linear regression.
Student
1
2
3
4
5
6
7
Hypothetical
Hypothetical
GPA (Y )
3.4
3.2
3.9
2.3
3.9
2.5
3.5
Predicted GPA (Y )
3.4
3.0
4.1
2.3
3.9
2.8
3.2
3.4
2.9
As you can see, we are able to predict the value for Y for any value of Xwithin a specified range.
An outlier is an extreme observation that does not fit the general correlation or regression pattern (see figure
below). In the regression setting, outliers will be far away from the regression line in the y-direction. Since it
is an unusual observation, the inclusion of an outlier may affect the slope and the y-intercept of the regression line.
When examining a scatterplot graph and calculating the regression equation, it is worth considering whether extreme
observations should be included or not. In the following scatterplot, the outlier has approximate coordinates of (30,
6,000).
Lets use our example above to illustrate the effect of a single outlier. Say that we have a student who has a high
GPA but who suffered from test anxiety the morning of the SAT verbal test and scored a 410. Using our original
regression equation, we would expect the student to have a GPA of 2.2. But, in reality, the student has a GPA equal
to 3.9. The inclusion of this value would change the slope of the regression equation from 0.0055 to 0.0032, which
is quite a large difference.
162
www.ck12.org
There is no set rule when trying to decide whether or not to include an outlier in regression analysis. This decision
depends on the sample size, how extreme the outlier is, and the normality of the distribution. For univariate data, we
can use the IQR rule to determine whether or not a point is an outlier. We should consider values that are 1.5 times
the inter-quartile range below the first quartile or above the third quartile as outliers. Extreme outliers are values that
are 3.0 times the inter-quartile range below the first quartile or above the third quartile.
An influential point in regression is one whose removal would greatly impact the equation of the regression line.
Usually, an influential point will be separated in the x direction from the other observations. It is possible for an
outlier to be an influential point. However, there are some influential points that would not be considered outliers.
These will not be far from the regression line in the y-direction (a value called a residual, discussed later) so you
must look carefully for them. In the following scatterplot, the influential point has approximate coordinates of (85,
35,000).
It is important to determine whether influential points are 1) correct and 2) belong in the population. If they are not
correct or do not belong, then they can be removed. If, however, an influential point is determined to indeed belong
in the population and be correct, then one should consider whether other data points need to be found with similar
x-values to support the data and regression line.
Calculating Residuals and Understanding their Relation to the Regression Equation
Recall that the linear regression line is the line that best fits the given data. Ideally, we would like to minimize the
distances of all data points to the regression line. These distances are called the error, e, and are also known as the
residual values. As mentioned, we fit the regression line to the data points in a scatterplot using the least-squares
method. A good line will have small residuals. Notice in the figure below that the residuals are the vertical distances
between the observations and the predicted values on the regression line:
To find the residual values, we subtract the predicted values from the actual values, so e = y y.
Theoretically,
the sum of all residual values is zero, since we are finding the line of best fit, with the predicted values as close as
163
www.ck12.org
possible to the actual value. It does not make sense to use the sum of the residuals as an indicator of the fit, since,
again, the negative and positive residuals always cancel each other out to give a sum of zero. Therefore, we try to
minimize the sum of the squared residuals, or (y y)
2.
Example C
Calculate the residuals for the predicted and the actual GPAs from our sample above.
GPA (Y )
1
2
3
4
5
6
7
2
(y y)
595
520
715
405
680
490
565
3.4
3.2
3.9
2.3
3.9
2.5
3.5
Predicted GPA
(Y )
3.4
3.0
4.1
2.3
3.9
2.8
3.2
Residual Value
0
0.2
0.2
0
0
0.3
0.3
Residual Value
Squared
0
0.04
0.04
0
0
0.09
0.09
0.26
To test for linearity and to determine if we should drop extreme observations (or outliers) from our analysis, it is
helpful to plot the residuals. When plotting, we simply plot the x-value for each observation on the x-axis and then
plot the residual score on the y-axis. When examining this scatterplot, the data points should appear to have no
correlation, with approximately half of the points above 0 and the other half below 0. In addition, the points should
be evenly distributed along the x-axis. Below is an example of what a residual scatterplot should look like if there
are no outliers and a linear relationship.
FIGURE 8.2
164
www.ck12.org
If the scatterplot of the residuals does not look similar to the one shown, we should look at the situation a bit more
closely. For example, if more observations are below 0, we may have a positive outlying residual score that is
skewing the distribution, and if more of the observations are above 0, we may have a negative outlying residual
score. If the points are clustered close to the y-axis, we could have an x-value that is an outlier. If this occurs, we
may want to consider dropping the observation to see if this would impact the plot of the residuals. If we do decide
to drop the observation, we will need to recalculate the original regression line. After this recalculation, we will have
a regression line that better fits a majority of the data.
Lesson Summary
Prediction is simply the process of estimating scores of one variable based on the scores of another variable. We use
the least-squares regression line, or linear regression line, to predict the value of a variable.
Using this regression line, we are able to use the slope, y-intercept, and the calculated regression coefficient to predict
the scores of a variable. The predictions are represented by the variable y.
The differences between the actual and the predicted values are called residual values. We can construct scatterplots
of these residual values to examine outliers and test for linearity.
Review Questions
1. A school nurse is interested in predicting scores on a memory test from the number of times that a student
exercises per week. Below are her observations:
TABLE 8.4: A table of memory test scores compared to the number of times a student exercises
per week.
Student
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
a.
a. Plot this data on a scatterplot, with the x-axis representing the number of times exercising per week and
the y-axis representing memory test score.
b. Does this appear to be a linear relationship? Why or why not?
c. What regression equation would you use to construct a linear regression model?
d. What is the regression coefficient in this linear regression model and what does this mean in words?
e. Calculate the regression equation for these data.
165
www.ck12.org
166
www.ck12.org
C HAPTER
Exponential Models
Chapter Outline
9.1
E XPONENTIAL G ROWTH
9.2
E XPONENTIAL D ECAY
9.3
9.4
9.5
167
www.ck12.org
Exponential functions are different than other functions you have seen before because now the variable appears as
the exponent (or power) instead of the base. In this section, we will be working with functions where the base is a
constant number and the exponent is the variable.
In general, the exponential function takes the form:
y = A bx
where A is the initial value, and b is the growth factor, the amount that y gets multiplied by each time the value of
x increases by 1. The growth factor can also be expressed as
b = 1+r
where r is the rate of change. A population that grows by 5% each year would have a growth factor of
b = 1 + .05 = 1.05.
Example A
A colony of bacteria has a population of three thousand at noon on Sunday. During the next week, the colonys
population doubles every day. What is the population of the bacteria colony at noon on Saturday? In this example,
we want to decribe something that doubles every time x increased by one. You could also think about this problem
as bacteria having a rate of change of 100% every day, or b = 1 + 1 = 2.
The exponential function, therefore, is y = 2x .
Lets make a table of values and calculate the population each day.
TABLE 9.1:
Day
Population
(in
thousands)
0 (Sun)
3
1 (Mon)
6
2 (Tues)
12
3 (Wed)
24
4 (Thurs)
48
5 (Fri)
96
6 (Sat)
192
To get the population of bacteria for the next day we simply multiply the current days population by 2.
168
www.ck12.org
P=3
P = 32
P = 322
P = 3222
P = 32222
P = 322222
P = 3222222
You can see that this function describes a population that is multiplied by 2 each time a day passes.
If we define x as the number of days since Sunday at noon, then we can write the following.
P = 3.2x
This is a formula that we can use to calculate the population on any day.
For instance, the population on Saturday at noon will be P = 3.26 = 3.64 = 192 (thousand) bacteria.
We used x = 6, since Saturday at noon is six days after Sunday at noon.
Lets start this section by graphing some exponential functions. Since we dont yet know any special properties of
exponential functions, we will graph using a table of values.
Example B
Graph the equation y = 2x .
Solution
Lets begin by making a table of values that includes both negative and positive values of x.
To evaluate the positive values of x, we just plug into the function and evaluate.
x = 1,
y = 21 = 2
x = 2,
y = 22 = 2 2 = 4
x = 3,
y = 23 = 2 2 2 = 8
TABLE 9.2:
x
-3
-2
y
1
8
1
4
169
www.ck12.org
y
1
2
1
2
4
8
x = 0,
y = 20 = 1
To evaluate the negative values of x, we must remember that x to a negative power means one over x to the same
positive power.
x = 1,
x = 2,
x = 3,
1
1
=
1
2
2
1
1
2
y=2 = 2 =
2
4
1
1
y = 23 = 3 =
2
8
y = 21 =
When we plot the points on the coordinate axes we get the graph below. Exponentials always have this basic shape.
That is, they start very small and then, once they start growing, they grow faster and faster, and soon they become
extremely big!
You may have heard people say that something is growing exponentially. This implies that the growth is very quick.
An exponential function actually starts slow, but then grows faster and faster all the time. Specifically, our function
y above doubled each time we increased x by one.
This is the definition of exponential growth. There is a consistent fixed period during which the function will double
or triple, or quadruple. The change is always a fixed proportion.
170
www.ck12.org
Lets graph a few more exponential functions and see what happens as we change the constants A and b in the
functions. The basic shape of the exponential function should stay the same, but the curve may become steeper or
shallower depending on the constants we are using.
We mentioned that the general form of the exponential function is y = A bx where A is the initial amount and b is
the factor that the amount gets multiplied by each time x is increased by one. Lets see what happens for different
values of A.
Example C
Graph the exponential function y = 3 2x and compare with the graph of y = 2x .
Solution
Lets make a table of values for y = 3 2x .
TABLE 9.3:
x
-2
-1
0
1
2
3
y = 3 2x
y = 3 22 = 3 212 = 43
y = 3 21 = 3 211 = 23
y = 3 20 = 3
y = 3 21 = 6
y = 3 22 = 3 4 = 12
y = 3 23 = 3 8 = 24
We can see that the function y = 3 2x is bigger than function y = 2x . In both functions, the value of y doubled every
time x increases by one. However, y = 3 2x starts with a value of 3, while y = 2x starts with a value of 1, so it
makes sense that y = 3 2x would be bigger as its values of y keep getting doubled.
You might think that if the initial value A is less than one, then the corresponding exponential function would be less
than y = 2x . This is indeed correct. Lets see how the graphs compare for A = 13 .
171
www.ck12.org
Example D
Graph the exponential function y = 13 2x and compare with the graph of y = 2x .
Solution
Lets make a table of values for y = 31 2x .
TABLE 9.4:
x
2
-1
0
1
2
3
y=
y=
y=
y=
y=
y=
y=
1
3
1
3
1
3
1
3
1
3
1
3
1
3
2x
22 = 13 212 =
21 = 13 211 =
20 = 13
21 = 23
22 = 13 4 = 34
23 = 13 8 = 38
1
12
1
6
Example E
Now lets explore what happens when we change the value of b. Graph the following exponential functions of the
same graph y = 2x , y = 3x , y = 5x , y = 10x .
172
www.ck12.org
Solution
To graph these functions we should start by making a table of values for each of them.
TABLE 9.5:
x
-2
-1
0
1
2
3
y = 2x
y = 3x
y = 5x
y = 10x
1
4
1
2
1
9
1
3
1
25
1
5
1
100
1
10
1
2
4
8
1
3
9
27
1
5
25
125
1
10
100
1000
Notice that for x = 0 the values for all the functions are equal to 1. This means that the initial value of the functions
is the same and equal to 1. Even though all the functions start at the same value, they increase at different rates. We
can see that the bigger the base is the faster the values of y will increase. It makes sense that something that triples
each time will increase faster than something that just doubles each time.
Solution
As you would expect, the graph of ex will curve between 2x and 3x .
173
www.ck12.org
The asymptote is y = 0 and the y-intercept is (0, 1) because anything to the zero power is one. The domain is all real
numbers and the range is all positive real numbers; y > 0.
We will now examine some real-world problems where exponential growth occurs.
Example F
The population of a town is estimated to increase by 15% per year. The population today is 20 thousand. Make a
graph of the population function and find out what the population will be ten years from now.
Solution
First, we need to write a function that describes the population of the town. The general form of an exponential
function is.
y = A bx
Define y as the population of the town.
Define x as the number of years from now.
A is the initial population, so A = 20 (thousand)
Finally, we must find what b (the growth factor) is. We are told that the population increases by 15% each year. This
means that the value of b in our exponential equation would be
b = 1 + r = 1 + 0.15 = 1.15
174
www.ck12.org
Here is another way to reach the value of b: In order to get the total population for the following year, we must add
the current population to the increase in population. In other words A + 0.15A = 1.15A. We see from this that the
population must be multiplied by a factor of 1.15 each year.
This means that the base of the exponential is b = 1.15.
The formula that describes this problem is y = 20 (1.15)x
TABLE 9.6:
y = 20 (1.15)x
4.9
9.9
20
40.2
80.9
x
10
-5
0
5
10
Notice that we used negative values of x in our table of values. Does it make sense to think of negative time? In this
case x = 5 represents what the population was five years ago, so it can be useful information. The question asked
in the problem was What will be the population of the town ten years from now?
To find the population exactly, we use x = 10 in the formula. We found
y = 20 (1.15)10 = 80.911 thousands.
Example G
Peter earned $1500 last summer. If he deposited the money in a bank account that earns 5% interest compounded
yearly, how much money will he have after five years?
175
www.ck12.org
Solution
This problem deals with interest which is compounded yearly. This means that each year the interest is calculated on
the amount of money you have in the bank. That interest is added to the original amount and next year the interest is
calculated on this new amount. In this way, you get paid interest on the interest.
Lets write a function that describes the amount of money in the bank. The general form of an exponential function
is:
y = A bx
Define y as the amount of money in the bank.
Define x as the number of years from now.
A is the initial amount, so A = 1500.
Example H
Gianna opens a savings account with $1000 and it accrues interest continuously at a rate of 5%. What is the balance
in the account after 6 years?
Solution
When solving a problem that involves continuous growth, you use the base e. In this example,the equation for
continuous growth is A = Pert , where A is the balance in the account, P is the amount put into the account when it
was opened, r is the continuous rate of change, and t is the time the account was open. Therefore, the equation for
this problem is A = 1000e0.05(6) and the account will have $1349.86 in it.
Example I
The population of Springfield is growing exponentially. The growth can be modeled by the function P = Ie0.055t ,
where P represents the projected population, I represents the current population of 100,000 in 2012 and t represents
the number of years after 2012
a. To the nearest person, what will the population be in 2022?
b. In what year will the population double in size if this growth rate continues?
176
www.ck12.org
Example J
Naya invests $7500 in an account which accrues interest continuously at a rate of 4.5%.
a. Write an exponential growth function to model the value of her investment after t years.
b. How much interest does Naya earn in the first six months to the nearest dollar?
c. How much money, to the nearest dollar, is in the account after 8 years?
Review Questions
y = 3x
y = 5 3x
y = 40 4x
y = 3 10x
177
www.ck12.org
In the last section, we looked at graphs of exponential functions. We saw that exponentials functions describe a
quantity that doubles, triples, quadruples, or simply gets multiplied by the same factor. All the functions we looked
at in the last section were exponentially increasing functions. They started small and then became large very fast. In
this section, we are going to look at exponentially decreasing functions. An example of such a function is a quantity
that gets decreased by one half each time. Lets look at a specific example.
For her fifth birthday, Nadias grandmother gave her a full bag of candy. Nadia counted her candy and found out that
there were 160 pieces in the bag. As you might suspect Nadia loves candy so she ate half the candy on the first day.
Her mother told her that if she eats it at that rate it will be all gone the next day and she will not have anymore until
her next birthday. Nadia devised a clever plan. She will always eat half of the candy that is left in the bag each day.
She thinks that she will get candy every day and her candy will never run out. How much candy does Nadia have at
the end of the week? Would the candy really last forever?
Day
No. of Candies
160
80
40
20
10
2.5
1.25
You can see that if Nadia eats half the candies each day, then by the end of the week she only has 1.25 candies left
in her bag.
Lets write an equation for this exponential function.
y = 160
1
2
1 1
y = 160
2 2
y = 160
You see that in order to get the amount of candy left at the end of each day we keep multiplying by 12 . Another way
of thinking about this is that the rate of change is - 50%, or -.5. This means b = 1 + r = 1 .5 = .5
178
www.ck12.org
y = 160
1x
2
Notice that this is the same general form as the exponential functions in the last section.
y = A bx
Here A = 160 is the initial amount and b = 12 is the factor that the quantity gets multiplied by each time. The
difference is that now b is a fraction that is less than one, instead of a number that is greater than one.
Lets now graph the candy problem function. The resulting graph is shown below.
So, will Nadias candy last forever? We saw that by the end of the week she has 1.25 candies left so there does not
seem to be much hope for that. But if you look at the graph you will see that the graph never really gets to zero.
Theoretically there will always be some candy left, but she will be eating very tiny fractions of a candy every day
after the first week!
179
www.ck12.org
This is a fundamental feature of an exponential decay function. Its value get smaller and smaller and approaches
zero but it never quite gets there. In mathematics we say that the function asymptotes to the value zero. This means
that it approaches that value closer and closer without ever actually getting there.
Graph an Exponential Decay Function
The graph of an exponential decay function will always take the same basic shape as the one in the previous figure.
Lets graph another example by making a table of values.
Example A
Graph the exponential function y = 5
1 x
2
Solution
Lets start by making a table of values.
TABLE 9.7:
x
3
-2
-1
0
1
2
y = 5
y = 5
y = 5
y = 5
y = 5
y = 5
y = 5
1 x
2
1 3
= 5.23 =
2
2
1
= 5.22 =
2
1
1
= 5.21 =
2
0
1
2 = 5.1 = 5
5
1 1
2 = 2
2
1
= 54
2
40
20
10
Remember that a fraction to a negative power is equivalent to its reciprocal to the same positive power.
We said that an exponential decay function has the same general form as an exponentially increasing function, but
that the base b is a positive number less than one. When b can be written as a fraction, we can use the Property of
Negative Exponents that we discussed in Section 8.3 to write the function in a different form.
180
www.ck12.org
For instance, y = 5
is equivalent to 5 2x .
These two forms are both commonly used so it is important to know that they are equivalent.
Example B
Graph the exponential function y = 8 3x .
Solution
Here is our table of values and the graph of the function.
TABLE 9.8:
x
-3
-2
-1
0
1
2
y = 8 3x
y = 8.3(3) = 8 33 = 216
y = 8.3(2) = 8 32 = 72
y = 8.3(1) = 8 31 = 24
y = 8 30 = 8
y = 8 31 = 83
y = 8 32 = 89
You might have noticed that an exponentially decaying function is very similar to an exponentially increasing
function. The two types of functions behave similarly, but they are backwards from each other.
The increasing function starts very small and increases very quickly and ends up very, very big. While the decreasing
function starts very big and decreases very quickly to soon become very, very small. Lets graph two such functions
together on the same graph and compare them.
Example C
Graph the functions y = 4x and y = 4x on the same coordinate axes.
181
www.ck12.org
Solution
Here is the table of values and the graph of the two functions.
Looking at the values in the table we see that the two functions are backwards of each other in the sense that the
values for the two functions are reciprocals.
TABLE 9.9:
x
-3
-2
-1
0
1
2
3
y = 4x
1
y = 43 = 64
1
2
y = 4 = 16
y = 41 = 14
y = 40 = 1
y = 41 = 4
y = 42 = 16
y = 43 = 64
y = 4x
y = 4(3) = 64
y = 4(2) = 16
y = 4(1) = 4
y = 40 = 1
y = 41 = 14
1
y = 42 = 16
1
y = 43 = 64
Here is the graph of the two functions. Notice that the two functions are mirror images of each others if the mirror
is placed vertically on the yaxis.
Exponential decay problems appear in several application problems. Some examples of these are half-life problems, and depreciation problems. Lets solve an example of each of these problems.
Example D: Half-Life
A radioactive substance has a half-life of one week. In other words, at the end of every week the level of radioactivity
is half of its value at the beginning of the week. The initial level of radioactivity is 20 counts per second.
a. Draw the graph of the amount of radioactivity against time in weeks.
b. Find the formula that gives the radioactivity in terms of time.
c. Find the radioactivity left after three weeks
182
www.ck12.org
Solution
Lets start by making a table of values and then draw the graph.
TABLE 9.10:
time
0
1
2
3
4
5
radioactivity
20
10
5
2.5
1.25
0.625
a.
b. Exponential decay fits the general formula y = A bx
In this case:
y is the amount of radioactivity
xis the time in weeks
A = 20is the starting amount
b=
1
2
c. Finally, to find out how much radioactivity is left after three weeks, we use x = 3 in the formula we just found.
3
1
20
y = 20
=
= 2.5
2
8
Example E: Depreciation
The cost of a new car is $32,000. It depreciates at a rate of 15% per year. This means that it looses 15% of each
value each year.
183
www.ck12.org
TABLE 9.11:
Time
0
1
2
3
4
5
Value(Thousands)
32
27.2
23.1
19.7
16.7
14.2
a.
b. Lets start with the general formula
y = A bx
In this case:
y is the value of the car
x is the time in years
A = 32 is the starting amount in thousands
b = 0.85 since we multiply the amount by this factor to get the value of the car next year
The formula for this problem is y = 32 (0.85)x .
c. Finally, to find the value of the car when it is four years old, we use x = 4 in the formula we just found.y =
32 (0.85)4 = 16.7thousand dollars or $16,704 if we dont round.
184
www.ck12.org
Review Questions
y = 51
x
y = 4 23
y = 3x
y = 43 6x
Draw the graph of the size of the bacteria population against time in days.
Find the formula that gives the size of the bacteria population in term of time.
Find the size of the bacteria population ten days after the drug was first taken.
Find the size of the bacteria population after 2 weeks (14 days)
185
www.ck12.org
Two girls in a small town once shared a secret, just between the two of them. They couldnt stand it though, and
each of them told three friends. Of course, their friends couldnt keep secrets, either, and each of them told three of
their friends. Those friends told three friends, and those friends told three friends, and so on... and pretty soon the
whole town knew the secret. There was nobody else to tell!
These girls experienced the startling effects of an exponential function.
If you start with the two girls who each told three friends, you can see that they told six people or 2 3.
Those six people each told three others, so that 6 3 or 2 3 3they told 18 people.
Those 18 people each told 3, so that now is 18 3 or 2 3 3 3 or 54 people.
As we did with linear functions, we could make a table of values and calculate the number of people told after each
round of gossip.
x rounds of gossip
y people told
18
54
162
486
This is clearly not a linear (constant) rate of change. But it does have a characteristic rate of change that identifies it
as an exponential function, as well learn below.
186
www.ck12.org
One method for identifying functions is to look at the rate of change in the dependent variable. If the difference
between values of the dependent variable is constant each time we change the independent variable by the same
amount, then the function is linear.
If we take the difference between consecutive yvalues, we see that each time the xvalue increases by one, the
yvalue always increases by 3.
NOTE: Be sure when using this approach to make sure that the difference between consecutive x-values is constant.
In mathematical notation, we can write the linear property as follows:
1
If yx22 y
x1 is always the same for values of the dependent and independent variables, then the points are on a
line. Notice that the expression we wrote is the definition of the slope of a line.
There is also a specific rate of change pattern that will help you identify exponential functions. If the ratio between
values of the dependent variable is constant each time we change the independent variable by the same amount, then
the function is exponential.
www.ck12.org
a.
If we take the ratio of consecutive yvalues, we see that each time the xvalue increases by one, the yvalue
is multiplied by 3.
Since the ratio is always the same, the function is exponential.
b.
If we take the ratio of consecutive yvalues, we see that each time the xvalue increases by one, the yvalue
is multiplied by 21 .
Since the ratio is always the same, the function is exponential.
Write Equations for Functions
Once we identify which type of function fits the given values, we can write an equation for the function by starting
with the general form for that type of function.
Example C
Determine what type of function represents the values in the following table.
TABLE 9.12:
x
188
www.ck12.org
y
3
1
-3
-7
-11
Solution
Lets first check the difference of consecutive values of y. (NOTE: the consecutive values of x are changing by the
same amount each time).
If we take the difference between consecutive yvalues, we see that each time the xvalue increases by one, the
yvalue always decreases by 4. Since the difference is always the same, the function is linear.
To find the equation for the function that represents these values, we start with the general form of a linear function.
y = mx + b
Here m is the slope of the line and is defined as the quantity by which y increases every time the value of x increases
by one. The constant b is the value of the function when x = 0. Therefore, the function is
y = 4x + 5
Example D
Determine what type of function represents the values in the following table.
TABLE 9.13:
x
0
y
400
189
www.ck12.org
y
100
25
625
1.5625
Solution
Lets check the ratio of consecutive values of y.
If we take the ratio of consecutive yvalues, we see that each time the xvalue increases by one, the yvalue is
multiplied by 14 .
Since the ratio is always the same, the function is exponential.
To find the equation for the function that represents these values, we start with the general form of an exponential
function, as we will see in this section:
y = A bx
b is the ratio between the values of y each time that x is increased by one. The constant A is the value of the function
when x = 0. Therefore, our answer is
x
1
y = 400
4
Review Questions
1. Determine whether the data in the following tables can be represented by a linear function.
190
www.ck12.org
TABLE 9.14:
x
-4
-3
-2
-1
0
1
y
10
7
4
1
-2
-5
TABLE 9.15:
x
-2
-1
0
1
2
3
y
4
3
2
3
6
11
TABLE 9.16:
x
0
1
2
3
4
5
y
50
75
100
125
150
175
2. Determine whether the data in the following tables can be represented by an exponential function.
TABLE 9.17:
x
0
1
2
3
4
5
y
200
300
1800
8300
25800
62700
TABLE 9.18:
x
0
1
2
y
120
180
270
191
www.ck12.org
y
405
607.5
911.25
TABLE 9.19:
x
0
1
2
3
4
5
y
4000
2400
1440
864
518.4
311.04
3. Determine what type of function represents the values in the following table and find the equation of the
function.
TABLE 9.20:
x
0
1
2
3
4
y
400
500
625
781.25
976.5625
TABLE 9.21:
x
-9
-7
-5
-3
-1
1
y
-3
-2
-1
0
1
2
TABLE 9.22:
x
-3
-2
-1
0
1
2
192
y
14
4
-2
-4
-2
4
www.ck12.org
y
14
4. The following table shows the rate of pregnancies (per 1000) for US women aged 15 to 19. (source: US
Census Bureau). Make a scatterplot with the rate of pregnancies as the dependent variable and the number of
years since 1990 as the independent variable. Find which curve fits this data the best and predict the rate of
teen pregnancies in the year 2010.
TABLE 9.23:
Year
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
193
www.ck12.org
What is a log?
When working with exponential functions, you will often be asked to solve an equation using logs. In general, to
solve an equation means to find the value(s) of the variable that makes the equation a true statement. For example,
if you were asked to solve the equation log2 x = 5 for x, how would you do that?
First, we have to think about what log means. A logarithm (or log for short) is an exponent.
Example A
Rewrite each exponential expression as a log expression.
a. 34 = 81
b. b4x = 52
194
www.ck12.org
Solution
a. In order to rewrite an expression, you must identify its base, its exponent, and its power. The 3 is the base, so
it is placed as the subscript in the log expression. The 81 is the power, and so it is placed after the log. Thus
we have: 34 = 81 is the same as log3 81 = 4 .
To read this expression, we say the logarithm base 3 of 81 equals 4. This is equivalent to saying 3 to the
4th power equals 81.
b. The b is the base, and the expression 4x is the exponent, so we have:logb 52 = 4x. We say, log base b of 52,
equals 4x.
Example B
c. log 10
Solution
Remember that logx (with no base specified) commonly refers to log10 x:
a. log1 = 0 because 100 = 1.
b. log10
101= 10
= 1 because
1
c. log 10 = 2 because 10 = 101/2
Example C
www.ck12.org
f (x) = log2 x
f(1) = log2 1
As we did in (a), we can consider the exponential form: 2? = 1. The missing exponent is 0. So we have f (1) =
log2 1 = 0.
c. If x = 2, we have:
f (x)=log2 x
f (-2)= log2 -2
Again, consider the exponential form: 2? = -2. There is no such exponent. Therefore f (-2) = log2 -2 does not
exist.
Argument: The expression inside a logarithmic expression. The argument represents the power in the
exponential relationship.
Exponential functions are functions with the input variable (the x term) in the exponent.
Logarithmic functions are the inverse of exponential functions. Recall: logb n = a is equivalent to ba = n.
log: The shorthand term for the logarithm of, as in: "logb n" = "the logarithm, base b, of n ".
n
Natural Number (Euler Number): The number e, such that as n , 1 + n1 e. e 2.71828.
Guided Practice
Solutions
1. Change 16 to 42 and set the exponents equal to each other.
196
www.ck12.org
4x8 = 16
4x8 = 42
x8 = 2
x = 10
2. Divide both sides by 2 and then take the log of both sides. Here we choose to use natural log (ln).
2(7)3x+1 = 48
73x+1 = 24
ln 73x+1 = ln 24
(3x + 1) ln 7 = ln 24
ln 24
3x + 1 =
ln 7
ln 24
ln 7
1 ln 24
x= +
0.211
3 3 ln 7
3x = 1 +
3. Subtract 9 from both sides and multiply both sides by 23 . Then, take the log of both sides.
2 x+2
5 + 9 = 21
3
2 x+2
5
= 12
3
5x+2 = 18
(x + 2) log 5 = log 18
log 18
2 0.204
x=
log 5
4. Add 4 to both sides and then take the log of both sides.
82x3 4 = 5
82x3 = 9
log 82x3 = log 9
(2x 3) log 8 = log 9
log 9
2x 3 =
log 8
log 9
2x = 3 +
log 8
3
log 9
x= +
2.028
2 2 log 8
Notice that in these problems, we did not find the numeric value of any of the logs until the very end. This
will reduce rounding errors and ensure that we have the most accurate answer.
197
www.ck12.org
More Practice
Use logarithms and a calculator to solve the following equations for x. Round answers to three decimal places.
1.
2.
3.
4.
5.
6.
7.
8.
198
5x = 65
7x = 75
2x = 90
3x2 = 43
6x+1 + 3 = 13
6(113x2 ) = 216
8 + 132x5 = 35
1
x3 5 = 14
2 7
www.ck12.org
When a real-life quantity increases by a percentage over a period of time, the final amount can be modeled by the
equation: A = P(1 + r)t , where A is the final amount, P is the initial amount, r is the rate (or percentage), and t is the
time (in years). 1 + r is known as the growth factor. Note that the growth factor is equivalent to b in the formulas
introduced previously.
Conversely, a real-life quantity can decrease by a percentage over a period of time. The final amount can be modeled
by the same equation, but recall that the rate of change will be negative, so the value of b will be smaller than one.
Examples of Exponential Growth
Example A
The population of Coleman, Texas grows at a 2% rate annually. If the population in 2000 was 5981, what was the
population is 2010? Round up to the nearest person.
Solution
First, set up an equation using the growth factor. r = 0.02,t = 10, and P = 5981.
A = P(1 + r)t
A = 5981(1 + 0.02)10
= 5981(1.02)10
= 7291 people
Example B
You deposit $1000 into a savings account that pays 2.5% annual interest. Find the balance after 3 years if the interest
rate is compounded a) annually, b) monthly, c) daily.
Solution
For part a, we will use A = 1000(1.025)3 = 1076.89, as we would expect from Example A.
199
www.ck12.org
The previous examples provided us with a way to find the exponential equation from the information given. In
a previous chapter, we were told we had a town with an initial population of 20,000 with an estimated growth of
15% per year. With that information, we were able to answer a question like, "How big will the population be in
10 years? But what if we wanted to ask the question: "When will the population reach 100 million?" We could
use the graph of our exponential function and guess at the value of the x-axis when the population is equal to 100
million. But that would only be a guess. To find the actual value of x in our exponential equation, we must use of
knowledge of logs or natural logs.
Here was our equation for the population growth in this town:
f (x) = 20 (1.15)x
We want to know when the population will reach 100 million:
100 = 20 1.15x
Lets solve for x:
100
20
= 1.15x
5 = 1.15x
Now we can use logs:
log (5) = log (1.15)x
log (5) = x log (1.15)
log(5)
log(1.15)
=x
11.52 = x
So, when will the population reach 100 million? In approximately 11 and a half years or 11.52 years to be
exact. Lets hope there is enough space in town.
If you prefer, you can also use natural logs to find the same value. The rules of natural log are the same as logs. Using
natural logs has the added benefit of being able to handle the number e.
200
www.ck12.org
= 1.15x
5 = 1.15x
Now, we use the natural log: ln (5) = ln (1.15)x
ln(5)
ln(1.15)
=x
11.52 = x
Examples of Exponential Decay
Example C
You buy a new car for $35,000. If the value of the car decreases by 12% each year, what will the value of the car be
in 5 years?
Solution
This is a decay function because the value decreases.
A = 35000(1 0.12)5
= 35000(0.88)5
= 18470.62
The car would be worth $18,470.62 after five years.
Example D
The half-life of an isotope of barium is about 10 years. The half-life of a substance is the amount of time it takes for
half of that substance to decay. If a nuclear scientist starts with 200 grams of barium, how many grams will remain
after 100 years?
This is an example of exponential decay. Half-life refers to a 50% decay, so b = 1 0.05 = 0.05. Our starting value
is 200 grams, and we know that it takes 10 years for half of the isotope to decay. Therefore, the equation should
read:
A = 200
A = 200
1 10
2
1
= 0.195
1024
Therefore, 0.195 grams of the barium still remain 100 years later.
Finding the Value of x in an Exponential Decay Model
The previous examples provided us with a way to find the exponential decay equation from the information given.
In a previous chapter, we had a car with an initial value of $32,000 that depreciated at 15% per year. We were able
201
www.ck12.org
to answer a question like, "How much will my car be worth when it is four years old?" But what if we wanted to ask,
"When will my car be worth only $10,000 dollars?" We could use a graph of our exponential function and guess at
the value on the x-axis when the value of my car is $10,000. But that would only be a guess. To find the actual value
of x in our exponential equation, we must use our knowledge of logs or natural logs.
Heres is our equation for the car: f (x) = 32 0.85x
We want to know when the car will be worth $10,000:
10 = 32 0.85x
Lets solve for x:
10
32
= 0.85x
0.3125 = 0.85x
ln (0.3125) = ln (0.85)x
ln (0.3125) = x ln (0.85)
ln(0.3125)
ln(0.85)
=x
7.16 = x
So when will our car be worth $10,000? In approximately 7 years or 7.16 years to be exact.
Extension: Transformations to Achieve Linearity
We can transform an exponential relationship between X and Y into a linear relationship. We commonly use
transformations in everyday life. For example, the Richter scale, which measures earthquake intensity, is an example
of making transformations of non-linear data.
Consider the following exponential relationship, and take the log of both sides as shown:
y = abx
log y = log(abx )
log y = log a + log bx
log y = log a + x log b
In this example, a and b are real numbers (constants), so this is now a linear relationship between the variables x and log y.
Thus, you can find a least squares line for these variables.
Lets take a look at an example to help clarify this concept. Say that we were interested in making a case for investing
and examining how much return on investment one would get on $100 over time. Lets assume that we invested $100
in the year 1900 and that this money accrued 5% interest every year. The table below details how much we would
have each decade:
TABLE 9.24: Table of account growth assuming $100 invested in 1900 at 5% annual growth.
Year
1900
1910
1920
1930
1940
1950
1960
202
www.ck12.org
If we graphed these data points, we would see that we have an exponential growth curve.
Say that we wanted to fit a linear regression line to these data. First, we would transform these data using logarithmic
transformations as follows:
TABLE 9.25: Account growth data and values after a logarithmic transformation.
Year
1900
1910
1920
1930
1940
1950
1960
1970
1980
1990
2000
2010
Log of amount
2
2.211893
2.423786
2.635679
2.847572
3.059465
3.271358
3.483251
3.695144
3.907037
4.118930
4.330823
If we plotted these transformed data points, we would see that we have a linear relationship as shown below:
203
www.ck12.org
We can now perform a linear regression on (year, log of amount), and we will find the following relationship:
Y = 0.021X 38.2
in which:
X is representing year
Y is representing log of amount.
These transformed models are trickier to interpret, but they are often useful for modeling.
Vocabulary
Growth Factor: The amount, (1 + r), an exponential function grows by. Populations and interest commonly
use growth factors.
Decay Factor: The amount, (1 r), an exponential function decreases by. Populations, depreciated values,
and radioactivity commonly use decay factors.
Guided Practice
1. Tommy bought a truck 7 years ago that is now worth $12,348. If the value of his truck decreased 14% each
year, how much did he buy it for? Round to the nearest dollar.
2. The Wetakayomoola credit card company charges an Annual Percentage Rate (APR) of 21.99%, compounded
monthly. If you have a balance of $2000 on the card, what would the balance be after 4 years (assuming you
do not make any payments)? If you pay $200 a month to the card, how long would it take you to pay it off?
You may need to make a table to help you with the second question.
3. As the altitude increases, the atmospheric pressure (the pressure of the air around you) decreases. For every
1000 feet up, the atmospheric pressure decreases about 4%. The atmospheric pressure at sea level is 101.3.
If you are on top of Hevenly Mountain at Lake Tahoe (elevation about 10,000 feet) what is the atmospheric
pressure?
204
www.ck12.org
Solutions
1. Tommy needs to use the formula A = P(1 + r)t and solve for P. (remember r is negative).
nt
0.2199 124
A = 2000 1 +
12
= 2000(1018325)48
= 4781.65
To determine how long it will take you to pay off the balance, you need to find how much interest is
compounded in one month, subtract $200, and repeat. A table might be helpful. For each month after the first,
12( 121 )
we will use the equation, B = R 1 + 0.2199
= R(1.018325), where B is the current balance and R is the
12
remaining balance from the previous month. For example, in month 2, the balance (including interest) would
12( 121 )
be B = 1800 1 + 0.2199
= 1800 1.08325 = 1832.99.
12
TABLE 9.26:
Month 1
Balance 2000
Payment 200
Remainder
$1800
2
1832.99
200.00
1632.99
3
1662.91
200.00
1462.91
4
1489.72
200.00
1289.72
5
1313.35
200.00
913.35
6
930.09
200.00
730.09
7
790.87
200.00
590.87
8
640.06
200.00
440.06
9
476.69
200.00
276.69
10
299.73
200.00
99.73
11
108.03
108.03
0
It is going to take you 11 months to pay off the balance and you are going to pay 108.03 in interest, making
your total payment $2108.03.
3. The equation will be A = 101, 325(1 0.04)100 = 1709.39. The decay factor is only raised to the power of 100
because for every 1000 feet the pressure decreased. Therefore, 10, 000 1000 = 100. Atmospheric pressure
is what you dont feel when you are at a higher altitude and can make you feel light-headed.
More Practice
Use an exponential growth or exponential decay function to model the following scenarios and answer the questions.
1. Sonyas salary increases at a rate of 4% per year. Her starting salary is $45,000. What is her annual salary, to
the nearest $100, after 8 years of service?
2. The value of Sams car depreciates at a rate of 8% per year. The initial value was $22,000. What will his car
be worth after 12 years to the nearest dollar?
3. Rebecca is training for a marathon. Her weekly long run is currently 5 miles. If she increase her mileage each
week by 10%, will she complete a 20 mile training run within 15 weeks?
205
www.ck12.org
4. An investment grows at a rate of 6% per year. How much, to the nearest $100, should Noel invest if he wants
to have $100,000 at the end of 20 years?
5. Charlie purchases a 7 year old used RV for $54,000. If the rate of depreciation was 13% per year during those
7 years, how much was the RV worth when it was new? Give your answer to the nearest one thousand dollars.
6. The value of homes in a neighborhood increase in value an average of 3% per year. What will a home
purchased for $180,000 be worth in 25 years to the nearest one thousand dollars?
7. The population of a community is decreasing at a rate of 2% per year. The current population is 152,000. How
many people lived in the town 5 years ago?
8. The value of a particular piece of land worth $40,000 is increasing at a rate of 1.5% per year. Assuming the
rate of appreciation continues, how long will the owner need to wait to sell the land if he hopes to get $50,000
for it? Give your answer to the nearest year.
206
www.ck12.org
C HAPTER
10
Logistic Models
Chapter Outline
10.1
10.2
10.3
R EFERENCES
207
www.ck12.org
Learning Objectives
Recognize logistic functions.
Identify the carrying capacity and inflection point of a logistic function.
Logistic Models
Exponential growth increases without bound. This is reasonable for some situations; however, for populations there
is usually some type of upper bound. This can be caused by limitations on food, space or other scarce resources. The
effect of this limiting upper bound is a curve that grows exponentially at first and then slows down and hardly grows
at all. This is characteristic of a logistic growth model.
The logistic equation is of the form:
f (t) =
C
1+abt
C
1+aekt
The above equations represent the logistic function, and it contains three important pieces: C, a, and b. C determines
that maximum value of the function, also known as the carrying capacity. C is represented by the dashed line in
the graph below.
208
www.ck12.org
The constant of a in the logistic function is used much like a in the exponential function: it helps determine the value
of the function at t=0. Specifically:
f (0) =
C
1+a
The constant of b also follows a similar concept to the exponential function: it helps dictate the rate of change at the
beginning and the end of the function. Just as in the exponential case, b must be positive. Just as in the exponential
function, when b is greater than 1, we have growth. When b is 0<b<1, then the function is decaying. Remember
that b = ek .
Inflection Point
All logistic functions have a point at which things turn over, called the inflection point, indicated by the horizontal
dotted line on the graph. It is at the inflection point that the graph transitions from curving up (concave up) to curving
down (concave down). This point occurs halfway to the carrying capacity:
f (t) = C2
The vertical dotted line on the graph shows the corresponding value of t for the inflection point. We can find this
value of t by using the equation:
t=
ln(a)
ln(b)
ln(a)
k
Model Parameters
Lets take a closer look at the parameters a and b. Imagine the following: A sociology major is researching the
spread of rumors on campus. This campus has 50,000 students, and when the rumor starts, only 500 students know
about it. In this example, C is 50,000 and 500 is the number of students that know the rumor at time 0.
The a parameter effectively controls the starting point of the logistic model, just as it did in the exponential model.
For a given value of C, a tells us the where the function crosses the y-axis. Knowing the logistic equation, our C
value, and the initial value of y, we can solve for a when time is 0:
f (t) =
C
1+abt
500 =
50,000
1+ab0
500 =
50,000
1+a
a=
50,000
500
1 = 99
When C and b stay constant, a will shift the graph horizontally. The figure below shows what happens if we change
the value of a for the same carrying capacity and b=10. For values of a equal to 1, 9, and 99, the starting values are
25,000, 5,000, and 500, respectively. We see that all three curves have the same shape and scale; theyre just shifted
horizontally. The larger the value of a is, the more to the right the curve is shifted.
209
www.ck12.org
If we take a closer look at the graph and extend the x-axis to the left, we can see that the three lines are actually the
same shape (because they all have the same C and b parameter values), but simply moved over horizontally:
Now, lets do the same examination on the parameter of b. If we keep the C and a values constant, what happens to
the logistic model as we alter the value of b? The graph below shows just that:
From this figure, we can graphically see some of the properties for b mentioned earlier. The dashed line (b =
2) doesnt even hit the Carrying Capacity at the end of 5 weeks. The other lines get progressively steeper as b
increases. This means that higher values of b show a quicker growth to Carrying Capacity assuming C and a are
constant.
Example A
A rumor is spreading at a school that has a total student population of 1200. Four people know the rumor when it
starts and three days later 300 people know the rumor. About how many people at the school know the rumor by the
fourth day?
210
www.ck12.org
Solution
In a limited population, the count of people who know a rumor is an example of a situation that can be modeled
using the logistic function. The population is 1200 so this will be the carrying capacity.
Identifying information:
.
1200
1+ab0
(Remember, any number raised to an exponent of zero even a negative zero is one.)
1 + a = 300
a = 299
Next, use the point (3, 300) to solve for b.
1200
1+299b3
300 =
1 + 299b3 = 4
299b3 = 3
b3 =
3
299
(Knowing bx =
299
3
1
bx
helps here)
b3
99.667 = b3
4.636 = b
The modeling equation when x = 4 is:
f (t) =
1200
1+2994.636t
Substituting in 4 for t in the logistic equation we get 728.470 or approximately 729 people.
Alternate solution, using equivalent formula and solving for k:
4=
1200
1+aek0
1 + aek0 =
1200
4
1 + aek0 = 300
(Remember, any number (even the number e) raised to an exponent of zero even a negative zero is one.)
1 + a = 300
a = 299
Next. . .
300 =
1200
1+299ek3
1 + 299ek3 = 4
299ek3 = 3
ek3 =
3
299
211
3
299
www.ck12.org
k3 = 4.602
k = 1.534
Now, if you substitute the values for a and k into the original equation and solve for x=4, you will get the same
solution: approximately 729 people!
Example B
Long Island has roughly 8 million people. A hundred years ago, it had 2 million people. Suppose that the resources
and infrastructure of the island could only support 20 million people. When will the population reach ten million
inhabitants?
Solution
Identify known points and the carrying capacity. (0, 8,000,000) and (-100, 2,000,000).
20,000,000
1+ab0
20,000,000
80,000,000
1 + ab0 = 2.5
ab0 = 1.5
a = 1.5
Now use the other point to solve for b:
2, 000, 000 =
20,000,000
1+1.5b(100)
1 + 1.5b(100) =
20,000,000
2,000,000
1 + 1.5b(100) = 10
1.5b(100) = 9
b(100) =
9
1.5
b100 = 6
b = 1.018
The question asks for the
value when
20,000,000
1+1.51.018t
1 + 1.5 1.018t =
20,000,000
10,000,000
1 + 1.5 1.018t = 2
1.5 1.018t = 1
1.018t = 0.667
212
www.ck12.org
ln (1.018t ) = ln (0.667)
t ln (1.018) = ln (0.667)
ln(0.667)
ln(1.018)
t =
t = 22.70
t = 22.70
This means that according to your assumption and the two population data points you used, the predicted time from
now that the population of Long Island will reach 10 million inhabitants is about 22.6 years.
Example C
A special kind of algae is grown in giant clear plastic tanks and can be harvested to make biofuel. The algae are
given plenty of food, water and sunlight to grow rapidly and the only limiting resource is space in the tank. The
algae are harvested when 95% of the tank is full leaving the tank 5% full of algae to reproduce and refill the tank.
Currently the time between harvests is twenty days and the payoff is 90% harvest. Would you recommend a more
optimal harvest schedule?
Solution
Identify known quantities and model the growth of the algae.
Known quantities:
The question asks about optimal harvest schedule. Currently the harvest is 90% per 20 day or a unit rate of 4.5% per
day. If you shorten the time between harvests where the algae are growing the most efficiently, then potentially this
unit rate might be higher. Suppose you leave 15% of the algae in the tank and harvest when it reaches 85%. How
much time will that take to yield 70%?
Solve for a:
0.05 =
1
1+ab0
1 + ab0 =
1
0.05
1 + ab0 = 20
a = 19
Solve for b:
0.95 =
1
1+19b20
1 + 19b20 =
1
0.95
19b20 = 0.053
b20 =
19
0.053
0.053
19
= b20
358.491 = b20
1.34 = b
Model for algae growth is:
213
www.ck12.org
1
1+191.34t
1
1+191.34t1
t1 = 4.137
The day at which you have 85% algae (so that you have a 70% yield):
0.85 =
1
1+191.34t2
t2 = 16.10
t2 t1 = 16.10 4.137 = 11.963
It takes about 12 days for the batches to yield 70% harvest which is a unit rate of about 6% per day. This is a
significant increase in efficiency. A harvest schedule that maximizes the time where the logistic curve is steepest
creates the fastest overall algae growth.
Vocabulary
Carrying capacity is the maximum sustainable population that the environmental factors will support. In
other words, it is the population limit.
The logistic model is appropriate whenever the total count has an upper limit and the initial growth is
exponential. Examples are the spread of rumors and disease in a limited population and the growth of bacteria
or human population when resources are limited.
Guided Practice
1. Determine the logistic model given
2. Determine the logistic model given
12
1+abt
1 + a = 1.333
a = 0.333
11 =
12
1+0.333b1
1 + 0.333b1 =
12
11
0.333b1 = 0.091
b1 =
0.091
0.333
3.659 = b
The approximate model is:
f (t) =
12
1+0.3333.659t
2. The two points give two equations, and the logistic model has two variables.
214
www.ck12.org
2=
7
1+abt
a = 2.5
Then,
5=
7
1+2.5b3
1.842 = b
The approximate model is:
f (t) =
7
1+2.51.842t
20
1+41.11t
20
1+41.11t
1 + 4 1.11t =
20
14
4 1.11t = 0.429
1.11t = 0.107
t ln (1.11) = ln (0.107)
t =
ln(0.107)
ln(1.11)
t = 21.416
t = 21.416
More Practice
For 1-5, determine the logistic model given the carrying capacity and two points.
1.
2.
3.
4.
5.
32
1+20e0.45t
www.ck12.org
25
1+45t
216
www.ck12.org
Imagine a huge bowl of your favorite potato salad, ready for a picnic on a beautiful, hot, midsummer day. The cook
was careful to prepare it under strictly sanitary conditions, using fresh eggs, clean organic vegetables, and new jars of
mayonnaise and mustard. Familiar with food poisoning warnings, s/he was so thorough that only a single bacterium
made it into that vast amount of food. While such a scenario is highly unrealistic without authentic canning, it will
serve as an example as we begin our investigation of how populations change, or population dynamics. Because
potato salad provides an ideal environment for bacterial growth, just as your mother may have warned, we can use
this single bacterial cell in the potato salad to ask:
How Do Populations Grow Under Ideal Conditions?
Given food, warm temperatures, moisture, and oxygen, a single aerobic bacterial cell can grow and divide by binary
fission to become two cells in about 20 minutes. The two new cells, still under those ideal conditions, can each repeat
this performance, so that after 20 more minutes, four cells constitute the population. Given this modest doubling,
how many bacteria do you predict will be happily feeding on potato salad after five hours at the picnic? After youve
thought about this, compare your prediction with the data below.
Like many populations under ideal conditions, bacteria show exponential. Each bacterium can undergo binary fission
every 20 minutes. After 5 hours, a single bacterium can produce a population of 32,768 descendants.
TABLE 10.1:
Time (Hours and Minutes)
0
20 minutes
40 minutes
1 hour
1 hour 20 minutes
1 hour 40 minutes
2 hours
2 hours 20 minutes
2 hours 40 minutes
3 hours
3 hours 20 minutes
3 hours 40 minutes
4 hours
4 hours 20 minutes
4 hours 40 minutes
www.ck12.org
Are you surprised? This phenomenal capacity for growth of living populations was first described by Thomas
Robert Malthus in his 1798 Essay on the Principle of Population. Although Malthus focused on human populations,
biologists have found that many populations are capable of this explosive reproduction, if provided with ideal
conditions. This pattern of growth is exponential, as the population grows larger, the rate of growth increases.
If you have worked compound interest problems in math or played with numbers for estimating the interest in your
savings account, you can compare the growth of a population under ideal conditions to the growth of a savings
account under a constant rate of compound interest. The graph below, using potato salad bacterial data, shows the
pattern of exponential growth: the population grows very slowly at first, but more and more rapidly as time passes.
FIGURE 10.1
Exponential or geometric growth is very
slow at first, but accelerates as the population grows.
Of course, if bacterial populations always grew exponentially, they would long ago have covered the Earth many
times over. While Thomas Malthus emphasized the importance of exponential growth on population, he also
stated that ideal conditions do not often exist in nature. A basic limit for all life is energy. Growth, survival,
and reproduction require energy. Because energy supplies are limited, organisms must spend them wisely. We
will end this lesson with a much more realistic model of population growth and the implications of its limits, but
first, lets look more carefully at the characteristics of populations which allow them to grow.
You learned above that populations can grow exponentially if conditions are ideal. While exponential growth occurs
when populations move into new or unfilled environments or rebound after catastrophes, most organisms do not
live in ideal conditions very long, if at all. Lets look at some data for populations growing under more realistic
conditions.
218
www.ck12.org
Biologist Georgyi Gause studied the population growth of two species of Paramecium in laboratory cultures. Both
species grew exponentially at first, as Malthus predicted. However, as each population increased, rates of growth
slowed and eventually leveled off. Each species reached a different maximum, due to differences in size of individuals and space and nutrient needs, but both showed the same, S-shaped growth pattern.
FIGURE 10.2
Two species of Paramecium illustrate logistic growth, with different plateaus due to differences in size and space
and nutrient requirements. The growth pattern resembles and is often called an S-curve. Slow but exponential
growth at low densities is followed by faster growth and then leveling.
Perhaps even more realistic is the growth of a sheep population, observed after the introduction of fourteen sheep to
the island of Tasmania in 1800. Like the lab Paramecia, the sheep population at first grew exponentially. However,
over the next 20 years, the population sharply declined by 1/3. Finally, the number of sheep increased slowly to a
plateau. The general shape of the growth curve matched the S-shape of Paramecium growth, except that the sheep
overshot their plateau at first.
FIGURE 10.3
Sheep introduced to Tasmania show logistic growth, except that they overshoot
their carrying capacity before stabilizing.
219
www.ck12.org
As Malthus realized, no population can maintain exponential growth indefinitely. Inevitably, limiting factors such as
reduced food supply or space lower birth rates, increase death rates, or lead to emigration, and lower the population
growth rate. After reading Malthus work in 1938, Pierre Verhulst derived a mathematical model of population
growth which closely matches the S-curves observed under realistic conditions. In this logistic (S-curve) model,
growth rate is proportional to the size of the population but also to the amount of available resources. At higher
population densities, limited resources lead to competition and lower growth rates. Eventually, the growth rate
declines to zero and the population becomes stable.
FIGURE 10.4
Growth of populations according to
Malthus exponential model (A) and Verhulsts logistic model (B). Both models
assume that population growth is proportional to population size, but the logistic
model also assumes that growth depends
on available resources. A models growth
under ideal conditions and shows that
all populations have a capacity to grow
infinitely large. B limits exponential growth
to low densities; at higher densities, competition for resources or other limiting factors inevitably cause growth rate to slow to
zero. At that point, the population reaches
a stable plateau, the carrying capacity
(K).
The logistic model describes population growth for many populations in nature. Some, like the sheep in Tasmania,
overshoot the plateau before stabilizing, and some fluctuate wildly above and below a plateau average. A few
may crash and disappear. However, the plateau itself has become a foundational concept in population biology
known as carrying capacity (K). Carrying capacity is the maximum population size that a particular environment
can support without habitat degradation. Limiting factors determine carrying capacity, and often these interact. In
the next section, we will explore in more detail the kinds of factors which restrict populations to specific carrying
capacities and some adaptations that limit growth.
Lesson Summary
Few populations in nature grow exponentially. No population can continue such growth indefinitely.
The logistic (S-curve) model best describes the growth of many populations in nature.
In the logistic model, growth rate depends on both population size and availability of resources. Growth is
slow at first, but as size increases, growth accelerates. At higher densities, limited resources cause growth rate
to decline, and populations stabilize at carrying capacity.
220
www.ck12.org
C HAPTER
Chapter Outline
222
11.1
11.2
S AMPLING D ISTRIBUTION
11.3
11.4
C ONFIDENCE I NTERVALS
www.ck12.org
A sample is a representative subset of a population. If a statistician or other researcher wants to know some
information about a population, the only way to be truly sure is to conduct a census. In a census, every unit in
the population being studied is measured or surveyed. If we really wanted to know the true approval rating of the
president, for example, we would have to ask every single American adult his or her opinion. There are some obvious
reasons why a census is impractical in this case, and in most situations.
Why is this impractical? First, it would be extremely expensive for the polling organization. They would need an
extremely large workforce to try and collect the opinions of every American adult. Also, it would take many workers
and many hours to organize, interpret, and display this information. Even if it could be done in several months, by
the time the results were published, it would be very probable that recent events had changed peoples opinions and
that the results would be obsolete.
In addition, a census has the potential to be destructive to the population being studied. For example, many
manufacturing companies test their products for quality control. A padlock manufacturer might use a machine
to see how much force it can apply to the lock before it breaks. If they did this with every lock, they would have
none left to sell! Likewise, it would not be a good idea for a biologist to find the number of fish in a lake by draining
the lake and counting them all!
The term most frequently applied to a non-representative sample is bias. Bias has many potential sources. It is
important when selecting a sample or designing a survey that a statistician make every effort to eliminate potential
sources of bias. In this section, we will discuss some of the most common types of bias. While these concepts are
universal, the terms used to define them here may be different than those used in other sources.
223
www.ck12.org
Sampling Bias
In general, sampling bias refers to the methods used in selecting the sample. The sampling frame is the term we
use to refer to the group or listing from which the sample is to be chosen. If you wanted to study the population of
students in your school, you could obtain a list of all the students from the office and choose students from the list.
This list would be the sampling frame.
Incorrect Sampling Frame
If the list from which you choose your sample does not accurately reflect the characteristics of the population, this
is called incorrect sampling frame. A sampling frame error occurs when some group from the population does not
have the opportunity to be represented in the sample. For example, surveys are often done over the telephone. You
could use the telephone book as a sampling frame by choosing numbers from the telephone book. However,some
phone numbers are not listed in the telephone book. In addition, younger adults in particular tend to only use their
cell phones or computer-based phone services and may not even have traditional phone service. Even if you picked
phone numbers randomly, the sampling frame could be incorrect, because there are also people, especially those
who may be economically disadvantaged, who have no phone. There is absolutely no chance for these individuals
to be represented in your sample. A term often used to describe the problems when a group of the population is not
represented in a survey is undercoverage. Undercoverage can result from all of the different sampling biases.
You may have heard of one of the most famous examples of sampling frame error. It occurred during the 1936 U.S.
presidential election. The Literary Digest, a popular magazine at the time, conducted a poll and predicted that Alf
Landon would win the election. As it turned out, the election was won in a landslide by Franklin Delano Roosevelt.
The magazine obtained a huge sample of ten million people, and from that pool, 2 million replied. With these
numbers, you would typically expect very accurate results. However, the magazine used their subscription list as
their sampling frame. During the depression, these individuals would have been only the wealthiest Americans, who
tended to vote Republican, and left the majority of typical voters under-covered.
Convenience Sampling
Suppose your statistics teacher gave you an assignment to perform a survey of 20 individuals. You would most
likely tend to ask your friends and family to participate, because it would be easy and quick. This is an example of
convenience sampling, or convenience bias. While it is not always true, your friends are usually people who share
common values, interests, and opinions. This could cause those opinions to be over-represented in relation to the
true population. Also, have you ever been approached by someone conducting a survey on the street or in a mall?
If such a person were just to ask the first 20 people they found, there is the potential that large groups representing
various opinions would not be included, resulting in undercoverage.
Judgment Sampling
Judgment sampling occurs when an individual or organization that is usually considered an expert in the field
being studied chooses the individuals or group of individuals to be used in the sample. Because it is based on a
subjective choice, even by someone considered an expert, it is very susceptible to bias. In some sense, this is what
those responsible for the Literary Digest poll did. They incorrectly chose groups they believed would represent the
population. If a person wants to do a survey on middle-class Americans, how would this person decide who to
include? It would be left to this persons own judgment to create the criteria for those considered middle-class. This
individuals judgment might result in a different view of the middle class that might include wealthier individuals that
others would not consider part of the population. Similar to judgment sampling, in quota sampling, an individual or
organization attempts to include the proper proportions of individuals of different subgroups in their sample. While
it might sound like a good idea, it is subject to an individuals prejudice and is, therefore, prone to bias.
224
www.ck12.org
Size Bias
If one particular subgroup in a population is likely to be over-represented or under-represented due to its size, this is
sometimes called size bias. If we chose a state at random from a map by closing our eyes and pointing to a particular
place, larger states would have a greater chance of being chosen than smaller ones. As another example, suppose
that we wanted to do a survey to find out the typical size of a students math class at a school. The chances are
greater that we would choose someone from a larger class for our survey. To understand this, say that you went to
a very small school where there are only four math classes, with one class having 35 students, and the other three
classes having only 8 students. If you simply choose students at random, it is more likely you will select students for
your sample who will will say the typical size of a math class is 35, since there are more students in the larger class.
Response Bias
The term response bias refers to problems that result from the ways in which the survey or poll is actually presented
to the individuals in the sample.
Voluntary Response Bias
Television and radio stations often ask viewers/listeners to call in with opinions about a particular issue they are
covering. The websites for these and other organizations also usually include some sort of online poll question of
the day. Reality television shows and fan balloting in professional sports to choose all-star players make use of these
types of polls as well. All of these polls usually come with a disclaimer stating that, This is not a scientific poll.
While perhaps entertaining, these types of polls are very susceptible to voluntary response, or self-selection, bias.
The people who respond to these types of surveys tend to feel very strongly one way or another about the issue in
question, and the results might not reflect the overall population. Those who still have an opinion, but may not feel
quite so passionately about the issue, may not be motivated to respond to the poll.
Non-Response Bias
One of the biggest problems in polling is that most people just dont want to be bothered taking the time to respond
to a poll of any kind. They hang up on a telephone survey, put a mail-in survey in the recycling bin, or walk quickly
past an interviewer on the street. We just dont know how much these individuals beliefs and opinions reflect those
of the general population, and, therefore, almost all surveys could be prone to non-response bias.
Questionnaire Bias
Questionnaire bias occurs when the way in which the question is asked influences the response given by the
individual. It is possible to ask the same question in two different ways that would lead individuals with the same
basic opinions to respond differently. Consider the following two questions about gun control.
"Do you believe that it is reasonable for the government to impose some limits on purchases of certain types
of weapons in an effort to reduce gun violence in urban areas?"
"Do you believe that it is reasonable for the government to infringe on an individuals constitutional right to
bear arms?"
A gun rights activist might feel very strongly that the government should never be in the position of limiting guns in
any way and would answer no to both questions. Someone who is very strongly against gun ownership, on the other
hand, would probably answer yes to both questions. However, individuals with a more tempered, middle position on
the issue might believe in an individuals right to own a gun under some circumstances, while still feeling that there
is a need for regulation. These individuals would most likely answer these two questions differently.
225
www.ck12.org
You can see how easy it would be to manipulate the wording of a question to obtain a certain response to a poll
question. Questionnaire bias is not necessarily always a deliberate action. If a question is poorly worded, confusing,
or just plain hard to understand, it could lead to non-representative results. When you ask people to choose between
two options, it is even possible that the order in which you list the choices may influence their response!
Incorrect Response Bias
A major problem with surveys is that you can never be sure that the person is actually responding truthfully. When an
individual intentionally responds to a survey with an untruthful answer, this is called incorrect response bias. This
can occur when asking questions about extremely sensitive or personal issues. For example, a survey conducted
about illegal drinking among teens might be prone to this type of bias. Even if guaranteed their responses are
confidential, some teenagers may not want to admit to engaging in such behavior at all. Others may want to appear
more rebellious than they really are, but in either case, we cannot be sure of the truthfulness of the responses.
Identifying Sources of Bias
Example A
You are assisting with a study attempting to determine the satisfaction of school communication with students who
speak a second language at home. The plan is to send home a questionnaire to the parents of the students, asking
them about their opinion.
What kind(s) of bias is this survey method particularly prone to? How might they be addressed?
Solution
This method of sampling is liable to result in both non-response and undercoverage bias. Non-response bias is an
issue any time a sample population is expected to submit a questionnaire, as your results are going to include more
input from the type of person who is willing and able to complete and submit your survey. In this case, undercoverage
is a particular problem, since the population most affected by the study is also unusually liable to misinterpret the
questions or the reason for them due to the language barrier.
One possible solution might be to conduct a phone survey conducted by a native speaker in the target language(s).
Example B
What type(s) of bias do the experiments below suggest?
a. An experiment to determine the danger of mixing household chemicals is conducted by collecting samples of
chemicals found under the experimenters sink.
b. Mall shoppers are asked to fill out and return a form rating their shopping experiences at each of the 26 stores
to identify the most popular stores in each of 4 categories.
c. A study of the average grades of mathematics students polls 16 Algebra I students, 14 Geometry students, 7
Calculus students, and 19 Statistics students.
Solution
a. Undercoverage bias This experiment is a prime example of the problems associated with convenience
sampling, since the only chemicals used were the ones conveniently found in one location, the results could
not be assumed to be the same as with chemicals found under other sinks.
226
www.ck12.org
b. Non-response bias Since the results are dependent on the shoppers turning in a response form on their own,
the results will be biased toward a specific type of personality, and will not reflect a true cross-section of
shoppers experiences.
c. Undercoverage The study only includes approximately 12 as many Calculus students as the other subjects.
Reducing Bias
Randomization
The best technique for reducing bias in sampling is randomization. When a simple random sample of size n
(commonly referred to as an SRS) is taken from a population, all possible samples of size n in the population have
an equal probability of being selected for the sample. For example, if your statistics teacher wants to choose a student
at random for a special prize, he or she could simply place the names of all the students in the class in a hat, mix
them up, and choose one. More scientifically, your teacher could assign each student in the class a number from 1 to
25 (assuming there are 25 students in the class) and then use a computer or calculator to generate a random number
to choose one student. This would be a simple random sample of size 1.
Systematic Sampling
There are other types of samples that are not simple random samples, and one of these is a systematic sample. In
systematic sampling, after choosing a starting point at random, subjects are selected using a jump number. If you
have ever chosen teams or groups in gym class by counting off by threes or fours, you were engaged in systematic
sampling. The jump number is determined by dividing the population size by the desired sample size to insure
that the sample combs through the entire population. If we had a list of everyone in your class of 25 students in
alphabetical order, and we wanted to choose 5 of them, we would choose every 5th student. Lets try choosing a
starting point at random by generating a random number from 1 to 25. Assume we get the number 14 as our seed
value.
In this case, we would start with student number 14 and then select every 5th student until we had 5 in all. When
we came to the end of the list, we would continue the count at number 1. Thus, our chosen students would be: 14,
19, 24, 4, and 9. It is important to note that this is not a simple random sample, as not every possible sample of 5
students has an equal chance of being chosen. For example, it is impossible to have a sample consisting of students
5, 6, 7, 8, and 9.
227
www.ck12.org
Cluster Sampling
Cluster sampling is when a naturally occurring group is selected at random, and then either all of that group, or
randomly selected individuals from that group, are used for the sample. If we select at random from out of that
group, or cluster into smaller subgroups, this is referred to as multi-stage sampling. For example, to survey student
opinions or study their performance, we could choose 5 schools at random from your state and then use an SRS
(simple random sample) from each school. If we wanted a national survey of urban schools, we might first choose 5
major urban areas from around the country at random, and then select 5 schools at random from each of these cities.
This would be both cluster and multi-stage sampling. Cluster sampling is often done by selecting a particular block
or street at random from within a town or city. It is also used at large public gatherings or rallies. If officials take a
picture of a small, representative area of the crowd and count the individuals in just that area, they can use that count
to estimate the total crowd in attendance.
Stratified Sampling
In stratified sampling, the population is divided into groups, called strata (the singular term is stratum), that have
some meaningful relationship. Very often, groups in a population that are similar may respond differently to a
survey. In order to help reflect the population, we stratify to insure that each opinion is represented in the sample.
For example, we often stratify by gender or race in order to make sure that the often divergent views of these different
groups are represented. In a survey of high school students, we might choose to stratify by school to be sure that the
opinions of different communities are included. If each school has an approximately equal number of students, then
we could simply choose to take an SRS of size 25 from each school. If the numbers in each stratum are different,
then it would be more appropriate to choose a fixed sample (100 students, for example) from each school and take a
number from each school proportionate to the total school size.
Lesson Summary
If you collect information from every unit in a population, it is called a census. Because a census is so difficult to
do, we instead take a representative subset of the population, called a sample, to try and make conclusions about
the entire population. The downside to sampling is that we can never be completely sure that we have captured
the truth about the entire population, due to random variation in our sample that is called sampling error. The list
of the population from which the sample is chosen is called the sampling frame. Poor technique in surveying or
choosing a sample can also lead to incorrect conclusions about the population that are generally referred to as bias.
Selection bias refers to choosing a sample that results in a subgroup that is not representative of the population.
Incorrect sampling frame occurs when the group from which you choose your sample does not include everyone in
the population, or at least units that reflect the full diversity of the population. Incorrect sampling frame errors result
in undercoverage. This is where a segment of the population containing an important characteristic did not have an
opportunity to be chosen for the sample and will be marginalized, or even left out altogether.
Points to Consider
1. Brandy wanted to know which brand of soccer shoe high school soccer players prefer. She decided to ask the
girls on her team which brand they liked.
a. What is the population in this example?
228
www.ck12.org
229
www.ck12.org
Understand the inferential relationship between a sampling distribution and a population parameter.
Graph a frequency distribution of sample means using a data set.
Understand the relationship between sample size and the distribution of sample means.
Understand sampling error.
Introduction
Have you ever wondered how we can learn what is true in a population when it would be impossible to contact
everyone? Statistics allows us to make use of the tool of probability to estimate what is true from just a sample of
the subjects we are interested in.
Suppose, for example, that we want to know how much cash people carry around in their pockets, on average. To
make this simple, we are going to work with a very small population of people: ten people on a busy street corner.
The diagram below reveals the amount of money that each person in the group of ten has in his/her pocket.
Our Scenario
In this scenario, we have a population of size ten. One person has no money, another has $1.00, another has $2.00,
and so on, until we reach the person who has $9.00. Our goal is to determine the average amount of money per
person in this population. What is that true mean? If you total the money of the ten people, you will find that the
sum is $45.00, thus yielding a mean of $4.50. Of course, for the purpose of this exercise, we dont know this!
Suppose you couldnt count the money of all ten people at once. Lets say instead you had 10 different individuals
all taking samples of the population. To start, suppose each of the ten researchers were to randomly select a sample
of only one person from the ten. That makes 10 samples of 1 person each. In this example, we would say that n = 1
230
www.ck12.org
(or sample size is one). The graph below shows the mean of each possible sample of n=1. (Since there is only one
person in each sample, the mean is the number of dollars in their pocket). Each of the 10 individuals was selected
and constitutes their own sample of size n=1.
The distribution of the dots on the graph is an example of a sampling distribution. As can be seen, selecting a
sample of one is not very good, since the range of sample means is anywhere from $0.00 to $9.00. The true mean of
$4.50 could be missed by quite a bit with any one given sample.
Increasing the sample size has improved your estimates. There are now 45 possible samples, such as ($0, $1), ($0,
$2), ($7, $8), ($8, $9), and so on, and some of these samples produce the same means. For example, ($0, $6), ($1,
231
www.ck12.org
$5), and ($2, $4) all produce means of $3. The three dots above the mean of 3 represent these three samples. In
addition, the 45 means are not evenly distributed, as they were when the sample size was one. Instead, they are more
clustered around the true mean of $4.50. ($0, $1) and ($8, $9) are the only two samples whose means deviate by as
much as $4.00. Also, five of the samples yield the true estimate of $4.50, and another eight deviate by only plus or
minus 50 cents.
Next, the sampling distributions for sample sizes of 4, 5, and 6 are shown:
232
www.ck12.org
Important Lessons
There are two important pieces to take away from this lesson. First, notice that the sample means become more and
more normally distributed around the true mean (the population parameter) as we increase our sample size. Second,
notice that the variablity of the sample means decreases as sample size increases. The sample means are more tightly
233
www.ck12.org
clustered around the true mean. This variablity of sample means is called the standard error, s. You can think of
it as the standard deviation of a sampling distribution.
And here is one last piece of information to take away. The sampling distribution, as it becomes more normal
in shape, also adheres to the Empirical Rule. This means that certain proportions of the sample means will fall
within defined increments. In this case, each increment would be one standard error from the population parameter.
According to this rule, 34% of the sample means will fall within one standard error above the population parameter,
and another 34% will fall within one standard error below the population parameter. In addition, probability theory
says that 95% of the samples will fall within two standard errors of the true value, and 99.7% will fall within three
standard errors.
Lesson Summary
In this lesson, we have learned about probability sampling, which is the key sampling method used in survey
research. In the example presented above, the elements were chosen for study from a population by random
sampling. The sample size had a direct effect on the distribution of estimates of the population parameter. The
larger the sample size, the closer the sampling distribution was to a normal distribution.
Points to Consider
Does the mean of the sampling distribution equal the mean of the population?
If the sampling distribution is normally distributed, is the population normally distributed?
Are there any restrictions on the size of the sample that is used to estimate the parameters of a population?
Are there any other components of sampling error estimates?
Review Questions
The following activity could be done in the classroom, with the students working in pairs or small groups. Before
doing the activity, students could put their pennies into a jar and save them as a class, with the teacher also
contributing. In a class of 30 students, groups of 5 students could work together, and the various tasks could be
divided among those in each group.
1. If you had 100 pennies and were asked to record the age of each penny, predict the shape of the distribution.
(The age of a penny is the current year minus the date on the coin.)
2. Construct a histogram of the ages of the pennies.
3. Calculate the mean of the ages of the pennies.
Have each student in each group randomly select a sample of 5 pennies from the 100 coins and calculate the mean
of the five ages of the coins chosen. Have the students then record their means on a number line. Have the students
repeat this process until all of the coins have been chosen.
4. How does the mean of the samples compare to the mean of the population (100 ages)? Repeat step 4 using a
sample size of 10 pennies. (As before, allow the students to work in groups.)
5. What is happening to the shape of the sampling distribution of the sample means as the sample size increases?
234
www.ck12.org
Understand one of the more remarkable theorems in all of mathematics, the Central Limit Theorem.
Recognize the relationship between the Normal distribution and the Central Limit Theorem.
Introduction
In the previous lesson, you learned that sampling is an important tool for determining the characteristic of a
population. When we constructed a distribution of sample means, we saw that the sample means clustered around
the true mean. As the sample size increased, the shape of that distribution became more and more Normal. Although
the true mean of the population was unknown, random sampling yielded a reliable estimate. It is now time to learn
how one of the most remarkable theorems in statistics will allow us to estimate what is true in a population without
having to repeatedly sample!
Central Limit Theorem
The Central Limit Theorem is perhaps the most important theorem in statistics. It basically confirms what might
be an intuitive truth to you by now: that as you increase the sample size for a random variable, the distribution of
the sample means better approximates a normal distribution.
Why is that idea so important? The reason is simple. Here is what this theorem allows us to do: If we can select a
single sample of a known size from our population and calculate its mean, we can use the Central Limit Theorem to
predict what that true population mean must be within a defined degree of confidence. And, more importantly, this
holds true no matter what the shape of the original distribution. Thats pretty amazing.
Before going any further, you should become familiar with (or reacquaint yourself with) the symbols that are
commonly used when dealing with population values, sample statistics, and statistics of a sampling distribution
of means. These symbols are shown in the table below. Note that the notation x (x-bar) is used to represent each
value in a sampling distribution (rather than the random variable x) to indicate that each value is a sample mean.
TABLE 11.1:
Mean
Standard Deviation
Size
Population Parameter
Sample Statistic
x
s
n
Sampling Distribution
x
Sx or x
www.ck12.org
So how can we use the Central Limit Theorem to help us construct a sampling distribution without repeatedly
sampling? We use what we know about the population and our proposed sample size to sketch the theoretical
sampling distribution. Remember, to sketch a distribution we need to know its shape, center and spread.
Notes to remember:
Shape of the sampling distribution: As long as your sample size is 30 or greater, you may assume the
distribution of the sample means to be approximately normal. This is true regardless of the original distribution
of the random variable.
The mean of the distribution: The mean of a sampling distribution, as you saw in the last lesson, is the mean
of the population. Formally: x =
The standard error of the distribution: The standard deviation of the sample means can be estimated by dividing
the standard deviation of the population by the square root of the sample size. Formally: x =
n
So there you have it. You can use the three bullets points above to construct the sampling distribution. And then,
since the distribution will be normal with a sample size of 30 or more, you can use what weve learned about the
area under a Normal curve to calculate the probability of observing a particular sample mean!
Example A
The time it takes a student to complete the mid-term for Algebra II is a bi-modal distribution with = 1 hr and
= 1 hr. During the month of June, Professor Spence administers the test 64 times. What is the probability that the
average mid-term completion time for students during the month of June exceeds 48 minutes?
Solution
Important facts:
There are more than 30 samples, so the Central Limit Theorem applies.
The mean of the sample should approximate the mean of the population, in other words x =
The standard deviation of Professor Spences sample, x , can be calculated as x = , where n = 64 (the
n
number of tests/samples)
48 minutes is the same as 48
60 = 0.8 hrs, so the range we are interested in is x > 0.8 hrs
First calculate the standard deviation of the sample, using x = :
n
1
x =
64
x = 0.125
Since the sample is normally distributed, according to the CLT, we can use the standard deviation of the sample to
calculate the z-score of the minimum value in the relevant range, 0.80 hrs:
Z=
236
0.80 1
= 1.60
0.125
www.ck12.org
Finally, we use the z-score probability reference above to correlate the z-score of -1.60 to the probability of a value
greater than that
Example B
Evan price-checked 123 online auction sellers to record their average asking price for his favorite game. According
to a major nation price-checking site, the national average online auction cost for the game is $35.00 with a standard
deviation of $3.00. Evan found the prices less than $34.86 on average. How likely is this result?
Solution
Since there are more than 30 samples (123 > 30), we can apply the CLT theorem and treat the sample as a normal
distribution.
The standard deviation of the sample is: x = 3 =
123
The z-score for Evans price point of $34.86 is:
Z=
3
11.09
= .27
34.86 35 .14
=
= 0.518
.27
.27
Consulting the z-score probability table, we learn that the area under the normal curve less than 0.52 is .3015 or
30.15%
The likelihood of 123 samples having a mean of $34.86 is approximately 30.15%
Example C
Mack asked 42 fellow high-school students how much they spent for lunch, on average. According to his research
online, the amount spent for lunch by high school students nation wide has = $15, with = $9. We would assume
that Marks random sample should fall within this sampling distribution. What is the probability that Macks random
sample will have a value that is within $0.01 of the national average?
Solution
There are a few important facts to note here:
Macks sample is 42 students, since 42 30, he can safely assume that the sampling distribution of the sample
mean will be approximately normal, according to the Central Limit Theorem.
The range we are considering is $14.99 to $15.01, since that represents $0.01 above and below the mean.
The mean of the sample should approximate the mean of the population, in other words x =
The standard deviation of Macks sample, x , can be calculated as x = , where n = 42
n
Lets start by finding the standard deviation of the sample, x :
237
www.ck12.org
9
x =
42
9
=
6.48
x = 1.389
Since Macks sample of 42 samples can be assumed to be normally distributed, and since we now know the standard
x
deviation of the sample, 1.389, we can calculate the z-scores for the score at each end of the range using Z = x
x :
15.01 15.00
= +0.01
1.389
14.99 15.00
Z2 =
= 0.01
1.389
Z1 =
Finally, we look up Z1 and Z2 on the Z-score probability table and we calculate the probability associated with the
range from z= -0.01 to z= 0.01. That value is 50.4% - 49.6% = 0.80%
The probability that Macks sample will have a mean within $0.01 of the population mean of $15.00 is a little less
than 1%.
Lesson Summary
The Central Limit Theorem confirms the intuitive notion that as the sample size increases for a random variable, the
distribution of the sample means will begin to approximate a normal distribution, with the mean equal to the mean
of the underlying population and the standard deviation equal to the standard deviation of the population divided by
the square root of the sample size, n.
Vocabulary
The Central Limit Theorem states that if samples are drawn at random from any population with a finite
mean and standard deviation, then the sampling distribution of the sample means approximates a normal
distribution as the sample size increases beyond 30.
The sampling distribution of the sample means is a distribution of the means of multiple samples. It is
commonly assumed to be a normal distribution, though technically it is normal only if the sample size is
greater than 30.
Point to Consider
1. The time it takes to drive from Cheyenne WY to Denver CO has a of 1 hr and of 15 mins. Over the course
of a month, a highway patrolman makes the trip 55 times. What is the probability that his average travel time
exceeds 60 minutes?
2. Abbi polls 95 high school students for their G.P.A.. According to the school, the average G.P.A. of high school
students has a mean of 3.0, and standard deviation of .5. What is the probability that Abbis random sample
will have a mean within 0.01 of the population?
238
www.ck12.org
3. A receipe website has calculated that the time it takes to cook Sunday dinner has a of 1 hr with of 25 mins.
Over the course of a month, 172 users report their time spent cooking Saturday dinner, what is the probability
that the average user reports spending less than 45 mins cooking dinner?
Solutions
1. The sample mean, x is the same as the population mean: 1 hr = 60 mins.
15
The sample standard deviation is 15mins = 7.42
= 2.02 min
55
The 55 trips made by the patrolman exceed the minimum sample size of 30 required to apply the CLT, so we
may assume the sample means to be normally distributed.
6060
2.02
0
2.02
=0
The z-scores of the minimum and maximum values in the range of interest, 2.99 to 3.01 is:
2.99 3.00 .01
=
= 0.2
.05
.05
3.01 3.00 .01
Z2 =
=
= +0.2
.05
.05
Z1 =
Referring to the z-score reference table, the z-scores -0.2 and 0.2 cover a range of apx. 15.86%
3. The sample mean, x is the same as the population mean: 1 hr = 60 mins.
25
mins = 13.11
The sample standard deviation is 25
= 1.91 min
172
The 172 users reporting cooking times exceed the minimum sample size of 30 required to apply the CLT, so
we may assume the sample means to be normally distributed.
4560
1.91
15
1.91
= 7.85
1. A random sample of size 30 is selected from a known population with a mean of 13.2 and a standard deviation
of 2.1. Samples of the same size are repeatedly collected, allowing a sampling distribution of sample means
to be drawn.
a. What is the expected shape of the resulting distribution?
b. Where is the sampling distribution of sample means centered?
239
www.ck12.org
240
www.ck12.org
Learning Objectives
Introduction
This lesson introduces the branch of statistics called inferential statistics. Earlier, we used descriptive statistics to
organize and describe our data, or to explore relationships between quantitative and categorical variables. The goal
of inferential statistics is to use sample data to increase our knowledge about the entire population. The remainder
of this text deals with the different kinds of inferential statistical methods that you can use to test ideas about what
is true in a population. In this section, we will focus on estimation. Estimation is the inferential technique used to
estimate the true value of a population parameter, typically a mean, from a sample.
Confidence Intervals
A sample mean can be referred to as a point estimate of a population mean. We call a sample mean a point estimate
because this single number is used as a plausible value of the population mean. Keep in mind that some error is
associated with any estimate - the true population mean may be larger or smaller than the sample mean.
But not many of us would feel particularly confident in a point estimate. For example, lets say you wanted to
know what the average SAT was for students at a particular college. You asked a few students while visiting, and
the sample mean was 1280. Would you feel comfortable saying, "The average SAT at this school is 1280." You
probably would realize that there is some sampling error involved. The true average SAT may be somewhat higher
or somewhat less.
An alternative to reporting a point estimate is identifying a range of possible values the parameter might take. This
range of possible values is known as a confidence interval. Associated with each confidence interval is a confidence
level. This level indicates the level of assurance you have that the resulting confidence interval encloses the unknown
population mean.
The general concept of confidence intervals is pretty intuitive: It is easier to predict that an unknown value will
lie somewhere within a wide range than to predict it will occur within a narrow range (a single value!).
A confidence interval is always centered around the mean of your sample. To construct the interval, you add
a margin of error. The margin of error is found by multiplying the standard error of the mean by the z-score of the
percent confidence level:
241
www.ck12.org
margin of error = Z
n
The end result looks something like this: we are 95% confident that the true average SAT for this college falls
between 1210 and 1330. Sometimes, the confidence interval is expressed like this: (1210, 1330).
What do we mean by confidence level? Common choices for the confidence level are 90%, 95%, and 99%. The
selection of a confidence level determines the probability that the confidence interval produced will contain the true
parameter value. So a confidence level of 99% is higher than a confidence level of 95%. The interval constructed
with 99% confidence will have a higher chance of containing the true mean than an interval constructed with 95%
confidence.
FIGURE 11.1
The confidence level derives from our understanding of the Central Limit Theorem. Think about the sampling
distribution for a population mean. If the sample size is at least thirty, the sampling distribution of the mean will be
nearly normal. Therefore, any sample mean that we draw has a 95% chance of falling within 1.96 standard errors
of the true population mean. (Remember your z scores).
So, if we want to construct an interval estimate for our population mean with 95% confidence, we want to add a
margin of error that corresponds to 1.96 standard errors.
So wouldnt we always want to construct an interval with the highest confidence level possible? Maybe not and
here is why. The more confidence in the interval, the wider it becomes. This means that youve lost precision and
this could be a problem. Lets read on to see why that is.
Example A
Suppose the average height of a sample of 100 women is 50 500 , in other words, X = 50 500 . Within what range of heights
can we expect the population mean to be, with 95% confidence? Assume a standard deviation for the population of
1.500 .
Solution
Here is what we know:
242
www.ck12.org
Suppose you had 40 samples of bags of candy, each of which contains some number of pieces. The number of
pieces in each bag is said to be normally distributed. The mean number of candies in your sample is 38 pieces. The
standard deviation for bags in the population is 2 pieces. What is the average number of candies in each bag in the
population? To answer this question, you need to create a confidence interval. Lets assume that you have been
asked to report a confidence interval with 99% certainty.
Solution
Since the population is normally distributed, we can state that the mean of the sample follows the Empirical Rule.
2
The standard error of the mean is calculated as , so SEM = 2 = 6.32
= .316
n
40
Saying that you expect 99 out of each 100 samples contain the population mean, is the same as saying that the
interval has a 99% confidence level.
38 z0.005 .316
38 2.58 .316
38 0.81528
Therefore, the confidence interval is approximately 37.18 to 38.82. We are 99% confident that the average
number of pieces in each bag of this candy in the population falls between 37 and 39 pieces (with rounding).
Well, not exactly
You might have noticed in all the examples of confidence intervals above that we used the population standard
deviation () to calculate the standard error. In reality, this would never happen. In real life, you would only
know the sample standard deviation (s). And whats the difference, you might ask?. Its very simple: if you only
know sample statistics, you cant use z in the confidence interval formula. You have to use a different statistic
called t. You havent been introduced to t yet, so we stuck with z. That meant we had to make the crazy assumption
that you actually knew the standard deviation of the population! Think about how odd that would be you are
trying to estimate the unknown mean of a population, but somehow you know the standard deviation.
243
www.ck12.org
You will be introduced to the t statistic in the next chapter. The logic of the confidence interval will be the same but
the formula will change slightly:
confidence interval = x margin of error
s
margin of error = t
n
We wanted you to see this formula now, as it is the "real" formula for confidence intervals.
The most common mistake made by persons interpreting a confidence interval is claiming that a confidence level
indicates the probability that the mean of the population will occur within your interval! This is not true. Your
interval either does - or does not - contain the true population mean.
What a 95% confidence interval means is that if you took 100 samples, all of the same size, and formed 100
confidence intervals, 95 of these intervals would capture the population mean. The probability is attributed to the
method, not to any particular confidence interval. This means if you repeated this sampling procedure 100 times, 95
of the intervals produced would contain the population mean. The confidence level indicates the number of times
out of 100 that the mean of the population will be within the given interval of the sample mean.
Suppose you plot the mean of each of 50 height samples on a graph, and drawing a line each way of the mean of
each sample to represent 2 standard deviations. If you were to do this for all 50 of the samples, you might end up
with an image like the one below.
At the top of the image is a normal curve. Each of the lines below the curve has a length that represents a 95%
confidence interval, centered on the mean (in red) of the sample.
a. What is indicated by the lines that are all red in color?
b. What value is indicated by the vertical red center line on each interval?
c. What does the percent hit number mean? How would it change if you were to continue taking more and
more samples of 60 each?
244
www.ck12.org
Solutions
a. The lines that are colored entirely red have a mean that is greater than 2 standard deviations away from
the population mean. In other words, the mean of those two samples was not within the stated confidence
interval (95%).
b. The vertical red center line represents the mean of each sample.
c. The percent hit number indicates the percentage of times that the population mean was included in the
confidence interval of sample means. If you were to continue plotting sample means and confidence intervals,
the percent hit would approach 95%. In fact, here is the same graph after 1000 sample runs:
Lesson Summary
In this lesson, you learned that a sample mean is known as a point estimate, because this single number is used as a
plausible value of the population mean. In addition to reporting a point estimate, you discovered how to calculate an
interval of reasonable values based on the sample data. This interval estimator of the population meanis called
the
. The
n
value of z 2 is different for each confidence interval of 90%, 95%, and 99%. You also learned that the probability is
attributed to the method used to calculate the confidence interval.
confidence interval. You can calculate this interval for the population mean by using the formula x z 2
Points to Consider
Does replacing with s change your chance of capturing the unknown population mean?
Is there a way to increase the chance of capturing the unknown population mean?
Vocabulary
A confidence interval is the interval within which you expect to capture a specific value. The confidence
interval width is dependent on the confidence level.
A confidence level is the probability value associated with a confidence interval.
245
www.ck12.org
More Practice
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
TABLE 11.2:
148
298
210
213
315
129
145
148
131
281
317
1. In a local teaching district, a technology grant is available to teachers in order to install a cluster of four
computers in their classrooms. From the 6,250 teachers in the district, 250 were randomly selected and asked
if they felt that computers were an essential teaching tool for their classroom. Of those selected, 142 teachers
felt that computers were an essential teaching tool.
(a) Calculate a 99% confidence interval for the proportion of teachers who felt that computers are an
essential teaching tool.
(b) How could the survey be changed to narrow the confidence interval but to maintain the 99%
246