Professional Documents
Culture Documents
Ik.
Probability:
The
M a t h e m a t i c s
of 3
C h a n c e
probability Model
DEFINITION
A phenomenon or trial is said to be random if individual outcomes are uncertain but the long-term pattern of many individual outcomes is predictable.
247
248
EXAMPLE
10
50100
5001000
Number of tosses
5000
Tossing a Coin
When you toss a coin, there are only two possible outcomes, heads or tails. Figure
8.1 shows the results of tossing a coin 5000 times twice. For each number of tosses
from 1 to 5000, we have plotted the proportion of those tosses that gave a head.
Trial A (red line) begins tail, head, tail, tail. You can see that the proportion of heads
for Trial A starts at 0 on the first toss, rises to 0.5 when the second toss gives a head,
then falls to 0.33 and 0.25 as we get two more tails. Trial (blue line), on the other
hand, starts with five straight heads, so the proportion of heads is 1 until the sixth toss.
The proportion of tosses that produce heads is quite variable at first. Trial A
starts low and Trial starts high. As we make more and more tosses, however, the
proportions of heads for both trials get close to 0.5 and stay there. If we made yet
a third trial at tossing the coin a great many times, the proportion of heads would
again settle down to 0.5 in the long run. We say that 0.5 is the probability of a head.
The probability 0.5 appears as a horizontal line on the graph.
Probability
DEFINITION
The Probability applet (see Applet Exercise 1) animates Figure 8.1. It allows you
to choose the probability of a head and simulate any number of tosses of a coin
with that probability. Try it. You will see that the proportion of heads gradually settles down close to the probability. Equally important, you will also see that the proportion in a small or moderate number of tosses can be far from the probability.
Probability describes only what happens in the long run. Random phenomena are irregular and unpredictable in the short run.
We might suspect that a coin has probability 0.5 of coming up heads just because the coin has two sides. As Exercise 1 illustrates, such suspicions are not always correct. The idea of probability is empirical. That is, it is based on observation rather than theorizing. Probability describes what happens in very many trials,
and we must actually observe many trials to pin down a probability.
Gamblers have known for centuries that the fall of coins, cards, and dice displays clear patterns in the long run. In fact, a question about a gambling game
launched probability as a formal branch of mathematics. The idea of probability
rests on the observed fact that the average result of many thousands of chance outcomes can be known with near certainty. But a definition of probability as "longrun proportion" is vague. Who can say what "the long run" is? We can always toss
the coin another 1000 times. Instead, we give a mathematical description of how
probabilities behave, based on our understanding of long-run proportions. To see how
to proceed, think first about a very simple random phenomenon, tossing a coin
once. When we toss a coin, we cannot know the outcome in advance. What do we
know? We are willing to say that the outcome will be either heads or tails. We believe that each of these outcomes has probability 1/2. This description of coin tossing has two parts:
249
250
This description is the basis for all probability models. Here is the vocabulary
we use.
Sample Space
DEFINITION
The sample space 5 of a random phenomenon is the set of all possible outcomes
that cannot be broken down further into simpler components.
Event
DEFINITION
DEFINITION
The sample space S can be very simple or very complex. When we toss a coin
once, there are only two outcomes, heads and tails. So the sample space is S = (H, T}.
If we draw a random sample of 1000 U.S. residents age 18 and over, as opinion polls
often do, the sample space contains all possible choices of 1000 of the more than
230 million adults in the country. This S is extremely large: 1.3 X 105794. Each member of S is a possible opinion poll sample, which explains the term sample space.
EXAMPLE
Probabilities can be hard to determine without detailing or diagramming the sample space. For example, E. P. Northrop notes that even the great eighteenth-century
French mathematician Jean le Rond d'Alembert tripped on the question: "In two
coin tosses, what is the probability that heads will appear at least once?" Because
the number of heads could be 0, 1 or 2, d'Alembert reasoned (incorrectly) that each
of those possibilities would have an equal probability of 1/3, and so he reached the
(wrong) answer of 2/3. What went wrong? Well, {0, 1, 2} could not be the fully-detailed sample space because "1 head" can happen in more than one way. For example, if you flip a dime and a penny once each, you could display the sample space
with a table:
Dime
Another way is with a tree diagram, in which all possible left-to-right pathways
through the branches generate outcomes.
Either way, we can see that the sample space has 4, not 3, equally likely outcomes:
{HH, HT, TH, TT}. With the table or tree diagram in front of us, you may already
see that the correct probability of at least 1 head is not 2/3, but 3/4.
EXAMPLE 3
Pair-a-Dice: Outcomes for Rolling Two Dice
Rolling two dice is a common way to lose money in casinos. There are 36 possible
outcomes when we roll two dice and record the up faces in order (first die, second
die). Figure 8.2 displays these outcomes. They make up the sample space S.
If the dice are carefully made, experience shows that each of the 36 outcomes
in Figure 8.2 comes up equally often. So a reasonable probability model assigns
probability 1/36 to each outcome.
In craps and most other games, all that matters is the sum of the spots on the
up faces. Let's change the random outcomes we are interested in: Roll two dice and
count the spots on the up faces. Now there are only 11 possible outcomes, from a
sum of 2 (for rolling a double 1) to a sum of 12 (for rolling a double 6). The sample space is now
5 = {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}
Comparing this S with Figure 8.2 reminds us that we can change S by changing the detailed description of the random phenomenon we are describing. The outcomes in this new sample space are not equally likely, because there are six ways to
roll a 7 and only one way to roll a 12. The probability aspect of this example is developed further in Example 4.
There are many ways to assign probabilities, so it is convenient to start with
some general rules that any assignment of probabilities to outcomes must obey.
These facts follow from the idea of probability as "the long-run proportion of
(George Diebold/Stone/
Getty Images.)
252
Green
HQ H i H i H I H I
I-
B l^sn
I
I H1
CXI
[
I' I
l X JHiglCIi
repetitions on which an event occurs." Some rules apply only to special kinds of
events, which we define here:
Complement of an Event
DEFINITION
The complement of an event A is the event that A does not occur, written as Ac.
Disjoint Events
DEFINITION
Two events are disjoint events if they have no outcomes in common. Disjoint
events are also called mutually exclusive events.
Independent Events
DEFINITION
Two events are independent events if the occurrence of one event has no effect on the probability of the occurrence of the other event.
1.
2.
3.
The probability that an event does not occur is 1 minus the probability
that the event does occur. If an event occurs in (say) 70% of all trials, it fails
to occur in the other 30%. The probability that an event occurs and the
probability that it does not occur always add to 100%, or 1 (see Figure 8.3).
4.
253
If two events are independent, then the probability that one event and the
other both occur is the product of their individual probabilities. Consider
event A is "red die is a 1 or 2" and event is "green die is 6." The red die
and green die logically have no influence over each other's outcomes, but we
can also look at Figure 8.2 and see that the chance of being in the top two
rows does not affect and is not affected by the chance of being in the sixth
column. And so Rule 4 for independent events applies and the probability
that A and both happen is the product (l/3)(l/6) = 1/18. Note that we can
also see from Figure 8.2 that the intersection or "overlap" of events A and
happens in 2 of the 36 outcomes and 2/36 = 1/18. Also, since A and
overlap, they are not disjoint, even though the everyday use of the word
"independent" might (incorrectly) suggest that kind of separateness.
5. The probability that one event the other occurs is the sum of their
individual probabilities minus the probability of their intersection. This
general addition rule makes sense if we look at Rule 5 in Figure 8.3. Simply
adding the probabilities of the two events would overshoot the answer
because we would be incorrectly "double-counting" the overlap. The way to
adjust for this is to subtract the overlap so that it is counted only once. Note
that the mathematical "or" is inclusive, which means that the event "A or B"
happens as long as at least one of the two events happens. In the set theory,
it is the union of A and B, which includes A's and B's "separate property" as
well as their "community property." Consider event A is "red die is a perfect
square," which has probability of 2/6. Consider event is "red die is an odd
number" (that is, 1, 3, or 5), which has probability of 3/6. The intersection of
events A and corresponds to rolling a "1," which has a probability of 1/6.
So the probability that A or occurs is 2/6 + 3/6 - 1/6 = 4/6 = 2/3. Notice
that if events A and had been disjoint, there would be no intersection to
worry about double counting and this rule would simply turn into this next one:
6. If two events are disjoint, the probability that one or the other occurs is
the sum of their individual probabilities. If one event occurs in 40% of all
trials, a different event occurs in 25% of all trials, and the two can never
occur together, then one or the other occurs on 65% of all trials because
40% + 25% = 65%.
We can use mathematical notation to state Rules 1 to 6 more concisely. We use
capital letters near the beginning of the alphabet to denote events. If A is any event,
we write its probability as P(A). Here are our probability facts in formal language.
As you apply these rules, remember that they are just another form of intuitively
true facts about long-run proportions.
Probability Rules
Rule
Rule
Rule
Rule
Rule
Rule
I.
2.
3.
4.
5.
6.
The
If S
The
The
The
The
Rule 3
Rule 5
Rule 6
254
SPOTLfGHT
8.1
EXAMPLE
Figure 8.2 displays the 36 possible outcomes of rolling two dice. For casino dice, it
is reasonable to assign the same probability to each of the 36 outcomes in Figure
8.2. Because all 36 outcomes together must have probability 1 (Rule 2), each outcome must have probability 1/36.
What is the probability of rolling a sum of 5? Because the event "roll a sum of
5" contains the four outcomes displayed in Figure 8.2, the addition rule for disjoint
events (Rule 6) says that its probability is
/-(roll a sum o f 5 ) = / ' | Q ) + (
= + + +
36
36
36
+ (
) + p(
36
--0.111
Continue using Figure 8.2 in this way to get the full probability model (sample space
and assignment of probabilities) for rolling two dice and summing the spots on the
up faces. Here it is:
Outcome
Probability
36
36
3
36
4
36
10
11
12
5
36
4
36
3
36
2
36
l
36
This model assigns probabilities to individual outcomes. Note that Rule 2 is satisfied because all the probabilities add up to 1. To find the probability of an event,
just add the probabilities of the outcomes that make up the event. For example:
255
36
A
36
A
36
JL
36
= i i = I
36
2
What is the probability of rolling any sum other than a 5? The "long way" to
find this would be
P(2) + P(3) + P( 4) + P( 6) + P( 7) + P( 8) + P(9) + P(10) + P ( l l ) + P( 12).
A much better way would be to use the complement rule (Rule 3):
/(roll sum that is not 5) = 1 P(roll sum of 5)
4
32
= 1 - = = 0.889
36
36
Another good time to use the complement rule would be to find the probability of getting a sum greater than 3. Compare the calculation of ^(sum > 3) with
1 - P(sum < 3).
For an example of Rule 5, let event A be "sum is odd" and event be "sum is a
multiple of 3." We previously calculated P(A) = 1/2. You can verify that P(B) = 1/3
and P{A and B) is 1/6. And so, P(A or B) = 1/2 + 1/3 - 1/6 = 2/3.
When the outcomes for a probability model are numbers, we can use a histogram to display the assignment of probabilities to the outcomes. Figure 8.4 is a
probability histogram of the probability model in Example 4. The height of each
bar shows the probability of the outcome at its base. Because the heights are probabilities, they add to 1. Think of Figure 8.4 as an idealized picture of the results
of very many rolls of a die. As an idealized picture, it is perfectly symmetric.
The probability of an
& is 4 =0.14.
0.20
0.15--
FIGURE 8.4
Probability histogram
showing the probability
model for rolling two
balanced dice and
counting the spots on
the up faces.
0.100.050.0
7 &
Outcome
10
11 12
Example 4 illustrates one way to assign probabilities to events: Assign a probability to every individual outcome, then add these probabilities to find the probability of any event. This idea works well when there are only a finite (fixed and limited) number of outcomes.
256
DEFINITION
A probability model is called discrete if its sample space has a countable number
of outcomes. To assign probabilities in a discrete model, list the probability of
all the individual outcomes. By Rules 1 and 2, these probabilities must be numbers between 0 and 1 inclusive and must have sum 1.
The probability of any event is the sum of the probabilities of the outcomes making up the event.
EXAMPLE
Benford's Law
Faked numbers in tax returns, invoices, or expense account claims often display patterns that aren't present in legitimate records. Some patterns, like too many round numbers, are obvious and easily avoided by a clever crook. Others are more subtle. It is a
striking fact that the first (leftmost) digits of numbers in legitimate records often follow a model known as Benford's law. Here it is (note that a first digit can't be 0):
First digit
Probability
1
0.301
0.176
0.125
0.097
5
0.079
0.067
0.058
8
0.051
9
0.046
Check that the probabilities of the outcomes sum exactly to 1. This is therefore a legitimate discrete probability model. Investigators can detect fraud by comparing the
first digits in records such as invoices paid by a business with these probabilities. For
example, consider the events A = "first digit is 1" and "first digit is 2." Applying Rule 6 to the table of probabilities yields P(A or ) = 0.301 + 0.176, which is
0.477 (almost 50%). Crooks trying to "make up" the numbers probably would not
make up numbers starting with 1 or 2 this often.
Let us use some intuition about why first digits behave this way. Note that the
increase from 1 to 2 is an increase of 100%, but from 2 to 3 is only 50%, from 3
to 4 is only 33%, and so on. So data values that increase at an approximately constant percentage (which a lot of financial data does, for example) will naturally "spend
more time" (within any particular power of 10) taking on values whose left digit is
1, and successively less for larger left-digit numbers.
Outcomes
PROCEDURE
257
As an aside, a less common way of expressing likelihood that you may encounter
in some gambling contexts is odds. The odds of an event A happening can be expressed 3.S!
EXAMPLE
You might think that first (leftmost) digits are distributed "at random" among the
digits 1 to 9. Under such a "discrete uniform distribution," the 9 possible outcomes
would then be equally likely. The sample space is S = (1, 2, 3, 4, 5, 6, 7, 8, 9}, and
the probability model is:
First digit
Probability
J_
9
1
9
l
9
l
9
l
9
0.4-
0.4 +
Mean = 5
Mean = 3.441
f 0.3-1
0.2-
0.1
0.0-0
44 5 6
First digit
()
4
5 6
First digit
(b)
258
DEFINITION
EXAMPLE
DNA Sequences
EXAMPLE
Baseball Lineups
A Major League Baseball team has 25 players on the active roster who are eligible
to play in a game. At the start of the game, the manager gives the officiating crew
a list of the team's 9 hitters who will begin the game and in what order they will
bat. Like Example 7, order matters here. Unlike Example 7, listing the same item
more than once is not allowed.
Any of the 25 players can be chosen to bat first, but only the remaining 24 players are available to be listed as the second batter, so that there are 25 X 24 choices
for the first two batters. Any of these choices leaves 23 batters for the third position, and so on. The number of different batting lineups is almost a trillion:
25 X 24 X 23 X 22 X 21 X 20 X 19 X 18 X 17 = 741,354,768,000
(.REUTERS7
Ray Stubblebine/Landov.)
Permutation
DEFINITION
A permutation is an ordered arrangement of k items that are chosen without replacement from a collection of n items. It can be notated as P(n, k), nPk or Pi
and has the formula
nPk = n X (n - 1) X X (n - k + 1), which is Rule B.
Examples 7 and 8 both involve counting the number of arrangements of distinct items. They can each be viewed as specific applications of the fundamental
principle of counting, and it is easier to think your way through the counting than
to memorize a recipe. Nevertheless, because these two situations occur so often, they
deserve to be given their own formal recognition as Rules A and B, respectively:
Counting Arrangements of Distinct Items
RULE
EXAMPLE
1).
Four-Letter Words
Suppose you have 4 cards that are labeled T, S, O, and P. How many four-letter sequences can be created? Since there are only 4 cards, the only way to make a fourletter sequence is to use each letter exactly once, so there are no repeats. So this is
a permutation by Counting Rule B, with n and k both equal to 4. To think through
the problem, proceed like this: Any of the 4 letters can be chosen first; then any of
the 3 that remain can be chosen second; and so on. The number of permutations
is therefore 4 X 3 X 2 X 1 = 24.
It turns out that 6 of these 24 four-letter sequences are actually words in the
English language (see if you can find them all), so the probability that a permutation chosen at random will actually be a word would be 6/24 = 1/4.
Example 9 shows us that the permutation of all n elements of a collection yields the
product of the first n positive integers. This expression is special enough to have its
own name-factorialand is also used in Chapter 11.
Factorial
DEFINITION
For a positive integer n, " factorial" is notated n\ and equals the product of the
first n positive integers:
n x (n - 1) x {n - 2) x x 3 x 2 x l.
By convention, we define 0! to equal 1, not 0, which can be interpreted as saying there is one way to arrange zero items.
259
260
Factorial notation allows us to write a long string of multiplied factors very compactly.
For example, the expression for permutations in Rule can now be rewritten as
- k)\
(You can verify this is equivalent by "canceling" the factors common to the numerator and denominator. These common factors are the positive integers from 1 to n k.)
EXAMPLE
10
In a typical state or multi-state lottery game, you win (at least a share of) the jackpot
as long as the collection of numbers you pick is the same collection that the Lottery
selects. Repetition is not allowed: The same number can't be picked twice in the same
drawing. Unlike permutations, order does not matter here. It doesn't matter what order the numbered ping pong balls come out of the mixing chamberall that matters
is what numbers are selected to be in that drawing's group of winners.
So while we can't use the permutation approach of Example 8 here, we can use
a modification of it. The number of ordered sets will be much larger than the number of unordered sets since the lottery drawing {2, 14, 15, 21, 30, 33} is the same set
of balls as {15, 2, 30, 14, 33, 21}, for example. But from the technique of Example
9, we can see that there would be 6! ways to arrange any particular set of 6 distinct
balls. So the number of collections of lottery balls will simply be the number of permutations divided by k\ In a lottery where a jackpot requires choosing the right set
of 6 numbers out of a collection of 46 numbers, there are
(46)(45)(44)(43)(42)(4i)
(6)(5)(4)(3)(2)(1)
9,366,819 possible sets of numbers and so the probability of your ticket winning (at
least a share of) the jackpot is 9,366,819 . The scenario of choosing an unordered subset of k balls from a collection of n different balls is called a combination.
Combination
DEFINITION
If it's hard to remember the difference between combinations (Rule D) and permutations (Rule B), consider this: If you order a "combination platter" at a diner, you're
asking for a certain set of foods to be on your plate, but you don't care what order
they're in. Also, you can use this memory aid: "Permutations Presume Positions;
Combinations Concern Collections." For completeness, we also provide a formula
(Rule C) for unordered collections in which repetition is allowed, but we cannot
give a simple explanation in the space we have, and we will not emphasize it.
Counting Unordered Collections of Distinct Items
Rule C. Suppose we have a collection of n distinct items. We want to select k of
those items with no regard to order, and any item can appear more than once in
the collection. The number of possible collections is + k ~
r
k\(n-1)!
C H A P T E R 8 P r o b a b i l i t y : T h e M a t h e m a t i c s of C h a n c e
261
This table summarizes all four ways we have seen of choosing items from a collection of distinct items:
Choosing k items from n distinct items
Repetition is allowed
Rule {permutation): , *. =
Rule A:
( - )!
n1 X X X 1 = nk
X (n - 1) X X (n - k + 1)
n is multiplied by itself k times
Order
does
matter
Order does
not matter
SPOTLIGHT
8.2
Rule C:
Rule D {combination):
(n + k- 1)!
k\(n - 1)!
!
k\{n - k)\
Combinatorics Calculations
expression n X (n 1) X X (n - + 1) will
of n, but a scientific
calculator
calculator
graphing calculator
For permutations,
calculators
PRB
For combinations,
n P r ( E N T E R ) 3 (ENTEFp.
use nCr instead of nPr.
FIGURE 8.6
This
spinner chooses a
n u m b e r between 0 and 1
at r a n d o m . T h a t is, it is
e q u a l l y likely t o s t o p at
a n y p o i n t o n t h e circle.
262
Density Curve
DEFINITION
EXAMPLE
11
The random-number generator will spread its output uniformly across the entire interval from 0 to 1 if we allow it to generate many numbers. The results of many trials are represented by the density curve of a uniform probability model. This density
curve appears in red in Figure 8.7. It has height 1 over the interval from 0 to 1, and
height 0 everywhere else. The area under the density curve is 1, the area of a square
with base 1 and height 1. The probability of any event is the area under the density curve and above the event in question.
As Figure 8.7a illustrates, the probability that the random-number generator produces a number X between 0.3 and 0.7 inclusive is
/>(0.3 < X < 0 . 7 ) = 0.4
because the rectangular area under the density curve and above the interval from
0.3 to 0.7 is 0.4. The area of a rectangle is the product of height and length and the
height of this density curve is 1, so the probability of any interval of outcomes will
just be the length of the interval: 0.7 0.3 = 0.4.
Also, we can apply Probability Rule 3 to non-overlapping intervals such as:
P(X< 0.5 o r I > 0.8) = P(X< 0.5) + P(X> 0.8)
= 0.5
+ 0.2
= 0.7
The last event consists of two nonoverlapping intervals, so the total area above the
event is found by adding two areas, as illustrated by Figure 8.7b. This assignment
of probabilities obeys all of our rules for probability.
263
EXAMPLE
12
IpV-P)
n
=
-*
(approximately)
The 68-95-99.7 rule now gives probabilities for the value o f p from a single SRS.
The probability is 0.95 that p lies between 0.58 and 0.62. Figure 8.8 shows this probability as an area under the normal density curve.
All that is new is the language of probability. "Probability is 0.95" is shorthand
for "95% of the time in a very large number of samples."
Standard
deviation 0.01
0.53
0.59
0.60
0.61
Mean 0.60
0.62
0.65
264
+ ($0 x } ) = $5
Bet B, on the other hand, pays out $10,000 on 1/10 of all bets in the long run. So
bet B's average payoff is
$10,000 X
j + ^$0 X - ^ - j = $1000
If you can place many bets, you should certainly choose B. Here is a general definition of the kind of "average outcome" we used to compare the two bets.
Mean of a Discrete Probability Model
DEFINITION
+ X2p2
Xkpk-
In Chapter 5, we met the mean x, the average of n observations that we actually have in hand. The mean /, on the other hand, describes the probability model
rather than any one collection of observations. The Greek letter mu (/x) is pronounced
"myoo." You can think of / as a theoretical mean that gives the average outcome
we expect in the long run. You will sometimes see the mean of a probability model
called the expected value. This isn't a very helpful name, because we don't necessarily expect the outcome to be close to the mean.
EXAMPLE
13
First Digits
If first digits in a set of records appear "at random," the probability model for the
first digit is as in Example 6:
First digit
Probability
265
If, on the other hand, the records obey Benford's law, the distribution of the first digit is
First digit
Probability
0.301
7
0.176
0.125
0.097
0.079
0.067
0.058
0.051
0.046
The mean is
/x = (1)(0.301) + (2)(0.176) + (3)(0.125) + (4)(0.097) + (5)(0.079) +
(6)(0.067) + (7)(0.058) + (8)(0.051) + (9)(0.046)
= 3.441
The means reflect the greater probability of smaller first digits under Benford's law.
We have marked the means on the probability histograms in Figure 8.4. Because the
histogram for random digits is symmetric, the mean lies at the center of symmetry.
We can't locate the mean of the right-skewed Benford's law model by eye; calculation is needed.
What about continuous probability models? Think of the area under a density
curve as being cut out of solid homogenous material. The mean /jl is the point at
which the shape would balance. Figure 8.9 illustrates this interpretation of the mean.
The mean lies at the center of symmetric density curves such as the uniform density
in Figure 8.7 and the normal curve in Figure 8.8. Exact calculation of the mean of a
distribution with a skewed density curve requires advanced mathematics. The idea that
the mean is the balance point of the probabilities applies to discrete models as well
(see Section 5.4), but in the discrete case we have a formula that gives us this point.
The mean / is an average outcome in two senses. The definition for discrete
models says that it is the average of the possible outcomes not weighted equally but
weighted by their probabilities. More likely outcomes get more weight in the average. An important fact of probability, the law of large numbers, says that / is the
average outcome in another sense as well.
Law of Large Numbers
THEOREM
Observe any random phenomenon having numerical outcomes with finite mean
fx. According to the law of large numbers, as the random phenomenon is repeated a large number of times,
the proportion of trials on which each outcome occurs gets closer and closer
to the probability of that outcome, and
the mean x of the observed values gets closer and closer to /.
266
These facts can be stated more precisely and then proved mathematically. The
law of large numbers brings the idea of probability to a natural completion. We first
observed that some phenomena are random in the sense of showing long-run regularity. Then we used the idea of long-run proportions to motivate the basic laws of
probability. Those laws are mathematical idealizations that can be used without interpreting probability as proportion in many trials. Now the law of large numbers
tells us that in many trials the proportion of trials on which an outcome occurs will
always approach its probability.
The law of large numbers also explains why gambling can be a business. The
winnings (or losses) of a gambler on a few plays are uncertain-that's why gambling
is exciting. It is only in the long run that the mean outcome is predictable. The house
plays many tens of thousands of times. So the house, unlike individual gamblers,
can count on the long-run regularity described by the law of large numbers. The average winnings of the house on tens of thousands of plays will be very close to the
mean of the distribution of winnings. Needless to say, gambling games have mean
outcomes that guarantee the house a profit.
We know that the simplest description of a distribution of data requires both a
measure of center and a measure of spread. The same is true for probability models. The mean is the average value for both a set of data and a discrete probability
model. All the observations are weighted equally in finding the mean x for data, but
the values are weighted by their probabilities in finding the mean / of a probability
model. The measure of spread that goes with the mean is the standard deviation.
For data, the standard deviation s is the square root of the average squared deviation
of the observations from their mean. We apply exactly the same idea to probability
models, using probabilities as weights in the average. Here is the definition.
Standard Deviation of a Discrete Probability Model
DEFINITION
Suppose that the possible outcomes x\, x2, . . , Xk in a sample space S are numbers, and that pj is the probability of outcome xj. The standard deviation a of a
discrete probability model with mean fx is
a = V ( x i ~ I)2P\ + (x2 ~ P)2pi +
EXAMPLE
14
+ (xk ~ P)2pk
First Digits
If the first digits in a set of records obey Benford's law, the discrete probability model
is
First digit
Probability
1
0.301
0.176
0.125
0.097
5
0.079
0.067
0.058
8
0.051
9
0.046
We saw in Example 13 that the mean is fx = 3.441. To find the standard deviation,
a =
267
You can follow the same pattern to find the standard deviation of the equally likely
model and show that the Benford's law model is less spread out than the equally
likely model.
Finding the standard deviation of a continuous probability model usually requires advanced mathematics (calculus). Chapter 5 told us the answer in one important case: The standard deviation of a normal curve is the distance from the center
(the mean) to the change-of-curvature points on either side.
SPOTLIGHT
Birthday Coincidences
364
365
363
365
" '
343
365
3 65 x 3 64 x . . . x 343
365 2 3
268
The word "limit" in Central Limit Theorem reflects that the normal curve is the
limit or target shape to which the sampling distribution gets closer and closer as the
sample size increases. The theorem also tells us the mean of the sampling distribution, and the mean is a measure of "central" tendency.
Central Limit Theorem
THEOREM
Draw an SRS of size n from any large population with mean /x and finite standard deviation or. Then
The Central Limit Theorem says that the sampling distribution of x is approximately normal when the sample size n is large (n > 30).
The first two parts of this statement can be proved from the definitions of
the mean and the standard deviation. They are true for any sample size n. The
Central Limit Theorem is a much deeper result. Pay attention to the fact that the
standard deviation of a mean decreases as the number of observations n increases.
Together with the Central Limit Theorem, this makes exact two general statements
that help us understand a wide variety of random phenomena:
Averages are less variable than individual observations.
Averages are more normal than individual observations.
The Central Limit Theorem applet allows you to watch the Central Limit Theorem in action: It starts with a distribution that is strongly skewed, not at all normal.
As you increase the size of the sample, the distribution of the mean x gets closer
and closer to the normal shape.
Consider dice. Rolls of a single die would have a uniformly flat probability histogram, with each of the six possible values having the probability 1/6. Now consider the mean of rolling a pair of dice. The probability model for the mean of two
dice simply divides by 2 the outcome sum in Example 4. (So, the probability that
the mean of two dice equals 4.5 must be the same as the probability that their sum
equals 9.) And the histogram in Figure 8.4 is certainly less variable and closer to
looking "normal" than is the flat histogram for rolling a single die.
EXAMPLE
15
269
2.5
V25
= 0.5 inch
The standard deviation a describes the variation when we measure many individual
women. The standard deviation <r/V of the distribution of x describes the variation in the average heights of samples of women when we take many samples. The
average height is less variable than individual heights.
Figure 8.10 compares the two distributions: Both are normal and both have the
same mean, but the average height of 25 randomly chosen women is much less
spread out. For example, the 68-95-99.7 rule says that 95% of all averages x lie between 63.5 and 65.5 inches because 2 standard deviations of x make 1 inch. This 2inch span is just one-fifth as wide as the 10-inch span that catches the middle 95%
of heights for individual women.
57
59.5
62
64.5
67
Height (inches)
69.5
72
The Central Limit Theorem says that in large samples the sample mean x is approximately normal. In Figure 8.10, we show a normal curve for x even though sample size 25 is not very large. Is that acceptable? How large a sample is needed for
the Central Limit Theorem to work depends on how far from a normal curve the
model we start with is. The closer to normality we start, the quicker the distribution
of the sample mean becomes normal. In fact, if individual observations follow a normal curve, the sampling distribution of x is exactly normal for any sample size. So
Figure 8.10 is accurate. The Central Limit Theorem is a striking result because as n
gets large it works for any model we may start with, no matter how far from normal. Here is an example that starts very far from normal.
EXAMPLE
16
An American roulette wheel has 38 slots, of which 18 are black, 18 are red, and 2 are
green. The dealer spins the wheel and whirls a small ball in the opposite direction
within the wheel. Gamblers bet on where the ball will come to rest (see Figure 8.11).
One of the simplest wagers chooses red (or black). A bet of $1 on red pays off an additional $1 if the ball lands in a red slot. Otherwise, the player loses his $1. The two
green slots always belong to the house.
FIGURE 8.1 I a
gambler may win or lose
at roulette, but in the
long run the casino
always wins. (Ingram
Publishing/PictureQuest.)
Lou bets on red. He wins if the ball stops in one of the 18 red slots. He loses if
it lands in one of the 20 slots that are black or green. Because casino roulette wheels
are carefully balanced so that all slots are equally likely, the probability model is
Net Outcome for Gambler
Win $1
Lose $1
M = (JDli)+ (-$(
2
= $ = $0.053(a loss of 5.3 cents)
38
The law of large numbers says that the mean / is the average outcome of a very
large number of individual bets. In the long run, gamblers will lose (and the casino
will win) an average of 5.3 cents per bet. We can similarly find the standard deviation for a single $1 bet on red:
rx = V ( 1 - ( - 0 . 0 5 3 ) ) 2 + ( - 1 - ( - 0 . 0 5 3 ) ) 2
= V(l-053) 2 jf
+(-0.947)2-
= V0.9972 = 0.9986
Lou certainly starts far from any normal curve. The probability model for each
bet is discrete, with just two possible outcomes. Yet the Central Limit Theorem says
that the average outcome of many bets follows a normal curve. Lou is a habitual
gambler who places fifty $1 bets on red almost every night. Because we know the
probability model for a bet on red, we can simulate Lou's experience over many
nights at the roulette wheel. The histogram in Figure 8.12 shows Lou's average winnings for 1000 nights. As the Central Limit Theorem says, the distribution looks
normal.
EXAMPLE
17
The normal curve in Figure 8.12 comes from the Central Limit Theorem and the
values of the mean / and standard deviation a in Example 16. It has
mean = / = 0.053
j j j . .
o0.9986
standard deviation = 7= = 7==- = 0.141
Vn
V50
Apply the 99.7 part of the 68-95-99.7 rule: Almost all average nightly winnings
will fall within 3 standard deviations of the mean, that is, between
-0.053 - (3)(0.141) = -0.476
and
-0.053 + (3)(0.141) = 0.370
271
FIGURE 8.12 a
gambler's winnings in a
night of 50 bets on red
or black in roulette vary
from night to night.
Here is the distribution
for 1000 nights. It is
approximately normal.
-0.5
-0.4
-0.3
-0.2
-0.1
0.0
0.1
0.2
0.3
0.4
0.5
Lou's total winnings after 50 bets of $1 each will then almost surely fall between
(50)(0.476) = -23.80
and
(50)(0.370) = 18.50
Lou may win as much as $18.50 or lose as much as $23.80. Some find gambling exciting because the outcome, even after an evening of bets, is uncertain. It is
possible to walk away a winner. It's all a matter of luck.
The casino, however, is in a different position. It doesn't want excitement, just
a steady income.
EXAMPLE
18
The casino bets with all its customers-perhaps 100,000 individual bets on black or
red in a week. The Central Limit Theorem guarantees that the distribution of average customer winnings on 100,000 bets is very close to normal. The mean is still
the mean outcome for one bet, 0.053, a loss of 5.3 cents per dollar bet. The standard deviation is much smaller when we average over 100,000 bets. It is
a
Vn
^ 9 8 ^
V 100,000
Here is what the spread in the average result looks like after 100,000 bets:
Spread = mean 3 standard deviations
= - 0 . 0 5 3 (3)(0.003)
= -0.053 0.009
= - 0 . 0 6 2 to -0.044
Because the casino covers so many bets, the standard deviation of the average
winnings per bet becomes very small. And because the mean is negative, almost all
outcomes will be negative. The gamblers' losses and the casino's winnings are almost certain to average between 4.4 and 6.2 cents for every dollar bet.
The gamblers who collectively place those 100,000 bets will lose money. The
probable range of their losses is
(100,000)(0.062) = - 6 2 0 0 to (100,000)(-0.044) = - 4 4 0 0
The gamblers are almost certain to loseand the casino is almost certain to take
in-between $4400 and $6200 on those 100,000 bets. What's more, the range of average outcomes continues to narrow as more bets are made. That is how a casino
can make a business out of gambling. According to Forbes magazine, the third richest American (with an estimated worth of $28 billion) in 2007 was casino mogul
Sheldon Adelson.
In Chapter 7, we based a confidence interval for a population proportion p on
the fact that the sampling distribution of a sample proportion p is close to normal
for large samples. The Central Limit Theorem applies to means. How can we apply
it to proportions? By seeing that a proportion is really a mean. This is our final example of the Central Limit Theorem. While it is more theoretical than our other examples, it gives us an important foundation.
EXAMPLE
19
The Sampling Distribution of a Proportion
If we can express the sample proportion of successes as a sample mean, we can apply tools we have learned to derive the formula (in Section 7.7) for the standard deviation of the sample proportion.
Consider an SRS of size n from a population that contains proportion p of "having a particular trait." For each of the n individuals, we can define a simple numerical variable Xi to equal 1 for a success and 0 for a failure. For example, if the third
individual has the trait of interest, then X3 = 1. So the sum of all n of the Xj values
is the total number of "successes" (that is, people that had the trait of interest). So
the proportion p of successes is given by
number of successes _ x\ + xi + + x _ _
n
n
So p is really a mean, and so its sampling distribution (by the Central Limit Theorem) is close to normal when the sample size n is large (n > 30).
Because p is the mean of the x we can find the mean and standard deviation
of p from the mean and standard deviation of one observation xt. Each observation
has probability p of being a success, so the probability model for one observation is
Outcome
Probability
Success, xi = 1
Failure, Xj = 0
1 - p
~P)=P
In the same way, after a bit more algebra, the tools of Section 8.5 show that the
standard deviation of one observation Xj is
o- = V ( i - P ? P + { ~p)2{i
-/>) = ^ H ^ J )
From the Central Limit Theorem (Section 8.6), the standard deviation of the mean
of n observations is
, so we simply substitute in our expression for and obtain:
vn
a
\fn
Vp(l-p)
\fn
lp(l-p)
"V
273
274
0-= V(* t -
(p. 266)
l / SKILLS CHECK
1. You read in a book on poker that the probability of
being dealt three of a kind in a five-card poker hand is
1/50. What does this mean?
(a) If you deal thousands of poker hands, the fraction
of them that contain three of a kind will be very close
to 1/50.
(b) If you deal 50 poker hands, exactly one of them
will contain three of a kind.
(c) If you deal 10,000 poker hands, exactly 200 of them
will contain three of a kind.
2. If two coins are flipped and then a die is rolled, the
sample space would have
different outcomes.
Exercises 3 to 5 use this probability model for the blood
type of a randomly chosen person in the United States:
Blood type
Probability
0.45
0.40
0.11
AB
(b)
(c)
0.68.
0.32.
0.16.
275
CHAPTER 8 EXERCISES
Discussion
Challenge