You are on page 1of 112

Business Math and Statistics

Robert N. Holt,Ph.D.,C.P.A.

Copyright 2015
Ivy Software

PAGE

Chapter One - Basic Business Math..........................................................

Chapter Two - Forecasting.........................................................................

24

Chapter Three - Regression Analysis.........................................................

38

Chapter Four - Probability..........................................................................

57

Appendix A - An Introduction to Sampling...........................

75

Appendix B - Binomial Distribution Table............................

81

Appendix C - Normal Distribution Table...............................

87

Chapter Five - Decision Analysis...............................................................

88

Illustrations.................................................................................................

99

TABLE OF CONTENTS

Learning Objectives
Chapter 1- Basic Business Math
After studying this chapter, you should be able to:
Solve an equation for an unknown.
Solve for unknown variables in the base and exponent of a
mathematical expression.
Classify a mathematical expression as a monomial or polynomial.
Transpose an equation and solve for the unknown.
Solve two simultaneous equations for two unknowns.
Determine whether costs are fixed or variable.
Describe and explain contribution margin and break even analysis.
Describe and illustrate Cartesian coordinates.
Use a production possibilities frontier illustration to define trade

CHAPTER 1
BASIC BUSINESS MATH
INTRODUCTION

Whether their jobs are in finance, human resources, marketing or any other function,
businesspeople need sound math skills. This package will teach many essential basic and advanced
quantitative methods to those who need to develop them or others who need to brush them up,
whether they are currently in business, about to enter business school or still in college. After you
read this text and complete the software, you will have the quantitative skills necessary to perform
operations critical to many business functions.

Before you start doing anything, however, it would be a good idea to have a pen or pencil
and plenty of paper available. You probably will also need a calculator to follow the examples in
the text or complete the problems on the computer.
FUNDAMENTAL MATHEMATICAL CONCEPTS
Basic Equations and Transposing Terms

An equation is a mathematical statement in which two expressions are set equal to each
other. Here is a simple example:
15 + 19 = 34

After performing whatever mathematical operations (addition, subtraction, multiplication,
or division) are required on each side of the equation (addition on the left-hand side of the equation
only in this instance), we find that the numbers on both sides of the equal sign are, in fact, equal.
As a matter of fact, you can subtract or multiply or do whatever you want to the numbers on one
side of the equation (except divide them by zero) and, as long as you do the same thing to the
numbers on the other side of the equation as well, the result on both sides will be equal. This idea
comes in very handy, especially were we not to supply one of the numbers in the equation, and to
put in its place an x (or unknown variable, often shortened to unknown) instead:
15 + x = 34

Since we are naturally inquisitive, we would like to find out what x is (often called solving for x).
The easiest way to do this is to isolate x from the rest of the equation, to get it alone on its own side
of the equal sign and get everything else together on the other side. To do this requires a process
called transposition, which means changing a terms sign and moving it to the other side of the
equation. This is accomplished by performing the same mathematical function on both sides of
the equation, which we know we are allowed to do from the paragraph above. The mathematical
function we use depends on how we intend to isolate the unknown. For instance, in our example
above, to isolate x, we would want to eliminate the 15 from the left side of the equation. In order to
do that, we have to subtract 15 from the left side of the equation, which means subtracting it from
the right side as well.
15 + x -15 = 34 - 15


After we do the subtraction, the simplified equation looks like this:
x = 19

We have thus solved for the unknown variable x in the equation. This is called an algebraic
equation because of the fact that we have solved for an unknown. There is a more detailed process
for transposition which we will discuss shortly. First, however, we have to introduce you to
some other things you might run into which tend to make some equations more complex and thus
transposition much more useful.
Coefficients and Exponents

Often an unknown variable presented to us has a number in front of it (such as 5x). This
number is called the variables coefficient and is multiplied with the unknown variable (5x means
multiply 5 by x or add x to itself four times, x + x + x + x + x) when you work out the expression.
Other times, you may have a situation where there is a number or an unknown variable in superscript
above and to the right of another number. An example of this is:
4

6 =?

With expressions like this, the large number is referred to as the base and the smaller number
as the exponent (often called the power). When you see an exponent, you know that the base is
going to be multiplied by itself a certain number of times. The exact number of times the base is
multiplied by itself is the exponent minus one if the exponent is positive. The case of the negative
exponent is covered in the next paragraph. In this example, since the exponent (4) is positive, one
would multiply 6 by itself three times (4 - 1 = 3). Upon doing this, we would see that (6x6x6x6)
or 64 = 1296. When the exponent is one, it means to multiply the base by itself (1-1) = 0 times, so
the value of a base with an exponent of one is just the value of the base itself. When the exponent
is zero for any number, the expression has a value of one.


Sometimes you may encounter a base with a negative exponent. This simply means 1 divided
by that base with a positive exponent, as shown using unknown variables in the example below:
1

-3
x = 3

x

If there is a coefficient or any other variables attached to a variable with a negative exponent,
they will appear in the numerator as long as they themselves do not have negative exponents.
Example:

7y

-3 8

7x y =

Solving for Unknown Variables in the Base and Exponent



To solve for an unknown variable in the base, you will have to move the exponent to the
other side of the equation. For example, say we were given this equation:
x5 = 32

To separate an unknown variable from its exponent, we have to multiply that exponent by its
inverse, or the number which multiplied by the exponent equals one. In this case, the inverse would
be 1/5 (1/5 x 5 = 1). If we do this to the side of the equation that contains the unknown variable, we
have to do it to the other side as well, so the 32 in the right side of the equation (which can also be
denoted as 321 ) will have 1/5 as its exponent. Having 1/5 as an exponent is equivalent to having
to find the 5th root of the base, which means that you are trying to find what number multiplied by
itself four times equals the base. The necessary steps to solve for x would look like this:

(5 x 15 )

(1 x 51 )
= 32

1
x = 32

x=

x =2

(1 x 15 )

32


You may find instances where there is an unknown variable in the exponent rather than the
base. When the unknown variable is in the exponent, the procedure gets somewhat complicated.
Usually, the easiest way to solve for this is by using a calculator or a computer. While we will go
over an extended set of calculator functions later in this chapter, here is an example of the hand
calculations necessary to solve for an expression with an unknown as the exponent:
x

2 = 16

(x x 1x )

= 16

2 = 16

( 1x )

(1 x 1x )
=

2 =

16

2 =

16

16

x=4

Monomials and Polynomials


A monomial is an expression that contains only one term (such as 7x2), which often may
contain an unknown variable with both a coefficient and an exponent, although a term can be a
simple number like 7 or 9 as well. A polynomial is an expression that has more than one term, all
or some of which are unknown variables with non-zero coefficients and exponents. Binomials and
trinomials are examples of polynomials, because they both have more than one term: binomials
have two terms (such as 7x4 + 10x2), and trinomials have three (such as 7x4 + 10x2 + 8). Note that
in all of these examples, all like terms have been consolidated; the expression 7x2 + 4x2 + 5 is not
a trinomial but a binomial, because it can be consolidated as 11x2 + 5.

There are also some rules regarding adding, subtracting, multiplying and dividing two or
more monomials when they are in the form of unknown variables with coefficients and/or exponents
attached to them. You can add and subtract two or more unknown variables, as long as they are
the same variable and have the same exponent, by adding or subtracting the coefficients just as you
would with simple numbers (i.e. you could combine 5x + 9x to equal 14x, but you could not add 5x
+ 9y, nor could you add 5x2 + 9x without knowing what x was).

For multiplication and division, things get somewhat more complicated. When you divide
one monomial by another, it does not matter whether the coefficients or powers are the same,
but the unknown variable should be in order to simplify it. When you multiply two monomials,
however, it does not matter if the coefficients, powers or even the unknown variables are the same.
You can multiply two monomials no matter what their unknown variables and exponents are.


Coefficients get multiplied the same way they would be if they were just numbers separate
from the unknown variables. When the same unknown variables are multiplied together, exponents
of those unknown variables actually get added together.
Example:
3x4 x 4x5 = 12x9
9

5x7 x 9x = 45x16
When multiplication of different unknown variables is performed, the coefficients will get multiplied
normally just like simple numbers even though they are attached to different variables. The
unknown variables and the exponents assigned to each of them stay independent when multiplying
them together.
Example:
3x8 x 9y5 = 27x8y5

Division of monomials is achieved in a similar fashion. Coefficients of different variables
are divided just as they would be if they were simple numbers. When you have the same unknown
variable in the numerator as in the denominator, the exponent of the variable in the denominator is
subtracted from the exponent of the same variable in the numerator.
Example:
13
30x
9
3x
=
4
10x

When you have different unknown variables in the numerator and the denominator, neither
the unknown variables nor their exponents can be divided by each other, but the coefficients can
be.
Example:

4
4
16x
4x
= 2
y
4y 2

Process of Transposing Equations



In the example of transposition on page 2, the 15 on the left side was eliminated by subtracting
15 from each side. The following is the process you would use to transpose a term using a slightly
more complex example than before, containing both coefficients and exponents:

Process Example
1. Put like terms on the same side of the equation:

4x2 - 16 = 28 - 7x2
4x2 + 7x2 = 28 + 16
2. Combine those terms: 11x2 = 44
3. Divide by the coefficient of the unknown variable:


11x 2 44
=
11
11

2
x = 4
4. Solve for x:






(2 x 1 ) ( 1 )
2
2

=4
x

x=











x =
+2


Please note how there is more than one solution for x. This is usually the case when you are
solving an equation where there is only one unknown and it has an even-numbered exponent, such
as in the example above. Transposing equations makes solving for the unknown much simpler
than when you have that unknown on both sides of an equation. This is particularly true as the
expressions become more complicated with coefficients and exponents.

Simultaneous Equations

Often you may have two or more variables to solve given two or more equations with which
to solve them (you cannot solve for two variables with less than two equations). This requires
solving for the unknowns in all of the equations, which are referred to as simultaneous equations.
Here is an example:
3x = 14 - 5y
7x + 2y = 23
To solve for y in terms of x in this equation, we follow this procedure:
Process Example
1. Arrange the equation so that like terms are



3x + 5y = 14 - 5y + 5y
in the same column. 7x + 2y = 23


2. Multiply both sides of one equation so that the


coefficients of one of the unknowns will have



the same absolute value (this means the + or - sign
is irrelevent) as the coefficient of the same unknowns
in all of the other equations (sometimes you have


to multiply each equation by something in order to
achieve this, such as in this example).

2 x (3x + 5y = 14), or
6x + 10y = 28
5 x (7x + 2y = 23), or
35x + 10y = 115

3. To eliminate the unknown whose coefficients have




35x + 10y = 115
the same absolute value: -(6x+10y = 28)

a. Add them together if their signs are unlike.
29x = 87

b. Subtract one from the other if they have the




same sign.
4. Divide both sides by the coefficient of the
29x 87
=
other unknown to find its value:
29 29

x = 3

5. Find the value of the other unknown by putting


3x = 14 - 5y
the value of the variable you now know into the



3(3) = 14 - 5y
equations, and follow the process of transposition


9 = 14 - 5y
discussed earlier, if necessary:





-5 = -5y
y = 1

6. Check the common solution in each of the other original

7x + 2y = 23
equations:








7(3) + 2(1) = 23











23 =
23
Thats it! You can now solve for two variables with two or more equations.

OTHER IMPORTANT MATHEMATICAL SKILLS


Rounding Rules

Often, when you solve for a variable or perform a mathematical procedure, the number you
come up with may be long and unwieldy, with many numbers to the right of the decimal point.
Sometimes you can express this as a fraction, but other times, such as on a spreadsheet, a certain
number of decimal places is necessary. For example:
7x = 23

7x 23
=
7
7

x = 3.285714286

Since in many cases it is unnecessary to include the entire number, rounding is used. How
many decimal points the user rounds to is dependent on the good judgment of that person, but rules
govern how the number itself is rounded. Say, for example, that the person determining the number
above wants to round to the third decimal place (to the thousandth). Thus:
x = 3.286

One rounds up or down depending on how close the number is to the whole digit in the
decimal place to which the user is rounding. Since in this example, the number was closer to
3.286 than 3.285, the number rounded to was 3.286. When a whole number or a decimal is exactly
halfway between two possible answers, round it off to the larger. Thus, if x had equalled 3.2855 in
the above example, you would round to 3.286, but if it had equalled 3.28549, you would round to
3.285.
Parentheses, Brackets and Braces

If you saw the expression 5 x 8 + 19 and did not have any indication in what order to
perform the functions, you might come up with one of the two very different answers of 59 (if
you multiplied the 5 and the 8 first and then added the 19), or 135 (if you added the 8 and 19 first
and then multiplied the sum by 5). Without any guidelines as to which operation, addition or
multiplication, to perform first, you have to guess which one should be done first. You often need
a way to indicate to your readers in which order to perform mathematical operations when there
are two or more in your expressions. Using parentheses to identify those operations that should be
done first is the easiest way to achieve clarity and understanding with your readers. For example,
if the writer had simply written the expression (5 x 8) + 19, it would cut down on the guessing, and
the reader would know the answer is 59.

Sometimes as expressions become more complicated, there will be operations within
operations that need to be separated. If you used parentheses exclusively, things might get
confusing very quickly. Instead, there is a hierarchy of notations that can be used. The
order in which they should be used is this: parentheses ( ) around those operations that

should be performed initially, then brackets [ ] around subsequent operations that include
parentheses, and then braces { } around operations which include brackets. For example:
{[7 x (10 + 8)] : (8 - 2)} = 21

Thus, what would be a very formidable expression to try to decipher without the use of
parentheses, brackets and braces is made much simpler with them. For very long expressions, you
can repeat this order again if necessary, using parentheses around the last set of braces, followed by
brackets, and so on.
Calculator Functions
A calculator is an electronic device which performs mathematical functions and makes life
much easier. Most come with instruction manuals, but we will describe how to perform several
basic mathematical functions on your handheld calculator.
Objective Process
Adding Two Numbers


(More than two numbers can be added

together by repeating the middle two steps
for every number that you wish to add,

then pressing the {=} key.)

Enter first number


Press {+} key
Enter number to add to it
Press {=} key

Subtracting One Number From Another Enter number to


(Two or more numbers can be subtracted from


subtract from
one by repeating the middle two steps for every


Press {-} key
number that you wish to subtract, then pressing


Enter number to
the {=} key.)





subtract








Press {=} key

Multiplying Two Numbers Enter number
(More than two numbers can be multiplied


Press {* or x} key
together by repeating the middle two steps
Enter number to
for every other number that you wish to



multiply by
multiply by, then pressing the {=} key.)



Press {=} key
Dividing Two Numbers Enter number to
(The same number can be further divided


divide
by repeating the middle two steps for every


Press {/ or :} key
other number by which you wish to divide it,


Enter number to
then pressing the {=} key.)




divide by








Press {=} key
Squaring a Number Enter number








Press {x2}key








Press {=} key

Raisng a Number to a Power


Enter number
Press {yx} key
Enter exponent
Press {=} key
Finding a Square Root of a Number

Press { } or {x1/2} key
Enter number








Press {=} key

Find the nth Root of a Number
Enter n
x








Press { } or {y1/x} key
Enter number








Press {=} key
Finding the Inverse of a Number







Enter number
1
Press {x-1} or { }
x key

Press {=} key

FUNCTIONAL BUSINESS MATH



Right about now, you are probably saying, Well, Ive learned all this stuff, but where has
it gotten me? What are the practical applications? How is this going to help me be a better
manager? Though currently you may doubt it, the algebra and other skills you have learned so far
are necessities for many different functional areas of business. Lets look at a few applications of
these principles in business in order to see the truth of this statement.
Fixed and Variable Costs

A central tenet to many business fields, among them managerial accounting, operations and
marketing, is the idea of fixed and variable costs. Quite simply, a variable cost is one that is
incurred by a company depending on how much is produced. For example, rubber is a variable cost
in a tire factory; if the factory does not produce any tires, it will not need to buy any rubber. The
total cost of the rubber varies with the level of production, whereas the variable cost for each unit
is unchanged. Often certain labor, such as the wages of workers on an assembly line, is considered
a variable cost when the need for that labor is dependent on whether a product is built or not; this
is called direct labor and is assumed to only be needed and used when a product is built. On the
other hand, labor that is not dependent on a product being built is referred to as indirect labor.
Examples of indirect labor include the salaries of the accounting staff: their job is not linked to
the production of any specific product in a factory, but to all of them. For this reason, accounting
staff salary and other indirect labor is referred to as a fixed cost. A fixed cost is one that will be
incurred regardless of the level of production. Examples include rent on a factory and the cost of
new equipment. Regardless of whether the factory produces one unit or a million, the fixed costs
will still be incurred. Collectively, a companys fixed and variable costs make up its total costs.

Managers like knowing how much things are going to cost before incurring those
costs and having to dip into the companys coffers in order to pay for them. The problem is

10

that often total costs will change as the number of units produced changes. The reason for this is
that the variable costs change as the production level increases or decreases. However, when a
manager knows what both the variable costs per unit and the fixed costs will be, he or she can set
up a simple algebraic equation with one unknown variable and substitute different unit amounts of
production into that variable to find what the total costs will be for any production level. Putting
those words into an equation form, that is:
Total Costs = (Variable Costs Per Unit x Units of Production) + Fixed Costs

Lets say the manager finds out that the fixed costs for a certain project will be $1 million and
the variable cost per unit is $5. He or she can set up an equation like this:
Total Costs = ($5 x x) + $1,000,000
where x is the unknown for which the manager can substitute different unit levels of production.
Now suppose this manager wants to find out what the total costs will be for 100,000 units. To do
this, he or she substitutes 100,000 for the x in the equation:
Total Costs = ($5 x 100,000) + $1,000,000 = $1,500,000
The total costs for the project will be $1,500,000.

This may seem relatively easy, but it gets somewhat more complicated when all you have is
a sheet with the costs and have to determine which are fixed and which are variable. Lets look at
a situation where this is true.

Example: Tubetime Industries builds televisions sold in the American market. Management
is thinking about creating a new television that will fit on a watch. The president of Tubetime has to
meet with the board tomorrow morning and introduce the proposal to fund this project. She needs
to know how much it is going to cost so that she can ask the board for the right amount of money.
The production manager gives her the following costs:




Material




$75/television



Direct Labor



$40/television



New Manufacturing Equipment
$1,500,000



Modifications to Factory

$3,000,000

11


If Tubetime is considering building 10,000 of the televisions this year (and needs money for
the entire year allotted to it now for this purpose), for how much should the president ask?

To set up this problem, you have to figure out which are the variable costs. Remember that
our definition of variable costs stated that these are costs which vary by units of production, and
since we are producing televisions, we want to separate out those costs which are charged per
television. Material and Direct Labor are the two variable costs given here. The other two costs
must be fixed costs, because they do not vary by level of production. Substituting in the units of
production (10,000) for each of the variable costs, we can determine the total costs:


Material = $75/television x 10,000 televisions = $ 750,000

Direct Labor = $40/television x 10,000 televisions = 400,000

New Manufacturing Equipment = 1,500,000

Modifications to Factory = 3,000,000

Total Costs $ 5,650,000
The president should ask for $5,650,000. Below are the collective fixed and variable costs.
Variable Costs = Material + Direct Labor (costs that vary by level of production) = $1,150,000
Fixed Costs = New Manufacturing Equipment + Modifications to Factory = $4,500,000


Generally speaking, as units of production increase, the collective fixed costs stay the same
and the collective variable costs increase. On a unit basis, as number of production units increase,
fixed costs decrease (since the same costs are now allocated to a larger number of units) and variable
costs stay the same. There may be some exceptions to this, since unit variable costs may decrease
somewhat with greater volumes due to material purchasing discounts, and total fixed costs may rise
as volume increases because of the need for increased floor space or new machinery, but in general
this is a helpful statement to remember.
The Concept of Contribution Margin

Just as managers want to know how much something will cost, they also would
like to know how much money the company will make after subtracting the variable costs
(sometimes called the cost to produce) from each unit it sells. This is so because managers
need to pay off the projects and eventually the companys fixed costs, whether it be the
salaries of employees in the finance department (indirect labor), the new factory they built
to produce the product or the new advertising campaign designed to increase its sales. Each
unit sold hopefully makes a contribution towards the companys fixed costs (except with
a distressed product or company, when the company may sell a product at less than it costs to

12

produce). The contribution margin is defined as the amount the company receives from the sale
of its product minus the products variable costs, and thus how much can be contributed to fixed
costs.

For example, say a manufacturer of oak desks can produce its basic model for $125 (including
materials, labor, and all other variable costs), and it can sell the model to a chain of office furniture
stores for $175. The contribution margin would be:
$175 - $125 = $50

Thus, for every basic desk they sell, the company makes a $50 contribution towards its fixed
costs. Please note that the contribution margin is the amount the company receives, not necessarily
the amount for which it is sold to the end consumer. The office furniture store might sell the
desk to the consumer for $250, but the amount used to figure out the contribution margin is $175,
how much the company receives from its sale of the product to the office furniture store. This is
especially true when there are many intermediaries between the producer and the end user, and
when a company has several distribution channels through which to sell its product.
The Concept of Breakeven

Breakeven is the volume (in units or dollars) of sales needed to cover fixed costs after the
variable costs have been subtracted. Exactly what those fixed costs are may vary from company
to company, product to product. More often than not, a product must not only make a contribution
to those fixed costs which are directly involved in its production (such as machinery purchased
specifically to produce that product), but an allocation of costs relating to the rest of the company,
which may seem largely unrelated, i.e. administrative cost at headquarters, salaries of the officers
of the firm, etc. Otherwise, the product will be dropped. Contribution margin per unit is used to
determine breakeven level, the formula for which is shown below.
Breakeven Level =

Fixed Costs
Contribution Margin Per Unit


Lets examine the use of breakeven with a simple example. Lets suppose a company is
making a product and has invested $100,000 on a slick advertising campaign and $75,000 each on
five slick salespeople (annual salary, including benefits and support). These are the only fixed costs
that the company has assigned to the product, and the contribution margin per unit is $50. To cover
those fixed costs, the company would like to determine how many units it would have to sell.
[$100,000 + (5 x $75,000)] = $475,000 (total fixed costs)

$475,000
$50

= 9,500 (breakeven level in units)

13


The company would therefore have to sell 9,500 units to break even or cover its fixed
costs.

Typically, the concept of breakeven is used in an incremental sense (i.e. how many additional
units would an advertising campaign have to sell above what would normally sell in order to be
worth spending the companys money). In the example above, if this company did not have a
salesforce before and was not considering using an advertising campaign, 9,500 units would be the
number of additional sales necessary for this investment to be worthwhile.
GRAPHING

In the previous sections, you have learned a great deal about equations, monomials and
polynomials and how to apply them to real business problems. You found out what total costs and
breakeven levels were for a certain level of production. In the real world, however, you may want
to examine the costs associated with many production levels, and you might wonder if there is a
better way to do this than working out the same equation multiple times. In fact, there is. You can
make your work much simpler by building a graph and plotting a line or curve on that graph to
represent an equation. Depending on how precisely the line and the graph are drawn, you can then
estimate to some degree of accuracy at what level one parameter will be for a given level of another.
For example, you could see how total costs would vary with changes in the level of production by
plotting the equation of the line on a graph and then finding the point on the line which corresponds
with the total cost for a given level of production. Graphing is also an easy way to communicate
plans, ideas and historical trends to others in your office or elsewhere.
Basics of Graphing

What we will be using in the following few pages are technically called Cartesian Coordinate
Systems, but for simplicity we will call them graphs. Physically, a graph generally represents
two perpendicular lines, with numbers marked alongside each of those lines, such as in the example
on the next page.

14

Figure 1-1
A Basic Graph Form
y

II

2
1

x
-3

-2

-1

-1

III

-2

IV

-3


The main horizontal and vertical lines are called axes (the singular form is axis) and may or
may not have arrows on the ends. Each axis is identified by letter, as identified above: the horizontal
axis is the x-axis, the vertical axis is the y-axis. To the right of the y-axis, the x coordinates are
positive; to the left, negative. Above the x-axis, the y coordinates are positive; below it, they are
negative. The point where the axes intersect is called the origin; there, both x and y coordinates are
zero. The axes divide the graph into four sections or quadrants, numbered for easy identification
by the Roman numerals you see above: Quadrant I, Quadrant II, etc. You are not obliged to put
either the axis letters nor the quadrant numbers in your graph each time you make one. However, it
is helpful to your reader if you include the hash marks with numbers below or to the side of them;
these give the reader some sense of proportion and allow the user to quickly find a specific location.
When all of your data falls in one quadrant, you can just graph that one quadrant using an Lshaped graph, as shown on the next page.

15

Figure 1-2
An L-Shaped Graph Form
y

Plotting a Line

Now that we know what the form of a graph looks like, we should understand what we have
to put on it in order to make it useful to us. The basic unit of all graphing is the point. The point is
a location in space which can be defined by a set of coordinates: the coordinates are the numbers
on the axes which correspond to the point. In a two-dimensional graph, which are identified by
having two axes such as those above, a point is a location on a plane, or two-dimensional surface,
which can be defined by two coordinates: (x,y). The x coordinate of a point is called the abscissa;
the coordinate tells its straight line distance from the y, or vertical, axis, and on which side of that
axis it sits. The y coordinate of a point is called the ordinate; the number that is the coordinate
tells how many units away it is from the x, or horizontal, axis, and on which side of that axis it sits.
Using this information, the user of a two-dimensional graph can find in what quadrant a point sits
and then find or define that point.
A line is a series of points adjacent to one another. If you know you are graphing a line, you
do not have to graph all of the points on that line to get a line. You can estimate what a line will
look like just by finding two points far enough apart, plotting them on a graph, connecting them
with a straight line and then continuing that line as far as you would like beyond those two points.
You can determine two points using a linear equation, an equation whose graph is a straight line.
The basic form of a linear equation is given below:

y = a + bx

16


Several texts show this equation as y = mx + b; we prefer y = a + bx. An equation is linear
as long as the exponent of x is 1 (which is usually denoted simply by x). The x is often referred
to as the independent variable (a term you will see later in the regression analysis chapter), the y
as the dependent variable. The b is called the coefficient of x just like it was before, and its value
determines the slope of the line, or the lines vertical rise divided by its horizontal run. The slope
can be denoted as:

y
x

b=

where means the change in the variable that follows between two points. While positive
slopes (the slopes of positive coefficients) curve upward, left to right, negative slopes (the slopes
of negative coefficients) curve downward, left to right. The farther away from zero the coefficient
is, the steeper the slope. A line with a slope of zero or no slope is horizontal. The value given to
a, the constant, shows the location at which the line that is graphed from this equation crosses the
y-axis (where x = 0). It is called the y-intercept. If the constant is negative (such as y = - 3 + 5x),
then it is a negative intercept, and is below the x-axis. A linear equation has only one coefficient,
and only one y-intercept. If an equation has an x with an exponent of any other number than one, it
is not a linear equation, although a may be represented occasionally as ax0, which is the same thing
as saying a, as for example 9x0, which equals 9.

Once you are given a linear equation, you can graph its line just by substituting in different
values for x and through that generating different values of y. We can use a fixed/variable cost
example from our functional business math section in order to demonstrate this.
Example:

The fixed costs for a certain project are $10 (it is a very inexpensive project) and the variable
costs are $2 per unit. We want to build an equation and then graph the line. We can easily determine
the equation to be:

y = 10 + 2x
where y is the total costs, and
x is the units of production.

Now, in order to build our graph, we have to compile the information that we already know.
We know that the slope is 2 (the value of the coefficient, b), and that the y-intercept is 10 (the value
of a). At the y-intercept, x = 0, so we already know one of our two points (0,10). We can find
another point just by picking an arbitrary x far enough away from the y-intercept (0,10) to draw a
reasonably distinguishable line. When x = 20, for example, y = 50, so we could graph (20,50) and
connect it to (0,10), as below, then extend the line as far upwards as we wanted.

17

Figure 1-3
Graph of Total Costs vs Units of Production
y

T
o
t
a
l
C
o
s
t
s

70
60

------------(20,50)

50
40
30
20
10

x
10

20

30

40

50

60

70

Units of Production

Now we can estimate a y (Total Costs) for any x (Units of Production) by finding an x on the line
and then seeing what point that corresponds to on the y-axis.
Solving for Simultaneous Equations with Graphs

Just as you can find any point on a line after graphing it with only two points, graphs also
allow you to quickly find the point where two equations are equal, perhaps even more quickly than
using the method previously supplied to solve for simultaneous equations, by graphing the two
lines together and noting the point where they intersect.


To illustrate, suppose we were given two equations and asked to find the point at which they
are equal (i.e. where the xs of both lines are equal and the ys are equal as well). The two equations
are:

y = 11 + 2x
y = 16 + x

18

Putting these two equations together on the same graph, they look like this:
Figure 1-4
Graph of Two Equations
y

----------point of
intersection
(5,21)

20

y=16+x------y=11+2x--------

15
10
5

-20

-15

-10

-5

10

15

20

-5
-10
-15
-20


You can see that the point at which the two lines intersect is (5,21). You therefore know that
these two equations are equal when x = 5 and y = 21. Graphing is helpful when it is either difficult
to solve for simultaneous equations or you do not like solving for them.

So far in this chapter, you have learned to set up a problem as an equation and then graph
it. Now we will examine what happens as that equation becomes more complex and exponents are
added to the variables.

19

Quadratic Equations

Previously, we said that a polynomial is an expression which contains two or more terms
(like 5x2 + 9). A quadratic equation is an equation containing a polynomial on one of its sides that
has one or more terms containing the unknown variable x, where one of those xs has an exponent
of two and none of the other xs has an exponent above two or below zero. The polynomial is set
equal to a number or variable on the other side of the equation. If this description confuses you, a
quadratic equation can be denoted this way:

where x and y are variables,


a and b are coefficients, and
c is the constant.

y = ax2 + bx + c

While b and c can be any number, including zero, fractions, and negative numbers, a can
be any number but zero. When you are given a quadratic equation, often it will be in a form where
you can simplify it. For example, y = 3x2 + 18x + 3 can be simplified to
y = 3(x2 + 6x + 1), and then the whole equation can be simplified by dividing both sides by 3. If
you have fractions in the coefficients or constants, you can multiply both sides of the equation
y = 1 x 2+ 1 x + 5
to simplify them. For example, can
be simplified by multiplying both sides of the
22 4
equation by 4, which results in 4y = 2x + x + 20.

While a quadratic equation can only contain xs up to the exponent of two, a cubic equation
(such as y = 4x3 + 3x2 - 8x - 7) contains an x with an exponent of three and no xs with exponents
below zero, which can be denoted as:

y = ax3 + bx2 + cx + d


Though we will preoccupy ourselves mainly with quadratic equations in this chapter, it is
important to know that these other sorts of equations exist as well and can be graphed.

While we spent some time in the past section discussing y-intercepts in graphs of linear
equations, in the graph of a quadratic equation of the form y = ax2 + bx + c sometimes you may
have two x-intercepts (where the graph crosses the x-axis) and sometimes you may have none at
all. The reason for this is that an x in a polynomial can take on several values and still result in the
same y: for instance, in the equation y = 5x2 - 3, x can take on a value of 2 or -2 and y will still equal
17. Thus, you have two xs for every y in an equation of the form y = ax2 + bx + c, and the result is
a curve called a parabola. A parabola is symmetric around an invisible line that cuts it in half; this
is called its axis of symmetry. The point at which the axis of symmetry intersects the parabola is
called its vertex.

Lets look at a basic example. The equation y = 4 + 2x2 is a quadratic equation even though
it may not look like it: it follows the form for quadratic equations given earlier, except in this case
the coefficient b is zero, so bx completely disappears. The graph of this equation is the upper one
in Figure 1-5:

20

Figure 1-5
Graph of a Quadratic Equation I


The steeper the coefficient of the x variable with the exponent of two, the steeper the curve.
A negative coefficient indicates a negative sloping curve. For example, the graph of y = 4 - 2x2 is
the mirror image of the graph above, sloping downward towards the negative side of the y-axis. In
the upward-sloping parabola, the graph of the equation has one y-intercept, but no x-intercepts. Its
axis of symmetry is also the y-axis; this is not always the case with quadratic equations. A parabola
has a maximum or a minimum point: when it is sloping upward or downward, this point is the
vertex. In the upper parabola in Figure 1-5 above, it has a minimum (0,4). In the graph of y = 4 2x2, the downward-sloping parabola in Figure 1-5, the vertex of (0,4) would be a maximum because
the parabola is sloping downward.

When you are given a quadratic equation, you must find the vertex and two points equidistant
from that vertex in order to graph it. An easy way to start is to see if you have any points where the
graph of the equation will cross the x or y axes, or in other words, any x or y intercepts. For example,
suppose we wanted to graph the equation y = x2 - 6x + 8. To see if we had any intercepts, we would
set x equal to zero; when we do this, we see that y equals 8. Thus, we know the y-intercept is (0,8).
To find any x-intercepts, we would set y equal to zero.

The easiest way to solve for the x-intercepts is to divide the polynomial into two or
more expressions which are multiplied by each other. When one of these expressions has
a value of zero, the whole polynomial has an expression of zero (since the product of two
numbers multiplied together when one of those numbers is zero is also zero), so y is zero as
well. We can break x2 - 6x + 8 into (x - 2) x (x - 4), set both expressions equal to zero, and
solve for x. When one of these expressions is equal to zero, the whole polynomial is equal

21

to zero. Thus, when x equals either 2 or 4, y = 0. The x-intercepts are thus (2,0) and (4,0), and that
gives us three points already for our graph. We still have to find the vertex, however.

Because we know that a parabola is symmetric around its axis of symmetry, with quadratic
equations in the form y = ax2 + bx + c or x = ay2 + by + c, we know that any time we can find two
intercepts, they are equidistant from that axis. If our intercepts are the points (2,0) and (4,0), we
know that our axis must run midway between these two points, through (3,0). Then, if we substitute
the 3 for the x in our equation, we can find the only point of the parabola that actually lies on that
axis, the vertex.

y = (3)2 - (6 x 3) + 8 = -1

Accordingly, the vertex of the parabola is located at (3,-1). Now, with four points, we can
graph the equation.
Figure 1-6
Graph of a Quadratic Equation II

y-intercept:
(0,8)

8
7
6
5
4
3
2
1

-5

-4

-3

-2

-1

-1

x-intercept:
(2,0)

vertex:
(3,-1)

22

x-intercept:
(4,0)


The process of graphing a quadratic equation often means finding more than two points,
usually at least three, and then fitting a curve between those points and estimating what the rest
of the curve will look like from that. This can be considerably more difficult than drawing a line
between two points. Some computer programs and calculators are able to draw curves quite quickly
and accurately. If doing it by hand, however, notice what the trend of the line is, and try a few
points farther away to see whether it continues or not; this is especially true for complex curves,
which may change their steepness and direction several times. Most importantly, when you draw
a line or a curve, you would like it to be as close as possible to as many points as possible through
which it is supposed to pass.
CONCLUSION

This chapter has been an introduction to the concepts of basic business math which you will
need to know both to perform functions in business and to understand much of the other concepts
throughout the rest of this package. However, only those concepts that have been determined to be
most useful have been presented. In no way can this be considered a comprehensive look at these
subjects. For a more thorough review, please see a text devoted to algebra and either geometry
or graphing. The material given here, however, will provide a good background for the other
mathematical techniques we will be teaching throughout this text.

23

Learning Objectives
Chapter 2- Forecasting
After studying this chapter you should be able to:
Describe and explain time series analysis.
Compute a simple average
Compute a moving average.
Compute a weighted average.
Compute a weighted moving average.
Calculate the slope and intercept of a trend line.

CHAPTER 2
FORECASTING
INTRODUCTION

The major stumbling block in any decision-making process is uncertainty -- lack of knowledge
over what the future will bring. We often speculate on the decisions we would have made if only
we knew then what we know now. Hindsight is perfect according to a popular clich, but what
about foresight? Foresight may be less than perfect, but it need not be completely worthless. There
are many techniques available to decision makers that will help them deal with uncertainty and
improve their predictions about the future. We will discuss some of the more popular methods in
this chapter.

Methods of predicting the future are called forecasting in business and government. Taken
literally, the term forecasting means to cast forward -- to look into the future. Quantitative
methods of forecasting are used when historical data are (or can be expressed) in a numerical
form and either follow detectable patterns or the causes of their scatter are known and can be
mathematically derived. Annual earnings, receivables, and students test scores all may be forecast
using quantitative methods. This chapter and the following one deal with these methods of
forecasting.

Other data cannot be expressed numerically, or have no discernible pattern and their scatter
cannot be explained; these must be forecast by qualitative methods. Examples include public
perception of a change in corporate strategy or the effect of a product-tampering incident on later
sales of that product. Qualitative methods of forecasting are beyond the scope of this text.
TIME SERIES ANALYSIS

Time series refers to data collected by periods of time -- days, weeks, months, years, and so
on. For example, Table 2-1 shows the number of customers per week at a fast-food restaurant.

24

Table 2-1
Customers at Burger City

Week

Number

1
2
3
4
5
6
7
8
9
10

3208
3067
3165
3025
3154
3033
2988
2972
3196
3041


A forecast using a time series attempts to use past data to project the time sequence into
the future. There are two distinct approaches to forecasting with time series: averaging and trend
analysis. Averaging is appropriate when data values are relatively constant over time, which is
reflected by a flat, horizontal pattern when the data points are graphed. Figure 2-1 shows the
Burger City data plotted against time. (In plotting time series data, it is conventional to use the
horizontal axis for time and the vertical axis for the variable you want to predict, like number of
customers). The pattern of the Burger City data suggests that using an averaging approach for the
time series would be sufficient for a good forecast.
Figure 2-1
Customers at Burger City
3250
3200
3150
Customers 3100
3050
3000
2950

6
Week

25

10


If the data show a steady increase or decrease over time, it is more appropriate to forecast
with trend analysis, detailed later in this chapter. Table 2-2 gives the annual earnings of the Alpha
Company between the years 2012 and 2015. Figure 2-2 shows the steady upward trend of Alpha
Co.s earnings over this period. The pattern of the Alpha Co. data argues for a trend analysis.
Table 2-2
Alpha Co. Earnings, 2012-2015

$150
$125
Earnings $100
(Millions) $75
$50
$25
$0

Year

Earnings)Millions

2012
2013
2014
2015

$62
$79
$93
$117

Figure 2-2
Alpha Co.s Earnings, 2012-2015

2012 2013 2014 2015


Year

Types of Averages


Going back to Figure 2-1, if the
number of customers during Week 11,
the previous weeks for which data are
term average throughout this chapter,
term arithmetic mean, since average

manager of Burger City wishes to forecast the


she might first consider using the average of
available (for simplicitys sake, we will use the
though you should get accustomed to using the
is used in a different context in many statistics

26

texts). There are several variations of averages with which we can make our forecast: besides the
familiar simple average, there is the weighted average, the moving average, and combinations of
these latter two. The different averaging techniques are defined over the next several pages for the
Burger City example.
Simple Average

The simple average is found by adding the values of all the data samples and dividing the
sum by the number of samples. A formula to represent this process looks like this:

Y where is the simple average, Y = n


represents the process of addition or summation (adding up everything immediately to the
right of this sign, in this case all the Y samples you have),

Y is the value of each sample of the variable (like number of weekly customers) you want to
average, and

n is the number of samples of the variable to be averaged.
(10 + 16 + 19)

= 45


For example, the simple average of 10, 16, and 19 is or
15. In the Burger
3
3
City example, there are 10 weeks of data and thus 10 samples of the variable (n = 10). The simple
average forecast for Week 11 is given by:

3208 + 3067 + 3165 + 3025 + 3154 + 3033 + 2988 + 2972 + 3196 + 3041


Y=
10

30849
3085
= 3084.9
=
10

Thus, the simple average tells us that 3085 customers on average have visited Burger City
over each of the past ten weeks. Using it as a forecasting method, the manager would predict
that 3085 customers will patronize Burger City during Week 11 and will make plans to serve that
number.

Advantages and Disadvantages: The simple average considers all data samples equally. It
is easy to understand and to apply. It is a good method to use when data samples are recent and do
not vary greatly, when little is known about the trend of that data and when accuracy is not a critical
factor. It is not, however, a very sophisticated technique and should not be used when additional
information suggests a different approach or when the cost of a faulty forecast is high.

27

Moving Average


As mentioned above, the simple average is most accurate for recent data which exhibits
little variation from period to period. But if the data extend very far back in time or are subject to
frequent, unpredictable change over several periods, it may be more appropriate to use a moving
average, which incorporates only the most recent data samples. For example, if we had several
hundred weeks of data we might choose to use only the last 10 or 15. If we had only yearly data,
we might restrict our forecast solely to the last five years and, for a very trendy or seasonal variable
such as hula hoop sales, we might use data only from the past three or four months.

The concept of moving is introduced by adding the data from the most recent period and
dropping the data from the oldest period (in your last forecast) for each new forecast. A 3-month
moving average of microcomputer sales in a discount store is illustrated in Table 2-3.
Table 2-3
Microcomputer Sales



The mathematical formula for the moving average is identical to that for the simple average
except that, while a simple average would consider all the data and n (the number of samples)
would be constantly increasing as time progressed and more data became available, a moving
average only considers a certain number of data samples, so n is fixed and only the last n items are
considered.

Advantages and Disadvantages: Because the moving average is merely a special case
of the simple average in which only the more recent data are considered for the forecast, it has
the same advantages and disadvantages as the simple average. However, it does incorporate one
improvement over the simple average: very old data (where very old can be defined by the
forecaster) are not considered and the forecast is generally more up-to-date than a simple average
of the same data would be.
Weighted Average


In computing an average, the forecaster faces a dilemma in selecting the number of data
samples to use. As a general rule, the more samples considered, the more representative is the

28

average. But a large number of samples invariably contain some old data that bear little relationship
to the forecasted period. The weighted average gives the forecaster an opportunity both to include as
many samples as desired and to control the contribution of less recent data by assigning a weight
of measure of importance to each sample. In most instances, the weights decrease with the age of
the data, although this need not always be the case. The last five periods of the Burger City example
are shown in Table 2-4 with weights that decrease gradually with age.
Table 2-4
Weighted Customer Data for Burger City
Week
6
7
8
9
10

Number
3033
2988
2972
3196
3041

Weight
.10
.15
.20
.25
.30

The formula for computing the weighted average is slightly more complex than that for the simple
average, but it uses the same symbols:

Y=

wY

w
where w is the weight given to each individual data sample (third column in Table 2-4),

is the weighted average,
Y represents the process of addition or summation (note that the w and Y are multiplied
together before all of the values are summed together), and
Y is the value of each data sample to be averaged.
For the above example, the weighted average of the number of customers at Burger City over the
last five weeks is given by:

= (.10)(3033) + (.15)(2988) + (.20)(2972) + (.25)(3196) + (.30)(3041)


.10 + .15 + .20 + .25 + .30
303.3 + 448.2 + 594.4 + 799.0 + 912.3
1.00

3057.2
1.00

= 3057.2

29

3057


Thus, using the weighted average as a forecasting method, the manager should plan for 3057
customers during Week 11. Although forecasters are free to use any weights they want, the formula
for the weighted average may be simplified somewhat when the sum of the weights is 1.00 as in the
example above. Since dividing by one does not change the value of the numerator, the denominator
can be omitted. The formula then becomes:

wY

Y =

It is common, although sometimes tedious, to force the weights to total 1.00 when using a weighted
average.
Advantages and Disadvantages The weighted average gives the forecaster an opportunity to apply
additional judgment in deciding how much influence older data should have on the forecast. In a
moving average, the forecaster merely selects a cutoff point before which data are not considered.
In a weighted average, the forecaster can gradually phase out older or unrepresentative data by
a judicious selection of weights. The disadvantages of the weighted-average technique are that
the forecasters judgment may introduce personal biases into the forecast and that the selection of
weights to total one (if that is desired) is often difficult.
Combinations of Averages


It is also common to combine the weighted average with the moving average. Suppose for
example that your weighted-average forecast for the number of Burger City customers who came
in during Week 11 was fairly accurate (you expected 3057 customers and 3006 showed up) and
that for Week 12 you wanted to keep the same basic forecasting method but include only the most
recent five weeks of data (adding Week 11 and dropping Week 6). You would thus be creating your
forecast for Week 12 using both a weighted and a 5-week moving average. You could keep your
work for your Week 11 forecast above, only your data would now be shifted to the next lowest
weight, Week 6 dropped and Week 11 given the highest weight. Thus, for your forecast, Week 7
would be given the lowest weight (.10) since it is the oldest data, and Week 11 would be given the
highest (.30), because it is the newest data, as shown in Table 2-5.

30

Table 2-5
Data for Week 12 Forecast for Burger City
Week
7
8
9
10
11

Number
2988
2972
3196
3041
3006

Weight
.10
.15
.20
.25
.30

Now you compute your forecast just like you did in the weighted-average example above:

(.10)(2988) + (.15)(2972) + (.20)(3196) + (.25)(3041) + (.30)(3006)


.10 + .15 + .20 + .25 + .30

298.8 + 445.8 + 639.2 + 760.25 + 901.8


1.00
= 3045.85 = 3045.85
1.00

3046

Thus, the manager should plan for 3046 customers in Week 12 when using a combination of the
weighted-average and moving-average forecasting techniques.

TREND ANALYSIS

Like other time series techniques, trend analysis uses past actual values of the variable the
forecaster wants to predict as the basis for the forecast. As noted earlier, unlike averaging, which is
best used with horizontal patterns (meaning little change) and constant variation, trend analysis is used
when there is a perceptible change over time (i.e. sales growth at an ice cream stand between January
and August). If time-series data show a steady increase or decrease, a forecast based on trend analysis
should project a similar change. Figure 2-3 shows the annual heating oil expense for a business firm.
A straight line has been drawn through the plotted points to represent the upward trend of the expense.

31

Figure 2-3
Heating-Oil Expenses, 2009-2015


The line shown in Figure 2-3 is called a trend line. Like other straight lines, a trend line
follows the general equation

Y = a + bX

where Y is the forecasted variable,



a is the y-intercept (the value of Y when X = 0),

b is the slope of the line (the change in Y for each unit increase in X), and

X is the period of the forecast.

This should be familiar to you since it was covered in Chapter 1. Thus, if Y = 22 + 3X, the
forecast for period 8 (2016) will be Y = 22 + 3(8), or 46.

There are several methods for fitting a trend line to a set of data. For the data in Figure 2-3,
an oval (shown by the broken line) was sketched around the plotted points and the trend line was
assumed to coincide with its long axis. In practice, this method is not sufficiently accurate.

A more precise method of fitting the trend line to the data is called least squares because
it attempts to find the line which minimizes the sum of the squared deviations (something well
discuss later in the probability chapter, but for now you can substitute the word distance) of the
data points to the trend line. To illustrate this method, look at the data points in Figure 2-4 and
draw (or visualize) a straight line through the pattern of dots in the figure. Please note that Figure
2-4 shows the same data as Figure 2-2, except we have increased the size in order to present an
explanation of least squares.

32

Figure 2-4
Alpha Co.s Earnings 2012-2015

Those dots that do not fall on the line have a deviation from the line (which is usually measured as
being parallel to the vertical axis). Figure 2-5 illustrates this concept. Figure 2-5 shows the data
point X = 2014 and Y = $93 million.

33

Figure 2-5
One Data Point for Alpha

Notice that the trend line indicates a value for Y of $97 million. The deviation is the distance
(measured along the vertical axis) between the trend line and the data point. In this case, the
deviation is $4 million ($93 million - $97 million). Do not be concerned about the sign at this
point - we are measuring distance. The slope and position of a least-squares line can be found by
computing the value of a (the y-intercept) and b (the slope) in a special manner. The value of b in
a least-squares trend line is given by the equation

b=

XY

(
2 _
X

X)

Please note that XY means X multiplied by Y. This equation may look a little scary, but it is really
just using the same notation from some of the formulas in this chapter with a few more variables
put in for good measure.

To find the least-squares trend line for Figure 2-4, we need to compute the value of b from
the data in Table 2-2. In order to simplify the computations, we will use X-values of 1, 2, 3, ...
instead of 2012, 2013, and so on. As long as the incremental difference between adjacent periods
remains the same (1, in this case), the results will not be altered. The computations are shown in
Table 2-6.

34

Table 2-6
Preliminary Trend Line Computations

The sums from Table 2-6 can now be substituted into the formula for b:

b=

(10)(351)
4
2
10
30 4

967 -

= 967 - 877.5
30-25
=

89.5
5 = 17.90


The value of b (the slope) can be interpreted as the expected change (an increase of 17.9
million dollars) in the earnings of the Alpha Company over the period of one year. The increase is
indicated by the positive sign of b as well as by the obvious upward trend shown in Figure 2-4.

The value of a (the y-intercept) can also be computed directly, but it is more common to take
advantage of the fact that the trend line always passes through the point given by the means of X
and Y. Mathematically,

Y = a + bX

Since we already know (or can easily determine) three of the four variables in this equation, we can
find the fourth with little difficulty:

35

Y 351
Y= n =
= 87.75
4
X 10
X = n = 4 = 2.50, and
Y = a + bX, therefore
87.75 = a + (17.90)(2.50)
Rearranging the equation to solve for the unknown:
a = 87.75 - 44.75 = 43

The value of a can be interpreted as the earnings of Alpha Co. in year 0, or, by the scale used
in this problem, the year prior to the first data point -- 2011.

The trend line equation now can be written as:
Y = 43 + 17.90X
A future value of Y can be found simply by substituting the appropriate value for X. To forecast
the annual earnings of Alpha Co. for 2016, X = 5 and
Y = 43 + (17.90)(5)
Y = 43 + 89.5 = 132.5

Thus, in 2016, we expect Alpha Co.s earnings to be 132.5 million dollars. This process of
forecasting with a trend line is illustrated graphically for the Alpha example in Figure 2-6. This
example is quite unusual in that all data points fall either on or very close to the trend line. In an
imperfect world, this happens very rarely, and thus our predictions carry much greater uncertainty.

36

Figure 2-6
Trend Analysis of Alpha Companys Earnings, 2011-2016


Advantages and Disadvantages: Trend analysis has an advantage over other time-series
methods in that it is appropriate for data that show a consistent change over time. We have assumed
linearity for a limited distance beyond our range of data, but the reader should be cautious about
this assumption for predictions substantially beyond the data range. Like all time-series methods,
however, trend analysis is based solely on historical levels of the variable to be predicted and cannot
take into account environmental or other related variables. The important steps of considering
related variables leads to the subject of regression analysis covered in the next chapter.

37

Learning Objectives
Chapter 3- Regression Analysis
After studying this chapter, you should be able to:
Describe and explain simple regression.
Calculate the coefficient of determination and the coefficient of
correlation.
Explain a correlation matrix.
Describe and explain the variable mean and standard deviation.
Compute degrees of freedom for a multiple regression expression.
Identify zero one variables.
Calculate the standard error of the estimate.

CHAPTER 3
REGRESSION ANALYSIS
INTRODUCTION

In the last chapter, we focused solely on predicting how long something will take or how
many customers we will need to serve based only on how long it has taken or how many people we
have served in the past. There may, however, have been conditions associated with each of those
past instances which may or may not be conditions in the new forecast. Say, for example, that you
are the box office manager for a football team. You would like to predict how many tickets will
be sold for the game next Saturday. You could average the attendance at the past ten games the
team has played, or you could use a weighted-average model, giving a lot of weight to the past few
games to see how many people will show up at this one. But shouldnt you also consider other
things that may have been an issue at some, though not all, of those other games: Is rain expected?
Is the team doing well? Are they playing a rival? There are a great number of things which may
radically affect attendance each time the team plays a game, and it would be to the box office
managers benefit to find out what they are, so that when analyzing this game, he or she can more
accurately predict attendance.

Regression analysis is the approach that is needed. This forecasting technique is similar
to the ones in Chapter Two in that the forecaster makes a recommendation about the unknown
from a collection of known data points. With regression, however, the forecaster is not limited to
predicting the unknown variable solely from historical levels of that same variable, but is able to
predict it based on one or a number of other conditions as well. If used properly, regression analysis
can be extremely accurate because it allows the forecaster to customize a prediction based on the
conditions that exist, granted those conditions have existed at some point in the past so their effect
on the predicted variable can be analyzed.

38

BASIC CONCEPTS OF REGRESSION




In computational terms, regression analysis is very closely related to trend analysis. In its
simplest form, regression analysis practically is trend analysis, except that time is replaced with
a different variable. By convention, the variable the forecaster wishes to predict is referred to as
the dependent variable and is represented, as in other forecasting techniques, by the letter Y. The
variable on which the forecast depends is called the independent variable and is represented by the
letter X. Using this notation, the regression line on which the forecast is based is given by the same
equation used in trend analysis:

Y = a + bX



A regression analysis using this equation is called linear because it assumes a straight-line
relationship between X and Y. Relationships of this type can be graphed easily and accurately by
hand or on a computer. There are other, non-linear relationships that are possible, but they are
beyond the scope of this discussion.

A regression analysis involving a single independent variable is further classified as a simple
regression, while one involving two or more independent variables is called a multiple regression.
We will examine both simple and multiple linear regression, but will start with an example based
on the data in Table 3-1 to build a simple regression model to predict family income.

39

Table 3-1
Education and Income of Family Heads
Years of Formal
Education

Income
(Thousands)

12
13
11
16
12
12
15
9

$28
$22
$19
$34
$26
$32
$25
$21


The first step in building the regression model is to plot the data in what is known as a scatter
diagram. On the surface, this plot is similar to those done in time series with the exception that the
data follow no clear progression through time (if the data in your scatter diagram seem to follow a
path over time, do not use a regression model. A time series model is necessary). In fact, time is
not even an element of this analysis. The data may be from the same time period or from different
periods. The scatter diagram for family income and formal education data is shown in Figure 3-1.

40

Figure
Figure 3-1
3-1
Scatter DIagram
Diagram of
Scatter
of Income
Income versus
versusEducation
Education
36
32
28

Y'

24

Income
($000) 20
16

12
8
4
0

6
10
8
Education (Years)

12

14

16


Since the regression line follows the same general equation of the trend line, we can use the
same procedures to compute a and b that we used in trend analysis. These computations are shown
in Table 3-2.
Table 3-2
3-2
Table
PreliminaryRegression
RegressionLine
Line Computations
Preliminary
Computations
X (Education)

12
13
11
16
12
12
15
9
100

Y (Income)

XY

X2

28
22
19
34
26
32
25
21
207

336
286
209
544
312
384
375
189
2635

144
169
121
256
144
144
225
81
1284

41


There is another scary-looking formula coming up, but it is one that is extremely necessary
for our derivation of the regression line. Using it, you can find the slope of the regression line, b.
It is:
X Y
_
XY
n
b=
2
X)
2 _ (
X
n
(100)(207)
2635 8
=
2
(100)
1284 8
2635 - 2587.5
=
1284 - 1250
47.5
=
= 1.3971
34

Thus you know b, the slope of the line, is approximately 1.40. Now you want to know where
on a graph this line would be, and the easiest way to do this is by determining where it crosses the
y-axis. You can find this by putting your newly derived b and the arithmetic means of X and Y (see
Chapter 2 for an introduction to averages), into the regression equation:

207
= 25.8750,
8
100
= 12.5000,
X=
8
Y=

Y = a + bX, and
25.8750 = a + 1.3971(12.5000)
a = 25.8750 - 17.4638 = 8.4113
The regression equation, with a and b rounded to two decimals, can now be written as

Y = 8.41 + 1.40X

42

Taken literally, this equation says that with no formal education (X = 0), a family heads expected
annual income is $8,410 (a = 8.41 thousands of dollars) and that for each additional year of formal
education, income can be expected to increase by $1,400 (b = 1.40 thousands of dollars). We can
substitute any amount of schooling (the independent variable) we want into the equation to tell us
the income (the dependent variable) of someone with that education level.

The forecasted income for a person with two years of college (X = 14) therefore would be:
Y = 8.41 + 1.40(14)

= 8.41 + 19.60

= 28.01 or $28,010
CORRELATION ANALYSIS


The regression line reveals the nature of the relationship between X and Y, but it does not
tell how strong that relationship is. For example, the data in Figure 2-4 all fall very close to the
trend line while the data in Figure 3-1 are more dispersed around the regression line. Intuitively,
we suspect that the relationship between time and annual earnings for Alpha Co. is stronger than the
relationship between education and income. Although this comparison involves a trend analysis
for one set of data and a regression analysis for the other, the two methods are sufficiently similar
to illustrate the relative strengths of two regression analyses or, for that matter, two trend analyses.
There are two ways to measure the strength of a relationship between dependent and independent
variables, the coefficient of determination and the coefficient of correlation.
Coefficient of Determination

Simply put, the coefficient of determination tells us how much variation in the data can be
explained by the regression model we have built. The concept of variation is illustrated in Figure
3-2.

43

Figure 3-2
Explained and Total Variation
Y (16, 34)
Y

Y' = a + bX

TV = Y - Y
EV = Y' - Y

25.875


Figure 3-2 can be interpreted as follows: Without a regression analysis, we expect any given
Y-value, on average, to coincide with the mean, Y . This makes sense, because the mean, Y bar or
Y , is just the average of all of the past Y-values, and a good indication of future ones. This is the
basic premise underlying the time series technique of averaging. In this case Y is 25.875 and is
shown as the horizontal line in Figure 3-2.

However, with regression analysis we can do much better about predicting Y. Given a value
of X, we would predict that Y would fall on the regression line Y = a + bX. Notice in Figure 3-2
that our actual data point, (X = 16, Y = 34) does not fall on the regression line; it is above it. On the
other hand, the regression line does help us quite a bit in explaining the variation of the data point
(X = 16, Y = 34) from the simple average Y . In other words, the regression line has explained a
significant portion of the variation of the data point from our uneducated expectation that Y is the
average for the data (uneducated i.e. without regression). Mathematically, this explained variation
is noted as:
Y - Y

But our task is to account for the total variation of our data point Y, from the average Y . As
figure 3-2 shows, the total variation is noted as:
Y-Y
and is comprised of two elements. The first is Y - Y , the explained variation from the average and
the second is
Y - Y

44

which is the unexplained variation of the data point from the regression line. In summary,
Y - Y = (Y - Y ) + (Y - Y)
Total
Explained
Unexplained

Variation Variation
Variation

This same equation holds true when the terms are squared and added (this can be proven
mathematically, but is beyond the scope of this text). Therefore:

(Y - Y )2 = (Y - Y )2 + (Y - Y)2

Total
Explained
Unexplained

Variation Variation
Variation


What does all this mean? It means that we have a basis for measuring the fit of the line to
the data. In other words, we know that all the data points will not fall exactly on the regression
line. But if our regression line has done the job, we can expect that most of the variation of a data
point can be explained by the regression line. The remaining unexplained variation is present, but
hopefully small.

The percentage of variation explained by the model is called the coefficient of determination
and is usually noted as r2:
Explained variation
Total Variation

r2 =

= EV
TV


The relationship shown in Figure 3-2 and the equation above is merely a pedagogical aid.
One does not actually compute EV, TV and divide to find The actual computation is as follows:

r=
2

XY
X_
2

X)

[[

2
Y_

Y)

If this seems a little bit imposing, do not worry about it, just stick with us.

45


At this point, it should be noted that there are many variations on this equation, just
as there are many variations of the equation for b given earlier. Algebraic manipulations of
the equation shown here will yield many equivalents, some of which are computationally
more or less convenient, depending on the organization of data. The equation shown here
was chosen for two reasons. First, two of the three major terms appear in the equation for
b and the third is similar to the denominator in the equation for b. As a result, the manual
computations are easier to follow. Second, this is a form frequently used in computer
analysis because it can be used with only one run through the data. Some other forms
require a preliminary run to compute the means of X and Y before r2 can be computed.

The equation for r2 can be solved using2 the table set up earlier to find b, with one
minor addition: we must also find the sum of Y . Table 3-3 shows the original computations
(from Table 3-2) with this addition.
Table 3-3
Preliminary r2 Computations
X(Educ)

Y(Inc)

XY

X2

Y2

12
13
11
16
12
12
15
9
100

28
22
19
34
26
32
25
21
207

336
286
209
544
312
384
375
189
2635

144
169
121
256
144
144
225
81
1284

784
484
361
1156
676
1024
625
441
5551

46

The value of r2 can now be computed as:

r=

2635

(100)
8

1284

(100)(207)
8
2

[[

(207)
8

5551

(2635 - 2587.5)
= (1284 - 1250)(5551 - 5356.125)
2
(47.5)
= (34)(194.875)
2256.25

= 6625.75 = .3405
Notice that several of the terms in this equation have already been computed (in determination
of b) and the computation is not nearly as difficult as it might look at first glance.

The answer of .3405 means that just over 34 per cent of the variation in family
income can be explained by formal education. But what about the remaining 66 percent
unexplained variation? Although it is not stated implicitly in this problem, common sense
tells us that experience, geography, economic conditions, good (or bad) fortune, choice of
occupation or profession, and many other factors also contribute to annual income. We shall
see later in multiple regression how additional factors can be incorporated into the analysis.
First, let us look at the second measure of the strength of the regression relationship.
Coefficient of Correlation


Although the coefficient of determination has a more clear and useful interpretation,
the coefficient of correlation, the second measure of the strength of the regression relationship,
is used more commonly. Fortunately, the computation of this correlation coefficient is
quite simple -- assuming one has first computed the coefficient of determination. In a
simple linear model, the correlation coefficient is simply one of the two square roots of
the coefficient of determination and, logically, is labeled r. In mathematical notation, we
write:

r2

r= +

47


Notice that there are both positive and negative roots of r2. Recall, for example, that both
(+2)2 and (-2)2 equal 4. For our education-income problem, the roots are:

r = + .3405 = + .5835

The convention in statistics is for the coefficient of correlation to take the sign of the slope
of the regression line. Since the slope of the regression line in this problem is positive (b is +1.40
and the regression line slopes upward from left to right), we will use +.5835 as the coefficient of
correlation. The correlation coefficient measures the degree of linear association between X and
Y. A positive association means that as X increases, so does Y, and vice-versa. As a rule of thumb,
statisticians like to see the coefficient of correlation be .8 or more or -.8 or less (the range for r is
between -1 and 1). In such cases, almost 65 percent of the movement in Y is explained by movement
in X . With its r-value of .5835 and of .3405, the effectiveness of our choice of an independent
variable in this regression analysis is questionable.
MULTIPLE REGRESSION


We should not give up on regression analysis in this problem too quickly, however. There
are ways, as suggested earlier, of incorporating additional independent variables to improve the
strength of the relationship and the accuracy of the forecast. We can add one or more independent
variables and use multiple regression to forecast family income.

The general form of a multiple regression equation is
Y = b0 + b1X1 + b2X2 +...+bmXm
where the first b is similar to a, the y-intercept,

b1,...bm are the coefficients of X1,...Xm


respectively, and

m is the number of independent variables.

The computations in multiple regression are extremely complex and are rarely performed
without the aid of a computer. Graphing of multiple regression problems is not only initially
difficult but eventually impossible because a dimension is required for each variable. Thus, while
we can graph simple regression (X and Y, only) on a flat, two-dimensional surface as in Figure 3-1
and we can, with a little imagination, also represent a three-dimensional figure and Y) on a twodimensional surface as in Figure 3-3, we cannot even visualize figures of four or more dimensions,
although we can calculate the equation of their regression lines. In order to keep our first example
of multiple regression down to dimensions we can understand, let us add just one more independent
variable, age, to the data originally presented in Table 3-1. The new data are shown in Table 3-4.

48

Table 3-4
Education, Age, and Income of Family Heads
Years of Formal
Education (X1)

Age
(X2)

Income
(Y)

12
13
11
16
12
12
15
9

38
35
29
32
33
40
27
23

28
22
19
34
26
32
25
21

Figure 3-3
Multiple Regression Plot

34
40

Income
($000)

35

25

30

26

25

28

32

22

21
19

20
15
10

14

5
0
20 24

12
28

32

Age
(Years)

36

10
40

49

44

16

Education
(Years)

A computer analysis of these data points is shown on the following page:

CORRELATION MATRIX
1
.2001
.9999
.5835
.5691
.9999
VARIABLE MEAN
STANDARD DEVIATION
Education
12.5
2.2039
Age
32.125
5.6679
Income
25.875
5.2763
REGRESSION EQUATION
DEPENDENT VARIABLE:
INCOME
INDEPENDENT
STANDARD
VARIABLE COEFFICIENT
DEVIATION
Education
1.1713
.7301
Age
.4386
.2839
CONSTANT -2.8560
11.6330
COEFFICIENT OF DETERMINATION
=
.5536
COEFFICIENT OF CORRELATION
=
.7440
DEGREES OF FREEDOM
=
5
STANDARD ERROR OF ESTIMATE
=
4.1711
TABLE OF RESIDUALS
ACTUAL
PREDICTED RESIDUAL
28
27.8660
- .1340
22
27.7216
5.7216
19
22.7474
3.7474
34
29.9298
-4.0822
26
25.6731
- .3269
32
28.7432
-3.2568
25
26.5555
1.5555
21
17.7733
-3.2267

T-RATIO
1.6043
1.5449
0.2455

The interpretation of this computer analysis is detailed in the following sections.

50

Correlation Matrix

The rows and columns of the matrix represent the variables in order. The correlation matrix
is reproduced below:

Education
Age
Income

Education
1
.2001
.5835

Age

Income

.9999
.5691

.9999


The values in the body of the matrix represent the correlation between row and column.
Thus, the intersection of row 1 and column 1 contains a 1 -- representing perfect correlation between
education and education. The correlation between education and income (row 3 and column 1) is
.5835, a value determined earlier in the discussion of simple regression. The correlation between
age and income, .5691, is shown in row 3, column 2. The correlations of age with itself (row 2,
column 2) and income with itself (row 3, column 3) are not exactly 1 due to rounding errors as the
binary arithmetic values used in internal computer processing are converted to decimal arithmetic.
Finally, the correlation of age with education (row 2, column 1) is of interest only if it is high
(close to 1), which it is not. A strong correlation between independent variables is an indication
of multicollinearity, a condition that suggests that one independent variable is determining one or
more of the other independent variables rather than the dependent variable. In such cases, one of
the related independent variables should be eliminated. The correlation coefficient here, .2001,
does not give such an indication.
The Variable Mean and Standard Deviation


Standard deviation is a calculation involving all observations equally, and the quantity
involved in the calculation is the squared distance from the mean. In a normal distribution (normal
distribution characteristics will be covered in Chapter 4), the standard deviation indicates distance
from the mean. The mean plus and minus approximately two standard deviations (actually, 1.96)
will include 95 per cent of all values of the variable. Plus and minus three standard deviations will
include over 99.7 percent. Using the computer analysis of the data from Table 3-4, we can say that
(assuming a normal distribution) the mean education for these data is 12.5 years and that 95 percent
of all values for education are within 1.96 X 2.2039 or, roughly, 4.3 years. That is, approximately
95 percent of the subjects in this study have between 8.2 and 16.8 years of education. Generally,
this usage of the standard deviation is more accurate for larger data sets, perhaps a few hundred
observations rather than the eight shown here.

51

Regression Equation
The regression equation for this analysis is
Y = -2.8560 + 1.1713X1 + .4386X2
where Y is the dependent variable (income),
X1 is the first independent variable (education), and
X2 is the second independent variable (age).

The coefficients of these independent variables are estimates of the true coefficients b1 and b2
which are the means of all possible estimates that we could have obtained with different samples.
The estimated coefficients may be anywhere in a normally distributed range around these means.
The t-ratio is a measure of the significance of each of these independent variables. It tells you how
many standard deviations away from zero the coefficient of your independent variable is. This is
important because if the coefficient is zero, it is insignificant. Insignificance (in this case) indicates
the independent variable does not assist in helping you build your regression line. If the resultant
ratio is between plus and minus 2 (or a different value established by an experienced forecaster),
it is possible that the value of the coefficient is actually zero and the usefulness of the independent
variable should be questioned. This is a good way to check yourself to make sure you are not
putting into your analysis an extraordinary number of independent variables simply to get a higher
r2. The reader may wish to confirm that the ratios here, both between 2 and -2, do in fact raise this
possibility.
Coefficient of Multiple Determination

Because this is a multiple regression analysis, the coefficient of determination is now called
the coefficient of multiple determination and shows the amount of variability in the dependent
variable that is explained by both independent variables working together. In this example, 55.36
percent of the variation in income can be explained by variation in education and age.

52

Coefficient of Multiple Correlation



Just as it was in our simple regression problem, the coefficient of multiple correlation
is the square root of the coefficient of multiple determination; in this example, it is the square
root of .5536 or .7440. Negative roots are not considered in multiple correlation like they are in
simple regression because of the possibility that the coefficients of the independent variables may
have different signs. In such cases, it is not clear which sign to give the coefficient of multiple
correlation.
Degrees of Freedom

The improvement in the coefficient of determination achieved by adding a second independent
variable is not without cost. First, there are the obvious data collection and computational costs.
A more subtle cost is the added uncertainty of the forecast that results from an additional variable.
In statistical terminology, an additional independent variable lowers the degrees of freedom
of the dependent variable; some statisticians refer to them as free pieces of information.
Computationally, degrees of freedom are found by subtracting one plus the number of independent
variables from the number of observations. In this example, there are 8 - (2 + 1), or 5, degrees of
freedom. In your regression models, as long as you have plenty of observations and try not to cram
too many independent variables into your model, you should be fine.
Standard Error of the Estimate

The standard error of the estimate is similar to the standard deviation. While the standard
deviation is a measure of variation around the mean, the standard error of the estimate is a measure
of variation of points around the regression line. This is useful in the following way: given
the independent variables for our regression line, we can build bands of confidence around our
prediction of the dependent variable. To illustrate, consider a 40-year old person with 14 years
of education. Referring back to the computer analysis according to the regression equation, we
would predict annual income to be:
Y = b0 + b1X1 + b2X2
= -2.8560 + 1.1713(14) + .4368(40)
= -2.8560 + 16.3982 + 17.5440
= 31.0862

53


Assuming a normal population, we know that the standard error of the estimate is 4.1711,
and we know that there is a 68% probability (one standard deviation on each side of that point on
the regression line) that this persons income is between $26,915.10 and $35,257.30, and 95%
probability (approximately two standard deviations on each side of that point on the regression
line) that it is between about $22,744.00 and $39,428.40. This quantifies how uncertain we may be
of any point on the regression line.

Table of Residuals

The table of residuals shows the actual, predicted, and residual values for each observation.
The residuals, or errors, can be used to compute the standard error of the estimate if it is not done
by the computer software. In our case, the table of residuals is of interest only to show how much
each data point deviates from the regression line.
ZERO-ONE VARIABLES

Up to this point we have dealt with variables that are continuous and can take on a wide
variety of values: for example, a persons income can be $20,000 or $50,000, any number within
that range, or any number above or below it or nearly so (although, for our independent variables
we have expressed age and education to the nearest year, income to the nearest thousand dollars).
Now let us see how a binary variable -- one that can assume only one of two values -- can be
incorporated into regression analysis.

A binary, or dummy, variable is useful in regression analysis when you would like to put an
important independent variable into the model that cannot be given a numerical value. Examples
include account status (current or past due), employment status (full-time or part-time), and
inventory status (in- or out-of-stock). The value of a binary variable has no numerical significance,
it merely distinguishes one outcome from the other. It is common to use zero and one as the two
values of a binary variable, hence the name zero-one variable. Continuous variables can also
be expressed in this fashion if appropriate. For example, a social security application might use
age only as under 65 or 65 and over. We will illustrate the use of a binary variable by adding
gender to our household: zero will represent a male, one a female.

54

Table 3-5
Education, Age, Gender, and Income of Family Heads
Years of Formal
Education (X1)
12
13
11
16
12
12
15
9

Age
(X2)
38
35
29
32
33
40
27
23

Gender
(X3)
0
1
1
0
0
0
1
0

Income
(Y)
28
22
19
34
26
32
25
21

The computer analysis of the data in Table 3-5 is shown below:


CORRELATION MATRIX
1

.9999
.1879
-
.2618
.9999
.5835
.5691
-
.6082
.9999
VARIABLE MEAN

STANDARD DEVIATION
Education 12.5

2.2039
Age
32.125

5.6679
Gender

.375

.5175
Income

25.875

5.2763
REGRESSION EQUATION
DEPENDENT VARIABLE:
INCOME
INDEPENDENT

STANDARD
VARIABLE COEFFICIENT
DEVIATION
T-RATIO
Education 1.5679

.3212

4.8817
Age
.2464

.1271

1.9389
Gender
-
6.7479

- 4.8600

1.3884
CONSTANT .8898

5.0092

.1776
COEFFICIENT OF DETERMINATION =
.9354
COEFFICIENT OF CORRELATION
=
.9671
DEGREES OF FREEDOM

=
4
STANDARD ERROR OF ESTIMATE
=
1.7747
TABLE OF RESIDUALS
ACTUAL PREDICTED RESIDUAL
28
29.0693
-1.0693
22
23.1500
-1.1500
19
18.5356
.4644
34
33.8624
.1376
26
27.8371
-1.8371
32
29.5622
2.4378
25
24.3144
.6856
21
20.6690
.3310

55

After analyzing this output, we see that there is a negative coefficient attached to the independent
variable gender. Since males were assigned a zero as their dummy variable, one can see that
the negative coefficient only reduces the income of women, who were given a one.

The regression equation for this example is
Y = b0 + b1X1 + b2X2 + b3X3
= .8898 + 1.5679X1 + .2464X2 - 6.7479X3
For female heads of household, 40 years old, with 14 years of formal education, the predicted
annual income is
Y = .8898 + 1.5679(14) + .2464(40) - 6.7479(1)
= .8898 + 21.9506 + 9.8560 - 6.7479
= 25.9485 or $25,948.50
Whereas for men of the same age and education level, the predicted annual income is
Y = .8898 + 1.5769(14) + .2464(40) - 6.7479(0)
= .8898 + 21.5906 + 9.8560 - 0
= 32.6964 or $32,696.40

We also note that 93.5 percent of the variation in income for this group of subjects is
explained by differences in age, gender, and education (the coefficient of determination is .9354).
Furthermore, the t-ratios of the regression equation coefficients are stronger than in the previous
analysis which did not consider gender (generally, a t-ratio over four conveys significance of its
associated independent variable). Finally, all correlation coefficients of the independent variables
with the dependent variables are extremely high and there is no evidence of multicollinearity among
the independent variables.

56

Learning Objectives
Chapter 4- Probability
After studying this chapter, you should be able to:
Distinguish simple and theoretical probability from classical
probability theory.
Discuss joint and conditional probabilities.
Identify a probability mass function.
Measure a probability distribution.
Describe a binomial distribution.
Calculate Z scores for a normal distribution.
Calculate the standard error of a sample.

CHAPTER 4
PROBABILITY
INTRODUCTION

Determining the probability that a possible event will or will not occur is a critical step in
making many business decisions. For example, if an oil engineer is deciding whether to drill for oil,
he would like to know the probability of striking oil before making an investment in the necessary
equipment and labor. Similarly, a product manager would like to know the probabilities associated
with different sales levels of a new product before deciding whether to produce it and how much to
spend on advertising and sales representatives. This chapter will introduce you to some of the basic
concepts of probability, and the next chapter will use probabilities in a framework designed to help
managers make better decisions.
TYPES OF PROBABILITY
Simple Probability

Simple theoretical probability is familiar to most people even though they may not be
aware of it. As children, most of us learned that the probability of rolling a 3 on a six-sided die is
1/6, since only one side of the die has three spots. We also learned that the probability of drawing
an ace from a deck of cards is 4/52 (or 1/13), since this number is the ratio of aces to the total
number of cards.

Classical probability theory allows us to make such calculations. If a situation can result
in y possible outcomes, and x of these are termed successes, then the probability of a success is x/y.
For this to be true, the list of possible outcomes must be mutually exclusive, collectively exhaustive,
and equally likely. We will define these terms shortly, but since they hold for our card-drawing
example, the probability of successfully drawing one of the four aces in a deck of 52 cards is 4/52,
or 1/13.

In order to define the three terms above, let us continue with the example of choosing an ace
from a deck of cards. Assuming we have a complete deck, there are four aces in that deck, thus
the probability of drawing an ace is (1/52 (the probability of drawing one particular ace, or one
particular card for that matter, from one deck) x 4 (the number of aces)) = 4/52. The deck of 52
cards includes all possible selections, making the process of drawing collectively exhaustive (we
cannot choose a card from any other deck besides the one in front of us).

57


Moreover, each card in the deck is designated by both suit & number, so no two are the
same, making the individual outcomes of drawing cards mutually exclusive (there are no duplicate
outcomes as long as cards already drawn are not put back in the deck). Finally, a random draw
implies that the choice of one card carries the same chance as a choice of another. Thus, all of the
outcomes are equally likely. The conventional form for writing the simple probability above is:
P(Ace) = 4/52. Similarly, the form for writing the probability of rolling a three on a six-sided die
(assuming it is a fair die) is P(3) = 1/6. The event of interest (the one for which you would like to
know the probability of occurrence) is enclosed in parentheses.

A more practical approach to real world probability is given by the relative frequency
method, which is based on experimentation and number of trials. If one were to roll a die 600
times and tallied that a three was rolled exactly 100 out of the 600 rolls, the relative frequency or
probability for a three, P(3), would be 100/600 or 1/6. Assuming a fair die, where all outcomes
are equally likely, we expect basically the same result as derived from the theoretical approach as
long as there are a sufficient number of trials (generally at least as many as the number of different
outcomes, and much more if possible). Relative frequency is useful in the real world when, unlike
throwing a fair die, different outcomes do not all have the same probabilities. It is most useful in
situations where a significant enough number of trials can be made to generate reliable probabilities
for each outcome.

Probabilities may also be found through a combination of intuition, knowledge, research
and judgment. Probabilities arrived at in this manner are referred to as subjective probabilities.
Subjective probabilities are a way to express a persons assessment of an uncertainty. When the
weatherperson states that there is a 30% chance of showers, usually this has not been derived
using any standard statistical measure of probability; rather, she is assessing the probability of rain,
arrived at through intuition, knowledge, research and judgment.
Joint and Conditional Probabilities


As discussed above, P(ace) = 4/52 and P(3) = 1/6 are simple or unconditional probabilities.
So is P(spades) = 13/52, the probability of drawing a spade from a deck of cards. Joint probability
is the probability of two events occurring together, for example, an ace that is a spade, written
P(ace,spade) or P(spade,ace). The probability is equal to 1/52, which makes sense since there is
only one card in the deck that will satisfy both constraints, the Ace of Spades. The members or
events within the parentheses are interchangeable for joint probabilities.

58


Conditional probability is the probability of one event occurring conditional upon another
that is already known or has already occurred. For example, suppose we have a friend who has
drawn a random card from a full deck, not shown it to us, and told us it is a spade. We want to
determine the probability that it is the Ace of Spades. The probability of this would be 1/13 since
there are thirteen cards in the spades suit and only one of those cards is the Ace of Spades. The
form for this probability would be written as P(ace|spade), the | translated as given. Unlike joint
probabilities, the events within the parentheses are not interchangeable. For example, P(spade|ace),
the probability of a spade given an ace, equals 1/4 (four aces, only one Ace of Spades) but is not
the same thing as P(ace|spade), the probability of an ace given a spade, which as we demonstrated
above equals 1/13.
Probability Distributions

Unfortunately, in business the simple theoretical and relative frequency types of probabilities
we have discussed so far will not be sufficient to make most probability-related decisions. There
are several reasons for this: first, business decision makers are often faced with a shortage of data
or difficulties in measuring a probability; second, different outcomes have different probabilities of
occurring which must be taken into account, and third, many business situations have not just a few
distinct possible outcomes but often an infinite number -- a new product may be bought by only one
person in a year, the first months demand may outstrip yearly production, or anything in between
may occur. Drilling for oil could produce returns ranging from negative if no oil is recovered, to
billions of dollars if a North Slope-sized oil field is discovered. Managers must be able to deal with
this more realistic type of uncertainty, known as continuous uncertainty, as well as with decisions
that have a limited number of outcomes.

Probability distributions are a way to deal with these many valued uncertainties because
they allow decision makers a way to order their judgments and present their assessment of the
probabilities in an easily interpreted and useful form. In the next few subsections, we will detail
several different probability distributions.
Probability Mass Function
A probability mass function (PMF) is a probability distribution useful when there are
a limited number of outcomes and the decision maker can assign probabilities to each of those
outcomes. As discussed in probability theory, these outcomes need to be mutually exclusive and
collectively exhaustive, but they do not necessarily have to be equally likely (since each outcome is
assigned its own probability which may be different from those of all the other outcomes). If these
two tenets hold true, all possibilities are covered with no overlap, and the sum of the probabilities
should add up to 1 or 100%, which means that the decision maker is 100% certain that one of the
outcomes he has chosen will occur.

59


For example, Jack Carr, the owner of Jacks Used Cars, wants to know how many cars he
will sell next week. He currently has only seven cars on the lot. From experience Jack knows he
sells three cars per week on average, so he could easily predict that he will sell three next week
and forget the matter. However, Jack realizes that some weeks he sells seven cars and some weeks
he sells none. Using a combination of intuition, knowledge and judgment, Jack sets up the table
below, which states the probability of selling a certain number of cars each week and is called a
simple probability distribution.
Table 4-1
Simple Probability Distribution of Car Sales
# of Cars
0
1
2
3
4
5
6
7

Probability
6%
11%
15%
20%
19%
15%
10%
4%

Jack cannot sell more than seven cars since all he has on the lot now are seven and he will not get
new ones until the following week. Assuming no customers return their cars, he cannot sell fewer
than zero either. The probability mass function for this distribution is a graph that looks like this:
Figure 4-1
Probability Mass Function of Car Sales

25
20

Probability 15
(%)
10
5
0

Number
of Cars

60

At points between whole numbers the probability drops to zero since it is impossible to sell a
fraction of a car.

The probability mass function is an easy way to graphically represent several outcomes with
known probabilities. It gives decision makers an indication of the most likely value and the spread
around it, which is a helpful tool in the decision process.
Cumulative Probability Distribution
The cumulative probability distribution uses the same data but instead considers the
possibility that the outcome will be less than or equal to a certain value. For example, Jack knows
that the probability of selling less than or equal to 7 cars will be one. Jack would translate the
probability mass function (PMF) into a cumulative distribution function (CDF) by adding the
probability assigned to each value to the sum of the probabilities assigned to the values below it, as
in the example below.
Table 4-2a
Simple Probability
Distribution
Cars

Probability

0
1
2
3
4
5
6
7

6%
11%
15%
20%
19%
15%
10%
4%

Table 4-2b
Cumulative Probability Distribution of Car Sales

Cars Cumulative
0
1
2
3
4
5
6
7

6%
17%
32%
52%
71%
86%
96%
100%

Probability
(6% + 11%)
(6% + 11% + 15%)
(6% + 11% + 15% + 20%)
(6% + 11% + 15% + 20% + 19%)
(6% + 11% + 15% + 20% + 19% + 15%)
(6% + 11% + 15% + 20% + 19% + 15% + 10%)
(6% + 11% + 15% + 20% + 19% + 15% + 10% + 4%)

The table on the left is the simple probability distribution for Jacks Used Cars that he derived
previously, and using it we have constructed the cumulative distribution function on the right
above. A graph of the CDF is shown below.
Figure 4-2
Cumulative Distribution Function of Cars Sold
100
80
Probability 60
40
(%)
20
0

0 1 2 3 4 5 6 7
Number of Cars

61


The areas where the line flattens out represent impossible outcomes (fractional sales, we said
before, are not possible). The cumulative probability distribution, since it deals with uncertainties
less than or equal to an amount, makes the idea of continuous probabilities much more manageable.
For example, an oil company geologist would find it easier to assess the probability of pumping
100,000 barrels per day or fewer from a well than computing the probability of pumping every
barrel from 0 to 100,000 barrels per day. However, for situations where the number of outcomes
is great, the cumulative probability attached to a specific outcome may be needed for planning
purposes. To satisfy this, a curve can be built and smoothed through the flats of the distribution
or between the known probabilities to give a good estimate of the cumulative probability assigned
to all possible outcomes. For example, the oil companys geologists CDF might look like this:
Figure 4-3
CDF of Barrels of Oil Per Day From One Well
100
90
80
70
60
50
Probability 40
30
20
10
0
60

70

80

90

100 110 120 130

Barrels per Day


(Thousands)

The geologist can estimate by smoothing a curve between the known points that there is a 40 %
chance of pumping 86,000 or fewer barrels per day, although he did not actually assess a probability
for 86,000, only for 60, 70, 80, 90, 100, 110, 120, & 130 thousand.
Measurement of Probability Distribution

Probability distributions are characterized by measures of their centers and their dispersion,
or how spread out potential outcomes are. One measure of the center of a probability distribution is
its mean or expected value, designated by the Greek letter mu (). The mean is the weighted average
of the distribution, or the outcomes (number of cars sold or barrels produced) multiplied by their
respective probabilities and summed up. For example, the mean of Jacks Used Car distribution
is:
3.4 = (0x.06) + (1x.11) + (2x.15) + (3x.20) + (4x.19) + (5x.15) + (6x.10) + (7x.04)

62

The mode, another measure of the center of the distribution, is the most likely outcome. The
mode can be identified as the steepest point on the CDF or the highest point on the probability
mass function, signifying the outcome having the highest probability of occurring. The mode for
Jacks distribution is 3: Jack predicts that there is a 20% probability that he will sell only three
cars next week, which is a higher probability than for any other sales level. The median, our
final measure for distribution centers, divides the distribution into two equally likely areas. For
continuous uncertainties, there exists a 50% chance that the outcome will be less than or equal to,
and a 50 % chance that the outcome will be greater than or equal to, the median. The median of
Jacks distribution is again 3. The median is also known as the .5 fractile. Other fractiles such as .1
indicate a .1 chance that the outcome will be less than or equal to, and a .9 chance that the outcome
will be greater than or equal to, the amount in question.
Assessing Probability Distributions


Suppose Z and Z Company is coming out with a new product for which no market data
exists. Zack, the new product manager, needs to know how many units he can expect to sell so that
he can decide whether the product is even worth making and, if it is, in what quantities it should be
produced. Zack decides to use the fractile method to assess demand.

Zack must use his knowledge, experience and judgment to assess the potential sales of the
company. He begins by choosing an upper and lower limit for expected sales, 400 and 50 units,
beyond which he seriously doubts actual sales will be. Then he chooses the .5 fractile, the midpoint
of the distribution and the point at which there is an equal likelihood of the actual outcome falling
above or below that point. Assume Zack feels that expected sales of 200 units is the appropriate .5
fractile. Zack now has divided the probability distribution into two halves, 50-200 and 200-400.
The next step is to divide each half in half by assessing the .25 and .75 fractiles. Assume he feels
the .25 fractile should be 150 units and the .75 fractile should be 275 units. Again, he is using his
knowledge and judgment along with his experience in new product introductions in assessing these
probabilities. Zack now defines the range of sales that he chose initially by assessing the .01 and
.99 fractiles at 50 and 400. By setting these sales figures at these extreme fractiles, Zack recognizes
that these are not the absolute limits, but he expects that there is only a one in a hundred chance that
the actual result will be below the range he has set, and an equal chance that it will be above it. The
decision maker should ensure that the distribution accurately reflects his or her feelings. One way
to do this is to view a graph of the cumulative distribution function (CDF). When the data points
are graphed and after a curve is drawn to connect them, assuming a normal distribution, the curve
should be S-shaped, with the most likely value clustered around the median. Zacks distribution is
shown on the next page:

63

Figure 4-4
Cumulative Distribution
1
0.9
0.8
0.7
0.6
Fractiles
(Cumulative 0.5
Probability) 0.4
0.3
0.2
0.1
0

Function

50

100 150 200 250 300 350 400

Number of Units Sold

Using Historical Data to Assess Probability Distributions




Often, data collected by a business in the past can be used to help assess probability
distributions for uncertainties in the future. Certain information, such as past sales reports, can be
used to enhance the decision makers judgment of the uncertainty.

For data from past experiences to be applicable to the situation at hand, the conditions under
which those situations occurred and the one you are trying to predict will occur should be nearly
indistinguishable. Often, this is rather difficult to achieve, for there are many significant things
which may differ even in seemingly common situations. For example, lets say Sarah is determining
how many pieces of chicken to order for the Friday Night Fried Chicken Special at her restaurant.
She has decided to use the figures from past weeks.

64

Table 4-3
Pieces of Chicken Consumed During Past Friday Night Specials
Weeks
1
2
3
4
5
6
7
8
9
10
11

Pieces of Chicken
101
105
108
96
156
97
101
100
104
102
100

After reviewing this data table, Sarah remembers that in Week Five there was a convention at the
hotel next door, and many of the attendees came to the restaurant and had the special. Since there
is a distinguishable difference in Sarahs eyes between what occurred that Friday and what will
occur this Friday, she should not use data from Week Five in assessing her probabilities.

Sarah now has ten weeks of indistinguishable data. She assigns each data point a
probability of .10 since she does not now want to say whether any sales figure is better or more
useful than any other. The probability mass function of the data is shown in the following graph:
Figure 4-5
Probability Mass Function of Sarahs Data
.20
Probability
.10

94

96

98

100 102 104 106 108

Pieces of Chicken Consumed

65

The Cumulative Distribution Function is shown in the graph below:


Figure 4-6
CDF of Sarahs Data

Probability

1
.90
.80
.70
.60
.50
.40
.30
.20
.10
94

96

98

100 102 104 106 108

Pieces of Chicken Consumed

An S-shaped curve can be smoothed through the flats since any whole number in this range would
be possible as shown in the graph below:
Figure 4-7
CDF of Sarahs Data With Smoothed Curve

Probability

1
.90
.80
.70
.60
.50
.40
.30
.20
.10
94

96

98

100 102 104 106 108

Pieces of Chicken Consumed


Once the decision maker has formulated his or her probability distribution and verified its
points relevance by viewing the graph and remembering the history surrounding each of the data
points, he or she can use it as they would other distributions and apply the fractile technique.

66

Theoretical Probability Distributions



It is not always practical or possible to subjectively assess probability distributions.
Sometimes, a manager may not have any information whatsoever with which to assign fractiles,
or may find the range is too broad. In such cases, a theoretical approach can be substituted for the
subjective one.
Binomial Distribution
A binomial distribution is the distribution of the number of successes in a string of Bernoulli
Trials. A Bernoulli Trial is an event that has two possible outcomes: one a success, the other a
failure. The probability of success is written as P(S) = P. Since the only other option is failure,
the probability of failure P(F) = 1-P so that P(S) + P(F) = 1 (or P + 1 - P). Flipping a coin is one
example of a Bernoulli Trial: There are only two outcomes. Suppose that tossing a head would
be designated a success and tossing a tail a failure. If the coin is fair, P(S) = .5 and P(F) = 1-.5 or
.5. If Jennifer has 3 chances to roll a die, and will receive $1 for each time she rolls a 3, her P(S)
for each individual attempt = 1/6 and P(F) = 1 - 1/6 = 5/6. Jennifer is interested in the probability
distribution of the amount she could win with her three rolls of the die. The probability of rolling
a three all three times (denoted (S,S,S)), is:
P(S) x P(S) x P(S) = 1/6 x 1/6 x 1/6 = 1/216
There are three sequences in which Jennifer can roll two 3s. They are: (S,S,F), (S,F,S), and
(F,S,S). The probabilities of these three scenarios are:
(1/6 x 1/6 x 5/6) + (1/6 x 5/6 x 1/6) + (5/6 x 1/6 x 1/6) = 15/216
There are also three sequences in which Jennifer can roll one 3: they are (S,F,F), (F,S,F), and (F,F,S).
The probability of these three scenarios are:
(1/6 x 5/6 x 5/6) + (5/6 x 1/6 x 5/6) + (5/6 x 5/6 x 1/6) = 75/216
Finally, there is the possibility that Jennifer has no success:
(F,F,F) = 5/6 x 5/6 x 5/6 = 125/216

67

Jennifers probability distribution is shown in the table below:


Table 4-4
Probability of A Certain Number of Successes on Three Rolls
S(# of Successes)
0
1
2
3

Probability
125/216
.58
75/216
.35
15/216
.07
1/216
.005


If the number of times Jennifer was able to roll the die increased, the number of scenarios
would increase greatly. For example, the possibility of two successes grows from three scenarios
in three trials to six scenarios in four trials: (S,S,F,F), (S,F,S,F), (S,F,F,S), (F,S,S,F), (F,S,F,S), and
(F,F,S,S).

There are formulas beyond the scope of this text that define this relationship, but we have
included a table for you in Appendix B that will help you determine the probability of a number
of successes for many different individual probabilities in a number of Bernoulli Trials. To use it,
calculate the probability of an individual success and find the column heading that contains it in
Appendix B. Then find the row that contains the corresponding number of successes (r) and trials
(n), and, in the box where the column and row intersect, you will find the probability of that number
of successes in that number of trials given the individual chance of success during each trial. For
instance, if we know that the probability of an individual success is .166 (1/6), and want to know
the probability of two (r) successes in four (n) trials, we would find the column that contains .16,
and the row where n = 4 and r = 2, and the probability value in the box where they intersect is .115.
This table is very useful in business: Say that a product manager wants to know how successful
she will be over the next year so that she can determine her annual bonus. For one of her products,
senior management defines success as sales greater than $3,000 in one month. She finds out that
at current sales levels each month there is a 70% probability of having sales exceed $3,000, so
the manager selects .7 as the individual probability of success. She can determine a probability
distribution for her sales by using 12 months for n (the number of trials) and 0 to 12 for r (the
number of successes). The distribution is shown in the table below, determined by multiplying the
probability of individual success by the probability of r successes in 12 trials, as found in the table
in Appendix B:

68

Table 4-5
Probability of Monthly Sales Over $3,000 For a Year
# of Successes
0
1
2
3
4
5
6
7
8
9
10
11
12

Probability of Sales>
$3,000
0+
0+
0+
.001
.008
.029
.079
.159
.231
.240
.168
.071
.014


Note that what is being given in this table is the probability of exactly r successes. The
probability of five successes in the table is not telling us the probability of at least five successes or
including the probability of everything over five successes.
Normal Distribution


The probability distribution for a number of random variables is a bell-shaped curve called
the normal curve. This distribution is probably the widest known and most useful distribution in
statistics. A normal distribution is best used when each data point represents a sum or average
of some other bits of data, in order to allow the user to infer something about this greater whole.
Suppose we were a quality control inspector collecting data on a filling machine at a soda bottling
plant. To get an indication of how accurate the machine is, we took several seven-bottle samples
off the line and weighed each bottle. Each sample of seven has the following average weight per
bottle (in fluid ounces):

69

Table 4-6
Sample Test Results of Soda Filling Machine
Sample
1
2
3
4
5
6
7
8
9
10

Average Weight
11.8
12.1
12.2
12.0
11.9
12.0
12.1
12.0
11.9
12.0


Even though we would like more than ten data points if we could get them, we can still graph
the points to see if the data resembles a bell-shaped curve:
Figure 4-8
Graph of Filling Machine Sample Weights
4
3
Samples
2
1
11.8

11.9

12.0

12.1

12.2

Sample Weight (Fluid Ounces)


A normal distribution can be described by its mean and standard deviation. The mean, as we
said earlier in the chapter, gives a measure of the average magnitude of the data being studied. The
standard deviation, symbolized by the Greek letter sigma ( ), and the variance, which is the square of
the standard deviation ( 2), are measures of spread or dispersion. We studied deviations of forecasts
from actual results in the latter part of Chapter 3, and the standard deviation is very similar. It is defined
as the square root of the sum of the squared differences from the mean multiplied by the probability of

70

their outcomes. Thats quite a few words, and it would probably be easier to demonstrate it. Using
the information supplied in Table 4-1 from the Jacks Used Cars problem, we could calculate the
standard deviation of expected car sales like this:
{(3.4-0)2 x.06} + {(3.4-1)2 x.11} + {(3.4-2)2 x.15} + {(3.4-3)2 x.20} + {(3.4-4)2 x.19}
+ {(3.4-5)2 x.15} + {(3.4-6)2 x.10} + {(3.4-7)2 x.04}
= 3.3 = 1.817

As can be seen above, we have taken the mean amount of sales we expect next week (3.4 cars/week
-- though Jack cannot sell a fraction of a car, we are using the concept of the mean developed in
Chapter 2) and subtracted each possible different sales level (or outcome) from it, then squared the
result and multiplied it by the probability attached to the outcome we subtracted. Well, it has been
calculated, but what does it mean? As we told you above, the standard deviation tells us how spread
out our distribution of possible outcomes is. The larger the standard deviation, the more spread
out the distribution. If our standard deviation was very small, on the other hand, we would be
fairly certain about what the expected outcome would be (in this case, how many cars we thought
we were going to sell next week), because it would mean that most of the probability would be
surrounding several outcomes that were relatively close to each other.

The standard deviation comes in especially handy when we use it with a normal distribution.
With the two together, we can tell what the probability is of future data being greater than a particular
amount or within a particular range. The area under the normal distribution curve totals one: this
indicates that there is a 100% probability that all data points will fall somewhere inside it. Once you
have the standard deviation of the data points around the mean, you can determine the probability
that new data points will fall above or below a certain point, or within a range between two points.
Probabilities are determined by measuring the difference between the mean and the amount in
question and dividing by the standard deviation, giving the number of standard deviations between
the point and the mean, or what is called a z-score. Once you have the z-score, you can go to a
table like the one given in Appendix C to find the probability that your next data point is that size
or smaller or that size or larger. The z-score is determined by the following formula:
z=

x-
o

where x is the point around which you want to determine the probability,
is the mean,
and o is the standard deviation

71

After you have determined the z-score, look up its absolute value in Appendix C. The row headings
give z-scores to the tenth, and to find the hundredth, look on the column headings, or if hundredths
are not important, just look under the first column, which is zero. Then match up the row and
column and find the box where they intersect. The number in that box is the probability that the
next data sample will be that point or less (if that point is below the mean) or that point or greater
(if it is above the mean). In a normal distribution, approximately 68% of previous data samples
as well as future ones (assuming nothing has significantly changed) will be within one standard
deviation on each side of the mean, 95% will be within two standard deviations, and 99.7% within
three standard deviations.

For example, suppose we have a normally distributed population with a mean age of 50 years
and a standard deviation of 20 years and are interested in the probability that the next data sample
is aged 40 or less (the shaded region in the following graph).
Figure 4-9
Population Distribution

40

50

To find out what the probability of this is, we must determine how many standard deviations it is
from 40 to 50.
z = 40 - 50 = -10 = -.5
20
20

Looking up .5 in the table in Appendix C gives a probability of approximately .3085. This is
our answer: There is a 30.85% probability that the average age of our next data sample is 40 or less.
We would get the same answer if we wanted to know what the probability of our next data sample
being 60 or more was, since it is the same number of standard deviations from 40 to 50 as it is from
50 to 60. If we wanted to know the probability of getting a sample whose age is 40 or older, we
would simply subtract the .3085 from 1 (we use 1 because we are 100% sure that the next sample,
if it is not aged 40, will either be younger or older than 40).

72


We could also find the probability that the next sample would be between 30 and 40. We
first must determine how many standard deviations away 30 is from the mean of 50:
30 - 50 = -1
z=
(a
negative z-score means a percentage of standard deviation below the mean; standard
20
deviation themselves are always positive). When you look up the z-score in the Appendix C table
(most z-tables only give positive z-scores, because the percentages do not change from positive to
negative), it gives a probability of .1587 (there is a 15.87% probability that the next sample will be
aged 30 or less). Since we have already determined that the probability of the next sample being
aged 40 or less is .3085, to figure out the probability that the next sample is between 30 and 40
is simply a matter of subtracting .1587 (probability that the next sample is aged 30 or less) from
.3085 (probability that the next sample is aged 40 or less) to determine that the area under the curve
between 30 and 40 = .3085 - .1587 = .1498 (the shaded region in the following graph).
Figure 4-10
Population Distribution

30

40

50

If we are interested in determining what the probability that a number between 50 and 70 will occur,
70 - 50 = 1
we calculate z, ,which
gives a value of .1587 on the table. This is the probability that
20
the next sample is 70 or greater. The area under this part of the curve, which flattens as it moves
away from the mean, is called the tail. Since we know that in a normal distribution half of the
probabilities will fall on each side of the mean, we know that the probability that the next sample
is 50 (the mean) or greater is .5 and, since we already know the probability that the next sample
is 70 or greater is .1587, the probability that the next sample is between 50 and 70 is .5 - .1587, or
.3413. This is also the area under the curve between 50 and 70, shown shaded in the diagram on
the following page.

73

Figure 4-11
Population Distribution

50

70


Now we have several methods for measuring uncertainty when making decisions. We are
now also able to assign probabilities to the different outcomes that may result from an uncertainty.
The following chapter shows how to make decisions among several alternatives when there is
uncertainty about each outcome, and uses the methods of determining probability which you learned
to develop here.

74

APPENDIX A
AN INTRODUCTION TO SAMPLING

Success in business depends on being able to know the needs and demographics of both
your current and prospective customers. Unfortunately, many companies have far too many
customers and far too large of a target market to get information from everybody, and it would be a
lengthy and expensive process for them to both gather and process that information if they tried (if
you doubt that, just look at the U.S. Census). Fortunately for them, a quantitative technique exists
that if used properly will give them a good idea of what that information (be it an opinion, level
of usage or expenditure, or something else) might be for an entire market based on only a small
proportion of responses. Such a technique is referred to as sampling theory.

While individuals such as baseball statisticians are concerned solely with the organization
and presentation of data, referred to as descriptive statistics, those who use sampling theory are
attempting to infer something about a greater whole beyond the data at hand. Because of this,
sampling theory is part of a body called inferential statistics. Sampling theory makes use of the
normal distribution, but in a somewhat different way than you learned in the previous chapter.
While before, you had the mean and standard deviation and could find out what the probability
was of the next data sample point being above or below a certain point or within a certain range,
with sampling theory all you have is a collection of random points and you have to infer from them
what the mean is of the entire group in which you are interested. Thus, it is the opposite of what
we did in this past chapter: there, you had the mean but you did not know what the next points were
going to be, while here you have just those points and do not know what the mean is. In its basic
form, sampling theory states that, as long as you take data randomly selected from the group you
are targeting and count the number of data points you have collected, you can calculate with some
certainty a range within which lies the mean for the entire target group.

In order to proceed further with sampling, some terminology is helpful. The sample is the
collective term for the data points you are gathering: if you wanted to find the mean shoe size of
students in a computer tutorial at a community college by asking seven people in that class, your
sample would be the collection of seven pieces of data (shoe sizes) and your sample size would be
seven. Note that some texts use sample to mean each data point and others use it for the group
of all the data points you collect. We will use the latter definition. The population is the total
collection of objects or people to be studied, from which a sample is to be drawn: in this example,
it would be the total number of students in the computer tutorial. The sample mean is the mean
value of the characteristic we are studying in all of our samples, determined using the formula for
the mean given in Chapter 2, while the population mean (sometimes called the true mean) is the
mean value of that characteristic in all of the objects or people in the population:

75

we are, however, unsure of its value, which is why we take a sample from which we can determine
a general range in which it lies.

Remember in the past chapter how our standard deviation showed the dispersion of data
points around the mean of our normal distribution? The scatter of points could be very close to the
mean, which would be reflected by a low standard deviation, or very spread out from the mean,
which would be shown by a very high one. With sampling theory, while we can build our sample
mean easily and accurately, we are usually unsure where the true mean lies, which is reflected by
the range of possible values around our sample mean, any of which may be the true mean. This
level of uncertainty around the sample mean is called the standard error, and it is measured in
standard deviations from the mean just like the normal distribution is, though it is derived somewhat
differently than the standard deviation. Generally, as the number of samples we have increases, the
range around our sample mean in which we believe our true mean lies becomes tighter, and thus
our standard error becomes lower. One can demonstrate why this is so fairly simply: suppose we
had a jar with thirty red marbles, thirty white marbles and thirty blue ones, but we did not know
this and could not see in the jar to count them. Now suppose we picked two marbles with which
to base our estimate of the population; one was red and the other was blue. Based on this, we
would wrongly conclude that half of the marbles in the jar were red and the other half were blue.
If we took increasingly larger samples out of the jar (while putting our old samples back in the jar),
however, our estimate of the true population would get better, and by the time we took a sample that
included all of the marbles, it would be perfect.

The standard error is determined by this formula:

o the standard deviation of the population, and


Where is
n is the sample size.

The standard error is in the same units as the sample mean, which makes it easy to define the
range within which our true mean lies: just add or subtract the standard error from the mean like
you did with the standard deviation in the past chapter. As long as the sample is random and large
enough, we can also assume the range of values that could be the true mean is normally distributed
around our sample mean, even if the entire range of values that make up the population is not. If
you are dealing with a relatively small sample (generally fewer than 120 data points or, for small
populations, under 10% of the population size), your derivation of the standard deviation will be
slightly different, and a special distribution called the t-distribution should be used instead of the
normal. Estimating population means with small samples is beyond the scope of this text; please
consult another statistics reference in order to see how to do this.

76


While the sample mean is considered the most likely value for the true mean, there is about a
68% probability that the true mean lies within one standard error on each side of the sample mean,
slightly greater than a 95% probability that it lies within two standard errors, about a 99.7% chance
that it lies within three, and close to a 100% probability that it lies within four. As we have said
before and as you can see from the formula above, as the sample size grows, the denominator gets
larger, which usually has the effect of lowering the standard error.

As an example of using sampling theory to estimate a true mean, say that we manage a soft
drink company and are about to test market a new product, Cactus Cola, in Tucson, Arizona. It is
July, and we would like to get a sense of the numbers of cans of soft drinks purchased by families
of four or more in June. We could ask every family in Tucson, but this would take too much time
and money and we want to run our test before the summer ends. Instead, we poll one hundred and
fifty families selected randomly throughout the Tucson area. A portion of the data we receive from
them is found in Table 4A-1.
Table 4A-1
Soft Drink Purchases in June
Family

Cans Purchased

1
2
3
4

25
11
19
12

146
147
148
149
150

32
10
47
20
37


The first thing we want to do is calculate the mean soft drink purchases from our hundred
and fifty family sample:
25 + 11 + 19 + 12 + 7 +...+32 + 10 + 17 + 20 + 37
150
= 19 cans

77


We know that the hundred and fifty families in our sample purchased on average 19 cans of
soft drinks in June, but we wonder how representative this is of all the families in Tucson, especially
since there are thousands of families of four in the area, and we have only surveyed a hundred and
fifty.

Looking at the calculations required for the standard error formula above, we find that we
need to determine the standard deviation of cans purchased. We know from the previous chapter
that the standard deviation is determined by squaring the differences between our mean of 19 cans
(X) and each of the individual pieces of data (X), then multiplying each of these by the individual
data points probabilities of occurring, summing these totals and taking their square root. Since we
have no way to determine which data point is closest to the true mean, we assign probabilities of
.0067 (1/150 or .67%) to each of the points. The calculation of standard deviation follows:
Table 4A-2
Calculation of Standard Deviation for Tucson Sample
Individual Data
Points: Cans
Purchased (X)

Difference from the Squared Difference


Mean of 19 Cans (X
(X - X)2
- X)

Squared Difference
Divided by 150 (#
of Data Points)

25
11
19
12

-6
8
0
7

36
64
0
49

0.24
0.427
0
0.327

32
10
47
20
37

-13
9
28
-1
-18

169
81
784
1
324

1.127
0.54
5.227
0.007
2.16

Sum of the far right column for all 150 data points = 600.25

Standard deviation =

24.5

600.25

Thus, we know that the standard deviation of our sample is 24.5 cans and the sample size is 150.
Putting those figures into the standard error formula, you get an answer of:
24.5
150

2 cans

78


Once we know the sample mean and the standard error, we can build interval estimates
(often called confidence intervals) which show how confident we are that the true mean is within
a certain range around the sample mean. We can do this by treating the standard error of a sample
mean the same way we treated the standard deviation of a normal distribution in the past chapter:
there is about a 68% probability that the population mean is within one standard error on each side
of the sample mean, approximately a 95% probability that it is within two standard errors on each
side, etc. Often, our assessment is couched in terms like these: We are 95% confident that the true
mean is between x and y or Our 95% confidence interval is between x and y. Using the standard
error of 2 cans in our above example, we can say that we are approximately 68% confident that the
true mean of soda consumption is between 17 and 21 cans, or one standard error on each side of the
sample mean (19 + 2), and slightly more than 95% confident that the true mean is between 15 and
23 cans, or two standard errors on each side of the sample mean [19 + (2 x 2)].

If we were to increase our sample size to 1500, however, and the standard deviation of the
sample did not change, our new standard error would be:
24.5
1500

633 cans


As you can see, our standard error has gotten quite a bit smaller, and thus the range of
probable values of the true mean around the sample mean has tightened up considerably. Now, we
are approximately 68% confident that the true mean is between 18.367 and 19.633 cans (19 + .633),
and slightly over 95% confident that it is between 17.734 and 20.266 cans [19 + (.633 x 2)]. As a
result, there is much less uncertainty concerning what our true mean is.

While this introduction to sampling has been mainly concerned with what you do with the
sampling data after you have gathered it, it is also important to note that there are crucial things that
you should watch out for when actually gathering that data. The highest priority is a completely
random sample of the population, and sometimes you may think you have one when truly you do
not. For example, suppose that we were trying to see what Americans think about pipe smoking.
We send a survey to 10,000 people to try to determine this. Unfortunately, the list of people to
whom we send the survey was purchased from Pipe Smoker magazine, and all of the people on that
list are subscribers to that magazine. Of course, most if not all of the respondents to our survey
would indicate that they enjoyed pipe smoking, and we might infer from this that everyone in the
country liked it as well, which is a questionable result. You may find this a silly example, but
there have been numerous examples throughout history that demonstrate nonrandom samples: for
instance, phone surveys of voters often do not result in a random sample of the voting population.
This was particularly true years ago when many people did not own phones; the sample was not
random precisely because it excluded non-phone owning voters. The way to avoid this and conduct
a random sample is to consider all of the parameters in your sampling technique (who you are
asking, and how, when and where you are asking them), separate out those which might detract
from its randomness, and change them.

79


Another important thing to consider which is often overlooked is to make sure all of the
sample data points you take belong to the population whose mean you are trying to estimate.
For instance, if you are conducting a sample of female homeowners, you obviously do not want
to include any males in your sample. Also important is that you try not to infer a characteristic
from your data about anything or anyone beyond the population you are estimating, because your
random sample is representative of only that population. For example, from your sample of female
homeowners, you should not try to infer anything about all homeowners in America. If you want to
say anything about all homeowners, you would have to conduct another random sample featuring
female and male homeowners

80

APPENDIX B


(To use this table, locate the number of trials you are conducting in the far left-hand column, and then find the
corresponding number to successes you would like to achieve in the column to its right. Then match the row that contains both
these figures with the column that contains the probability of success for each individual trial you are conducting; the box where
they intersect is the probability of exactly that number of successes.)


of

#
#
of
Trials Successes
0.1

0.166

Probability of Individual Success


0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.90000 0.83400 0.80000 0.70000 0.60000 0.50000 0.40000 0.30000 0.20000 0.10000

0.10000 0.16600 0.20000 0.30000 0.40000 0.50000 0.60000 0.70000 0.80000 0.90000

2
2
2

0
1
2

0.81000 0.69556 0.64000 0.49000 0.36000 0.25000 0.16000 0.09000 0.04000 0.01000
0.18000 0.27689 0.32000 0.42000 0.48000 0.50000 0.48000 0.42000 0.32000 0.18000
0.01000 0.02756 0.04000 0.09000 0.16000 0.25000 0.36000 0.49000 0.64000 0.81000

3
3
3
3

0
1
2
3

0.72900
0.24300
0.02700
0.00100

0.58009
0.34639
0.06895
0.00457

0.51200
0.38400
0.09600
0.00800

0.34300
0.44100
0.18900
0.02700

0.21600
0.43200
0.28800
0.06400

0.12500
0.37500
0.37500
0.12500

0.06400
0.28800
0.43200
0.21600

0.02700
0.18900
0.44100
0.34300

0.00800
0.09600
0.38400
0.51200

0.00100
0.02700
0.24300
0.72900

4
4
4
4
4

0
1
2
3
4

0.65610
0.29160
0.04860
0.00360
0.00010

0.48380
0.38518
0.11500
0.01526
0.00076

0.40960
0.40960
0.15360
0.02560
0.00160

0.24010
0.41160
0.26460
0.07560
0.00810

0.12960
0.34560
0.34560
0.15360
0.02560

0.06250
0.25000
0.37500
0.25000
0.06250

0.02560
0.15360
0.34560
0.34560
0.12960

0.00810
0.07560
0.26460
0.41160
0.24010

0.00160
0.02560
0.15360
0.40960
0.40960

0.00010
0.00360
0.04860
0.29160
0.65610

5
5
5
5
5
5

0
1
2
3
4
5

0.59049
0.32805
0.07290
0.00810
0.00045
0.00001

0.40349
0.40155
0.15985
0.03182
0.00317
0.00013

0.32768
0.40960
0.20480
0.05120
0.00640
0.00032

0.16807
0.36015
0.30870
0.13230
0.02835
0.00243

0.07776
0.25920
0.34560
0.23040
0.07680
0.01024

0.03125
0.15625
0.31250
0.31250
0.15625
0.03125

0.01024
0.07680
0.23040
0.34560
0.25920
0.07776

0.00234
0.02835
0.13230
0.30870
0.36015
0.16807

0.00032
0.00640
0.05120
0.20480
0.40960
0.32768

0.00001
0.00045
0.00810
0.07290
0.32805
0.59049

6
6
6
6
6
6
6

0
1
2
3
4
5
6

0.53144
0.35429
0.09842
0.01458
0.00122
0.00005
0.00000

0.33651
0.40187
0.19997
0.05307
0.00792
0.00063
0.00002

0.26214
0.39322
0.24576
0.08192
0.01536
0.00154
0.00006

0.11765
0.30253
0.32414
0.18522
0.05954
0.01021
0.00073

0.04666
0.18662
0.31104
0.27648
0.13824
0.03686
0.00410

0.01563
0.09375
0.23438
0.31250
0.23438
0.09375
0.01563

0.00410
0.03686
0.13824
0.27648
0.31104
0.18662
0.04666

0.00073
0.01021
0.05954
0.18522
0.32414
0.30253
0.11765

0.00006
0.00154
0.01536
0.08192
0.24576
0.39322
0.26214

0.00000
0.00005
0.00122
0.01458
0.09841
0.35429
0.53144

81

APPENDIX B - BINOMIAL DISTRIBUTION PROBABILITY TABLE (CONTINUED)


#
#
of
of
Trials Successes

Probability of Individual Success

0.1

0.166

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

7
7
7
7
7
7
7
7

0
1
2
3
4
5
6
7

0.47830
0.37201
0.12400
0.02296
0.00255
0.00017
0.00001
0.00000

0.28065
0.39102
0.23349
0.07746
0.01542
0.00184
0.00012
0.00000

0.20972
0.36700
0.27525
0.11469
0.02867
0.00430
0.00036
0.00001

0.08235
0.24706
0.31765
0.22689
0.09724
0.02500
0.00357
0.00022

0.02799
0.13064
0.26127
0.29030
0.19354
0.07741
0.01720
0.00164

0.00781
0.05469
0.16406
0.27344
0.27344
0.16406
0.05469
0.00781

0.00164
0.01720
0.07741
0.19354
0.29030
0.26127
0.13064
0.02799

0.00022
0.00357
0.02500
0.09724
0.22689
0.31765
0.24706
0.08235

0.00001
0.00036
0.00430
0.02867
0.11469
0.27525
0.36700
0.20972

0.00000
0.00001
0.00017
0.00255
0.02296
0.12400
0.37201
0.47830

8
8
8
8
8
8
8
8
8

0
1
2
3
4
5
6
7
8

0.43047
0.38264
0.14880
0.03307
0.00459
0.00041
0.00002
0.00000
0.00000

0.23406
0.37270
0.25964
0.10336
0.02572
0.00409
0.00041
0.00002
0.00000

0.16777
0.33554
0.29360
0.14680
0.04588
0.00918
0.00115
0.00008
0.00000

0.05765
0.19765
0.29648
0.25412
0.13614
0.04668
0.01000
0.00122
0.00007

0.01680
0.08958
0.20902
0.27869
0.23224
0.12386
0.04129
0.00786
0.00066

0.00391
0.03125
0.10938
0.21875
0.27344
0.21875
0.10938
0.03125
0.00391

0.00066
0.00786
0.04129
0.12386
0.23224
0.27869
0.20902
0.08958
0.01680

0.00007
0.00122
0.01000
0.04668
0.13614
0.25412
0.29648
0.19765
0.05765

0.00000
0.00008
0.00115
0.00918
0.04588
0.14680
0.29360
0.33554
0.16777

0.00000
0.00000
0.00002
0.00041
0.00459
0.03307
0.14880
0.38264
0.43047

9
9
9
9
9
9
9
9
9
9

0
1
2
3
4
5
6
7
8
9

0.38742
0.38742
0.17219
0.04464
0.00744
0.00083
0.00006
0.00000
0.00000
0.00000

0.19521
0.34969
0.27841
0.12930
0.03860
0.00768
0.00102
0.00009
0.00000
0.00000

0.13422
0.30199
0.30199
0.17616
0.06606
0.01652
0.00275
0.00029
0.00002
0.00000

0.04035
0.15565
0.26683
0.26683
0.17153
0.07351
0.02100
0.00386
0.00041
0.00002

0.01008
0.06047
0.16124
0.25082
0.25082
0.16722
0.07432
0.02123
0.00354
0.00026

0.00195
0.01758
0.07031
0.16406
0.24609
0.24609
0.16406
0.07031
0.01758
0.00195

0.00026
0.00354
0.02123
0.07432
0.16722
0.25082
0.25082
0.16124
0.06047
0.01008

0.00002
0.00041
0.00386
0.02100
0.07351
0.17153
0.26683
0.26683
0.15565
0.04035

0.00000
0.00002
0.00029
0.00275
0.01652
0.06606
0.17616
0.30199
0.30199
0.13422

0.00000
0.00000
0.00000
0.00006
0.00083
0.00744
0.04464
0.17219
0.38742
0.38742

10
10
10
10
10
10
10
10
10
10
10

0
1
2
3
4
5
6
7
8
9
10

0.34868
0.38742
0.19371
0.05740
0.01116
0.00149
0.00014
0.00001
0.00000
0.00000
0.00000

0.16280
0.32404
0.29024
0.15405
0.05366
0.01282
0.00213
0.00024
0.00002
0.00000
0.00000

0.10737
0.26844
0.30199
0.20133
0.08808
0.02642
0.00551
0.00079
0.00007
0.00000
0.00000

0.02825
0.12106
0.23347
0.26683
0.20012
0.10292
0.03676
0.00900
0.00145
0.00014
0.00001

0.00605
0.04031
0.12093
0.21499
0.25082
0.20066
0.11148
0.04247
0.01062
0.00157
0.00010

0.00098
0.00977
0.04395
0.11719
0.20508
0.24609
0.20508
0.11719
0.04395
0.00977
0.00098

0.00010
0.00157
0.01062
0.04247
0.11148
0.20066
0.25082
0.21499
0.12093
0.04031
0.00605

0.00001
0.00014
0.00145
0.00900
0.03676
0.10292
0.20012
0.26683
0.23347
0.12106
0.02825

0.00000
0.00000
0.00007
0.00079
0.00551
0.02642
0.08808
0.20133
0.30199
0.26844
0.10737

0.00000
0.00000
0.00000
0.00001
0.00014
0.00149
0.01116
0.05740
0.19371
0.38742
0.34868

82

APPENDIX B - BINOMIAL DISTRIBUTION PROBABILITY TABLE (CONTINUED)

#


r

#
of
of
Trials Successes
0.1

0.166

Probability of Individual Success


0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

11
11
11
11
11
11
11
11
11
11
11
11

0
1
2
3
4
5
6
7
8
9
10
11

0.31381
0.38355
0.21308
0.07103
0.01578
0.00246
0.00027
0.00002
0.00000
0.00000
0.00000
0.00000

0.13578
0.29728
0.29585
0.17666
0.07032
0.01960
0.00390
0.00055
0.00006
0.00000
0.00000
0.00000

0.08590 0.01977 0.00363 0.00049 0.00004 0.00000 0.00000 0.00000


0.23622 0.09322 0.02661 0.00537 0.00069 0.00005 0.00000 0.00000
0.29528 0.19975 0.08868 0.02686 0.00519 0.00053 0.00002 0.00000
0.22146 0.25682 0.17737 0.08057 0.02336 0.00371 0.00022 0.00000
0.11073 0.22013 0.23649 0.16113 0.07007 0.01733 0.00173 0.00002
0.03876 0.13208 0.22072 0.22559 0.14715 0.05661 0.00969 0.00027
0.00969 0.05661 0.14715 0.22559 0.22072 0.13208 0.03876 0.00246
0.00173 0.01733 0.07007 0.16113 0.23649 0.22013 0.11073 0.01578
0.00022 0.00371 0.02336 0.08057 0.17737 0.25682 0.22146 0.07103
0.00002 0.00053 0.00519 0.02686 0.08868 0.19975 0.29528 0.21308
0.00000 0.00005 0.00069 0.00537 0.02661 0.09322 0.23622 0.38355
0.00000 0.00000 0.00004 0.00049 0.00363 0.01977 0.08590 0.31381

12
12
12
12
12
12
12
12
12
12
12
12
12

0
1
2
3
4
5
6
7
8
9
10
11
12

0.28243
0.37657
0.23013
0.08523
0.02131
0.00379
0.00049
0.00005
0.00000
0.00000
0.00000
0.00000
0.00000

0.11324
0.27047
0.29609
0.19645
0.08798
0.02802
0.00651
0.00111
0.00014
0.00001
0.00000
0.00000
0.00000

0.06872
0.20616
0.28347
0.23622
0.13288
0.05315
0.01550
0.00332
0.00052
0.00006
0.00000
0.00000
0.00000

13
13
13
13
13
13
13
13
13
13
13
13
13
13

0
1
2
3
4
5
6
7
8
9
10
11
12
13

0.25419
0.36716
0.24477
0.09972
0.02770
0.00554
0.00082
0.00009
0.00001
0.00000
0.00000
0.00000
0.00000
0.00000

0.09444
0.24437
0.29183
0.21299
0.10598
0.03797
0.01008
0.00201
0.00030
0.00003
0.00000
0.00000
0.00000
0.00000

0.05498 0.00969 0.00131 0.00012 0.00001 0.00000 0.00000 0.00000


0.17867 0.05398 0.01132 0.00159 0.00013 0.00000 0.00000 0.00000
0.26801 0.13881 0.04528 0.00952 0.00118 0.00007 0.00000 0.00000
0.24567 0.21813 0.11068 0.03491 0.00648 0.00058 0.00001 0.00000
0.15355 0.23371 0.18446 0.08728 0.02429 0.00338 0.00015 0.00000
0.06910 0.18029 0.22135 0.15710 0.06559 0.01419 0.00108 0.00001
0.02303 0.10302 0.19676 0.20947 0.13117 0.04415 0.00576 0.00009
0.00576 0.04415 0.13117 0.20947 0.19676 0.10302 0.02303 0.00082
0.00108 0.01419 0.06559 0.15710 0.22135 0.18029 0.06910 0.00554
0.00015 0.00338 0.02429 0.08728 0.18446 0.23371 0.15355 0.02770
0.00001 0.00058 0.00648 0.03491 0.11068 0.21813 0.24567 0.09972
0.00000 0.00007 0.00118 0.00952 0.04528 0.13881 0.26801 0.24477
0.00000 0.00000 0.00013 0.00159 0.01132 0.05398 0.17867 0.36716
0.00000 0.00000 0.00001 0.00012 0.00131 0.00969 0.05498 0.25419

0.01384
0.07118
0.16779
0.23970
0.23114
0.15850
0.07925
0.02911
0.00780
0.00149
0.00019
0.00001
0.00000

0.00218
0.01741
0.06385
0.14189
0.21284
0.22703
0.17658
0.10090
0.04204
0.01246
0.00249
0.00030
0.00002

83

0.00024
0.00293
0.01611
0.05371
0.12085
0.19336
0.22559
0.19336
0.12085
0.05371
0.01611
0.00293
0.00024

0.00002
0.00030
0.00249
0.01246
0.04204
0.10090
0.17658
0.22703
0.21284
0.14189
0.06385
0.01741
0.00218

0.00000
0.00001
0.00019
0.00149
0.00780
0.02911
0.07925
0.15850
0.23114
0.23970
0.16779
0.07118
0.01384

0.00000
0.00000
0.00000
0.00006
0.00052
0.00332
0.01550
0.05315
0.13288
0.23622
0.28347
0.20616
0.06872

0.00000
0.00000
0.00000
0.00000
0.00000
0.00005
0.00049
0.00379
0.02131
0.08523
0.23013
0.37657
0.28243

APPENDIX B - BINOMIAL DISTRIBUTION PROBABILITY TABLE (CONTINUED)

#
#
of
of
Trials Successes
r
0.1
0.166

14
14
14
14
14
14
14
14
14
14
14
14
14
14
14

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14

0.22877
0.35586
0.25701
0.11423
0.03490
0.00776
0.00129
0.00016
0.00002
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000

15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

0.20589
0.34315
0.26690
0.12851
0.04284
0.01047
0.00194
0.00028
0.00003
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000

Probability of Individual Success


0.2

0.3

0.4

0.5

0.6

0.7

0.07876
0.21948
0.28396
0.22607
0.12374
0.04926
0.01471
0.00335
0.00058
0.00008
0.00001
0.00000
0.00000
0.00000
0.00000

0.04398
0.15393
0.25014
0.25014
0.17197
0.08599
0.03224
0.00921
0.00202
0.00034
0.00004
0.00000
0.00000
0.00000
0.00000

0.00678
0.04069
0.11336
0.19433
0.22903
0.19631
0.12620
0.06181
0.02318
0.00662
0.00142
0.00022
0.00002
0.00000
0.00000

0.00078
0.00731
0.03169
0.08452
0.15495
0.20660
0.20660
0.15741
0.09182
0.04081
0.01360
0.00330
0.00055
0.00006
0.00000

0.00006
0.00085
0.00555
0.02222
0.06110
0.12219
0.18329
0.20947
0.18329
0.12219
0.06110
0.02222
0.00555
0.00085
0.00006

0.00000
0.00006
0.00055
0.00330
0.01360
0.04081
0.09182
0.15741
0.20660
0.20660
0.15495
0.08452
0.03169
0.00731
0.00078

0.00000
0.00000
0.00002
0.00022
0.00142
0.00662
0.02318
0.06181
0.12620
0.19631
0.22903
0.19433
0.11336
0.04069
0.00678

0.00000 0.00000
0.00000 0.00000
0.00000 0.00000
0.00000 0.00000
0.00004 0.00000
0.00034 0.00000
0.00202 0.00002
0.00921 0.00016
0.03224 0.00129
0.08599 0.00776
0.17197 0.03490
0.25014 0.11423
0.25014 0.25071
0.15393 0.35586
0.04398 0.22877

0.06569
0.19612
0.27325
0.23568
0.14073
0.06162
0.02044
0.00523
0.00104
0.00016
0.00002
0.00000
0.00000
0.00000
0.00000
0.00000

0.03518
0.13194
0.23090
0.25014
0.18760
0.10318
0.04299
0.01382
0.00345
0.00067
0.00010
0.00001
0.00000
0.00000
0.00000
0.00000

0.00475
0.03052
0.09156
0.17004
0.21862
0.20613
0.14724
0.08113
0.03477
0.01159
0.00298
0.00058
0.00008
0.00001
0.00000
0.00000

0.00003
0.00470
0.02194
0.06339
0.12678
0.18594
0.20660
0.17708
0.11806
0.06121
0.02449
0.00742
0.00165
0.00025
0.00002
0.00000

0.00000
0.00046
0.00320
0.01389
0.04166
0.09164
0.15274
0.19638
0.19638
0.15274
0.09164
0.04166
0.01389
0.00320
0.00046
0.00003

0.00000
0.00002
0.00025
0.00165
0.00742
0.02449
0.06121
0.11806
0.17708
0.20660
0.18594
0.12678
0.06339
0.02194
0.00470
0.00047

0.00000
0.00000
0.00001
0.00008
0.00058
0.00298
0.01159
0.03477
0.08113
0.14724
0.20613
0.21862
0.17004
0.09156
0.03052
0.00475

0.00000 0.00000
0.00000 0.00000
0.00000 0.00000
0.00000 0.00000
0.00001 0.00000
0.00010 0.00000
0.00067 0.00000
0.00345 0.00003
0.01382 0.00028
0.04299 0.00194
0.10318 0.01047
0.18760 0.04284
0.25014 0.12851
0.23090 0.26690
0.13194 0.34315
0.03518 0.20589

84

0.8

0.9

APPENDIX B - BINOMIAL DISTRIBUTION PROBABILITY TABLE (CONTINUED)



#
#
of
of
Trials Successes
0.1
0.166
0.2

16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16
16

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

0.18530
0.32943
0.27452
0.14234
0.05140
0.01371
0.00279
0.00044
0.00006
0.00001
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000

0.05478
0.17447
0.26045
0.24192
0.15649
0.07476
0.02728
0.00776
0.00174
0.00031
0.00004
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000

17
17
17
17
17
17
17
17
17
17
17
17
17
17
17
17
17
17

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

0.16677
0.31501
0.28001
0.15556
0.06050
0.01748
0.00388
0.00068
0.00009
0.00001
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000

0.04569
0.15460
0.24618
0.24500
0.17067
0.08832
0.03516
0.01100
0.00274
0.00054
0.00009
0.00001
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000

Probability of Individual Success


0.3

0.4

0.5

0.6

0.02815
0.11259
0.21111
0.24629
0.20011
0.12007
0.05503
0.01965
0.00553
0.00123
0.00021
0.00003
0.00000
0.00000
0.00000
0.00000
0.00000

0.00332
0.02279
0.07325
0.14650
0.20405
0.20988
0.16490
0.10096
0.04868
0.01854
0.00556
0.00130
0.00023
0.00003
0.00000
0.00000
0.00000

0.00028
0.00301
0.01505
0.04681
0.10142
0.16227
0.19833
0.18889
0.14167
0.08395
0.03918
0.01425
0.00396
0.00081
0.00012
0.00001
0.00000

0.00002
0.00024
0.00183
0.00854
0.02777
0.06665
0.12219
0.17456
0.19638
0.17456
0.12219
0.06665
0.02777
0.00854
0.00183
0.00024
0.00002

0.00000
0.00001
0.00012
0.00081
0.00396
0.01425
0.03918
0.08395
0.14167
0.18889
0.19833
0.16227
0.10142
0.04681
0.01505
0.00301
0.00028

0.00000 0.00000
0.00000 0.00000
0.00000 0.00000
0.00003 0.00000
0.00023 0.00000
0.00130 0.00003
0.00556 0.00021
0.01854 0.00123
0.04868 0.00553
0.10096 0.01965
0.16490 0.05503
0.20988 0.12007
0.20405 0.20011
0.14650 0.24629
0.07325 0.21111
0.02279 0.11259
0.00332 0.02815

0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.00001
0.00006
0.00044
0.00279
0.01371
0.05140
0.14234
0.27452
0.32943
0.18530

0.02252
0.09570
0.19140
0.23925
0.20935
0.13608
0.06804
0.02673
0.00835
0.00209
0.00042
0.00007
0.00001
0.00000
0.00000
0.00000
0.00000
0.00000

0.00233
0.01695
0.05811
0.12452
0.18678
0.20813
0.17840
0.12014
0.06436
0.02758
0.00946
0.00258
0.00055
0.00009
0.00001
0.00000
0.00000
0.00000

0.00017
0.00192
0.01023
0.03410
0.07958
0.13793
0.18391
0.19267
0.16056
0.10704
0.05709
0.02422
0.00807
0.00207
0.00039
0.00005
0.00000
0.00000

0.00001
0.00013
0.00104
0.00519
0.01816
0.04721
0.09442
0.14838
0.18547
0.18547
0.14838
0.09442
0.04721
0.01816
0.00519
0.00104
0.00013
0.00001

0.00000
0.00000
0.00005
0.00039
0.00207
0.00807
0.02422
0.05709
0.10704
0.16056
0.19267
0.18391
0.13793
0.07958
0.03410
0.01023
0.00192
0.00017

0.00000 0.00000
0.00000 0.00000
0.00000 0.00000
0.00001 0.00000
0.00009 0.00000
0.00055 0.00001
0.00258 0.00007
0.00946 0.00042
0.02758 0.00209
0.06436 0.00835
0.12014 0.02673
0.17840 0.06804
0.20813 0.13608
0.18678 0.20935
0.12452 0.23925
0.05811 0.19140
0.01695 0.09570
0.00233 0.02252

0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.00001
0.00009
0.00068
0.00388
0.01748
0.06050
0.15556
0.28001
0.31501
0.16677

85

0.7

0.8

0.9

APPENDIX B - BINOMIAL DISTRIBUTION PROBABILITY TABLE (CONTINUED)

#
#
of
of
Trials Successes
0.1
0.166
0.2

Probability of Individual Success


0.3

0.4

0.5

0.6

0.7

0.8

0.9

18
18
18
18
18
18
18
18
18
18
18
18
18
18
18
18
18
18
18

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

0.15009 0.03811 0.01801


0.30019 0.13652 0.08106
0.28351 0.23097 0.17226
0.16801 0.24519 0.22968
0.07000 0.18301 0.21533
0.02178 0.10199 0.15073
0.00524 0.04399 0.08165
0.00100 0.01501 0.03499
0.00015 0.00411 0.01203
0.00002 0.00091 0.00334
0.00000 0.00016 0.00075
0.00000 0.00002 0.00014
0.00000 0.00000 0.00002
0.00000 0.00000 0.00000
0.00000 0.00000 0.00000
0.00000 0.00000 0.00000
0.00000 0.00000 0.00000
0.00000 0.00000 0.00000
0.00000 0.00000 0.00000

0.00163
0.01256
0.04576
0.10460
0.16810
0.20173
0.18732
0.13762
0.08110
0.03862
0.01490
0.00464
0.00116
0.00023
0.00004
0.00000
0.00000
0.00000
0.00000

0.00010
0.00122
0.00691
0.02455
0.06139
0.11459
0.16552
0.18916
0.17340
0.12844
0.07707
0.03737
0.01453
0.00447
0.00106
0.00019
0.00002
0.00000
0.00000

0.00000 0.00000 0.00000 0.00000


0.00007 0.00000 0.00000 0.00000
0.00058 0.00002 0.00000 0.00000
0.00311 0.00019 0.00000 0.00000
0.01167 0.00106 0.00004 0.00000
0.03268 0.00447 0.00023 0.00000
0.07082 0.01453 0.00116 0.00002
0.12140 0.03737 0.00464 0.00014
0.16692 0.07707 0.01490 0.00075
0.18547 0.12844 0.03862 0.00334
0.16692 0.17340 0.08110 0.01203
0.12140 0.18916 0.13762 0.03499
0.07082 0.16552 0.18732 0.08165
0.03268 0.11459 0.20173 0.15073
0.01167 0.06139 0.16810 0.21533
0.00311 0.02455 0.10460 0.22968
0.00058 0.00691 0.04576 0.17226
0.00007 0.00122 0.01256 0.08106
0.00000 0.00010 0.00163 0.01801

19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19
19

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

0.13509 0.03178 0.01441


0.28518 0.12019 0.06845
0.28518 0.21530 0.15402
0.17956 0.24283 0.21820
0.07980 0.19333 0.21820
0.02660 0.11544 0.16365
0.00690 0.05362 0.09546
0.00142 0.01982 0.04432
0.00024 0.00592 0.01662
0.00003 0.00144 0.00508
0.00000 0.00029 0.00127
0.00000 0.00005 0.00026
0.00000 0.00001 0.00004
0.00000 0.00000 0.00001
0.00000 0.00000 0.00000
0.00000 0.00000 0.00000
0.00000 0.00000 0.00000
0.00000 0.00000 0.00000
0.00000 0.00000 0.00000
0.00000 0.00000 0.00000

0.00114
0.00928
0.03580
0.08695
0.14905
0.19164
0.19164
0.15253
0.09805
0.05136
0.02201
0.00772
0.00221
0.00051
0.00009
0.00001
0.00000
0.00000
0.00000
0.00000

0.00006
0.00077
0.00463
0.01750
0.04665
0.09331
0.14515
0.17971
0.17971
0.14643
0.09762
0.05325
0.02366
0.00850
0.00243
0.00054
0.00009
0.00001
0.00000
0.00000

0.00000 0.00000 0.00000 0.00000 0.00000


0.00004 0.00000 0.00000 0.00000 0.00000
0.00033 0.00001 0.00000 0.00000 0.00000
0.00185 0.00009 0.00000 0.00000 0.00000
0.00739 0.00054 0.00001 0.00000 0.00000
0.02218 0.00243 0.00009 0.00000 0.00000
0.05175 0.00850 0.00051 0.00001 0.00000
0.09611 0.02366 0.00221 0.00004 0.00000
0.14416 0.05325 0.00772 0.00026 0.00000
0.17620 0.09762 0.02201 0.00127 0.00000
0.17620 0.14643 0.05136 0.00508 0.00003
0.14416 0.17971 0.09805 0.01662 0.00024
0.09611 0.17971 0.15253 0.04432 0.00142
0.05175 0.14515 0.19164 0.09546 0.00690
0.02218 0.09331 0.19164 0.16365 0.02660
0.00739 0.04665 0.14905 0.21820 0.07980
0.00185 0.01750 0.08695 0.21820 0.17956
0.00033 0.00463 0.03580 0.15402 0.28518
0.00004 0.00077 0.00928 0.06845 0.28518
0.00000 0.00006 0.00114 0.01441 0.13509

86

0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.00000
0.00002
0.00015
0.00100
0.00524
0.02178
0.07000
0.16801
0.28351
0.30019
0.15009

APPENDIX C
NORMAL DISTRIBUTION PROBABILITY TABLE

(To use this table, determine your z-score by taking the distance from the point you are trying to measure to the mean and
dividing it by the standard deviation of your distribution. Then locate the number on the left-hand row to the tenth place, and
the column that contains its hundreth decimal. The number in the box in which they intersect is the probability that the next
data sample will be less than the point you are measuring (if that point is below the mean), or greater than the point you are
measuring (if that point is above the mean).)

z-score

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3.0
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9

0.50000
0.46017
0.42074
0.38209
0.34458
0.30854
0.27425
0.24196
0.21186
0.18406
0.15866
0.13567
0.11507
0.09680
0.08076
0.06681
0.05480
0.04457
0.03593
0.02872
0.02275
0.01786
0.01390
0.01072
0.00820
0.00621
0.00466
0.00347
0.00256
0.00187
0.00135
0.00097
0.00069
0.00048
0.00034
0.00023
0.00016
0.00011
0.00007
0.00005

0.49601
0.45620
0.41683
0.37828
0.34090
0.30503
0.27093
0.23885
0.20897
0.18141
0.15625
0.13350
0.11314
0.09510
0.07927
0.06552
0.05370
0.04363
0.03515
0.02807
0.02222
0.01743
0.01355
0.01044
0.00798
0.00604
0.00453
0.00336
0.00248
0.00181
0.00131
0.00094
0.00066
0.00047
0.00032
0.00022
0.00015
0.00010
0.00007
0.00005

0.49202
0.45224
0.41294
0.37448
0.33724
0.30153
0.26763
0.23576
0.20611
0.17879
0.15386
0.13136
0.11123
0.09342
0.07780
0.06426
0.05262
0.04272
0.03438
0.02743
0.02169
0.01700
0.01321
0.01017
0.00776
0.00587
0.00440
0.00326
0.00240
0.00175
0.00126
0.00090
0.00064
0.00045
0.00031
0.00022
0.00015
0.00010
0.00007
0.00004

0.48803
0.44828
0.40905
0.37070
0.33360
0.29806
0.26435
0.23270
0.20327
0.17619
0.15151
0.12924
0.10935
0.09176
0.07636
0.06301
0.05155
0.04182
0.03362
0.02680
0.02118
0.01659
0.01287
0.00990
0.00755
0.00570
0.00427
0.00317
0.00233
0.00169
0.00122
0.00087
0.00062
0.00043
0.00030
0.00021
0.00014
0.00010
0.00006
0.00004

0.48405
0.44433
0.40517
0.36693
0.32997
0.29460
0.26109
0.22965
0.20045
0.17361
0.14917
0.12714
0.10749
0.09012
0.07493
0.06178
0.05050
0.04093
0.03288
0.02619
0.02068
0.01618
0.01255
0.00964
0.00734
0.00554
0.00415
0.00307
0.00226
0.00164
0.00118
0.00084
0.00060
0.00042
0.00029
0.00020
0.00014
0.00009
0.00006
0.00004

0.48006
0.44038
0.40129
0.36317
0.23636
0.29116
0.25785
0.22663
0.19766
0.17106
0.14686
0.12507
0.10565
0.08851
0.07353
0.06057
0.04947
0.04006
0.03216
0.02559
0.02018
0.01578
0.01222
0.00939
0.00714
0.00539
0.00402
0.00298
0.00219
0.00159
0.00114
0.00082
0.00058
0.00040
0.00028
0.00019
0.00013
0.00009
0.00006
0.00004

0.47608
0.43644
0.39743
0.35942
0.32276
0.28774
0.25463
0.22363
0.19489
0.16853
0.14457
0.12302
0.10383
0.08692
0.07215
0.05938
0.04846
0.03920
0.03144
0.02500
0.01970
0.01539
0.01191
0.00914
0.00695
0.00523
0.00391
0.00289
0.00212
0.00154
0.00111
0.00079
0.00056
0.00039
0.00027
0.00019
0.00013
0.00008
0.00006
0.00004

0.47210
0.43251
0.39358
0.35569
0.31918
0.28434
0.25143
0.22065
0.19215
0.16602
0.14231
0.12100
0.10204
0.08534
0.07078
0.05821
0.04746
0.03836
0.03074
0.02442
0.01923
0.01500
0.01160
0.00889
0.00676
0.00508
0.00379
0.00280
0.00205
0.00149
0.00107
0.00076
0.00054
0.00038
0.00026
0.00018
0.00012
0.00008
0.00005
0.00004

0.46812
0.42858
0.38974
0.35197
0.31561
0.28096
0.24825
0.21770
0.18943
0.16354
0.14007
0.11900
0.10027
0.08379
0.06944
0.05705
0.04648
0.03754
0.03005
0.02385
0.01876
0.01463
0.01130
0.00866
0.00657
0.00494
0.00368
0.00272
0.00199
0.00144
0.00104
0.00074
0.00052
0.00036
0.00025
0.00017
0.00012
0.00008
0.00005
0.00003

0.46414
0.42465
0.38591
0.34827
0.31207
0.27760
0.24510
0.21476
0.18673
0.16109
0.13786
0.11702
0.09853
0.08226
0.06811
0.05592
0.04551
0.03673
0.02938
0.02330
0.01831
0.01426
0.01101
0.00842
0.00639
0.00480
0.00357
0.00264
0.00193
0.00139
0.00100
0.00071
0.00050
0.00035
0.00024
0.00017
0.00011
0.00008
0.00005
0.00003

87

Learning Objectives
Chapter 5- Decision Analysis
After studying this chapter, you should be able to:
Describe a decision node.
Describe an event node.
Calculate expected monetary value.
Use the fractile technique to determine expected values.
Diagram a decision tree showing continuous uncertainty.

CHAPTER 5
DECISION ANALYSIS
INTRODUCTION

The previous chapter dealt with probabilities that certain events will occur, along with
theoretical ways to ascertain probabilities that cannot be readily determined. This section presents
an analytical approach to decision-making through incorporating probability measurement into a
larger decision framework; in other words, using what you have learned about probability to make
the best decision possible with the information you have.

People make decisions every day, ranging in complexity from what to eat for breakfast to
whether to fund the million dollar development cost of a new product. Intuition, instinct and
judgment are the tools most people use to make decisions, just like they did in the forecasting
chapter. For small, inconsequential decisions, these tools will suffice; the more important decisions
are better made using a systematic approach.
Decomposition

Decomposition, the process of breaking a decision down into manageable parts in order
to come to a conclusion, is integral to decision analysis. The process begins with defining the
problem, then analyzing the alternatives and considering the possible results of each alternative.
After all possible alternatives and their consequences have been studied, a decision can be made.

Mike is the procurement officer for a company that manufactures three products, A, B, and
C. He is in charge of purchasing the electrical power necessary to manufacture these products for
the coming month. Mike is able to buy power in two ways: with a one month, fixed-price contract
that allows the company unlimited power for $1,000/month, or on a unit basis at $2.50 per energy
unit. To perform a decomposition, Mike has to get an estimate of the number of each part his
company expects to produce and the power required to produce each one. Mike does some research
and compiles the information found in the table below.
Table 5-1
Power Procurement Data
Part
A
B
C

# to be Produced
10
25
50

Power to Produce a Single Unit


7 power units
5 power units
3 power units

88


By multiplying the expected production of each part by the number of power units needed to
produce that unit and then summing those power requirements ((7 x 10) + (5 x 25) + (3 x 50)), Mike
determines that he needs 345 power units for next months production. Mike has now defined the
problem, and his alternatives are given by the power company. His analysis of the alternatives and
their consequences (how much the company will have to pay for the power) shows that, under the
unit-basis, Mikes company will have to pay $2.50/unit x 345 units = $862.50, which is less than
the $1,000 fixed price. Therefore, Mike should probably buy power on a unit by unit basis.

In this example, all of Mikes alternatives were supplied by the power company, but maybe
there were others which he did not consider. For instance, perhaps Mike could have negotiated
another agreement with the power company or made use of an alternative energy source. Creativity
and sound judgment are required when developing a complete set of alternatives. One should be
especially careful, however, not to let overanalysis prevent action (so-called paralysis by analysis)
nor to waste too much time considering alternatives to fairly minor decisions, for the decision makers
time has a cost which may be greater than the difference between two overanalyzed alternatives.

An important alternative not considered in the power procurement example is the do nothing
alternative. This is often used when one or more of the alternative actions has a cost, but the value
of the perceived benefits is uncertain. For example, suppose Carl the Cabdriver is considering the
purchase of snow tires in preparation for a big winter storm which may or may not hit Sunnyville,
where Carl lives and works. The snow tires cost $300, are useful only for this storm, and have no
value after it. Thus, if the snowstorm misses Sunnyville, the tires will be worthless if he buys them,
but if it does reach Sunnyville, the benefits will be worth much more than $300, since Carl can make
$700 more than he normally would driving his cab around the town. Clearly, whether or not Carl
buys the snow tires depends on his assessment of the probability of the storm hitting Sunnyville.
In this situation the do nothing alternative will have a greater return ($0) than buying the tires
given the later scenario that the storm does not occur (-$300), and thus it should be considered as
a possible option. Sometimes, however, doing nothing is not a foreseeable option, such as in the
power procurement problem, where doing nothing might mean the plant would not have any power
for the month and could not produce anything, even if it could sell everything it produced.

89


After making an extensive list of alternatives, the decision maker must choose among them.
To do so requires first linking consequences with alternatives. In Mikes case it was easy: a cost of
$862.50 for paying by the unit basis versus one of $1,000 for paying with the fixed-price contract.
In Carl the Cabdrivers case it was more difficult: the consequences may be a cost of -$300 (if he
buys the tires and it does not snow), no cost (if he does not buy the tires and it does or does not
snow), or a profit of $400 (if he buys the tires for $300, it snows, and he makes $700 extra driving
his cab, $700 - $300 = $400). Consequences are best described by comparing levels of attributes
which they have in common; often in business, this is cost. Other attributes used to compare
alternatives include changes in profit and market share. If the attributes are incomparable, the
analysis is relatively useless and the decision maker may not be any better off for taking the time to
frame the decision. For example, if the consequences of one alternative is given as the percentage
increase in market share while the consequences of the other is given as larger profits, the decision
maker is forced to choose between apples and oranges. Often, a little extra effort will help put the
consequences in similar terms, which is worth it if the decision maker has already put a lot of work
into framing the decision.

When making a decision, some degree of uncertainty usually exists. Mike faced no uncertainty
since he was given the production numbers, but Carl certainly did. The monetary consequences of
the alternative Carl chose depended on something as uncertain as the weather. To predict this level
of uncertainty, one should use the analysis developed in the probability chapter, because attaching
probabilities to each consequence is a helpful step in the quantitative portion of decision-making.
Decision Modeling


More complex decisions can get unwieldly when broken down into their components. If
Mike had seven other power procurement alternatives besides the two given, it would be difficult
to keep track of those alternatives without a concise way to organize them. A conventional way
to keep things in order is the decision diagram or decision tree. A decision tree is a graphic
representation of a problem, the possible consequences and the attributes of each consequence.
Each alternative developed is portrayed as a branch of the tree. Branches are connected by either
open circles or squares. The circles, known as event nodes, represent points of uncertainty where
one of two or more things will happen out of the control of the decision maker. The squares,
known as decision nodes, show where a decision must be made among two or more alternatives.
Mikes decision tree would begin with a square where he has to decide between the two purchasing
methods and would look like the figure on the following page.

90

Figure 5-1
Mikes Decision Tree
Buy Per Unit
Buy Contract


Taking a slightly more complex example, lets introduce George, who is an oil wildcatter
with two plots of land, A and B, in the Permian Basin of Texas. George only has enough money to
drill one well, but he suspects there is oil under both parcels. Assuming he does not want to borrow
any money to drill on both plots, his decision tree would begin with a square with three branches,
(one for drilling on Plot A, the other for Plot B, and the last being to do nothing), and would look
like this:
Figure 5-2
Georges Decision Tree I
Drill A
Drill B
Do Not Drill


Connected to the two options that require drilling would be a circle with the consequences
Hit Oil or Dry Hole attached. There is no event node connected with the Do Not Drill
alternative: once George decides not to drill for oil, the decision has been made and there are
neither additional consequences nor more decisions to be made. You can signal this by putting a
vertical line at the end of the branch, such as in the diagram in Figure 5-3.
Figure 5-3
Geroges Decision Tree II
Hit Oil
Dry Hole
Hit Oil
Dry Hole

Drill A
Drill B
Do Not Drill

91


The alternatives depicted on the branches must be mutually exclusive (each alternative is
different from the others) and collectively exhaustive (all alternatives have been considered and
presented on the tree), two terms which we introduced in the previous chapter. Since George has
the funds to drill only one well, our decision tree above is complete. However, if George had the
funds to drill two wells, his tree would look like this:
Figure 5-4
Georges Decision Tree III
Drill A
Drill B
Drill A&B
Do Not Drill

Hit Oil
Dry Hole
Hit Oil
Dry Hole
A-Hit: B-Hit
A-Hit: B-Dry
A-Dry: B-Hit
A-Dry: B-Dry


The branches of the decision tree should flow in chronological order from left to right: the
decision maker acts, something happens, the decision maker reacts. In other words, the decision
maker should draw the diagram in the order he or she expects the events to unfold. For example,
George has to drill before he finds out whether Plots A or B will yield oil. Our decision tree should
follow this order. Often, determining what follows what can be a little tricky; for example, in
Georges case, there may have been oil on one or both of the plots millions of years before George
even considered drilling there. However, the events in the decision tree always unfold from left to
right chronologically from the eyes of the decision maker. George does not know if oil is present
until he drills.

Suppose we assume that if George hits oil on one of the plots, he has the option of drilling
on the other plot. One branch of his diagram would look like this:
Figure 5-5
Partial Decision Tree
Hit
Dry

Drill A

Drill B
Sell B

Hit
Dry


George, for whatever reason, does not and cannot make the decision to drill on Plot B until
the uncertainty regarding Plot A is resolved. This decision, in turn, is followed by an uncertain
event. Thus, action leads to an event which leads to another decision which leads to another
event.

92


Although chronology should not be tampered with, some flexibility and simplification is
possible. Two decision nodes, if not separated by an event node, can be combined, as in the
example below.
Figure 5-6
Combining Decision Nodes
Drill

Drill A
Drill B

Drill A
Drill B
Do Not Drill

Do Not Drill


Another issue is how far into the future the decision tree should go. If George kept hitting
oil and wanted to include the possibility of buying and drilling new plots of land in his diagram,
his decision tree would soon grow to a useless and unmanageable level. In Georges case, the
natural cutoff point seems to come after he has drilled on either of the two plots he currently
owns. As it is with determining the correct chronology, it is the responsibility of the builder of the
diagram to determine where to end a decision diagram. If a point exists beyond which decisions
or consequences have not been readily thought out, this should be the cutoff point for the diagram.
Otherwise, good judgment is necessary.
Evaluating Decision Diagrams


Constructing a decision tree parallels the process of decomposition: defining the problem,
listing alternatives, and matching alternatives to consequences. Evaluating the tree requires the
same process of assigning attributes to the different alternatives and their consequences, except this
is now in the form of assigning values to the various branches. In Mike the Power Procurers case,
there are two branches, one having a value of $862.50 and the other a value of $1,000.00. Because
this was a payment which the company would have to make, Mike chose the lower value, the lesser
payment.

As mentioned with decomposition, attributes are not always directly comparable. However,
by using techniques such as discounted cash-flow analysis, one is often able to escape this problem
and compare such apparent opposites as the value of a new machine today and sales growth over
the next five years. For a more thorough understanding of discounted cash-flow analysis and
an introduction to the time value of money, a concept which is integral to it, please see Capital
Budgeting or Understanding Corporate Finance, available where you purchased this program or
by calling Ivy Software directly at 1-800-342-5489. A section on the time value of money can
also be found in the appendices to either Financial Accounting: A Management Perspective or The
Financial Accounting Cycle With Supplements, available through these same channels.

93

Cash costs or profits are often the values a decision maker puts on the end of the branches of his
tree, but they are not the only consideration when evaluating a decision. Other less tangible costs
and benefits should be included if they are significant to the decision, such as loss of market share
or improvements in customer relations. Unfortunately, most of these are not easily quantifiable.
When a decision maker feels these are important factors in making the decision, he or she can either
take the time to quantify them or write them into his or her analysis at the end of the branches with
the numerical consequences.

Now lets return to our oil prospector, George, for an example of evaluating a decision tree
once the value of the outcomes is known. George knows that it will cost him $1,000,000 to drill an
oil well. A geologist has told George how much he will make if oil is struck on each of the plots.
Table 5-2
Oil Well Payout
Field
A
B
C
(George decided he coulddrill on his house lot)

Payout
$20,000,000
$50,000,000
$30,000,000

Georges diagram with the values of each outcome would look like this:
Figure 5-7
Georges Decision Tree IV
Drill A
Drill B
Drill C

Hit Oil
Dry
Hit Oil
Dry Hole

+$19,000,000
-$1,000,000
+$49,000,000
-$1,000,000

Hit Oil
Dry Hole

+$29,000,000
-$1,000,000

Do Not Drill

$0

The clear choice seems to be to drill on Plot B because George can make the most money if he drills
there. However, the geologist also tells George the probability of striking oil on each of the plots.

94

Table 5-3
Georges Probability of Striking Oil on Each Plot

Probability of striking oil on Plot A


Probability of striking oil on Plot B
Probability of striking oil on Plot C

P(A) = .8
P(B) = .2
P(C) = .5


Now, Georges decision is not so easy. True, he would gross $50 million if he struck
oil in Plot B, but would he want to waste his money drilling there if only a 20% probability
of striking oil existed (and thus an 80% chance that all George would find was a dry hole)?
Probabilities are noted along the branches of decision trees, and the values of each outcome
appear on the end of those branches and usually include both the cost (given as a negative
number) and benefits (given as a positive one), as in Georges tree on the following page:
Figure 5-8
Georges Decision Tree V
+$19,000,000
-$1,000,000
+$49,000,000
-$1,000,000

0.8
0.2
0.2
0.8

+$29,000,000
-$1,000,000

0.5
0.5

$0

George now calculates the weighted average of each branch as follows:


Table 5-4
Weighted Average Value
A (.8 x $19,000,000) + = $15,200,000 - $200,000 =
(.2 x -$1,000,000) $15,000,000
B (.2 x $49,000,000) + = $9,800,000 - $800,000 =
(.8 x -$1,000,000) $9,000,000
C (.5 x $29,000,000) + = $14,500,000 - $500,000 =
(.5 x -$1,000,000) $14,000,000

These figures are entered into the event nodes and the branches with the lowest returns are pruned.
You can signal this by putting two slash marks across a branch, as demonstrated in the decision tree
on the next page:

95

Figure 5-9
Georges Decision Tree V (with Weighted Average
in Millions of Dollars)
$15.0
$9.0

0.8
0.2
0.2
0.8

$14.0

0.5
0.5

+$19,000,000
-$1,000,000
+$49,000,000
-$1,000,000
+$29,000,000
-$1,000,000
$0


Georges best alternative is to drill on Plot A, since it has the highest weighted average, a
value of $15,000,000. This can be interpreted in this way: On average, George will make $15
million if he drills on Plot A, a higher number on average than if he drilled on any other plot. This
number is called the alternatives expected monetary value or EMV. It is somewhat difficult
to understand a decision-framing concept that uses averages in cases like Georges, where there
is only one opportunity to drill and he either makes a lot of money or nothing at all, but EMV
combines probabilities and expected returns in a way that makes analysis and decisions easier. In
cases such as Georges, one can interpret the EMV in this way: Imagine George had many plots that
were exactly like Plot A, and could drill on as many of them as he wanted. As long as he drilled
on a significant number of them, on average he would make $15 million (on some he would make
nothing, and on others twenty million dollars). Then compare that to what he would make if he did
the same thing on many plots that were just like Plot B, and many that were just like Plot C, both
of which produced less than $15 million on average.
Continuous Uncertainty


For simplicitys sake, we assumed that Georges geologists estimate of oil production was
either an all or nothing prediction. In reality, the oil could range from none to a great quantity. This
type of uncertainty is depicted on decision trees by a fan:
Figure 5-10
Decision Tree Showing Continuous Uncertainty
Drill A

The cumulative distribution function (CDF) described earlier can be used to evaluate the fan on a
tree through a method known as bracket medians. Suppose the graph on the next page is the CDF
from Georges geologist :

96

Figure 5-11
CDF of Barrels/Day from Georges Well
100
90
80
70

Probability

60
50
40
30
20
10
0

10

Barrels of Oil per Day


(in hundreds of thousands)


Suppose George wants an easier way to analyze this data and make a decision on pipeline
capacity for the amount of oil produced. George could divide the CDF into five ranges, these being
the 0 to .2 fractiles, .2 to .4 fractiles, .4 to .6 fractiles, .6 to .8 fractiles, and .8 to 1.0 fractile. He uses
the midpoint of each range, (the .1, .3, .5, .7, and .9 fractiles) as a representation of that range, and
gives each a branch on the decision diagram.
Figure 5-12
Georges Decision Tree VI
500,000
Drill A

0.2

250,000

0.2

400,000

0.2

500,000

0.2
0.2

600,000
750,000


Because George sees that by using the fractile technique he has determined five ranges each
equally likely to contain the true amount of oil, George then assigns the probability of each range,
.2, to each branch of the diagram. He computes the expected value in barrels to be 500,000 by
multiplying each of the .2 probabilities by each of the range midpoint values and then summing
them.

97


A large tree may become unmanageable with five branches after every event node, such as
would have happened if George had many plots on which to drill and used the above method for
every one. An alternative method exists which is called the Pearson-Tukey Method and provides
a reasonable result using only three branches per event node. The decision maker uses the .05, .5,
and .95 fractiles from the CDF and assigns probabilities of .185, .63 and .185 respectively. This
method provides results comparable to the bracket median technique.

The previous chapter dealt with probabilities. This chapter has dealt with analyzing decisions
using a framework that incorporates both probability and personal preferences. The skills you have
learned in these chapters should provide you with invaluable tools in making business decisions.

98

Illustration 1
(Chapter 1, Exercise 4, Question 2)
Total
Costs
10
9
8
7
6
5
4
3
2
1

10

20

30

40

50

60

70

80

90

# of Driveways Paved

99

100

110

120

130

Illustration 2
(Chapter 1, Exercise 4, Question 3)
A

Total
Costs 130

C
130

120

120

110

110

100

100

90

90

80

80

70

70

60

60

50

50

40

40

30

30

20

20

10

10

Total
Costs

Total
Costs

Lbs. of Asphalt

Total
Costs

130

110

110

100

100

90

90

80

80

70

70

60

60

50

50

40

40

30

30

20

20

10

10

D
120

130

120

# of Driveways Paved

Lbs. of Asphalt

# of Driveways Paved

None of the above.

100

Illustration 3
(Chapter 1, Exercise 5, Question 4)
A

C
(-2,25)
(0,21)

(-3,0)

(-7,0)

(3,0)

(-7,0)

(-5,-21)

(3,0)

(7,0)
(-7,0)

(3,0)

(5,-21)

(0,-21)
(-2,-25)

None of the above.

101

Illustration 4
(Chapter 4, Reading Comprehension Quiz, Question 9)
Probability
.3

.2

.1

# of Swordfish Caught

Illustration 5
(Chapter 4, Exercise 4, Question 1)

10 lbs.
(o = 2)
Distribution by Weight of Bags of Apples

102

Illustration 6
(Chapter 5, Exercise 2, Questions 2)

NONE OF THE ABOVE

103

Dry Hole

Illustration 7
(Chapter 5, Exercise 3, Questions 1, 2 & 3)
Project 1

Project 2

.40

$70,000

.60
.55

$30,000
$100,000

.45

-$15,000

Illustration 8
(Chapter 5, Exercise 4, Question 2)
1.0
.9
.8
.7
.6
.5
.4
.3
.2
.1

$120

$140

$160

Selling Price of House (in Thousands of Dollars)

104

$180

$200

Illustration 9
(Chapter 5, Final Exam, Question 6)
Project A

.4
.6

Project B

.7
.3

Project C

.6
.4

105

success

$100,000

failure

-$50,000

success

$20,000

failure

-$5,000

success

$50,000

failure

-$17,000

You might also like