You are on page 1of 24

Winter 2013 Math Review for FIN 411 Prof. J.

Page
FIN 411: Derivatives and Fixed Income
Review of Math Concepts
1. Algebra
1.1. Variables
In nance, and in this class in particular, we will be dealing with unknown quantities that
may take on any number of values, such as the price of a stock on some future date.
We will typically represent such an unknown quantity, called a variable, by a letter. For
example, we will generally use S
t
to represent the price of a stock (or some other asset) at
time t. Often the value of a variable depends on the value of one or more other variables.
1.2. Functions
A function of one variable is a rule by which the numerical value of a single variable
(the independent variable) yields a numerical output value (the dependent variable). In
generic terms, the independent variable is usually called , and the dependent variable
is usually called y. The expression y = () is used to indicate that the value of the
dependent variable y is determined from the variable by the function . However, these
particular letters are not set in stone. The notation c = g(S) might mean, for example,
that the value of a call option c is determined from the stock price S by the function g.
To express the dependence of a dependent variable on more than one independent
variable, you need to use a multivariate function. For example, if y depends on the
independent variables
1
,
2
, and
3
, you can write y = (
1
,
2
,
3
).
Example 1.1. The number of lift tickets sold by Sundance in a year (let = number of
lift tickets sold in a year) may depend on the U.S. Gross National Product (let g = GNP
in trillions of dollars) and the snowfall (let s = snowfall in inches), as represented by this
multivariate function: = (g, s) = 10, 000 + 1000g + 20s. This function implies that a
$1 trillion increase in GNP will increase Sundances annual lift ticket count by 1000 tickets
and that a 1 inch increase in the annual snowfall rate will increase Sundances annual lift
ticket count by 20. As a complete example, lets imagine a year in which the GNP is $14
trillion and the annual snowfall is 300 inches. The number of lift tickets would then be
rendered as follows:
= (14, 300) = 10, 000 + 1000(14) + 20(300) = 30, 000
1
Winter 2013 Math Review for FIN 411 Prof. J. Page
1.3. Equations
An equation is simply a mathematical statement that equates two quantities. For exam-
ple, in economics class, you learned that the equilibrium price in a market is the price that
makes supply and demand equal. By setting supply and demand equal to each other, we
can nd the equilibrium price.
An equation in a single variable is linear if each termin the equation is either a constant
or the variable multiplied by a constant. For example, 13 = 5+16 is a linear equation
of one variable, whereas 9
2
+ 4 = 85 is not a linear equation.
1.3.1 Solving Linear Equations in One Variable
Manipulate the equation using Rule 1 so that all the terms involving the variable (call it )
are on one side of the equation and all constants are on the other side. Then use Rule 2
to solve for .
Rule 1: Adding the same quantity to both sides of an equation does not change the set
of solutions to that equation.
Rule 2: Multiplying or dividing both sides of an equation by the same nonzero number
does not change the set of solutions to that equation.
Example 1.2. Finding the equilibrium price. Let p = the price in dollars per gallon
of gasoline in Provo. Suppose that the daily demand for gas in Provo (in thousands of
gallons) is 500 10p and that the daily supply of gasoline in Provo is 400 + 20p. At
what price for gasoline would daily supply and demand be equal?
To nd out, solve for the value of p in the equation 500 10p = 400 + 20p. Begin
by adding 10p to both sides, which yields 500 = 400 + 30p. Next, add 400 to both
sides of the equation, which yields 100 = 30p. Finally, divide both sides by 30 to obtain
100/ 30 = p, or p = 10/ 3. Therefore, p is approximately $3.33.
You can check your solution by substituting 10/3 into the equation: 50010(10/ 3) =
400 + 20(10/ 3) = 1400/ 3. The substitution veries that a price per gallon of approx-
imately $3.33 brings supply and demand into equilibrium; therefore, p = 10/ 3 is the
solution to the equation.
1.3.2 Solving Two Linear Equations
In nance, you often have two variables or unknowns (call them and y) that are related
by two equations. To nd the values of the unknowns, you must nd the values of
and y that satisfy both equations. For example, in this class we will have to solve linear
2
Winter 2013 Math Review for FIN 411 Prof. J. Page
equations to use the arbitrage approach for determining the price of put and call options
and of other securities.
Two linear equations in two variables (call them
1
and
2
) have either no solution, an
innite number of solutions, or a unique solution. You may solve two linear equations by
either substitution or elimination.
Substitution: Use one equation to solve for one variable in terms of the other (say,

1
in terms of
2
). Then substitute this relationship for each occurrence of
1
in the
remaining equation. Now solve the remaining equation for
2
. Given that you know

1
in terms of
2
, you also know
1
.
Elimination: Add a multiple of one equation to the other equation to eliminate a
variable (say,
1
) from the other equation. Solve the resulting equation for the re-
maining variable (
2
). Substitute this value of
2
in either of the original equations
to nd
1
.
Matrix algebra: A system of linear equations

11

1
+
12

2
= b
1

21

1
+
22

2
= b
2
can be written in matrix form as AX = B where
A =
_

11

12

21

22
_
X =
_

1

2
_
B =
_
b
1
b
2
_
We can then use simple matrix algebra to solve for
1
and
2
as
X = A
1
B
This matrix approach is especially convenient for large systems of equations for
which the substitution or elimination approach becomes quite tedious.
Example 1.3. Prof. Page has just returned from his ag football game, and Sis. Page
asks what the score of the game was. Prof. Page is a bit annoyed that she did not
attend the game to witness his glorious performance on the elds of friendly strife, and he
responds with the following riddle: "In all, 45 points were scored. If you double my teams
score and add 10 points, you get the other teams score."
How can Sis. Page determine the score of the game? While there are many points
of inuence she could use to get Prof. Page to simply tell her the score, if she chose to
3
Winter 2013 Math Review for FIN 411 Prof. J. Page
solve the riddle herself she would start by dening variables to represent the unknown
quantities. Let
1
= the number of points scored by Prof. Pages team, and
2
= the
number of points scored by the other team. Given the information in the riddle, she knows
that
1
+
2
= 45 and that 2
1
+ 10 = 3
2
. That gives her two equations with two
unknowns, which she can solve to determine
1
and
2
.
By substitution: From the rst equation,
1
= 45
2
. She can then plug this into
the second equation to get 2(45
2
) + 10 = 3
2
. Solving for
2
gives
2
= 20.
Plugging this value back into the rst equations gives
1
+ 20 = 45, so that
1
= 25.
By elimination: First, rearrange the second equation as 2
1
3
2
= 10. Next,
multiply the rst equation by 2 to get 2
1
+ 2
2
= 90, and subtract the second equation
from (2 times) the rst to get 5
2
= 100. Dividing through by 5 gives
2
= 20.
Substituting this value back into the rst equation then yields
1
= 25 as before.
By matrix algebra: Rearrange the second equations as before and dene the matri-
ces A and B:
A =
_
1 1
2 3
_
, B =
_
45
10
_
X =
_
1 1
2 3
_
1
_
45
10
_
=
_
25
20
_
Multiply the inverse of A by B to get the column matrix X containing the values of
1
and
2
. This can be done easily in Excel using the functions MINVERSE, which computes
the inverse of a matrix, and MMULT, which performs matrix multiplication. If the elements
of A are entered into cells A1 through B2, and the elements of B are entered into D1 and
D2, then the expression
=MMULT(MINVERSE(A1:B2),D1:D2)
will yield the values of X. Note that to get Excel to display both elements of X, we need to
enter the above expression as an array formula. To do so, highlight two cells in a column,
say F1 and F2, and type the expression above. Then, instead of hitting ENTER to evaluate
the function, press CTRL+SHIFT+ENTER. This will ll cells F1 and F2 with the values 25
and 20, consistent with the answers obtained by substitution and elimination.
1.4. Exponents
An exponent is shorthand for repeated multiplication. For example, 2
4
= 2 2 2 2.
The following rules apply to exponents:
Rule 1:
0
= 1
4
Winter 2013 Math Review for FIN 411 Prof. J. Page
Rule 2: When multiplying like terms involving exponents, add the exponents. That is,

b
=
+b
.
Rule 3: (y)

Rule 4: (

)
b
=
b
Rule 5: (/ y)

Rule 6:

=
1

Rule 7:

b
=
b
1.5. Exponential Function
The exponential function and its inverse, the natural logarithm function, are widely used
in mathematics and in formulas that are encountered when working with derivatives and
xed income securities. The exponential function is closely related to the mathematical
constant e. This constant can be dened as an innite series:
e = 1 +
1
1!
+
1
2!
+
1
3!
+
1
4!
+
where n! = n (n 1) (n 2) 3 2 1. It can be calculated tor any desired
accuracy by evaluating enough terms in the series. Using the rst four terms, we get
e = 1 + 1 +
1
2
+
1
6
= 2.66667
Using the rst six terms, we get
e = 1 + 1 +
1
2
+
1
6
+
1
24
+
1
120
= 2.71667
Using the rst ten terms, we get e = 2.71828, which is accurate to ve decimal places.
The exponential function is e

. It is sometimes also written as exp(). It is calculated


as is 2.71828

. For example, e
3
= 2.718283
3
= 20.0855. The exponential function
has many interesting properties. One of these is that
e
R
= lim
m
_
1 +
R
m
_
m
5
Winter 2013 Math Review for FIN 411 Prof. J. Page
In other words, as the value of m is increased in the expression on the right-hand side,
we get closer and closer to e
R
. This property of e leads directly to
Ae
Rn
= lim
m
A
_
1 +
R
m
_
mn
As we will see later in the course, this property allows us to use the exponential function
to work with continuously compounded rates of interest or returns. This allows us to apply
interest rates and other rates of return in a mathematically convenient way as we develop
formulas for pricing xed income and derivative securities.
Another important property of the exponential function is
e

e
y
= e
+y
(This is actually a property of exponents in general: see Rule 2 above). As we will see
later, it is this property that makes continuously compounded interest rates mathematically
convenient, because it means we can simply add up the interest rates that accrue over
successive periods of time.
1.6. Natural logarithm
Logarithms are convenient tools that reduce a multiplication problem to an addition prob-
lem. They also simplify problems involving exponents to much simpler multiplication prob-
lems. Because of their convenience, in this course we will frequently express interest rates
and returns in terms of logarithms. This will allow us to use the exponential function as a
simple way to apply discount rates to cash ows. We will also encounter logarithms in the
formulas we derive for pricing options.
Assuming a positive number b, the logarithm of a number to the base b is the
power to which b must be raised to result in the number . In other words, if you write
log
b
= c, then b
c
= .
When the base b of the logarithm is e = 2.71828, we call this the natural loga-
rithm. The natural logarithm function, which we usually write as ln() (or sometimes
log, where the unspecied base is understood to be e), is the inverse of the exponen-
tial function. This means that if we transform a variable using the exponential function,
we can undo that transformation using the natural logarithm. That is, if y = e

, then
= n(y) and vice versa. For example, e
3
= 20.0855, and ln20.0855 = 3.
The following useful rules apply to working with logarithms (including the natural loga-
rithm):
6
Winter 2013 Math Review for FIN 411 Prof. J. Page
Rule 1: log
b
= log
b
y = log
b
y
Rule 2: log
b
log
b
y = log
b
/ y
Rule 3: log
b

c
= c log
b

2. Calculus
Differential calculus is the mathematics of calculating the derivative of a function. The
derivative of a function is a measure of its slope, or rate of change, at a certain point.
Both of these interpretations have important applications in nance. For example, in
mean-variance portfolio theory and the CAPM, we are interested in the tangency portfolio,
whose risk and return places it at the point on the efcient frontier where the tangent line
passes through the point on the y axis that represents the risk-free rate. The slope of this
tangent line is the Sharpe ratio of the market portfolio.
In this class, we will be interested measuring risk. For example, the value of a bond
is a function of interest rates. If we want to measure the interest rate risk of the bond,
computing the derivative, or rate of change, of the bond price formula gives us a useful
measure of how much the price of a bond changes for a given change in interest rates.
2.1. Derivatives and Rules for Finding Derivatives
The intuitive denition of the derivative as the slope, or rate of change, of a function
at a given point, leads naturally to its formal mathematical denition. Given a function
y = (), the derivative (usually written as

() or
dy
d
, or sometimes y

or
d
d
) is dened
as

() = lim
0
( + ) ( )
2
where is a small increment. In other words, if we want to know the slope of () at ,
we can evaluate () at a value slightly larger than , + , and at a value slightly lower,
. The difference between the two function values, (+ ) ( ), divided by the
difference between + and , 2, is the approximate slope of the function at . As
we make the increment smaller and smaller, we get the exact slope, or instantaneous
rate of change, of ().
While in principle one could compute the derivative of any function using the above
denition, taking the limit can often be a bit difcult. In practice, we have a number of
rules that allow us to nd the derivative of most functions. Some of the most important
rules are:
Rule 1: If () = k, where k is a constant, then
dy
d
= 0.
7
Winter 2013 Math Review for FIN 411 Prof. J. Page
Rule 2: If () =
n
, where n is any number, then
dy
d
= n
n1
.
Rule 3: If () = e

, then

() = e

. (Thats easy!)
Rule 4: If () = ln(), then

() = 1/ . (Also pretty easy)


Rule 5: If () can be written as the sum of two functions, g() and h(), the

() =
g

() + h

().
Rule 6: (Chain Rule) If y = (g()), where and g are two functions, then
dy
d
=
d
dg
dg
d
.
In other words, nd the derivative of (g) as if g were a single variable, then nd the
g

() and multiply

(g) g

().
2.2. Second Derivatives
The second derivative of a function (written as

() or
d
2
y
d
2
) is simply the derivative of
the functions rst derivative.
If the second derivative is less than or equal to 0 for a value of , then () is
concave at .
If the second derivative is greater than or equal to 0 for a value of , then () is
convex at .
If () 0 for all , then () is a concave function.
If () 0 for all , then () is a convex function.
2.3. Partial Derivatives
For a function of multiple variables, we can compute the partial derivative with respect
to one variable by treating the other variables as constants. As such, the partial derivative
tells us how much a function changes for a small change in one variable while holding all
other variables constant. If y = (
1
,
2
,
3
), we write the partial derivative with respect
to
1
as
y

1
, and the second partial derivative as

2
y

2
.
2.4. Finding the Maximum or Minimum of a Function
It is often useful nd the maximum or minimum of some function. We can do this easily
by taking the derivative and setting it equal to zero. Specically, a concave function is
maximized for any value of where

() = 0. Similarly, a convex function is minimized


for any value of where

() = 0.
8
Winter 2013 Math Review for FIN 411 Prof. J. Page
Example 2.1. An automobile manufacturer has opened a new manufacturing plant to
produce hybrid vehicles. The cost c() of producing hybrids in a day is expressed by

3
20
2
+ 20, 000. What production level minimizes the average cost of producing
a hybrid?
The average cost of producing units is g() = c()/ = (
3
20
2
+20, 000)/ =

2
20+20, 000. Therefore, g

() = 220 and g

() = 2. Given that g

() > 0
for any value of , g() is a convex function. Thus, any value of for which g

() = 0
will minimize the average production cost.
To set g

() = 0, you must solve 2 20 = 0, which yields = 10. Therefore,


producing 10 cars per day will minimize the average cost per hybrid.
The total cost of producing 10 cars, for example, is c(10) = 103 20(102) +
20, 000(10) = 1, 000 2000 + 200, 000 = $199, 000. The minimum average cost
of producing a hybrid is $199, 000/ 10 = $19, 900.
3. Probability
Uncertainty and risk lie at the core of everything we do in nance. In this class in par-
ticular, we are concerned with measuring nancial risk and using derivatives contracts to
manage that risk. As such, we need ways to formally represent the uncertainty inherent
in nancial markets (probability distributions), as well as ways to quantify how we feel
about that risk (utility functions). For this we look to the tools of probability and statistics,
treating outcomes like future stock and commodities prices or other economic conditions
as random events.
3.1. Probability Rules
We refer to the result of a random process (such as ipping a coin or rolling a die) as an
event. We measure the probability of a particular outcome or event on a scale from 0
to 1 (with 0 meaning certain not to occur and 1 meaning certain to occur), and write the
probability of a given event E as P(E).
Given the probabilities of individual events, there are a number of important rules for
determining the probability of multiple events and for describing how the probabilities of
different events relate to each other.
Probability of Two Events
Given two events E
1
and E
2
, P(E
1
orE
2
) = P(E
1
) + P(E
2
) P(E
1
ndE
2
).
9
Winter 2013 Math Review for FIN 411 Prof. J. Page
Probability of Complementary Events
For a given event E, the complementary event is the event that E does not occur. For
example, in the roll of a die, the complementary event to a roll of 3 is a roll of 1 or 2 or
4 or 5 or 6. Note that by denition, complementary events are mutually exclusive, and
together account for the entire set of possible outcomes.
Thus for a given event E, the probability of the complementary event (

E) may be found
using the formula P(

E) = 1 P(E).
Conditional Probability
Given two events A and B, the conditional probability that A will occur, given that you
know that B has occurred, may be computed from P(A|B) =
P(A and B)
P(B)
.
Joint Probabilities
Given two events A and B, the joint probability that events A and B both occur can be
computed from P(A and B) = P(B)P(A|B).
Independent Events
Two events A and B are independent events if knowledge that one event has occurred
does not change the probability that the other event has occurred. If a set of events is
independent, you can nd the joint probability that a subset of the independent events
occurs simply by multiplying the probabilities of the individual events. This is consis-
tent with the rule for joint probabilities given above, because independence implies that
P(A|B) = P(A).
3.2. Probability distributions
A randomvariable is a variable that may take on any of a set possible random outcomes.
Notationally, random variables are usually distinguished from nonrandom, or determinis-
tic, variables by a tilde, such as

X
A probability distribution is a mathematical functions for describing random vari-
ables. In its simplest form, a probability distribution is simply a list of the possible out-
comes of a random variable, together with the probability of each outcome. For example,
the probability distribution for a coin toss is:
Outcome Probability
Heads 0.5
Tails 0.5
10
Winter 2013 Math Review for FIN 411 Prof. J. Page
This type of probability distribution, where there are two possible outcomes (success
or failure, up or down, heads or tails, etc.), is referred to generally as a Bernoulli distri-
bution. It can be written in general terms as:

X =
_
1 with probability p
0 with probability 1 p
3.2.1 Binomical Distribution
If a Bernoulli trial is repeated multiple times, we are often interested in the number of
successes (heads in a coin toss, etc.) out of n trials. For example, if we ip a coin 5
times, there are 6 possible outcomes: we could get heads 0, 1, 2, 3, 4, or 5 times. Each
of these outcomes has a certain probability of occurring, with 0 heads out of 5 coin ips
being less likely than 3 out of 5 and so forth. In general, there are n+1 possible outcomes
to a series of n Bernoulli trials.
The distribution of the number of successes in n independent
*
Bernoulli trials is called
a binomial distribution, and the binomial function tells us the probability of each possible
outcome in such a scenario. That is, if X is the random variable which represents the
number of successful trials and p probability of success in each trial, then the probability
of successes in n trials is equal to
P(X = |n, p) =
_
n

_
p

(1 p)
n
, = 0, 1, . . . , n
where
_
n

_
= n!/ [!(n )!].
In Excel, the binomial function can be computed using
=BINOMDIST(number
_
s, trials, probability
_
s, cumulative)
where number
_
s is the number of successes , trials is the number of trials n,
probability
_
s is the probability of success in each trial p, and cumulative is an
indicator for whether you want the function to return the probability of exactly successes
(cumulative=0), or the probability of observing or fewer successes (cumulative=1).
To compute the probability of observing more than successes, use 1-BINOMDIST(x,
n,p,1).
*
Independent trials means that the outcome of one trial has no effect on the outcome of other trials. In
the coin ip example, it means that the probability of heads is 0.5 each time you ip the coin, regardless of
what happened in the previous coin ips.
11
Winter 2013 Math Review for FIN 411 Prof. J. Page
Example 3.1. We can use the binomial distribution for a simple model of the random
uctuation of a stock price over time. For example, assume a stock has an initial price
S
0
= $20. Furthermore, suppose that each day the stock price will either go up by $1
with probability p = 0.6, or fall by $1 with probability 1 p = 0.4. Thus the movement of
the stock price each day is a Bernoulli trial with p = 0.6. We can represent this visually
using a tree diagram as in Figure 1.
S
0
= 20
S
1
= 19
S
1
= 21
t = 0 t = 1
Figure 1: Numerical example of a one-period binomial tree
Suppose we are interested what the stock price may be two days from now. In our
model, the price two days from now will depend on the outcome of two successive
Bernoulli trials. That is, the price will be $22 if the price goes up both days; $20 if the
price goes up then down or down then up; or $18 if the price goes down both days. The
price two days from now depends on on whether there are 2, 1, or 0 upward price move-
ments over the two days, and thus the nal price follows a binomial distribution. We can
therefore use the binomial function to compute the probability of each of the three possible
prices at the end of day 2. This is illustrated in Figure 2.
In fact, working through this example helps us understand a bit better what is happen-
ing in the binomial function. A nal price of $22 requires two successive up-movements,
the probability of which is 0.60.6 = 0.36. A nal price of $20 can result from either the
price going up then down, which occurs with probability 0.6 0.4 = 0.24, or from the
price going down then up, also with probability 0.4 0.4 = 0.24. Thus the total prob-
ability that we observe a nal price of $20 is 0.24 + 0.24 = 0.48. Finally, a nal price
of $18 requires two down movements, which occurs with probability 0.4 0.4 = 0.16.
Thus, the probability of a given nal price is equal to the probability of each price path
with the exact number of up movements necessary to reach that price, times the number
of such possible paths. In the binomial function,
_
n

_
is the number of possible paths with
up movements, and p

p
n
is the probability of each of those paths.
12
Winter 2013 Math Review for FIN 411 Prof. J. Page
20
19
21
18
P(S
2
= 18) =
_
2
0
_
0.6
0
0.4
2
= 0.16
20
P(S
2
= 20) =
_
2
1
_
0.6
1
0.4
1
= 0.48
22
P(S
2
= 22) =
_
2
2
_
0.6
2
0.4
0
= 0.36
t = 0 t = 1 t = 2
Figure 2: Stock prices in a two-period binomial tree
3.2.2 Expected Value
In the example above, there were 3 possible outcomes for the nal stock price in 2 days.
While we have seen how to compute the probability of each possible outcome, we may be
interested in determining a best guess of what the price will be. For this we can compute
the expected value of the nal price. Conceptually, we can think of the expected value
as representing the average nal price we would nd if we were able to observe a large
number of repetitions of two-day random stock price sequences. If we restarted our model
at a price of $20 and let it run for 2 days over and over again, we would expect to see
a nal price of $22 about 36% of the time, a price of $20 about 48% of the time, and
a price of $18 about 16% of the time. The average nal price would be approximately
22 0.36 + 20 0.48 + 18 0.16 = 20.40. This is the expected value of the nal
stock price after two days.
In general, we compute the expected value of a discrete random variable (i.e. one with
a nite set of possible outcomes) X as
E(X) =

P(X = )
summed over all of the possible values of . For the binomial distribution, the expected
13
Winter 2013 Math Review for FIN 411 Prof. J. Page
value of the number of successes X out of n trials is
E(X) =
n

=0
_
n

_
p

(1 p)
n
= np
In the example above, the expected number of up movements in two days is np = 2
0.6 = 1.2. The expected number of down movements is therefore 2 1.2 = 0.8, and
thus the expected nal price is the starting price of $20 plus $11.2 minus $10.8, which
equals $20.40 as we had computed above.
3.2.3 Normal Distribution
Another important probability distribution that we will use in this class is the normal dis-
tribution. The bell-shaped normal distribution appears naturally in many applications. In
fact, many variables that are the end result of multiple random inuences will exhibit a
distribution that is approximately normal.
To see why the normal distribution is normal, let us return to our binomial stock price
example. As mentioned previously, two days of stock price movements can produce three
different possible outcomes, and in general, n days can produce n+1 possible outcomes.
The most likely outcome is one up day and one down day, while the two extreme outcomes
of two up days or two down days are somewhat less likely.
What is the distribution of the stock price after many days? For example, after 30 days
there are 31 possible outcomes, and again the mid-range outcomes are the most likely
because there are more price paths that lead to them. For example, there is only one
price path with 30 up days (for a nal price of $50), but there are
_
30
15
_
= 155, 117, 520
possible price paths with 15 up days and 15 down days (for a nal price of $20). If you
plot the possible outcomes on the x-axis against the probability of each outcome on the
y-axis as in Figure 3, the plot will very closely approximate the bell-shaped curve of the
normal distribution.
Because random variables that are the end result of of multiple random inuences
(such as uctuations in stock or commodity prices) are usually well-described by the nor-
mal distribution, and because the normal distribution is mathematically convenient, we will
frequently use the normal distribution to model the random uctuations of asset prices.
The normal distribution is completely characterized by two parameters, the mean and
standard deviation. Figure 4 shows a graph of the standard normal distribution, with a
mean of 0 and standard deviation of 1. Using the standard normal distribution, we can
adapt the normal distribution to a given application by changing the mean (which will shift
the distribution up and down the number line) and/or standard deviation (which will make
14
Winter 2013 Math Review for FIN 411 Prof. J. Page
Figure 3: Probability Distribution of Stock Price After 30 Days (p = 0.6)
the normal curve narrower or broader).
We can use the normal distribution functions in Excel to compute probabilities based
on the normal distribution. For example, suppose the monthly rate of return on the S&P
500 index is approximately normally distributed with a mean of 1% and standard deviation
of 6%. What is the probability that the return in a given month will be negative? We can
compute this using the normal distribution function in Excel,
=NORMDIST(x,mean,standard
_
dev,cumulative)
where x is the cutoff (0 in our case, for a negative return), mean and standard
_
dev
are the mean and standard deviation of the distribution (1% and 6% in our case), and
cumulative is an indicator of whether we want the probability of observing a value less
than the cutoff x (cumulative=1), or simply the normal PDF evaluated at x (cumulative=0).
In our case, we would compute the probability of a negative return as
NORMDIST(0,0.01,0.06,1) = 0.4338
15
Winter 2013 Math Review for FIN 411 Prof. J. Page
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
-4 -3 -2 -1 0 1 2 3 4
68.26%
95.54%
99.74%
Figure 4: Standard normal distribution
3.2.4 Lognormal Distribution
You may have noticed a problem with our representation of stock price movements in
the binomial model above. Once the number of days in our model extends beyond 20,
there is some probability (albeit very small) that the stock price will become negative. For
example, after 30 days, if the stock price has experienced more than 20 down movements,
then nal stock price will be negative. This obviously wont do if we want our model of
stock price uctuations to be reasonably realistic.
To address this problem, we can make a slight modication to our binomial model.
Instead of characterizing the stock price as moving either up by $1 or down by $1 each
day, let us say that the stock moves up or down by a certain percentage of the current
price. To keep the magnitude of the daily changes comparable to the prior example,
suppose that the stock price moves either up by 5% or down by 5% (since $1 is 5% of the
initial $20 stock price). With this set up, the stock price can never go below zero, because
16
Winter 2013 Math Review for FIN 411 Prof. J. Page
after any down movement, the new price is 95% of the previous days price. On the other
hand, because the percentage changes compound from period to period, a sequence of
upward movements can cause the price to become quite high.
Figure 5: Probability Distribution of Stock Price After 30 Days With Percentage Changes
(p = 0.6)
Figure 5 plots the distribution of possible nal prices after 30 days from our new bi-
nomial model against their corresponding probabilities. As expected and desired, we no
longer observe possible nal prices that are negative. However, the resulting curve is
noticeably asymmetric compared to the normal bell curve that arose when we modeled
price changes as a xed, absolute dollar amount. Indeed, the binomial distribution, when
modeled as percentage changes that compound from period to period, converges to a
lognormal, rather than a normal, distribution. The lognormal describes the distribution
of a variable whose logarithm is normally distributed.
Thus, if we repeat our percent-based binomial model for T periods (where T is a
large number), the nal stock price

S
T
is approximately lognormally distributed, while
the logarithm of the nal price, ln(

S
T
) is normally distributed. Note that we can add or
subtract a constant from a normally distributed random variable, and the result is still
17
Winter 2013 Math Review for FIN 411 Prof. J. Page
normally distributed but with a with a shifted mean. That is, if we have a random variable

X N(, ), then if we have subtract a constant c from



X, then

X c N( c, ).
Therefore, if ln(

S
T
) is normally distributed, then we can subtract the log of the initial
price ln(S
0
), so that the log stock return fromtime 0 to time T, ln(

S
T
)lnS
0
= ln(

S
T
/ S
0
)
is also normally distributed. This means that if we use a lognormal distribution to model
the stock price on some future dates, then the (log) stock return between today and that
future date is normally distributed. Conversely, we can model stock returns as normally
distributed, which implies that the future stock price is lognormally distributed.
4. Statistics
In this course, we will be dealing with risks that arise from the random uctuations of
nancial variables such as asset prices and interest rates. We will use tools from statistics
to analyze historical data on asset prices and interest rates so that we can characterize
the empirical probability distribution of those variables. In particular, we will be interested
measures of central tendency, skewness, and variability, as well the degree to which
different nancial variables move together.
4.1. Measures of Central Tendency
The mean, median, and mode are the primary measures used to summarize the typical
value or central location for a data set.
The mean is just the average of the numbers in a data set.
The median is the point in a data set where half the observations are less than
the median and half the observations are more than the median. If the data set
consists of an even number of data points, the median is the average of the two
middle observations. If the data set consists of an odd number of data points, the
median is the middle observation.
The mode is the most frequently occurring value in a data set.
Excels AVERAGE, MEDIAN, and MODE functions can be used to compute measures of
central location.
4.2. Skewness
Skewness is a measure of the symmetry of a distribution. For example, the normal dis-
tribution in Figure 4 is perfectly symmetric (skewness = 0). In other words, relatively low
outcomes or relatively high outcomes are both equally likely. In contrast, the lognormal
distribution in Figure 5 exhibits positive skewness, meaning that relatively high outcomes
18
Winter 2013 Math Review for FIN 411 Prof. J. Page
(in this case, stock prices), are more likely than relatively low prices. Negative skewness,
means the opposite, the relatively low outcomes are more likely than relatively high out-
comes. If a distribution exhibits signicant skewness, the median may be a more useful
summary of central tendency than the mean. The skewness of a sample of data may be
computed in Excel using the SKEW function.
4.3. Measures of Variability
The sample variance and sample standard deviation are measures of a data sets spread
about the mean or variability. Given data points
1
,
2
, . . . ,
n
, the sample variance of
the data set (written as
2
) is dened as

2
=
1
n 1
n

=1
(

)
2
Standard deviation, denoted by , is the square root of the variance. If the observa-
tions on the variable are all the same, their standard deviation is zero. As they become
more different, their standard deviation increases. Intuitively, we can think of standard
deviation as the average distance from the mean of observations in the dataset.
In this class, we will be particularly interested in the volatility of nancial variables
such as asset prices, or in other words, how wildly they uctuate from period to period.
Because we will use standard deviation to measure volatility, we will use the two terms
interchangeably in this class, and use to denote both.
In Excel, the VAR function can be used to nd the sample variance of a set of obser-
vations, and STDEV can be used to compute standard deviation.
4.4. Covariance and Correlation
Covariance and correlation are measures of the strength of the relationship between two
random variables, such as the returns on two stocks.
4.4.1 Covariance
Given n points (
1
, y
1
), (
2
, y
2
), . . . (
n
, y
n
), the covariance between data sets X and
Y is given by
Co(X, Y) =

n
=1
(

)(y

y)
n 1
Suppose that X and Y tend to go up and down together. That is, when X is larger than
average, then Y is usually larger than average and when X is smaller than average, then Y
is usually smaller than average. Then most of the terms in the numerator of our covariance
19
Winter 2013 Math Review for FIN 411 Prof. J. Page
formula will be positive and the covariance will be positive. Conversely, suppose that
when X is larger than average, then Y is usually smaller than average and when X is
smaller than average, then Y is usually larger than average. Then most of the terms in the
numerator or our covariance formula will be negative and the covariance will be negative.
Therefore, if X and Y covary in the same direction, their covariance will be positive,
whereas if X and Y covary in opposite directions, their covariance will be negative.
In summary, a positive covariance indicates that X and Y tend to go up or down to-
gether whereas a negative covariance indicates that X and Y tend to move in opposite
directions (relative to their averages). Note that covariance only measures the strength of
a linear relationship and is not useful for detecting nonlinear relationships between vari-
ables. Therefore, covariance is a measure of linear association between two variables.
Given a sample of observations of two variables, the covariance between the two
variables can be calculated in Excel using the COVAR function. Note that the formula
used by Excel has n in the denominator instead of n 1 (that is, Excel computes the
population covariance instead of the sample covariance). Therefore, to get the correct
sample covariance, we need to multiply the output of the COVAR function by n/ (n 1).
4.4.2 Correlation
Covariance depends on the units in which the data are measured, which can make in-
terpreting the magnitude of covariance difcult. Instead, we often use the correlation
coefcient (denoted by rho), which is a unit-free measure of the strength of a linear rela-
tionship between two variables.
The coefcient of correlation (usually denoted by ) is a unit-free measure of the
degree of linear association between two data sets X and Y. Given n points (
1
, y
1
),
(
2
, y
2
), . . . (
n
, y
n
), the correlation between data sets X and Y is given by

y
=
Co(X, Y)

y
By dividing the covariance by the standard deviations of X and Y, we make the measure
unit free, so that 1
y
1 regardless of the units in which X and Y are expressed.
Correlation of 1 indicates a perfect linear relationship: a change in X is accompanied
by an exactly proportional change in Y. Similarly,
y
= 1 indicates a perfect negative
linear relationship: a change in X is accompanied by an exactly proportional change in
Y in the opposite direction. Correlation of 0 indicates that changes in the two variables
are completely unrelated to each other. Between these values, correlations closer to 0
mean a weaker relationship between the two variables, while correlations closer to 1 or
20
Winter 2013 Math Review for FIN 411 Prof. J. Page
1 mean a stronger relationship.
Correlation between two variables can be calculated in Excel using the CORREL func-
tion.
4.5. Linear Regression
Linear regression is used to estimate a best-t linear relationship of the form
y = + b
between two variables y and . The parameter b is the slope and the parameter is the
intercept. In other words, we want to nd the equation of a line that best describes the
relationship between and y.
Given a sample of observations of y and , it can be shown that the best estimate of
the slope b can be found by
b =
Co(, y)
Vr()
=
y

The intercept is then estimated as = y b , where and y are the means of the
observations of and y, respectively.
Using Excel, the intercept and and slope can be calculated using the INTERCEPT and
SLOPE functions. If, for example, the values are in cells B2 to B11 and the y values are
in cells C2 to C11, the instructions are:
=INTERCEPT(C2:C11, B2:B11)
and
=SLOPE(C2:C11, B2:B11)
The R
2
of a linear regression of y on x is the proportion of the variance of y that is
accounted for by . It is the square of the coefcient of correlation (
2
y
).
5. Risk aversion and the CAPM
Weve talked about how we model risk (variability and uncertainty in the future values of
assets that are of interest to rms or investors), using probability distributions described
by mean and variance.
Now we need a way to represent decision makers preferences. That is, how do the
you feel about this risk?
21
Winter 2013 Math Review for FIN 411 Prof. J. Page
5.1. Utility and Risk Aversion
Utility is a concept developed by economists to measure the relative satisfaction from
or desirability of the consumption of goods and services. Given this measure, one may
speak meaningfully of increasing or decreasing utility, and thereby explain economic be-
havior in terms of attempts to increase ones utility. We will be particularly concerned
with utility of money or wealth. Based on some reasonable assumptions about human
preferences, derived from observation and introspection:
We prefer more wealth to less wealth (non-satiability)
We get less satisfaction or utility out of an additional dollar if we are already relatively
wealthy, vs. if we are relatively poor (diminishing marginal utility)
We prefer a xed sum of money to a gamble with the same expected payoff (risk
aversion)
It turns out that we can capture all of these features by thinking of utility as a concave,
increasing function of wealth.
5.2. Risk aversion and asset pricing
The prices (or equivalently, the expected returns) of risky nancial securities like stocks
and bonds are determined by investors risk aversion. This is easily seen in the Capital
Asset Pricing Model (CAPM).
5.2.1 A Quick Review of the CAPM
For a given set of risky assets, there is a unique portfolio on the efcient frontier that has
the highest possible Sharpe ratio. If everyone observes the same set of risky assets and
has the same expectations about the returns on those assets (that is, everyone agrees on
the expected returns and variances of the risky assets), then everyone chooses the same
tangency portfolio. If everyone chooses the same risky portfolio, then that risky portfolio
must be the market portfolio!
In equilibrium, prices adjust until supply equals demand. In this case, the supply is
the number of shares of each stock or security existing in the market. The demand is the
amount of each risky security that investors want to hold in their portfolio. Furthermore,
because the degree of risk aversion varies across investors, some will want to lend (hold
some of their wealth in the risk-free asset), while some will want to borrow at the risk-free
rate to invest more in the tangency portfolio. The risk-free rate will adjust until the amount
of lending and the amount of borrowing cancel out. Thus, the value of the aggregate risky
portfolio will equal the entire wealth of the economy!
22
Winter 2013 Math Review for FIN 411 Prof. J. Page
5.2.2 The Market Price of Risk
Within the stylized framework of the CAPM, the market price of risk (also called the market
risk premium, or the return on the market portfolio in excess of the risk-free rate) is directly
determined by the aggregate risk aversion of investors in the market. To see this, suppose
there are N investors in the economy, and each investor has utility
U

= E[r]
1
2
A

2
where U

is utility (of the th investor), E[r] is the expected return (of the investors port-
folio),
2
is the return variance, and A

is a parameter that captures the investors degree


of risk aversion.
For concreteness, suppose that each has $1 to invest. How much will each investor
put in the market portfolio?
y

=
E[r
M
] r

2
M
If we add up the amount invested in the market portfolio by all investors, we get:
$1
E[r
M
] r

2
M
_
1
A
1
+
1
A
2
+ +
1
A
N
_
In equilibrium, the total wealth invested in the stock market must be
$1 N
This implies
E[r
M
] r

=

A
2
M
where

A is the average risk aversion of investors in the market (to be precise, its actually
the inverse of investors average risk tolerance):

A
1
N
_
1
A
1
+
1
A
2
+ +
1
A
N
_
In other words, the risk premium on the market portfolio is equal to investors aggregate
risk aversion times the variance of the market portfolio (i.e. how much investors dislike
risk multiplied by how risky the market portfolio actually is).
23
Winter 2013 Math Review for FIN 411 Prof. J. Page
5.2.3 The Main Idea
In market equilibrium, because investors hold assets as part of the market portfolio, the
risk premium on an individual asset depends on how much it contributes to the variance
of the overall market portfolio. We measure a stocks contribution to the riskiness of the
market portfolio by its covariance with the market portfolio. Specically, the expected
return E(r) on an asset is
E(r) = r

+ E(r
M
r

)
where r

is the risk-free rate of interest, r


M
is the return on the market portfolio, and
= Co(r, r
M
)/ Vr(r
M
) is a measure of the sensitivity of the stocks return to the return
on the market.
According to the CAPM, the risk in the returns from an asset can be divided into two
parts. Systematic risk is risk related to returns from the market and cannot be diversied
away. Nonsystematic risk or idiosyncratic risk is risk that is unique to the asset and can
be diversied away by choosing a large portfolio of different assets. The parameter is
a measure of systematic risk. The CAPM equation above shows that, when an asset has
no systematic risk ( = 0), its expected return is the risk-free rate. As increases, the
expected return increases. When = 1, so that it has the same systematic risk as the
market, the expected return is the return of the market.
Notice that the denition of is the same as the formula for computing the slope
coefcient in a linear regression. We can therefore estimate the beta of a stock or portfolio
of stocks by regressing its excess return (the return minus the risk-free rate), on the
excess return of the market portfolio, which is usually approximated as the return on a
well-diversied stock index such as the S&P 500.
When the asset is an individual stock, the expected return given by CAPM equation
is an unbiasedbut not particularly goodpredictor of the actual return. But, when the
asset is a well-diversied portfolio of stocks, it is a much better predictor. As a result, the
equation
E(r
p
) = r

+
p
E(r
M
r

)
where r
p
is the portfolio return, can be used to predict the expected return on a diversied
portfolio. The
p
in the equation is the beta of the portfolio and can be calculated as the
weighted average of the betas of the stocks in the portfolio.
24

You might also like