You are on page 1of 70

ECON 381 SC Foundations Of Economic Analysis

2003

John Hillas
University of Auckland
Contents

Chapter 1. Logic, Sets, Functions, and Spaces 1


1. Introduction 1
2. Logic 1
3. Some Set Theory 1
4. Functions 2
5. Spaces 2
6. Metric Spaces and Continuous Functions 2
7. Open sets, Compact Sets, and the Weierstrauss Theorem 2
8. Linear Spaces and Convex Sets 2
Chapter 2. Linear Algebra 3
1. The Space Rn 3
2. Linear Functions from Rn to Rm 5
3. Matrices and Matrix Algebra 6
4. Matrices as Representations of Linear Functions 8
5. Linear Functions from Rn to Rn and Square Matrices 10
6. Inverse Functions and Inverse Matrices 11
7. Changes of Basis 11
8. The Trace and the Determinant 14
9. Calculating and Using Determinants 15
10. Eigenvalues and Eigenvectors 20
Chapter 3. Choice, Preference, and Utility 23
1. Formal Model of Motivated Choice 23
2. The Properties of Choice Structures 24
3. Preference Relations and Preference Based Choice 24
4. The Relationship Between the Choice Based Approach and the Preference
Based Approach 25
5. Representing Preferences by Utility Functions 26
6. Consumer Behaviour: Optimisation Subject to a Budget Constraint 27

Chapter 4. Consumer Behaviour: Optimisation Subject to the Budget Constraint 29


1. Constrained Maximisation 29
2. The Theorem of the Maximum 33
3. The Envelope Theorem 35
4. Applications to Microeconomic Theory 38

Chapter 5. Decision Making Under Uncertainty 43


1. Revision of Probability Theory 43
2. Preferences Over Lotteries 46
3. Expected Utility Functions 47
Chapter 6. Multiple Agent Models I: Introduction to Noncooperative Game Theory 51
1. Normal Form Games 51
2. Extensive Form Games 55
i
ii CONTENTS

3. Existence of Equilibrium 59
Chapter 7. Multiple Agent Models II: Introduction to General Equilibrium Theory 61
1. The basic model of a competitive economy 61
CHAPTER 1

Logic, Sets, Functions, and Spaces

1. Introduction
This chapter is very incomplete. It includes at the moment only some material that we
shall use in the next chapter, and not even all of that. Never mind, perhaps there will be
more soon.

2. Logic
Exercises.

3. Some Set Theory


Set theory was developed in the second half of the 19th century and is at the very
foundation of modern mathematics. We shall not be concerned here with the development
of this theory. Rather we shall only give the basic language of set theory and outline some
of the very basic operations on sets. Exercise 1 of this section gives a description of one
of the most famous paradoxes of what is called naive set theory. This paradox is known as
Russell’s Paradox.
We start by defining a set to be a collection of objects or elements. We shall, in this
section, denote sets by capital letters and elements of sets by lower case letters. If the
element a is in the set A we write a ∈ A. If every element of the set B is also in the set A we
call B a subset of the set A and write B ⊂ A. We shall also say the A contains B. If A and B
have exactly the same elements then we say they are equal or identical. Alternatively we
could say A = B if and only if A ⊂ B and B ⊂ A. If B ⊂ A and B 6= A then we say that B is
a proper subset of A or that A strictly contains B.
In order to avoid the paradoxes such as the one referred to in the first paragraph we
shall always assume that in whatever situation we are discussing there is some given set U
called the universal set which contains all of the sets with which we shall deal. Exercise 1
asks you to show why such an assumption will avoid Russell’s Paradox.
We customarily enclose our specification of a set by braces. In order to specify a set we
may simply list the elements. For example to specify the set D which contains the numbers
1,2, and 3 we may write D = {1, 2, 3}. Alternatively we may define the set by specifying
a property that identifies the elements. For example we may specify the same set D by
D = {x | x is an integer and 0 < x < 4}. Notice that this second method is more powerful.
We could not, for example, list all the integers. (Since there are an infinite number of them
we would die before we finished.)
For any two sets A and B we define the union of A and B to be that set which contains
exactly all of the elements of A and all the elements of B. We denote the union of A and
B by A ∪ B. Similarly we define the intersection of A and B to be that set which contains
exactly those elements which are in both A and B. We denote the intersection of A and B
by A ∩ B. Thus we have

A ∪ B = {x | x ∈ A or x ∈ B}
A ∩ B = {x | x ∈ A and x ∈ B}.
1
2 1. LOGIC, SETS, FUNCTIONS, AND SPACES

Just as the number zero is extremely useful so the concept of a set that has no elements
is extremely useful also. This set we call the empty set or the null set and denote by 0. / To
see one use of the empty set notice that having such a concept allows the intersection of
two sets be well defined whether or not the sets have any elements in common.
We also introduce the concept of a Cartesian product. If we have two sets, say A and B,
the Cartesian product, A × B, is the set of all ordered pairs, (a, b) such that a is an element
of A and b is an element of B. Symbolically we write
A × B = {(a, b) | a ∈ A and b ∈ B}.
Exercises.

4. Functions
Exercises.

5. Spaces
Exercises.

6. Metric Spaces and Continuous Functions


Exercises.

7. Open sets, Compact Sets, and the Weierstrauss Theorem


Exercises.

8. Linear Spaces and Convex Sets


Exercises.
CHAPTER 2

Linear Algebra

1. The Space Rn
In the previous chapter we introduced the concept of a linear space or a vector space.1
We shall now examine in some detail one example of such a space. This is the space
of all ordered n-tuples (x1 , x2 , . . . , xn ) where each xi is a real number. We call this space
n-dimensional real space and denote it Rn .
Remember from the previous chapter that to define a vector space we not only need
to define the points in that space but also to define how we add such points and how we
multiple such points by scalars. In the case of Rn we do this element by element in the
n-tuple or vector. That is,
(x1 , x2 , . . . , xn ) + (y1 , y2 , . . . , yn ) = (x1 + y1 , x2 + y2 , . . . , xn + yn )
and
α(x1 , x2 , . . . , xn ) = (αx1 , αx2 , . . . , αxn ).
Let us consider the case that n = 2, that is, the case of R2 . In this case we can visualise
the space as in the following diagram. The vector (x1 , x2 ) is represented by the point that
is x1 units along from the point (0, 0) in the horizontal direction and x2 units up from (0, 0)
in the vertical direction.
x2
6

2 q (1, 2)

-
1 x1

Figure 1

1Of,course, we haven’t actually done any previous chapter from this “book” and, if you’ve checked the
previous chapter on the web site, you will know that the previous chapter is rather incomplete. Never mind.
Perhaps it will be there for next year’s students. Sigh. And even more sighing, since the previous was actually
written for last year’s notes.

3
4 2. LINEAR ALGEBRA

Let us for the moment continue our discussion in R2 . Notice that we are implicitly
writing a vector (x1 , x2 ) as a sum x1 × v1 + x2 × v2 where v1 is the unit vector in the first
direction and v2 is the unit vector in the second direction. Suppose that instead we consid-
ered the vectors u1 = (2, 1) = 2 × v1 + 1 × v2 and u2 = (1, 2) = 1 × v1 + 2 × v2 . We could
have written any vector (x1 , x2 ) instead as z1 × u1 + z2 × u2 where z1 = (2x1 − x2 )/3 and
z2 = (2x2 − x1 )/3. That is, for any vector in R2 we can uniquely write that vector in terms
of u1 and u2 . Is there anything that is special about u1 and u2 that allows us to make this
claim? There must be since we can easily find other vectors for which this would not have
been true. (For example, (1, 2) and (2, 4).)
The property of the pair of vectors u1 and u2 is that they are independent. That is, we
cannot write either as a multiple of the other. More generally in n dimensions we would say
that we cannot write any of the vectors as a linear combination of the others, or equivalently
as the following definition.

D EFINITION 1. The vectors x1 , . . . , xk all in Rn are linearly independent if it is not


possible to find scalars α1 , . . . , αk not all zero such that

α1 x1 + · · · + αk xk = 0.

Notice that we do not as a matter of definition require that k = n or even that k ≤ n. We


state as a result that if k > n then the collection x1 , . . . , xk cannot be linearly independent.
(In a real maths course we would, of course, have proved this.)

C OMMENT 1. If you examine the definition above you will notice that there is nowhere
that we actually need to assume that our vectors are in Rn . We can in fact apply the same
definition of linear independence to any vector space. This allows us to define the concept
of the dimension of an arbitrary vector space as the maximal number of linearly indepen-
dent vectors in that space. In the case of Rn we obtain that the dimension is in fact n.

E XERCISE 1. Suppose that x1 , . . . , xk all in Rn are linearly independent and that the
vector y in Rn is equal to β1 x1 + · · · + βk xk . Show that this is the only way that y can be
expressed as a linear combination of the xi ’s. (That is show that if y = γ1 x1 + · · · + γk xk
then β1 = γ1 , . . . , βk = γk .)

The set of all vectors that can be written as a linear combination of the vectors x1 , . . . , xk
is called the span of those vectors. If x1 , . . . , xk are linearly independent and if the span of
x1 , . . . , xk is all of Rn then the collection { x1 , . . . , xk } is called a basis for Rn . (Of course,
in this case we must have k = n.) Any vector in Rn can be uniquely represented as a linear
combination of the vectors x1 , . . . , xk . We shall later see that it can sometimes be useful to
choose a particular basis in which to represent the vectors with which we deal.
It may be that we have a collection of vectors { x1 , . . . , xk } whose span is not all of
R . In this case we call the span of { x1 , . . . , xk } a linear subspace of Rn . Alternatively we
n

say that X ⊂ Rn is a linear subspace of Rn if X is closed under vector addition and scalar
multiplication. That is, if for all x, y ∈ X the vector x + y is also in X and for all x ∈ X
and α ∈ R the vector αx is in X. If the span of x1 , . . . , xk is X and if x1 , . . . , xk are linearly
independent then we say that these vectors are a basis for the linear subspace X. In this
case the dimension of the linear subspace X is k. In general the dimension of the span of
x1 , . . . , xk is equal to the maximum number of linearly independent vectors in x1 , . . . , xk .
Finally, we comment that Rn is a metric space with metric d : R2n → R+ defined by
q
d((x1 , . . . , xn ), (y1 , . . . , yn )) = (x1 − y1 )2 + · · · + (xn − yn )2 .

There are many other metrics we could define on this space but this is the standard one.
2. LINEAR FUNCTIONS FROM Rn TO Rm 5

2. Linear Functions from Rn to Rm


In the previous section we introduced the space Rn . Here we shall discuss functions
from one such space to another (possibly of different dimension). The concept of continu-
ity that we introduced for metric spaces is immediately applicable here. We shall be mainly
concerned here with an even narrower class of functions, namely, the linear functions.
D EFINITION 2. A function f : Rn → Rm is said to be a linear function if it satisfies
the following two properties:
(1) f (x + y) = f (x) + f (y) for all x, y ∈ Rn , and
(2) f (αx) = α f (x) for all x ∈ Rn and α ∈ R.
C OMMENT 2. When considering functions of a single real variable, that is, functions
from R to R functions of the form f (x) = ax + b where a and b are fixed constants are
sometimes called linear functions. It is easy to see that if b 6= 0 then such functions do
not satisfy the conditions given above. We shall call such functions affine functions. More
generally we shall call a function g : Rn → Rm an affine function if it is the sum of a linear
function f : Rn → Rm and a constant b ∈ Rm . That is, if for any x ∈ Rn g(x) = f (x) + b.
Let us now suppose that we have two linear functions f : Rn → Rm and g : Rn → Rm .
It is straightforward to show that the function ( f + g) : Rn → Rm defined by ( f + g)(x) =
f (x) + g(x) is also a linear function. Similarly if we have a linear function f : Rn → Rm
and a constant α ∈ R the function (α f ) : Rn → Rm defined by (α f )(x) = α f (x) is a
linear function. If f : Rn → Rm and g : Rm → Rk are linear functions then the composite
function g ◦ f : Rn → Rk defined by g ◦ f (x) = g( f (x)) is again a linear function. Finally,
if f : Rn → Rn is not only linear, but also one-to-one and onto so that it has an inverse
f −1 : Rn → Rn then the inverse function is also a linear function.
E XERCISE 2. Prove the facts stated in the previous paragraph.
Recall in the previous section we defined the notion of a linear subspace. A linear
function f : Rn → Rm defines two important subspaces, the image of f , denoted Im( f ) ⊂
Rm , and the kernel of f , denoted Ker( f ) ⊂ Rn . The image of f is the set of all vectors in
Rm such that f maps some vector in Rn to that vector, that is,
Im( f ) = { y ∈ Rm | ∃x ∈ Rn such that y = f (x) }.
The kernel of f is the set of all vectors in Rn that are mapped by the function f to the zero
vector in Rm , that is,
Ker( f ) = { x ∈ Rn | f (x) = 0 }.
The kernel of f is sometimes called the null space of f .
It is intuitively clear that the dimension of Im( f ) is no more than n. (It is of course no
more than m since it is contained in Rm .) Of course, in general it may be less than n, for
example if m < n or if f mapped all points in Rn to the zero vector in Rm . (You should
satisfy yourself that this function is indeed a linear function.) However if the dimension
of Im( f ) is indeed less than n it means that the function has mapped the n-dimensional
space Rn into a linear space of lower dimension and that in the process some dimensions
have been lost. The linearity of f means that a linear subspace of dimension equal to the
number of dimensions that have been lost must have been collapsed to the zero vector (and
that translates of this linear subspace have been collapsed to single points). Thus we can
say that
dim(Im( f )) + dim(Ker( f )) = n.
In the following section we shall introduce the notion of a matrix and define various
operations on matrices. If you are like me when I first came across matrices, these defini-
tions may seem somewhat arbitrary and mysterious. However, we shall see that matrices
may be viewed as representations of linear functions and that when viewed in this way the
operations we define on matrices are completely natural.
6 2. LINEAR ALGEBRA

3. Matrices and Matrix Algebra


A matrix is defined as a rectangular array of numbers. If the matrix contains m rows
and n columns it is called an m × n matrix (read “m by n” matrix). The element in the ith
row and the jth column is called the i jth element. We typically enclose a matrix in square
brackets [ ] and write it as
 
a11 . . . a1n
 .. .. ..  .
 . . . 
am1 . . . amn
In the case that m = n we call the matrix a square matrix. If m = 1 the matrix contains a
single row and we call it a row vector. If n = 1 the matrix contains a single column and we
call it a column vector. For most purposes we do not distinguish between a 1 × 1 matrix
[a] and the scalar a.
Just as we defined the operation of vector addition and the multiplication of a vector by
a scalar we define similar operations for matrices. In order to be able to add two matrices
we require that the matrices be of the same dimension. That is, if matrix A is of dimension
m × n we shall be able to add the matrix B to it if and only if B is also of dimension m × n.
If this condition is met then we add matrices simply by adding the corresponding elements
of each matrix to obtain the new m × n matrix A + B. That is,
     
a11 . . . a1n b11 . . . b1n a11 + b11 . . . a1n + b1n
 .. .. ..  +  .. .. ..  =  .. .. ..
.

 . . .   . . .   . . .
am1 ... amn bm1 ... bmn am1 + bm1 ... amn + bmn
We can see that this definition of matrix addition satisfies many of the same properties
of the addition of scalars. If A, B, and C are all m × n matrices then
(1) A + B = B + A,
(2) (A + B) +C = A + (B +C),
(3) there is a zero matrix 0 such that for any m × n matrix A we have A + 0 = 0 + A =
A, and
(4) there is a matrix −A such that A + (−A) = (−A) + A = 0.
Of course, the zero matrix referred to in 3 is simply the m × n matrix consisting of all
zeros (this is called a null matrix) and the matrix −A referred to in 4 is the matrix obtained
from A by replacing each element of A by its negative, that is,
   
a11 . . . a1n −a11 . . . −a1n
−  ... .. ..  =  .. .. ..  .

. .   . . . 
am1 ... amn −am1 ... −amn
Now, given a scalar α in R and an m × n matrix A we define the product of α and
A which we write αA to be the matrix in which each element is replaced by α times that
element, that is,
   
a11 . . . a1n αa11 . . . αa1n
α  ... .. ..  =  .. .. ..  .

. .   . . . 
am1 ... amn αam1 ... αamn
So far the definitions of matrix operations have all seemed the most natural ones. We
now come to defining matrix multiplication. Perhaps here the definition seems somewhat
less natural. However in the next section we shall see that the definition we shall give is in
fact very natural when we view matrices as representations of linear functions.
We define matrix multiplication of A times B written as AB where A is an m × n matrix
and B is a p × q matrix only when n = p. In this case the product AB is defined to be an
m × q matrix in which the element in the ith row and jth column is ∑nk=1 aik bk j . That is,
3. MATRICES AND MATRIX ALGEBRA 7

to find the term to go in the ith row and the jth column of the product matrix AB we take
the ith row of the matrix A which will be a row vector with n elements and the jth column
of the matrix B which will be a column vector with n elements. We then multiply each
element of the first vector by the corresponding element of the second and add all these
products. Thus

∑k=1 a1k bk1 ∑nk=1 a1k bkq


  n
...
  
a11 ... a1n b11 ... b1q
 .. .. ..   .. .. .. .. .. ..
. = .
  
 . . .  . . . . .
am1 ... amn bn1 ... bnq ∑k=1 mk bk1
n
a ... ∑k=1 mk kq
n
a b

For example

 
  p q  
a b c  r ap + br + ct aq + bs + cv
s = .
d e f d p + er + f t dq + es + f v
t v

We define the identity matrix of order n to be the n × n matrix that has 1’s on its main
diagonal and zeros elsewhere that is, whose i jth element is 1 if i = j and zero if i 6= j. We
denote this matrix by In or, if the order is clear from the context, simply I. That is,

 
1 0 ... 0
 0 1 ... 0 
I= .
 
.. .. .. ..
 . . . . 
0 0 ... 1

It is easy to see that if A is an m × n matrix then AIn = A and Im A = A. In fact, we could


equally well define the identity matrix to be that matrix that satisfies these properties for
all such matrices A in which case it would be easy to show that there was a unique matrix
satisfying this property, namely, the matrix we defined above.
Consider an m × n matrix A. The columns of A are m-dimensional vectors, that is,
elements of Rm and the rows of A are elements of Rn . Thus we can ask if the n columns
are linearly independent and similarly if the m rows are linearly independent. In fact we
ask: What is the maximum number of linearly independent columns of A? It turns out that
this is the same as the maximum number of linearly independent rows of A. We call the
number the rank of the matrix A.
8 2. LINEAR ALGEBRA

4. Matrices as Representations of Linear Functions


Let us suppose that we have a particular linear function f : Rn → Rm . We have sug-
gested in the previous section that such a function can necessarily be represented as multi-
plication by some matrix. We shall now show that this is true. Moreover we shall do so by
explicitly constructing the appropriate matrix.
Let us write the n-dimensional vector x as a column vector
 
x1
 x 
 2 
x =  . .
 .. 
xn

Now, notice that we can write the vector x as a sum ∑ni=1 xi ei , where ei is the ith unit vector,
that is, the vector with 1 in the ith place and zeros elsewhere. That is,
       
x1 1 0 0
 x   0   1   0 
 2 
 ..  = x1  ..  + x2  ..  + · · · + xn  ..  .
     
 .   .   .   . 
xn 0 0 1
Now from the linearity of the function f we can write
n
f (x) = f ( ∑ xi ei )
i=1
n
= ∑ f (xi ei )
i=1
n
= ∑ xi f (ei ).
i=1

But, what is f (ei )? Remember that ei is a unit vector in Rn and that f maps vectors in Rn
to vectors in Rm . Thus f (ei ) is the image in Rm of the vector ei . Let us write f (ei ) as
 
a1i
 a 
 2i 
 ..  .
 . 
ami
Thus
n
f (x) = ∑ xi f (ei )
i=1
     
a11 a12 a1n
 a21   a22   a2n 
= x1   + x2   + · · · + xn 
     
.. .. .. 
 .   .   . 
am1 am2 amn
∑i=1 a1i xi
n
 
 ∑ni=1 a2i xi 
=
 
.. 
 . 
∑ni=1 ami xi
and this is exactly what we would have obtained had we multiplied the matrices
4. MATRICES AS LINEAR FUNCTIONS 9

  
a11 a12 ... a1n x1
 a21 a22 ... a2n  x2 
.
  
 .. .. .. ..  ..
 . . . .  . 
am1 am2 ... amn xn

Thus we have not only shown that a linear function is necessarily represented by multi-
plication by a matrix we have also shown how to find the appropriate matrix. It is precisely
the matrix whose n columns are the images under the function of the n unit vectors in Rn .

E XERCISE 3. Find the matrices that represent the following linear functions from R2
to R2 .

(1) a clockwise rotation of π/2 (90◦ ),


(2) a reflection in the x1 axis,
(3) a reflection in the line x2 = x1 (that is, the 45◦ line),
(4) a counter clockwise rotation of π/4 (45◦ ), and
(5) a reflection in the line x2 = x1 followed by a counter clockwise rotation of π/4.

Recall that in Section 2 we defined, for any f , g : Rn → Rm and α ∈ R, the functions


( f + g) and (α f ). In Section 3 we defined the sum of two m × n matrices A and B, and
the product of a scalar α with the matrix A. Let us instead define the sum of A and B as
follows.
Let f : Rn → Rm be the linear function represented by the matrix A and g : Rn → Rm
be the linear function represented by the matrix B. Now define the matrix (A + B) to be the
matrix that represents the linear function ( f + g). Similarly let the matrix αA be the matrix
that represents the linear function (α f ).

E XERCISE 4. Prove that the matrices (A + B) and αA defined in the previous para-
graph coincide with the matrices defined in Section 3.

We can also see that the definition we gave of matrix multiplication is precisely the
right definition if we mean multiplication of matrices to mean the composition of the linear
functions that the matrices represent. To be more precise let f : Rn → Rm and g : Rm → Rk
be linear functions and let A and B be the m × n and k × m matrices that represent them.
Let (g ◦ f ) : Rn → Rk be the composite function defined in Section 2. Now let us define
the product BA to be that matrix that represents the linear function (g ◦ f ).
10 2. LINEAR ALGEBRA

Now since the matrix A represents the function f and B represents g we have
(g ◦ f )(x) = g( f (x))
  
a11 a12 ... a1n x1
 a
 21 a22 ... a2n  x2 
= g  .
 
.. .. .. ..
 ..
 
. . .  . 
am1 am2 . . . amn xn
∑i=1 a1i xi
n
 
 ∑ni=1 a2i xi
= g 
 
.. 
 . 
∑i=1 ami xi
n

∑ni=1 a1i xi
  
b11 b12 ... b1m
 b21 b22 ... b2m  ∑ni=1 a2i xi 
=
  
.. .. .. ..  .. 
 . . . .  . 
bk1 bk2 ... bkm ∑ni=1 ami xi
∑ j=1 b1 j ∑i=1 a ji xi
m n
 
 ∑mj=1 b2 j ∑ni=1 a ji xi 
=
 
.. 
 . 
∑ j=1 bk j ∑i=1 a ji xi
m n

∑ni=1 ∑mj=1 b1 j a ji xi
 
 ∑ni=1 ∑mj=1 b2 j a ji xi 
=
 
.. 
 . 
∑i=1 ∑ j=1 bk j a ji xi
n m

∑mj=1 b1 j a j1 ∑mj=1 b1 j a j2 ... ∑mj=1 b1 j a jn


  
x1
 ∑mj=1 b2 j a j1 ∑mj=1 b2 j a j2 ... ∑mj=1 b2 j a jn  x2 
= .
  
.. .. .. ..  ..
 . . . .  . 
∑mj=1 bk j a j1 ∑mj=1 bk j a j2 ... ∑mj=1 bk j a jn xn
And this last is the product of the matrix we defined in Section 3 to be BA with the column
vector x. As we have claimed the definition of matrix multiplication we gave in Section 3
was not arbitrary but rather was forced on us by our decision to regard the multiplication
of two matrices as corresponding to the composition of the linear functions the matrices
represented.
Recall that the columns of the matrix A that represented the linear function f : Rn →
R were precisely the images of the unit vectors in Rn under f . The linearity of f means
m

that the image of any point in Rn is in the span of the images of these unit vectors and
similarly that any point in the span of the images is the image of some point in Rn . Thus
Im( f ) is equal to the span of the columns of A. Now, the dimension of the span of the
columns of A is equal to the maximum number of linearly independent columns in A, that
is, to the rank of A.

5. Linear Functions from Rn to Rn and Square Matrices


In the remainder of this chapter we look more closely at an important subclass of linear
functions and the matrices that represent them, viz the functions that map Rn to itself. From
what we have already said we see immediately that the matrix representing such a linear
function will have the same number of rows as it has columns. We call such a matrix a
square matrix.
7. CHANGES OF BASIS 11

If the linear function f : Rn → Rn is one-to-one and onto then the function f has
an inverse f −1 . In Exercise 2 you showed that this function too was linear. A matrix
that represents a linear function that is one-to-one and onto is called a nonsingular matrix.
Alternatively we can say that an n×n matrix is nonsingular if the rank of the matrix is n. To
see these two statements are equivalent note first that if f is one-to-one then Ker( f ) = {0}.
(This is the trivial direction of Exercise 5.) But this means that dim(Ker( f )) = 0 and so
dim(Im( f )) = n. And, as we argued at the end of the previous section this is the same as
the rank of matrix that represents f .
E XERCISE 5. Show that the linear function f : Rn → Rm is one-to-one if and only if
Ker( f ) = {0}.
E XERCISE 6. Show that the linear function f : Rn → Rn is one-to-one if and only if
it is onto.

6. Inverse Functions and Inverse Matrices


In the previous section we discussed briefly the idea of the inverse of a linear function
f : Rn → Rn . This allows us a very easy definition of the inverse of a square matrix A. The
inverse of A is the matrix that represents the linear function that is the inverse function of
the linear function that A represents. We write the inverse of the matrix A as A−1 . Thus a
matrix will have an inverse if and only if the linear function that the matrix represents has
an inverse, that is, if and only if the linear function is one-to-one and onto. We saw in the
previous section that this will occur if and only if the kernel of the function is {0} which
in turn occurs if and only if the image of f is of full dimension, that is, is all of Rn . This is
the same as the matrix being of full rank, that is, of rank n.
As with the ideas we have discussed earlier we can express the idea of a matrix inverse
purely in terms of matrices without reference to the linear function that they represent.
Given an n × n matrix A we define the inverse of A to be a matrix B such that BA = In
where In is the n × n identity matrix discussed in Section 3. Such a matrix B will exist if
and only if the matrix A is nonsingular. Moreover, if such a matrix B exists then it is also
true that AB = In , that is, (A−1 )−1 = A.
In Section 9 we shall see one method for calculating inverses of general n ×n matrices.
Here we shall simply describe how to calculate the inverse of a 2 × 2 matrix. Suppose that
we have the matrix
 
a b
A= .
c d
The inverse of this matrix is
  
1 d −b
.
ad − bc −c a
E XERCISE 7. Show that the matrix A is of full rank if and only if ad − bc 6= 0.
E XERCISE 8. Check that the matrix given is, in fact, the inverse of A.

7. Changes of Basis
We have until now implicitly assumed that there is no ambiguity when we speak of
the vector (x1 , x2 , . . . , xn ). Sometimes there may indeed be an obvious meaning to such
a vector. However when we define a linear space all that are really specified are “what
straight lines are” and “where zero is.” In particular, we do not necessarily have defined in
an unambiguous way “where the axes are” or “what a unit length along each axis is.” In
other words we may not have a set of basis vectors specified.
Even when we do have, or have decided on, a set of basis vectors we may wish to re-
define our description of the linear space with which we are dealing so as to use a different
12 2. LINEAR ALGEBRA

set of basis vectors. Let us suppose that we have an n-dimensional space, even Rn say, with
a given set of basis vectors v1 , v2 , . . . , vn and that we wish instead to describe the space in
terms of the linearly independent vectors b1 , b2 , . . . , bn where
bi = b1i v1 + b2i v2 + · · · + bni vn .
Now, if we had the description of a point in terms of the new coordinate vectors, e.g.,
as
z1 b1 + z2 b2 + · · · + zn bn
then we can easily convert this to a description in terms of the original basis vectors. We
would simply substitute the formula for bi in terms of the e j ’s into the previous formula
giving
! ! !
n n n
∑ b1i zi v1 + ∑ b2i zi v2 + · · · + ∑ bni zi vn
i=1 i=1 i=1
or, in our previous notation
∑ni=1 b1i zi 
  
 ∑ni=1 b2i zi 
.
 
 ..
 . 
(∑ni=1 bni zi )
But this is simply the product
  
b11 b12 ... b1n z1
 b21 b22 ... b2n  z2 
.
  
 .. .. .. ..  ..
 . . . .  . 
bn1 bn2 ... bnn zn
That is, if we are given an n-tuple of real numbers that describe a vector in terms of the
new basis vectors b1 , b2 , . . . , bn and we wish to find the n-tuple that describes the vector
in terms of the original basis vectors we simply multiply the ntuple we are given, written
as a column vector by the matrix whose columns are the new basis vectors b1 , b2 , . . . , bn .
We shall call this matrix B. We see among other things that changing the basis is a linear
operation.
Now, if we were given the information in terms of the original basis vectors and wanted
to write it in terms of the new basis vectors what should we do? Since we don’t have the
original basis vectors written in terms of the new basis vectors this is not immediately
obvious. However we do know that if we were to do it and then were to carry out the
operation described in the previous paragraph we would be back with what we started.
Further we know that the operation is a linear operation that maps n-tuples to n-tuples
and so is represented by multiplication by an n × n matrix. That is we multiply the n-
tuple written as a column vector by the matrix that when multiplied by B gives the identity
matrix, that is, the matrix B−1 . If we are given a vector of the form
x1 v1 + x2 v2 + · · · + xn vn
and we wish to express it in terms of the vectors b1 , b2 , . . . , bn we calculate
 −1  
b11 b12 . . . b1n x1
 21 b22 . . . b2n   x2 
 b   
 .. .. . . . .
 . . .. ..   .. 
 

bn1 bn2 . . . bnn xn
Suppose now that we consider a linear function f : Rn → Rn and that we have origi-
nally described Rn in terms of the basis vectors v1 , v2 , . . . , vn where vi is the vector with 1
7. CHANGES OF BASIS 13

in the ith place and zeros elsewhere. Suppose that with these basis vectors f is represented
by the matrix  
a11 a12 . . . a1n
 21 a22 . . . a2n 
 a 
A= . . . . .
 .. .. .. .. 

an1 an2 . . . ann
If we now describe Rn in terms of the vectors b1 , b2 , . . . , bn how will the linear function
f be represented? Let us think of what we want? We shall be given a vector described
in terms of the basis vectors b1 , b2 , . . . , bn and we shall want to know what the image of
this vector under the linear function f is, where we shall again want our answer in terms
of the basis vectors b1 , b2 , . . . , bn . We shall know how to do this when we are given the
description in terms of the vectors e1 , e2 , . . . , en . Thus the first thing we shall do with our
vector is to convert it from a description in terms of b1 , b2 , . . . , bn to a description in terms
of e1 , e2 , . . . , en . We do this by multiplying the n-tuple by the matrix B. Thus if we call our
original n-tuple z we shall now have a description of the vector in terms of e1 , e2 , . . . , en ,
viz Bz. Given this description we can find the image of the vector in question under f by
multiplying by the matrix A. Thus we shall have A(Bz) = (AB)z. Remember however this
will have given us the image vector in terms of the basis vectors e1 , e2 , . . . , en . In order to
convert this to a description in terms of the vectors b1 , b2 , . . . , bn we must multiply by the
matrix B−1 . Thus our final n-tuple will be (B−1 AB)z.
Recapitulating, suppose that we know that the linear function f : Rn → Rn is rep-
resented by the matrix A when we describe Rn in terms of the standard basis vectors
e1 , e2 , . . . , en and that we have a new set of basis vectors b1 , b2 , . . . , bn . Then when Rn
is described in terms of these new basis vectors the linear function f will be represented
by the matrix B−1 AB.
E XERCISE 9. Let f : Rn → Rm be a linear function. Suppose that with the standard
bases for Rn and Rm the function f is represented by the matrix A. Let b1 , b2 , . . . , bn be
a new set of basis vectors for Rn and c1 , c2 , . . . , cm be a new set of basis vectors for Rm .
What is the matrix that represents f when the linear spaces are described in terms of the
new basis vectors?
E XERCISE 10. Let f : R2 → R2 be a linear function. Suppose that with the standard
bases for Rn and Rm the function f is represented by the matrix
 
3 1
.
1 2
Let    
3 1
and
2 1
be a new set of basis vectors for R2 . What is the matrix that represents f when R2 is
described in terms of the new basis vectors?
Properties of a square matrix that depend only on the linear function that the matrix
represents and not on the particular choice of basis vectors for the linear space are called
invariant properties. We have already seen one example of an invariant property, the rank
of a matrix. The rank of a matrix is equal to the dimension of the image space of the
function that the matrix represents which clearly depends only on the function and not on
the choice of basis vectors for the linear space.
The idea of a property being invariant can be expressed also in terms only of matrices
without reference to the idea of linear functions. A property is invariant if whenever an
n × n matrix A has the property then for any nonsingular n × n matrix B the matrix B−1 AB
also has the property. We might think of rank as a function that associates to any square
matrix a nonnegative integer. We shall say that such a function is an invariant if the property
14 2. LINEAR ALGEBRA

of having the function take a particular value is invariant for all particular values we may
choose.
Two particularly important invariants are the trace of a square matrix and the determi-
nant of a square matrix. We examine these in more detail in the following section.

8. The Trace and the Determinant


In this section we define two important real valued functions on the space of n × n
matrices, the trace and the determinant. Both of these concepts have geometric interpreta-
tions. However, while the trace is easy to calculate (much easier than the determinant) its
geometric interpretation is rather hard to see. Thus we shall not go into it. On the other
hand the determinant while being somewhat harder to calculate has a very clear geometric
interpretation. In Section 9 we shall examine in some detail how to calculate determinants.
In this section we shall be content to discuss one definition and the geometric intuition of
the determinant.
Given an n × n matrix A the trace of A, written tr(A) is the sum of the elements on the
main diagonal, that is,
 
a11 a12 . . . a1n
 21 a22 . . . a2n 
 a n

..  ∑ ii

tr  . . .. . = a .
 .. .. 
. i=1
an1 an2 ... ann
E XERCISE 11. For the matrices given in Exercise 10 confirm that tr(A) = tr(B−1 AB).
It is easy to see that the trace is a linear function on the space of all n × n matrices, that
is, that for all A and B n × n matrices and for all α ∈ R
(1) tr(A + B) = tr(A) + tr(B),
and
(2) tr(αA) = αtr(A).
We can also see that if A and B are both n × n matrices then tr(AB) = tr(BA). In fact, if
A is an m × n matrix and B is an n × m matrix this is still true. This will often be extremely
useful in calculating the trace of a product.
E XERCISE 12. From the definition of matrix multiplication show that if A is an m × n
matrix and B is an n × m matrix that tr(AB) = tr(BA). [Hint: Look at the definition of
matrix multiplication in Section 2. Then write the determinant of the product matrix using
summation notation. Finally change the order of summation.]
The determinant, unlike the trace is not a linear function of the matrix. It does however
have some linear structure. If we fix all columns of the matrix except one and look at
the determinant as a function of only this column then the determinant is linear in this
single column. Moreover this is true whatever the column we choose. Let us write the
determinant of the n × n matrix A as det(A). Let us also write the matrix A as [a1 , a2 , . . . , an ]
where ai is the ith column of the matrix A. Thus our claim is that for all n × n matrices A,
for all i = 1, 2, . . . n, for all n vectors b, and for all α ∈ R
det([a1 , . . . , ai−1 , ai + b, ai+1 , . . . , an ]) = det([a1 , . . . , ai−1 , ai , ai+1 , . . . , an ])
(3)
+ det([a1 , . . . , ai−1 , b, ai+1 , . . . , an ])
and
(4) det([a1 , . . . , ai−1 , αai , ai+1 , . . . , an ]) = α det([a1 , . . . , ai−1 , ai , ai+1 , . . . , an ]).
We express this by saying that the determinant is a multilinear function.
9. CALCULATING AND USING DETERMINANTS 15

Also the determinant is such that any n × n matrix that is not of full rank, that is, of
rank n, has a zero determinant. In fact, given that the determinant is a multilinear function
if we simply say that any matrix in which one column is the same as one of its neighbours
has a zero determinant this implies the stronger statement that we made. We already see
one use of calculating determinants. A matrix is nonsingular if and only if its determinant
is nonzero.
The two properties of being multilinear and zero whenever two neighbouring columns
are the same already almost uniquely identify the determinant. Notice however that if the
determinant satisfies these two properties then so does any constant times the determinant.
To uniquely define the determinant we “tie down” this constant by assuming that det(I) = 1.
Though we haven’t proved that it is so, these three properties uniquely define the deter-
minant. That is, there is one and only one function with these three properties. We call this
function the determinant. In Section 9 we shall discuss a number of other useful properties
of the determinant. Remember that this additional properties are not really additional facts
about the determinant. They can all be derived from the three properties we have given
here.
Let us now look to the geometric interpretation of the determinant. Let us first think
about what linear transformations can do to the space Rn . Since we have already said that
a linear transformation that is not onto is represented by a matrix with a zero determinant
let us think about linear transformations that are onto, that is, that do not map Rn into a
linear space of lower dimension. Such transformations can rotate the space around zero.
They can “stretch” the space in different directions. And they can “flip” the space over.
In the latter case all objects will become “mirror images” of themselves. We call linear
transformations that make such a mirror image orientation reversing and those that don’t
orientation preserving. A matrix that represents an orientation preserving linear function
has a positive determinant while a matrix that represents an orientation reversing linear
function has a negative determinant. Thus we have a geometric interpretation of the sign
of the determinant.
The absolute size of the determinant represents how much bigger or smaller the linear
function makes objects. More precisely it gives the “volume” of the image of the unit
hypercube under the transformation. The word volume is in quotes because it is the volume
with which we are familiar only when n = 3. If n = 2 then it is area, while if n > 3 then it
is the full dimensional analog in Rn of volume in R3 .

E XERCISE 13. Consider the matrix


 
3 1
.
1 2

In a diagram show the image under the linear function that this matrix represents of the
unit square, that is, the square whose corners are the points (0,0), (1,0), (0,1), and (1,1).
Calculate the area of that image. Do the same for the matrix
 
4 1
.
−1 1

In the light of Exercise 10, comment on the answers you calculated.

9. Calculating and Using Determinants


We have already used the concepts of the inverse of a matrix and the determinant
of a matrix. The purpose of this section is to cover some of the “cookbook" aspects of
calculating inverses and determinants.
16 2. LINEAR ALGEBRA

Suppose that we have an n × n matrix


 
a11 ... a1n
 .. .. .. 
A= . . . 
an1 ... ann

then we shall use |A| or




a11 ... a1n

.. .. ..

. . .

an1 ... ann

as an alternative notation for det(A). Always remember that



11 . . . a1n
a
.. .. .
..
.
.
a . . . ann
n1

is not a matrix but rather a real number. For the case n = 2 we define

a a12
det(A) =
11
a21 a22
as a11 a22 − a21 a12 . It is possible to also give a convenient formula for the determinant of
a 3 × 3 matrix. However, rather than doing this, we shall immediately consider the case of
an n × n matrix.
By the minor of an element of the matrix A we mean the determinant (remember a
real number) of the matrix obtained from the matrix A by deleting the row and column
containing the element in question. We denote the minor of the element ai j by the symbol
|Mi j |. Thus, for example,

22 . . . a2n
a
.. .. .. .
|M11 | = . . .

n2 . . . ann
a

E XERCISE 14. Write out the minors of a general 3 × 3 matrix.

We now define the cofactor of an element to be either plus or minus the minor of the
element, being plus if the sum of indices of the element is even and minus if it is odd. We
denote the cofactor of the element ai j by the symbol |Ci j |. Thus |Ci j | = |Mi j | if i + j is even
and |Ci j | = −|Mi j | if i + j is odd. Or,

|Ci j | = (−1)i+ j |Mi j |.

We now define the determinant of an n × n matrix A,



11 . . . a1n
a

det(A) = |A| = ... .. ..

. .

n1 . . . ann
a

to be ∑nj=1 a1 j |C1 j |. This is the sum of n terms, each one of which is the product of an
element of the first row of the matrix and the cofactor of that element.

E XERCISE 15. Define the determinant of the 1 × 1 matrix [a] to be a. (What else could
we define it to be?) Show that the definition given above corresponds with the definition
we gave earlier for 2 × 2 matrices.
9. CALCULATING AND USING DETERMINANTS 17

E XERCISE 16. Calculate the determinants of the following 3 × 3 matrices.


     
1 2 3 1 5 2 1 1 0
(a)  3 6 9  (b)  1 4 3  (c)  5 4 1 
4 5 7 0 1 2 2 3 2
   
1 0 0 2 5 2
(d)  0 1 0  (e)  1 5 3 
0 0 1 0 1 3
E XERCISE 17. Show that the determinant of the identity matrix, det(In ) is 1 for all
values of n. [Hint: Show that it is true for I2 . Then show that if it is true for In−1 then it is
true for In .]
One might ask what was special about the first row that we took elements of that row
multiplied them by their cofactors and added them up. Why not the second row, or the first
column? It will follow from a number of properties of determinants we list below that in
fact we could have used any row or column and we would have arrived at the same answer.
E XERCISE 18. Expand the matrix given in Exercise 16(b) in terms of the 2nd and
3rd rows and in terms of each column and check that the resulting answer agrees with the
answer you obtained originally.
We now have a way of calculating the determinant of any matrix. To find the deter-
minant of an n × n matrix we have to calculate n determinants of size (n − 1) × (n − 1).
This is clearly a fairly computationally costly procedure. However there are often ways to
economise on the computation.
E XERCISE 19. Evaluate the determinants of the following matrices
   
1 8 0 7 4 7 0 4
 2 3 4 6   5 6 1 8 
(a) 
 1 6 0 −1  (b)  0 0 9 0 
  

0 −5 0 8 1 −3 1 4
[Hint: Think carefully about which column or row to use in the expansion.]
We shall now list a number of properties of determinants. These properties imply
that, as we stated above, it does not matter which row or column we use to expand the
determinant. Further these properties will give us a series of transformations we may
perform on a matrix without altering its determinant. This will allow us to calculate a
determinant by first transforming the matrix to one whose determinant is easier to calculate
and then calculating the determinant of the easier matrix.
P ROPERTY 1. The determinant of a matrix equals the determinant of its transpose.
|A| = |A0 |
P ROPERTY 2. Interchanging two rows (or two columns) of a matrix changes its sign
but not its absolute value. For example,

c d
= cb − ad = −(ad − cb) = − a b .


a b c d

P ROPERTY 3. Multiplying one row (or column) of a matrix by a constant λ will


change the value of the determinant λ -fold. For example,

11 . . . λ a1n 11 . . . a1n
λa a

.. .. . .
.. = λ ..
.. .
.. .
.
. .
a
n1 ... ann a
n1 . . . ann

18 2. LINEAR ALGEBRA

E XERCISE 20. Check Property 3 for the cases n = 2 and n = 3.


C OROLLARY 1. |λ A| = λ n |A| (where A is an n × n matrix).
C OROLLARY 2. | − A| = |A| if n is even. | − A| = −|A| if n is odd.
P ROPERTY 4. Adding a multiple of any row (column) to any other row (column) does
not alter the value of the determinant.
E XERCISE 21. Check that
   
1 5 2 1 5+3×2 2
 1 4 3  =  1 4+3×3 3 
0 1 2 0 1+3×2 2
 
1 + (−2) × 1 5 + (−2) × 4 2 + (−2) × 3
= 1 4 3 .
0 1 2
P ROPERTY 5. If one row (or column) is a constant times another row (or column) then
the determinant the matrix is zero.
E XERCISE 22. Show that Property 5 follows from Properties 3 and 4.
We can strengthen Property 5 to obtain the following.
P ROPERTY 50 . The determinant of a matrix is zero if and only if the matrix is not of
full rank.
E XERCISE 23. Explain why Property 50 is a strengthening of Property 5, that is, why
50 implies 5.
These properties allow us to calculate determinants more easily. Given an n × n matrix
A the basic strategy one follows is to use the above properties, particularly Property 4 to
find a matrix with the same determinant as A in which one row (or column) has only one
non-zero element. Then, rather than calculating n determinants of size (n − 1) × (n − 1)
one only needs to calculate one. One then does the same thing for the (n − 1) × (n − 1)
determinant that needs to be calculated, and so on.
There are a number of reasons we are interested in determinants. One is that they give
us one method of calculating the inverse of a nonsingular matrix. (Recall that there is no
inverse of a singular matrix.) They also give us a method, known as Cramer’s Rule, for
solving systems of linear equations. Before proceeding with this it is useful to state one
further property of determinants.

P ROPERTY 6. If one expands a matrix in terms of one row (or column) and the cofac-
tors of a different row (or column) then the answer is always zero. That is
n
∑ ai j |Ck j | = 0
j=1

whenever i 6= k. Also
n
∑ ai j |Cik | = 0
i=1
whenever j 6= k.
E XERCISE 24. Verify Property 6 for the matrix
 
4 1 2
 5 2 1 .
1 0 3
9. CALCULATING AND USING DETERMINANTS 19

Let us define the matrix of cofactors C to be the matrix [|Ci j |] whose i jth element is
the cofactor of the i jth element of A. Now we define the adjoint matrix of A to be the
transpose of the matrix of cofactors of A. That is
adj(A) = C0 .
It is straightforward to see (using Property 6) that A adj(A) = |A|In = adj(A)A. That is,
A−1 = |A| 1
adj(A). Notice that this is well defined if and only if |A| =
6 0. We now have a
method of finding the inverse of any nonsingular square matrix.
E XERCISE 25. Use this method to find the inverses of the following matrices
     
3 −1 2 4 −2 1 1 5 2
(a)  1 0 3  (b)  7 3 3  (c)  1 4 3  .
4 0 2 2 0 1 0 1 2
Knowing how to invert matrices we thus know how to solve a system of n linear
equations in n unknowns. For we can express the n equations in matrix notation as Ax = b
where A is an n × n matrix of coefficients, x is an n × 1 vector of unknowns, and b is an
n × 1 vector of constants. Thus we can solve the system of equations as x = A−1 Ax = A−1 b.
Sometimes, particularly if we are not interested in all of the x’s it is convenient to use
another method of solving the equations. This method is known as Cramer’s Rule. Let us
suppose that we wish to solve the above system of equations, that is, Ax = b. Let us define
the matrix Ai to be the matrix obtained from A by replacing the ith column of A by the
vector b. Then the solution is given by
|Ai |
xi = .
|A|
E XERCISE 26. Derive Cramer’s Rule. [Hint: We know that the solution to the system
of equations is solved by x = (1/|A|)adj(A)b. This gives a formula for xi . Show that this
formula is the same as that given by xi = |Ai |/|A|.]
E XERCISE 27. Solve the following system of equations (i) by matrix inversion and
(ii) by Cramer’s Rule
2x1 − x2 = 2 −x1 + x2 + x3 = 1
(a) 3x2 + 2x3 = 16 (b) x1 − x2 + x3 = 1 .
5x1 + 3x3 = 21 x1 + x2 + x3 = 1
E XERCISE 28. Recall that we claimed that the determinant was an invariant. Confirm
this by calculating (directly) det(A) and det(B−1 AB) where
   
1 0 1 1 0 0
B =  1 −1 2  and A =  0 2 0 .
2 1 −1 0 0 3
E XERCISE 29. An nth order determinant of the form

a
11 0 0 ... 0

a
21 a22 0 ... 0

a
31 a32 a 33 ... 0

.. .. .. .. ..
.
. . . .

a a
n1 a n2 . . . ann
n3

is called triangular. Evaluate this determinant. [Hint: Expand the determinant in terms of
its first row. Expand the resulting (n − 1) × (n − 1) determinant in terms of its first row,
and so on.]
20 2. LINEAR ALGEBRA

10. Eigenvalues and Eigenvectors


Suppose that we have a linear function f : Rn → Rn . When we look at how f deforms
Rn one natural question to look at is: Where does f send some linear subspace? In particu-
lar we might ask if there are any linear subspaces that f maps to themselves. We call such
linear subspaces invariant linear subspaces. Of course the space Rn itself and the zero di-
mensional space {0} are invariant linear subspaces. The real question is whether there are
any others. Clearly, for some linear transformations there are no other invariant subspaces.
For example, a clockwise rotation of π/4 in R2 has no invariant subspaces other than R2
itself and {0}.
A particularly important class of invariant linear subspaces are the one dimensional
ones. A one dimensional linear subspace is specified by one nonzero vector, say x̄. Then
the subspace is {λ x̄ | λ ∈ R}. Let us call this subspace L(x̄). If L(x̄) is an invariant linear
subspace of f and if x ∈ L(x̄) then there is some value λ such that f (x) = λ x. Moreover the
value of λ for which this is true will be the same whatever value of x we choose in L(x̄).
Now if we fix the set of basis vectors and thus the matrix A that represents f we have
that if x is in a one dimensional invariant linear subspace of f then there is some λ ∈ R
such that
Ax = λ x.
Again we can define this notion without reference to linear functions. Given a matrix A if
we can find a pair x, λ with x 6= 0 that satisfy the above equation we call x an eigenvector of
the matrix A and λ the associated eigenvalue. (Sometimes these are called characteristic
vectors and values.)
E XERCISE 30. Show that the eigenvalues of a matrix are an invariant, that is, that
they depend only on the linear function the matrix represents and not on the choice of basis
vectors. Show also that the eigenvectors of a matrix are not an invariant. Explain why the
dependence of the eigenvectors on the particular basis is exactly what we would expect and
argue that is some sense they are indeed invariant.
Now we can rewrite the equation Ax = λ x as
(A − λ In )x = 0.
If x, λ solve this equation and x 6= 0 then we have a nonzero linear combination of the
columns of A − λ In equal to zero. This means that the columns of A − λ In are not linearly
independent and so det(A − λ In ) = 0, that is,
 
a11 − λ a12 ... a1n
 a
21 a22 − λ . . . a2n 
det   = 0.
 
.. .. . . ..
 . . . . 
an1 an2 . . . ann − λ
Now, the left hand side of this last equation is a polynomial of degree n in λ , that is, a
polynomial in λ in which n is the highest power of λ that appears with nonzero coefficient.
It is called the characteristic polynomial and the equation is called the characteristic equa-
tion. Now this equation may, or may not, have a solution in real numbers. In general, by
the fundamental theorem of algebra the equation has n solutions, perhaps not all distinct, in
the complex numbers. If the matrix A happens to be symmetric (that is, if ai j = a ji for all i
and j) then all of its eigenvalues are real. If the eigenvalues are all distinct (that is, different
from each other) then we are in a particularly well behaved situation. As a prelude we state
the following result.
T HEOREM 1. Given an n × n matrix A suppose that we have m eigenvectors of A
x1 , x2 , . . . , xm with corresponding eigenvalues λ1 , λ2 , . . . , λm . If λi 6= λ j whenever i 6= j then
x1 , x2 , . . . , xm are linearly independent.
10. EIGENVALUES AND EIGENVECTORS 21

An implication of this theorem is that an n × n matrix cannot have more than n eigen-
vectors with distinct eigenvalues. Further this theorem allows us to see that if an n × n
matrix has n distinct eigenvalues then it is possible to find a basis for Rn in which the lin-
ear function that the matrix represents is represented by a diagonal matrix. Equivalently
we can find a matrix B such that B−1 AB is a diagonal matrix.
To see this let b1 , b2 , . . . , bn be n linearly independent eigenvectors with associated
eigenvalues λ1 , λ2 , . . . , λn . Let B be the matrix whose columns are the vectors b1 , b2 , . . . , bn .
Since these vectors are linearly independent the matrix B has an inverse. Now
B−1 AB = B−1 [Ab1 Ab2 . . . Abn ]
= B−1 [λ1 b1 λ2 b2 . . . λn bn ]
= [λ1 B−1 b1 λ2 B−1 b2 . . . λn B−1 bn ]
 
λ1 0 . . . 0
 0 λ ... 0 
2
= . ..  .
 
.. . .
 .. . . . 
0 0 ... λn
CHAPTER 3

Choice, Preference, and Utility

We shall start this course with an introduction to the formal modelling of motivated
choice. There is an abundance of experimental evidence to suggest that, at least in some
circumstances, the decision making of actual people is influenced by a myriad of appar-
ently irrelevant considerations. It is nevertheless traditional in economics—and I think for
very good reason—to start, and often to go no further than, what is usually called rational
choice, which I am calling here motivated choice.
The treatment here follows largely the approach and uses the notation of ? The style
shall be rather formal. For better or worse, modern economic theory is formal, sometimes,
perhaps, more formal than it needs to be. Whatever style you will choose for your own
work, you will need to be able to read and work with formal models to access the literature
that exists.
Before getting into the formal modelling and assumptions of the next section I shall
discuss here an assumption that is buried in the setup. In the next section we shall define
a choice rule as a function that for any choice situation specifies which of the available
elements might be chosen. A choice situation will be defined by specifying which elements
are available to be selected, that is, by specifying the available set of choices. Notice that
there is already an assumption buried here. By specifying only the set, and not saying
how it is presented to the decision maker we are already making assumptions. And those
assumptions are not necessarily realistic. There is a good deal of evidence to suggest that
the manner a choice is presented to a decision maker might have a good deal of influence
on the actual decision she comes to.

1. Formal Model of Motivated Choice


We start by specifying the set of all alternatives the decision-maker might be asked to
choose among. For the moment we shall just specify it as an abstract set X. Each element
of the set corresponds to the completely specified consequence of a choice. These might
be very complicated consequences. One example of such a consequence is a consumption
plan, specifying how much of each good an individual will consume. Another is the pro-
duction plan of a firm. In another setting the consequences might be lotteries, giving some
outcome in one realisation and another outcome in a second. In other parts of the course
we shall look in more detail at some examples of sets of alternatives in which a bit more
structure is assumed. Here however we shall simply let X be an arbitrary set. We shall,
however, assume for the moment that X is finite.
While X is the set of all alternatives the decision-maker will typically be offered a
choice from some smaller subset, say B ⊂ X. We shall call such a set B a budget set. The
family of all such budget sets that the decision-maker might confront we shall denote B.
Notice that this might not be the set of all subsets. For example in choosing which subjects
to enrol in this year not all combinations were available to you. There were considerations
of prerequisites and pairs of subjects which could not both be taken. (If X is finite we could
actually let B be the family of all nonempty subsets of X. But we shall include the extra
generality for later use.)
For a given decision maker a choice structure is a pair (B,C(·)) where B is as de-
scribed above and C(·) is a choice rule that associates to every budget set B in B a chosen
23
24 3. CHOICE, PREFERENCE, AND UTILITY

nonempty subset of B. Formally we say that C is a correspondence from B to X and write


C:BX
such that for all B in B we have C(B) ⊂ B. Informally the choice rule C(·) tells us for each
budget set B which elements of B are (or could be) chosen by the decision-maker. Notice
that we make the requirements that C(B) be nonempty and that it be a subset of B part of
the definition of what a choice function is. In the next section we discuss an important
property that we do not make part of the definition. This reflects the view that it simply
would not make much sense to consider choice functions that did not choose anything or
that chose elements that were not available. On the other hand it is quite coherent to think
of a decision-maker choosing in ways that do not satisfy the property given in the next
section.

2. The Properties of Choice Structures


There are many ways that we might think of a decision-maker choosing. In order
that you should realise that there is something substantive about the properties that we
shall discuss we shall first give some examples that will not satisfy the property we shall
formulate below.
A decision-maker might choose the element that is most like the average of all the
elements with which he is confronted. (This, of course, assumes that we have some way
of averaging the elements.) Or he might choose the one most dissimilar from the rest.
(Similarly, this assumes we have some notion of similarity.)
Both of these methods of choosing might seem a little strange to you. This is probably
because you have in mind already some, perhaps informal, idea that the decision-maker’s
choice is based on some underlying preference of the decision-maker. We shall, in the
next section explicitly introduce preferences. Let’s see first however if we can do anything
without explicitly talking about preferences (at least in our formal model).
Suppose that we were observing the decision-maker and we saw her, when confronted
with a particular budget set choose a particular alternative x and reject another that was also
available y. If we knew already that her choice was based on some underlying preference
over the alternatives then we might say that she had revealed a preference for x over y. Thus
we would be a bit surprised if we were every to see her choosing y when x was available to
her.
We formalise this in the following definition.
D EFINITION 3. The choice structure (B,C(·))satisfies the weak axiom of revealed
preference if whenever there is some budget set B in B containing both x and y with x in
C(B) it is the case for any B0 in B that contains both x and y if y is in C(B0 ) then x is also
in C(B0 ).

3. Preference Relations and Preference Based Choice


Another approach to choice starts by taking not the decision-maker’s choice behaviour
as a primitive but by positing from the start that the decision-maker has a preference re-
lation on the elements of X. A preference relation % is a list of all those pairs (x, y) for
which it is true that x is at least as good for the decision-maker as y. Thus if for a particular
pair (x0 , y0 ) it was the case that x0 was as good as y0 then (x0 , y0 ) would be in the set %. We
actually denote this by the somewhat more suggestive notation x0 % y0 , which we read x0 is
at least as good as y0 .
We could do everything we want to about preferences in terms of the relation %, and
from some points of view it might be good to do things that way. However it is very
convenient, and sometimes instructive to introduce two other binary relations on X:
4. CHOICE AND PREFERENCES 25

(1) the strict preference relation  defined by


x  y if and only if (x % y and not y % x
which we read “x is preferred to y.”
(2) the indifference relation ∼ defined by
x ∼ y if and only if (x % y and y % x
which we read “x is indifferent to y.”
It is also possible to start with the strict preference relation as the primitive and to
define the (weak) preference relation and the indifference relation in terms of that.
There is a bit more to what we would intuitively think of as preferences that we have
so far (which, after all, is almost nothing). We call a preference relation rational if it is
both complete and transitive.
D EFINITION 4. The preference relation % is rational if
(1) (Completeness) for all x, y in X either x % y or y % x (or both)
(2) (Transitivity) for all x, y, z in X if x % y and y % z then x % z.
The properties that define rational preference relations also have easily stated implica-
tions for the derived strict preference and indifference relations.
P ROPOSITION 1. If % is rational then
(1)  is both irreflexive (it is not the case that x  x for any x in X) and transitive
(for all x, y, z in X if x  y and y  z then x  z).
(2) ∼ is reflexive (x ∼ x for all x in X), transitive (for all x, y, z in X if x ∼ y and y ∼ z
then x ∼ z), and symmetric (for all x, y in X if x ∼ y then y ∼ x).
(3) for all x, y, z in X if x  y and y % z then x  z.
(4) for all x, y, z in X if x % y and y  z then x  z.

4. The Relationship Between the Choice Based Approach and the Preference Based
Approach
We return to the choice based approach. Given a choice structure (B,C(·))we can
define preference relation based on it, the so-called revealed preference relation, %∗ .
D EFINITION 5. Given a choice structure (B,C(·))the revealed preference relation %∗
is defined by x %∗ y if and only if there is some B in B such that x, y are in B and x is in
C(B).
We read x %∗ y as “x is revealed at least as good as y” or “x is revealed weakly preferred
to y.” Note that %∗ may not be either transitive or complete. We also say that “x is revealed
(strictly) preferred to y if there is some B in B such that x, y are in B and x is in C(B) and
y is not. We can then restate Definition 3 as follows.
D EFINITION 10 . The choice structure (B,C(·))satisfies the weak axiom of revealed
preference if whenever x is revealed at least as good as y then y is not revealed preferred to
x.
There are two questions we might ask regarding the relationship between the two
approaches.
(1) If the decision-maker has a rational preference ordering do the preference max-
imising choices necessarily generate a choice structure that satisfies the weak
axiom?
(2) If the decision-maker’s choices facing a family of budget sets B are represented
by a choice structure (B,C(·))that satisfies the weak axiom is there necessarily
a rational preference relation that is consistent with these choices?
26 3. CHOICE, PREFERENCE, AND UTILITY

The answer to the first question is unambiguously “yes.” To answer the second ques-
tion in the affirmative requires some assumptions on B.
To analyse the problem formally we need a bit more notation.
Suppose that the decision-maker has a rational preference relation % on X. For any
nonempty subset of alternatives B ⊂ X we let C∗ (B, %) be her preference maximising
choices, i.e.,
C∗ (B, %) = {x ∈ B | x % y for every y ∈ B}.
P ROPOSITION 2. Suppose that % is a rational preference relation and that B is such
that C∗ (B, %) is nonempty for all B in B. Then the choice structure (B,C∗ (·, %)) satisfies
the weak axiom of revealed preference.
P ROOF. Suppose that for some B in B we have x, y in B and x in C∗ (B, %). By the
definition of C∗ (B, %) this means that x % y. Now consider some other B0 in B with x, y in
B and y in C∗ (B0 , %). Thus y % z for all z in B0 . But since x is in C∗ (B, %) and y is in B we
have x % y. Also since % is rational it is transitive and so x % z for all z in B0 . And so x is
in C∗ (B0 , %) as well, as we require. 

The relation between C∗ (·, %) and the preferences % is quite clear and is not problem-
atic. The relation between a choice structure (B,C(·))and the revealed preference relation
%∗ is a little less clear, even if %∗ is rational. Can you think of an example in which %∗
is rational but doesn’t seem to quite reflect the choice structure? [Hint: Try to have %∗
contain all ordered pairs, and yet not have the choice structure be that of complete indif-
ference.] To say something systematic it is convenient to say precisely what we mean by a
preference relation representing or rationalising a choice structure.
D EFINITION 6. Given a choice structure (B,C(·)), we say that the rational preference
relation % rationalises C(·) relative to B if C(B) = C∗ (B, %) for all B in B, that is,if %
(and B) generates the choice structure (B,C(·)).
We are now in a position to answer the second question raised earlier.
P ROPOSITION 3. If (B,C(·))is a choice structure satisfying the weak axiom and if
B includes all nonempty subsets of X containing three or fewer elements then there is a
rational preference relation % that rationalises C(·) relative to B. that is, C(B) = C∗ (B, %)
for all B in B. Furthermore this rational preference relation is the only preference relation
that does so.
The proof of this proposition is not too difficult. You can read it in ? if you are up to
it. (It’s Proposition 1.D.2 there.) We shall not prove it here.

5. Representing Preferences by Utility Functions


In the previous sections we have discussed choice and preference, two extremely im-
portant concepts in economics. However there is another concept that, if not more im-
portant, is more widely used and better known, that of utility. How is the idea of utility
related to the two concepts that we have discussed already. Just as we saw in the previous
section that preferences could rationalise or represent a choice structure so too a utility
function can represent a preference relation. In most modern choice theory this is all a
utility function does.
A utility function u(x) assigns a real number to each element of the set of alternatives
X. We interpret the utility function to mean that the decision-maker prefers one alternative
to another if and only if the first is assigned a higher utility level.
D EFINITION 7. A function u : X → R is a utility function representing the preference
relation % if, for all x and y in X, x % y if and only if u(x) ≥ u(y).
6. CONSUMER OPTIMISATION 27

We can now ask about the relationship between the preference relation being rational
and it being represented by a utility function. Again, in one direction we get a very stark
answer.
P ROPOSITION 4. If a preference relation % can be represented by a utility function
then it is rational.
Again, we shall not prove this here. It’s not very difficult. You can find the proof in ?
and one of the homework exercises guides you through the proof.
There are a number of circumstances in which the converse is true. In particular, if the
set of alternatives X is finite then any rational preference relation can be represented by a
utility function. In cases in which X is not finite one needs additional conditions in order
to guarantee that the preferences cab be so represented. We shall return to this question a
little when we discuss the consumer’s decision problem in a little more detail in the next
section.

6. Consumer Behaviour: Optimisation Subject to a Budget Constraint


We now look at a special case of our theory so far, that of a consumer choosing which
bundle of commodities to consume. For much of our discussion we shall focus on the
case in which there are only two commodities. When we look at the more general case we
shall denote the number of commodities by L and index the commodities by ` = 1, . . . , L.
Thus the vector (x1 , x2 , . . . , x` , . . . , xL ) represents the situation in which the consumer con-
sumes (or perhaps, depending on our interpretation, plans to consume) the quantity x1 of
commodity 1, x2 of commodity 2, and so on.
In discussing the consumer’s problem we shall normally assume that the consumer
must consume nonnegative amounts of each commodity. If we make no other assumptions
then the set of alternatives X would be
X = RL+ = {(x1 , x2 , . . . , xL ) | x` ≥ 0 for all `}.
If we made other restrictions on what the consumption bundles could be we might obtain
a strict subset of RL+ . A number of such examples are given in the homework.
Having discussed the set of alternatives we turn now to look at budget sets. Though
in many examples actual budget sets may well be more complicated we shall restrict our
attention here to what are known as Walrasian or competitive budget sets.
D EFINITION 8. Given prices of commodities p = (p1 , p2 , . . . , pL ) in RL++ (the strictly
positive L-vectors) and wealth w ≥ 0 the Walrasian, or competitive budget set B(p, w) =
{x ∈ X | p ·x ≤ w} is the set of all feasible consumption bundles for a consumer with wealth
w facing prices p.
For the consumer’s problem the family of budget sets B is precisely the Walrasian
budget sets. That is,
B = {B(p, w) | p ∈ RL++ and w ≥ 0}.
We shall define the Walrasian demand correspondence x(p, w) to be the rule that asso-
ciates to any price–wealth pair (p, w) those consumption bundles chosen by the consumer
given the budget set B(p, w). Now, since we are thinking of only the budget set as relevant
for the consumer and not anything else about the price–wealth pair this means that any
changes to the price-wealth pair that do not change the budget set should not change the
demanded bundles. This is reflected in the assumption that the Walrasian demand corre-
spondence is homogeneous of degree zero.
D EFINITION 9. The Walrasian demand correspondence is homogeneous of degree
zero if x(α p, αw) = x(p, w) for any p ∈ RL++ , w ≥ 0 and α > 0.
28 3. CHOICE, PREFERENCE, AND UTILITY

We think of the commodity space X as including all possible uses of the consumer’s
wealth. Thus if the consumer were not to spend all of her wealth she would be forgoing the
possibility of increasing her consumption with no alternate use of her wealth. We could
allow such situations, but it would limit what we could say about the consumer’s behaviour.
Recognising that it is a substantive assumption, though rather a mild one, we assume that
the consumer spends all of her wealth. We call this assumption Walras’ law.
D EFINITION 10. The Walrasian demand correspondence satisfies Walras’ law if p ·
x = w for any p ∈ RL++ , w ≥ 0 and x ∈ x(p, w).
We now assume that the Walrasian demand correspondence is single valued, that is,
that it is a function. There are a number of essentially adding up restrictions implead by the
homogeneity of degree zero and Walras’ law. Consider the requirement of homogeneity
x(α p, αw) − x(p, w) = 0 for all α > 0. Suppose we differentiate this with respect to α and
evaluate the result at α = 1. We obtain the following result.
P ROPOSITION 5. If the Walrasian demand function x(p, w) is homogeneous of degree
zero then for all p and w
L ∂ x` (p, w) ∂ x (p, w)
∑ ∂ p
pk + `
∂w
w = 0 for ` = 1, . . . , L.
k=1 k
If you know matrix notation you can say this a bit more simply.
D p x(p, w)p + Dw x(p, w)w = 0.
Now consider Walras’ law p · x(p, w) = w for all p and w. Suppose we differentiate
this with respect to price. We obtain the following result.
P ROPOSITION 6. If the Walrasian demand function x(p, w) satisfies Walras’ law then
for all p and w
L ∂ x (p, w)
∑ p` `∂ p + xk (p, w) = 0 for k = 1, . . . , L.
`=1 k
Or in matrix notation
p · D p x(p, w)p + x(p, w)T = 0T .
Consider again Walras’ law p · x(p, w) = w for all p and w and differentiate this time
with respect to w. We obtain the following result.
P ROPOSITION 7. If the Walrasian demand function x(p, w) satisfies Walras’ law then
for all p and w
L ∂ x (p, w)
∑ p` `∂ w = 1.
`=1
Or in matrix notation
p · Dw x(p, w)p = 1.
D EFINITION 11. The Walrasian demand function x(p, w) satisfies the weak axiom of
revealed preference if the following property holds for any two price wealth situations
(p, w) and (p0 , w0 :

If p · x(p0 , w0 ) ≤ w and x(p0 , w0 ) 6= x(p, w) then p0 · x(p, w) > w0


Let us describe this in words. Suppose that we have had a change in prices and wealth
(say from (p, w) to (p0 , w0 )). The weak axiom then says that if the newly chosen bundle is
different from the old bundle and the new bundle had been affordable in the old situation
then the previously chosen bundle must no longer be affordable.
CHAPTER 4

Consumer Behaviour: Optimisation Subject to the Budget


Constraint

1. Constrained Maximisation
1.1. Lagrange Multipliers. Consider the problem of a consumer who seeks to dis-
tribute his income across the purchase of the two goods that he consumes, subject to the
constraint that he spends no more than his total income. Let us denote the amount of the
first good that he buys x1 and the amount of the second good x2 , the prices of the two
goods p1 and p2 , and the consumer’s income y. The utility that the consumer obtains from
consuming x1 units of good 1 and x2 of good two is denoted u(x1 , x2 ). Thus the consumer’s
problem is to maximise u(x1 , x2 ) subject to the constraint that p1 x1 + p2 x2 ≤ y. (We shall
soon write p1 x1 + p2 x2 = y, i.e., we shall assume that the consumer must spend all of his
income.) Before discussing the solution of this problem lets write it in a more “mathemat-
ical” way.
max u(x1 , x2 )
x1 ,x2
(5)
subject to p1 x1 + p2 x2 = y

We read this “Choose x1 and x2 to maximise u(x1 , x2 ) subject to the constraint that p1 x1 +
p2 x2 = y.”
Let us assume, as usual, that the indifference curves (i.e., the sets of points (x1 , x2 ) for
which u(x1 , x2 ) is a constant) are convex to the origin. Let us also assume that the indif-
ference curves are nice and smooth. Then the point (x1∗ , x2∗ ) that solves the maximisation
problem (5) is the point at which the indifference curve is tangent to the budget line as
given in Figure 1.
One thing we can say about the solution is that at the point (x1∗ , x2∗ ) it must be true that
the marginal utility with respect to good 1 divided by the price of good 1 must equal the
marginal utility with respect to good 2 divided by the price of good 2. For if this were
not true then the consumer could, by decreasing the consumption of the good for which
this ratio was lower and increasing the consumption of the other good, increase his utility.
Marginal utilities are, of course, just the partial derivatives of the utility function. Thus we
have
∂u ∗ ∗ ∂u ∗ ∗
∂ x1 (x1 , x2 ) ∂ x2 (x1 , x2 )
(6) = .
p1 p2
The argument we have just made seems very “economic.” It is easy to give an alternate
argument that does not explicitly refer to the economic intuition. Let x2u be the function
that defines the indifference curve through the point (x1∗ , x2∗ ), i.e.,
u(x1 , x2u (x1 )) ≡ ū ≡ u(x1∗ , x2∗ ).
Now, totally differentiating this identity gives
∂u ∂u dxu
(x1 , x2u (x1 )) + (x1 , x2u (x1 )) 2 (x1 ) = 0.
∂ x1 ∂ x2 dx1
29
30 4. CONSUMER BEHAVIOUR: OPTIMISATION SUBJECT TO THE BUDGET CONSTRAINT

x2
6

@
@
@
@
@
@
x2∗ qqqqqqqqqqqq@
qq@
q qq
q
qq @
q @ u(x1 , x2 ) = ū
qq
q @
qq @
q
qq
@
p1 x1 + p2 x2 = y
q @
q @ -
x1∗ x1

Figure 1

That is,
∂u u
dx2u ∂ x (x1 , x2 (x1 ))
(x1 ) = − ∂ u1 .
dx1 (x1 , xu (x1 ))
∂ x2 2
Now x2u (x1∗ ) = x2∗ . Thus the slope of the indifference curve at the point (x1∗ , x2∗ )
∂u ∗ ∗
dx2u ∗ ∂ x (x1 , x2 )
(x1 ) = − ∂ u1 .
dx1 (x∗ , x∗ )
∂ x2 1 2
p
Also, the slope of the budget line is − p1 .
Combining these two results again gives result
2
(6).
Since we also have another equation that (x1∗ , x2∗ ) must satisfy, viz
(7) p1 x1∗ + p2 x2∗ = y
we have two equations in two unknowns and we can (if we know what the utility function
is and what p1 , p2 , and y are) go happily away and solve the problem. (This isn’t quite true
but we shall not go into that at this point.) What we shall develop is a systemic and useful
way to obtain the conditions (6) and (7). Let us first denote the common value of the ratios
in (6) by λ . That is,
∂u ∗ ∗ ∂u ∗ ∗
∂ x1 (x1 , x2 ) ∂ x2 (x1 , x2 )
=λ =
p1 p2
and we can rewrite this and (7) as
∂u ∗ ∗
(x , x ) − λ p1 = 0
∂ x1 1 2
(8) ∂u ∗ ∗
(x , x ) − λ p2 = 0
∂ x2 1 2
y − p1 x1∗ − p2 x2∗ = 0.
1. CONSTRAINED MAXIMISATION 31

Now we have three equations in x1∗ , x2∗ , and the new artificial or auxiliary variable λ . Again
we can, perhaps, solve these equations for x1∗ , x2∗ , and λ . Consider the following function

(9) L (x1 , x2 , λ ) = u(x1 , x2 ) + λ (y − p1 x1 − p2 x2 )

This function is known as the Lagrangian. Now, if we calculate ∂∂ L ∂L ∂L


x1 , ∂ x2 , and, ∂ λ , and
set the results equal to zero we obtain exactly the equations given in (8). We now describe
this technique in a somewhat more general way.
Suppose that we have the following maximisation problem

max f (x1 , . . . , xn )
x1 ,...,xn
(10)
subject to g(x1 , . . . , xn ) = c

and we let

(11) L (x1 , . . . , xn , λ ) = f (x1 , . . . , xn ) + λ (c − g(x1 , . . . , xn ))

then if (x1∗ , . . . , xn∗ ) solves (10) there is a value of λ , say λ ∗ such that

∂L ∗
(12) (x , . . . , xn∗ , λ ∗ ) = 0 i = 1, . . . , n
∂ xi 1
∂L ∗
(13) (x , . . . , xn∗ , λ ∗ ) = 0.
∂λ 1
Notice that the conditions (12) are precisely the first order conditions for choosing
x1 , . . . , xn to maximise L , once λ ∗ has been chosen. This provides an intuition into this
method of solving the constrained maximisation problem. In the constrained problem we
have told the decision maker that he must satisfy g(x1 , . . . , xn ) = c and that he should choose
among all points that satisfy this constraint the point at which f (x1 , . . . , xn ) is greatest. We
arrive at the same answer if we tell the decision maker to choose any point he wishes but
that for each unit by which he violates the constraint g(x1 , . . . , xn ) = c we shall take away λ
units from his payoff. Of course we must be careful to choose λ to be the correct value. If
we choose λ too small the decision maker may choose to violate his constraint—e.g., if we
made the penalty for spending more than the consumer’s income very small the consumer
would choose to consume more goods than he could afford and to pay the penalty in utility
terms. On the other hand if we choose λ too large the decision maker may violate his
constraint in the other direction, e.g., the consumer would choose not to spend any of his
income and just receive λ units of utility for each unit of his income.
It is possible to give a more general statement of this technique, allowing for multiple
constraints. (Of course, we should always have fewer constraints than we have variables.)
Suppose we have more than one constraint. Consider the problem

max f (x1 , . . . , xn )
x1 ,...,xn

subject to g1 (x1 , . . . , xn ) = c1
.. ..
. .
gm (x1 , . . . , xn ) = cm .

Again we construct the Lagrangian

L (x1 , . . . , xn , λ1 , . . . , λm ) = f (x1 , . . . , xn )
(14)
+ λ1 (c1 − g1 (x1 , . . . , xn )) + · · · + λm (cm − gm (x1 , . . . , xn ))
32 4. CONSUMER BEHAVIOUR: OPTIMISATION SUBJECT TO THE BUDGET CONSTRAINT

and again if (x1∗ , . . . , xn∗ ) solves (14) there are values of λ , say λ1∗ , . . . , λm∗ such that
∂L ∗
(x , . . . , xn∗ , λ1∗ , . . . , λm∗ ) = 0 i = 1, . . . , n
∂ xi 1
(15)
∂L ∗
(x , . . . , xn∗ , λ1∗ , . . . , λm∗ ) = 0 j = 1, . . . , m.
∂λj 1

1.2. Caveats and Extensions. Notice that we have been referring to the set of condi-
tions which a solution to the maximisation problem must satisfy. (We call such conditions
necessary conditions.) So far we have not even claimed that there necessarily is a solution
to the maximisation problem. There are many examples of maximisation problems which
have no solution. One example of an unconstrained problem with no solution is
(16) max 2x
x
maximise over the choice of x the function 2x. Clearly the greater we make x the greater is
2x, and so, since there is no upper bound on x there is no maximum. Thus we might want
to restrict maximisation problems to those in which we choose x from some bounded set.
Again, this is not enough. Consider the problem
(17) max 1/x .
0≤x≤1

The smaller we make x the greater is 1/x and yet at zero 1/x is not even defined. We could
define the function to take on some value at zero, say 7. But then the function would not
be continuous. Or we could leave zero out of the feasible set for x, say 0 < x ≤ 1. Then
the set of feasible x is not closed. Since there would obviously still be no solution to the
maximisation problem in these cases we shall want to restrict maximisation problems to
those in which we choose x to maximise some continuous function from some closed (and
because of the previous example) bounded set. (We call a set of numbers, or more generally
a set of vectors, that is both closed and bounded a compact set.) Is there anything else that
could go wrong? No! The following result says that if the function to be maximised is
continuous and the set over which we are choosing is both closed and bounded, i.e., is
compact, then there is a solution to the maximisation problem.
T HEOREM 2 (The Weierstrauss Theorem). Let S be a compact set. Let f be a continu-
ous function that takes each point in S to a real number. (We usually write: let f : S → R be
continuous.) Then there is some x∗ in S at which the function is maximised. More precisely,
there is some x∗ in S such that f (x∗ ) ≥ f (x) for any x in S.
Notice that in defining such compact sets we typically use inequalities, such as x ≥ 0.
However in Section 1 we did not consider such constraints, but rather considered only
equality constraints. However, even in the example of utility maximisation at the beginning
of Section 1.1, there were implicitly constraints on x1 and x2 of the form
x1 ≥ 0, x2 ≥ 0.
A truly satisfactory treatment would make such constraints explicit. It is possible to ex-
plicitly treat the maximisation problem with inequality constraints, at the price of a little
additional complexity. We shall return to this question later in the book.
Also, notice that had we wished to solve a minimisation problem we could have trans-
formed the problem into a maximisation problem by simply multiplying the objective func-
tion by −1. That is, if we wish to minimise f (x) we could do so by maximising − f (x).
As an exercise write out the conditions analogous to the conditions (8) for the case that we
wanted to minimise u(x). Notice that if x1∗ , x2∗ , and λ satisfy the original equations then x1∗ ,
x2∗ , and −λ satisfy the new equations. Thus we cannot tell whether there is a maximum
at (x1∗ , x2∗ ) or a minimum. This corresponds to the fact that in the case of a function of a
single variable over an unconstrained domain at a maximum we require the first derivative
2. THE THEOREM OF THE MAXIMUM 33

to be zero, but that to know for sure that we have a maximum we must look at the second
derivative. We shall not develop the analogous conditions for the constrained problem with
many variables here. However, again, we shall return to it later in the book.

2. The Theorem of the Maximum


Often in economics we are not so much interested in what the solution to a particular
maximisation problem is but rather wish to know how the solution to a parameterised
problem depends on the parameters. Thus in our first example of utility maximisation
we might be interested not so much in what the solution to the maximisation problem is
when p1 = 2, p2 = 7, and y = 25, but rather in how the solution depends on p1 , p2 , and
y. (That is, we might be interested in the demand function.) Sometimes we shall also be
interested in how the maximised function depends on the parameters—in the example how
the maximised utility depends on p1 , p2 , and y.
This raises a number of questions. In order for us to speak meaningfully of a de-
mand function it should be the case that the maximisation problem has a unique solution.
Further, we would like to know if the “demand” function is continuous—or even if it is
differentiable. Consider again the problem (14), but this time let us explicitly add some
parameters.
max f (x1 , . . . , xn , a1 , . . . , ak )
x1 ,...,xn

subject to g1 (x1 , . . . , xn , a1 , . . . , ak ) = c1
(18)
.. ..
. .
gm (x1 , . . . , xn , a1 , . . . , ak ) = cm
In order to be able to say whether or not the problem has a unique solution it is useful
to know something about the shape or curvature of the functions f and g. We say a function
is concave if for any two points in the domain of the function the value of function at a
weighted average of the two points is greater than the weighted average of the value of the
function at the two points. We say the function is convex if the value of the function at the
average is less than the average of the values. The following definition makes this a little
more explicit. (In both definitions x = (x1 , . . . , xn ) is a vector.)
D EFINITION 12. A function f is concave if for any x and x0 with x 6= x0 and for any t
such that 0 < t < 1 we have f (tx + (1 −t)x0 ) ≥ t f (x) + (1 −t) f (x0 ). The function is strictly
concave if f (tx + (1 − t)x0 ) > t f (x) + (1 − t) f (x0 ).
A function f is convex if for any x and x0 with x 6= x0 and for any t such that 0 <
t < 1 we have f (tx + (1 − t)x0 ) ≤ t f (x) + (1 − t) f (x0 ). The function is strictly convex if
f (tx + (1 − t)x0 ) < t f (x) + (1 − t) f (x0 ).
The result we are about to give is most conveniently stated when our statement of the
problem is in terms of inequality constraints rather than equality constraints. As mentioned
earlier we shall examine this kind of problem later in this course. However for the moment
in order to proceed with our discussion of the problem involving equality constraints we
shall assume that all of the functions with which we are dealing are increasing in the x
variables. (See Exercise 1 for a formal definition of what it means for a function to be
increasing.) In this case if f is strictly concave and g j is convex for each j then the prob-
lem has a unique solution. In fact the concepts of concavity and convexity are somewhat
stronger than is required. We shall see later in the course that they can be replaced by the
concepts of quasi-concavity and quasi-convexity. In some sense these latter concepts are
the “right” concepts for this result.
T HEOREM 3. Suppose that f and g j are increasing in (x1 , . . . , xn ). If f is strictly
concave in (x1 , . . . , xn ) and g j is convex in (x1 , . . . , xn ) for j = 1, . . . , m then for each value
34 4. CONSUMER BEHAVIOUR: OPTIMISATION SUBJECT TO THE BUDGET CONSTRAINT

of the parameters (a1 , . . . , ak ) if problem (18) has a solution (x1∗ , . . . , xn∗ ) that solution is
unique.
Now let v(a1 , . . . , ak ) be the maximised value of f when the parameters are (a1 , . . . , ak ).
Let us suppose that the problem is such that the solution is unique and that (x1∗ (a1 , . . . , ak ), . . . , xn∗ (a1 , . . . , ak ))
are the values that maximise the function f when the parameters are (a1 , . . . , ak ) then
(19) v(a1 , . . . , ak ) = f (x1∗ (a1 , . . . , ak ), . . . , xn∗ (a1 , . . . , ak ), a1 , . . . , ak ).
(Notice however that the function v is uniquely defined even if there is not a unique max-
imiser.)
The Theorem of the Maximum gives conditions on the problem under which the func-
tion v and the functions x1∗ , . . . , xn∗ are continuous. The constraints in the problem (18)
define a set of feasible vectors x over which the function f is to be maximised. Let us call
this set G(a1 , . . . , ak ), i.e.,
(20) G(a1 , . . . , ak ) = {(x1 , . . . , xn ) | g j (x1 , . . . , xn , a1 , . . . , ak ) = c j ∀ j}
Now we can restate the problem as
max f (x1 , . . . , xn , a1 , . . . , ak )
x1 ,...,xn
(21)
subject to (x1 , . . . , xn ) ∈ G(a1 , . . . , ak ).
Notice that both the function f and the feasible set G depend on the parameters a,
i.e., both may change as a changes. The Theorem of the Maximum requires both that the
function f be continuous as a function of x and a and that the feasible set G(a1 , . . . , ak )
change continuously as a changes. We already know—or should know—what it means for
f to be continuous but the notion of what it means for a set to change continuously is less
elementary. We call G a set valued function or a correspondence. G associates with any
vector (a1 , . . . , ak ) a subset of the vectors (x1 , . . . , xn ). The following two definitions define
what we mean by a correspondence being continuous. First we define what it means for
two sets to be close.
D EFINITION 13. Two sets of vectors A and B are within ε of each other if for any
vector x in one set there is a vector x0 in the other set such that x0 is within ε of x.
We can now define the continuity of the correspondence G in essentially the same way
that we define the continuity of a single valued function.
D EFINITION 14. The correspondence G is continuous at (a1 , . . . , ak ) if for any ε > 0
there is δ > 0 such that if (a01 , . . . , a0k ) is within δ of (a1 , . . . , ak ) then G(a01 , . . . , a0k ) is within
ε of G(a1 , . . . , ak ).
It is, unfortunately, not the case that the continuity of the functions g j necessarily im-
plies the continuity of the feasible set. (Exercise 2 asks you to construct a counterexample.)
R EMARK 1. It is possible to define two weaker notions of continuity, which we call
upper hemicontinuity and lower hemicontinuity. A correspondence is in fact continuous in
the way we have defined it if it is both upper hemicontinuous and lower hemicontinuous.
We are now in a position to state the Theorem of the Maximum. We assume that f is
a continuous function, that G is a continuous correspondence, and that for any (a1 , . . . , ak )
the set G(a1 , . . . , ak ) is compact. The Weierstrauss Theorem thus guarantees that there is a
solution to the maximisation problem (21) for any (a1 , . . . , ak ).
T HEOREM 4 (Theorem of the Maximum). Suppose that f (x1 , . . . , xn , a1 , . . . , ak ) is
continuous (in (x1 , . . . , xn , a1 , . . . , ak )), that G(a1 , . . . , ak ) is a continuous correspondence,
and that for any (a1 , . . . , ak ) the set G(a1 , . . . , ak ) is compact. Then
(1) v(a1 , . . . , ak ) is continuous, and
3. THE ENVELOPE THEOREM 35

(2) if (x1∗ (a1 , . . . , ak ), . . . , xn∗ (a1 , . . . , ak )) are (single valued) functions then they are
also continuous.
Later in the course we shall see how the Implicit Function Theorem allows us to iden-
tify conditions under which the functions v and x∗ are differentiable.
Exercises.
E XERCISE 31. We say that the function f (x1 , . . . , xn ) is nondecreasing if xi0 ≥ xi for
each i implies that f (x10 , . . . , xn0 ) ≥ f (x1 , . . . , xn ), is increasing if xi0 > xi for each i implies
that f (x10 , . . . , xn0 ) > f (x1 , . . . , xn ) and is strictly increasing if xi0 ≥ xi for each i and x0j > x j
for at least one j implies that f (x10 , . . . , xn0 ) > f (x1 , . . . , xn ). Show that if f is nondecreasing
and strictly concave then it must be strictly increasing. [Hint: This is very easy.]
E XERCISE 32. Show by example that even if the functions g j are continuous the
correspondence G may not be continuous. [Hint: Use the case n = m = k = 1.]

3. The Envelope Theorem


In this section we examine a theorem that is particularly useful in the study of con-
sumer and producer theory. There is in fact nothing mysterious about this theorem. You
will see that the proof of this theorem is simply calculation and a number of substitutions.
Moreover the theorem has a very clear intuition. It is this: Suppose we are at a maximum
(in an unconstrained problem) and we change the data of the problem by a very small
amount. Now both the solution of the problem and the value at the maximum will change.
However at a maximum the function is flat (the first derivative is zero). Thus when we want
to know by how much the maximised value has changed it does not matter (very much)
whether or not we take account of how the maximiser changes or not. See Figure 2. The
intuition for a constrained problem is similar and only a little more complicated.
f (x, a)
6

f (x∗ (a0 ), a0 ) qq qq qq qq qq qq qq qq qq qq qq qq qq qq qq qq q q q q qq
f (x∗ (a), a0 ) qq q
qq
q f (·, a0 )
qqqqqqqqqqqqqqqq q
f (x∗ (a), a) qq qq
qq q
qq
q
qq q
qq
qq q
qq qq
q q f (·, a)
qq qq
q q -
x∗ (a) x∗ (a0 ) x

Figure 2

To motivate our discussion of the Envelope Theorem we will first consider a particular
case, viz, the relation between short and long run average cost curves. Recall that, in
36 4. CONSUMER BEHAVIOUR: OPTIMISATION SUBJECT TO THE BUDGET CONSTRAINT

general we assume that the average cost of producing some good is a function of the amount
of the good to be produced. The short run average cost function is defined to be the function
which for any quantity, Q, gives the average cost of producing that quantity, taking as given
the scale of operation, i.e., the size and number of plants and other fixed capital which we
assume cannot be changed in the short run (whatever that is). The long run average cost
function on the other hand gives, as a function of Q, the average cost of producing Q units
of the good, with the scale of operation selected to be the optimal scale for that level of
production.
That is, if we let the scale of operation be measured by a single variable k, say, and
we let the short run average cost of producing Q units when the scale is k be given by
SRAC(Q, k) and the long run average cost of producing Q units by LRAC(Q) then we have
LRAC(Q) = min SRAC(Q, k).
k

Let us denote, for a given value Q, the optimal level of k by k(Q). That is, k(Q) is the value
of k that minimises the right hand side of the above equation.
Graphically, for any fixed level of k the short run average cost function can be rep-
resented by a curve (normally assumed to be U-shaped) drawn in two dimensions with
quantity on the horizontal axis and cost on the vertical axis. Now think about drawing one
short run average cost curve for each of the (infinite) possible values of k. One way of
thinking about the long run average cost curve is as the “bottom” or envelope of these short
run average cost curves. Suppose that we consider a point on this long run or envelope
curve. What can be said about the slope of the long run average cost curve at this point. A
little thought should convince you that it should be the same as the slope of the short run
curve through the same point. (If it were not then that short run curve would come below
the long run curve, a contradiction.) That is,
d LRAC(Q) ∂ SRAC(Q, k(Q))
= .
dQ ∂Q
See Figure 3.

Cost
6

SRAC

LRAC(Q̄) =
SRAC(Q̄, k(Q̄)) q q q q q q q q q q q q q q q qq
qq
q
qq LRAC
q
qq
q
qq
q
qq -
Q̄ Q

Figure 3
3. THE ENVELOPE THEOREM 37

The envelope theorem is a general statement of the result of which this is a special
case. We will consider not only cases in which Q and k are vectors, but also cases in which
the maximisation or minimisation problem includes some constraints.
Let us consider again the maximisation problem (18). Recall:
max f (x1 , . . . , xn , a1 , . . . , ak )
x1 ,...,xn

subject to g1 (x1 , . . . , xn , a1 , . . . , ak ) = c1
.. ..
. .
gm (x1 , . . . , xn , a1 , . . . , ak ) = cm
Again let L (x1 , . . . , xn , λ1 , . . . , λm ; a1 , . . . , ak ) be the Lagrangian function.
L (x1 , . . . , xn , λ1 , . . . , λm ; a1 , . . . , ak ) = f (x1 , . . . , xn , a1 , . . . , ak )
m
(22)
+ ∑ λ j (c j − g j (x1 , . . . , xn , a1 , . . . , ak )).
j=1

Let (x1∗ (a1 , . . . , ak ), . . . , xn∗ (a1 , . . . , ak )) and (λ1 (a1 , . . . , ak ), . . . , λm (a1 , . . . , ak )) be the values
of x and λ that solve this problem. Now let
(23) v(a1 , . . . , ak ) = f (x1∗ (a1 , . . . , ak ), . . . , xn∗ (a1 , . . . , ak ), a1 , . . . , ak )
That is, v(a1 , . . . , ak ) is the maximised value of the function f when the parameters are
(a1 , . . . , ak ). The envelope theorem says that the derivative of v is equal to the derivative of
L at the maximising values of x and λ . Or, more precisely
T HEOREM 5 (The Envelope Theorem). If all functions are defined as above and the
problem is such that the functions x∗ and λ are well defined then
∂v ∂L ∗
(a1 , . . . , ak ) = (x (a , . . . , ak ), . . . , xn∗ (a1 , . . . , ak ),
∂ ah ∂ ah 1 1
λ1 (a1 , . . . , ak ), . . . , λm (a1 , . . . , ak ), a1 , . . . , ak )
∂f ∗
= (x (a , . . . , ak ), . . . , xn∗ (a1 , . . . , ak ), a1 , . . . , ak )
∂ ah 1 1
m ∂g
− ∑ λ j (a1 , . . . , ak ) h (x1∗ (a1 , . . . , ak ), . . . , xn∗ (a1 , . . . , ak ), a1 , . . . , ak )
j=1 ∂ ah
for all h.
In order to show the advantages of using matrix and vector notation we shall restate
the theorem in that notation before returning to give a proof of the theorem. (In proving
the theorem we shall return to using mainly scalar notation.)
T HEOREM 5 (The Envelope Theorem). Under the same conditions as above
∂v ∂L ∗
(a) = (x (a), λ (a), a)
∂a ∂a
∂f ∗ ∂g
= (x (a), a) − λ (a) (x∗ (a), a).
∂a ∂a
P ROOF. From the definition of the function v we have
(24) v(a1 , . . . , ak ) = f (x1∗ (a1 , . . . , ak ), . . . , xn∗ (a1 , . . . , ak ), a1 , . . . , ak )
Thus
∂v ∂f ∗ n
∂f ∗ ∂ x∗
(25) (a) = (x (a), a) + ∑ (x (a), a) i (a).
∂ ah ∂ ah i=1 ∂ xi ∂ ah
38 4. CONSUMER BEHAVIOUR: OPTIMISATION SUBJECT TO THE BUDGET CONSTRAINT

Now, from the first order conditions (12) we have


m ∂gj ∗
∂f ∗
(x (a), a) − ∑ λ j (a) (x (a), a) = 0.
∂ xi j=1 ∂ xi
Or
m ∂gj
∂f ∗
(26)
∂ xi
(x (a), a) = ∑ λ j (a) ∂ x (x∗ (a), a).
j=1 i

Also, since x∗ (a) satisfies the constraints we have, for each j


g j (x1∗ (a), . . . , xn∗ (a), a1 , . . . , ak ) ≡ c j .
And, since this holds as an identity, we may differentiate both sides with respect to ah
giving
n ∂g
∂ x∗ ∂gj
∑ ∂ x (x∗ (a), a) ∂ ai (a) + ∂ a (x∗ (a), a) = 0.
j

i=1 i h h
Or
n ∂gj ∂ xi∗ ∂gj ∗
(27) ∑ ∂x (x∗ (a), a)
∂ ah
(a) = −
∂ ah
(x (a), a).
i=1 i

Substituting (26) into (25) gives


∂v ∂f ∗ n m ∂gj ∗ ∂ x∗
(a) = (x (a), a) + ∑ [ ∑ λ j (a) (x (a), a)] i (a).
∂ ah ∂ ah i=1 j=1 ∂ xi ∂ ah
Changing the order of summation gives
∂v ∂f ∗ m n ∂g
∂ x∗
(x (a), a) + ∑ λ j (a)[ ∑
j ∗
(28) (a) = (x (a), a) i (a)].
∂ ah ∂ ah j=1 i=1 ∂ xi ∂ ah
And now substituting (27) into (28) gives
m ∂gj ∗
∂v ∂f ∗
(a) = (x (a), a) − ∑ λ j (a) (x (a), a),
∂ ah ∂ ah j=1 ∂ ah
which is the required result. 

Exercises.
E XERCISE 33. Rewrite this proof using matrix notation. Go through your proof and
identify the dimension of each of the vectors or matrices you use. For example fx is a 1 × n
vector, gx is an m × n matrix.

4. Applications to Microeconomic Theory


4.1. Utility Maximisation. Let us again consider the problem given in (5)
max u(x1 , x2 )
x1 ,x2
subject to p1 x1 + p2 x2 − y = 0.
Let v(p1 , p2 , y) be the maximised value of u when prices and income are p1 , p2 , and y. Let
us consider the effect of a change in y with p1 and p2 remaining constant. By the Envelope
Theorem
∂v ∂
= {u(x1 , x2 ) + λ (y − p1 x1 + p2 x2 )} = 0 + λ 1 = λ .
∂y ∂y
This is the familiar result that λ is the marginal utility of income.
4. APPLICATIONS TO MICROECONOMIC THEORY 39

4.2. Expenditure Minimisation. Let us consider the problem of minimising expen-


diture subject to attaining a given level of utility, i.e.,
n
min
x1 ,...,xn
∑ pi xi
i=1
subject to u(x1 , . . . , xn ) − u0 = 0.

Let the minimised value of the expenditure function be denoted by


e(p1 , . . . , pn , u0 ). Then by the Envelope Theorem we obtain
n
∂e
{ ∑ pi xi + λ (u0 − u(x1 , . . . , xn ))} = xi − λ 0 = xi

=
∂ pi ∂ pi i=1

when evaluated at the point which solves the minimisation problem which we write as
hi (p1 , . . . , pn , u0 ) to distinguish this (compensated) value of the demand for good i as a
function of prices and utility from the (uncompensated) value of the demand for good i as
a function of prices and income. This result is known as Hotelling’s Theorem.

4.3. The Hicks-Slutsky Equations. It can be shown that the compensated demand at
utility u0 , i.e., hi (p1 , . . . , pn , u0 ) is equal to the uncompensated demand at income e(p1 , . . . , pn , u0 ),
i.e., xi (p1 , . . . , pn , e(p1 , . . . , pn , u0 )). (This result is known as the duality theorem.) Thus
totally differentiating the identity

xi (p1 , . . . , pn , e(p1 , . . . , pn , u0 )) ≡ hi (p1 , . . . , pn , u0 )

with respect to pk we obtain


∂ xi ∂ xi ∂ e ∂ hi
+ =
∂ pk ∂ y ∂ pk ∂ pk
which by Hotelling’s Theorem gives

∂ xi ∂ xi ∂ hi
+ h = .
∂ pk ∂ y k ∂ pk

So
∂ xi ∂ hi ∂x
= − hk i
∂ pk ∂ pk ∂y
for all i, k = 1, . . . , n. These are the Hicks-Slutsky equations.

4.4. The Indirect Utility Function. Again let v(p1 , . . . , pn , y) be the indirect utility
function, that is, the maximised value of utility as described in Application (1). Then by
the Envelope Theorem

∂v ∂u
= − λ xi (p1 , . . . , pn , y) = −λ xi (p1 , . . . , pn , y)
∂ pi ∂ pi
∂u ∂v
since ∂ pi = 0. Now, since we have already shown that λ = ∂y (in Section 4.1) we have

∂ v/∂ pi
xi (p1 , . . . , pn , y) = − .
∂ v/∂ y

This is known as Roy’s Theorem.


40 4. CONSUMER BEHAVIOUR: OPTIMISATION SUBJECT TO THE BUDGET CONSTRAINT

4.5. Profit functions. Now consider the problem of a firm that maximises profits
subject to technology constraints. Let x = (x1 , . . . , xn ) be a vector of netputs, i.e., xi is
positive if the firm is a net supplier of good i, negative if the firm is a net user of that good.
Let assume that we can write the technology constraints as F(x) = 0. Thus the firm’s
problem is
n
max
x1 ,...,xn
∑ pi xi
i=1
subject to F(x1 , . . . , xn ) = 0.
Let ϕi (p) be the value of xi that solves this problem, i.e., the net supply of commodity
i when prices are p. (Here p is a vector.) We call the maximised value the profit function
which is given by
n
Π(p) = ∑ pi ϕi (p).
i=1
And so by the Envelope Theorem
∂Π
= ϕi (p).
∂ pi
This result is known as Hotelling’s lemma.
Exercises.
E XERCISE 34. Consider the direct utility function
n
u(x) = ∑ βi log(xi − γi ),
i=1
where βi and γi , i = 1, . . . , n are, respectively, positive and nonpositive parameters.
(1) Derive the indirect utility function and show that it is decreasing in its arguments.
(2) Verify Roy’s Theorem.
(3) Derive the expenditure function and show that it is homogeneous of degree one
and nondecreasing in prices.
(4) Verify Hotelling’s Theorem.

E XERCISE 35. For the utility function defined in exercise 2,


(1) Derive the Slutsky equation.
(2) Let di (p, y) be the demand for good i derived from the above utility function.
Goods i and j are said to be gross substitutes if ∂ di (p, y)/∂ p j > 0 and gross
complements if ∂ di (p, y)/∂ p j < 0. For this utility function are the various goods
gross substitutes, gross complements, or can we not say?
(The two previous exercises are taken from R. Robert Russell and Maurice Wilkinson,
Microeconomics: A Synthesis of Modern and Neoclassical Theory, New York, John Wiley
& Sons, 1979.)
E XERCISE 36. An electric utility has two generating plants in which total costs per
hour are c1 and c2 respectively where
c1 =80 + 2x1 + 0.001bx12
b >0
c2 =90 + 1.5x2 + 0.002x22
where xi is the quantity generated in the i-th plant. If the utility is required to produce 2000
megawatts in a particular hour, how should it allocate this load between the plants so as
to minimise costs? Use the Lagrangian method and interpret the multiplier. How do total
4. APPLICATIONS TO MICROECONOMIC THEORY 41

costs vary as b changes. (That is, what is the derivative of the minimised cost with respect
to b.)
CHAPTER 5

Decision Making Under Uncertainty

In this chapter we examine decision making under uncertainty. This is another special
case (like consumer theory) of the general theory of motivated decision making that we
developed in the previous chapter. As was the case with consumer theory the extra structure
that our model of decision making under uncertainty puts on the problem allows us to say
a bit more than we could in the very general case. It is well to remember however that
there are some facts about this model that follow simply from the fact that this model is a
special case of the general model of motivated decision making.
It is possible to divide the models of decision making under uncertainty into those
in which the uncertainty is assumed to have an objective character with uncertain events
having a particular objectively given probability and those in which the subjective prob-
ability that a decision maker assigns to a given event is derived from her preferences or
choice behaviour along with her utility function. we shall develop here only the first kind
of model.

1. Revision of Probability Theory


We start with a brief introduction to some of the essential elements of probability
theory. What we want to do is to associate to a number of different events in which we
are interested a probability. Thus we might be interested in the probability that it will rain
tomorrow. We might also be interested in the probability that there will be an electrical
failure. We might also be interested in the probability that both these events occur, or in
the probability that either occur. Moreover we might also be interested in knowing what
the probability that there will be an electrical failure tomorrow, given that it rains. This last
kind of probability we call conditional probability.
We can start our analysis by specifying a set of atomistic states, a completely exhaus-
tive and mutually exclusive list of complete descriptions of all those aspects of reality in
which we are interested. Thus, once we have specified the state we know exactly what
has happened. There will thus, in general be a large number of states. For the purposes of
this section we shall assume that this set is finite. To each of these states is associated a
probability, which since we have assumed the set of states finite, we might as well assume
strictly positive.
D EFINITION 15. A finite probability space is a pair (Ω, p) where Ω is a finite set and
p : Ω → R++ is a strictly positive function such that ∑ω∈Ω p(ω) = 1.
The content of the last part of the definition is that if we add up the probability of
each of the states we should get 1. This corresponds to the idea that the list of states is
completely exhaustive, that is, that it includes everything that might possibly happen.
An event is some subset of Ω. The probability of some event is the sum of the proba-
bilities of the states contained in the event. Let us denote the probability of the event E by
P(E). Thus
P(E) = ∑ p(ω).
ω∈E
The idea of conditional probability of an event F conditional on another event E is the
probability of that part of E that is in F relative to the whole of E. Suppose that we know
43
44 5. DECISION MAKING UNDER UNCERTAINTY

that the event E has occurred (and that this is all we know). Thus we have essentially a
new probability distribution in which the states outside E are given probability zero and
those in E are given a proportionately greater probability so that the sum of the probability
of the states in E is 1. To calculate the probability of F in this new situation we would add
the new probabilities for all those states in F, or, since the states outside E have probability
zero, the probabilities of those states that are in both E and F. That is,

∑ p(ω)
ω∈(E∩F) P(E ∩ F)
P(F | E) = = .
∑ p(ω) P(E)
ω∈E

From these definitions it is possible to derive a number of results about the probability
of events.
First we see that
P(E ∪ F) = P(E) + P(F) − P(E ∩ F).
This is the easiest implication of the definitions and you are left as an exercise to convince
yourself of its truth. (Constructing a proof of the claim is clearly a good way of doing this.)
Rewriting the definition of conditional probability gives
(29) P(E ∩ F) = P(E) · P(F | E).

1.1. Bayes’ Theorem. We turn now to something just a little less trivial. Consider a
situation in which we have some prior assessment of the uncertainty of a particular event.
For example we might believe that the chance that it will rain is 30% or 0.3. (If this seems
a bit unrealistic pretend that we are not in Auckland.)
Now we are sitting watching TV and the weather person tells us that it’s going to rain
tomorrow. How should we adjust our assessment of the probability that it will rain tomor-
row? Suppose that we know that the weather person is pretty good and makes mistakes
with probability only 10% or 0.1. That is, if it’s actually going to rain she says it’s going to
rain with probability 0.9 and says it’s not going to rain with probability 0.1. On the other
hand, if it’s not going to rain she says it’s going to rain with probability 0.1 and says it’s not
going to rain with probability 0.9. Think about this problem and make some calculation
of what you think your assessment of the probability of rain would be after hearing the
weather person’s prediction that it will rain.
As another example suppose that there was screening for some rare disease. Suppose
that one person in ten thousand actually has the disease. Suppose that the test used to show
if a person has the disease is quite good. If the person has the disease the test will positive
with certainty. If the person does not have the disease the test will give a negative result 99
times out of 100, but will one time out of 100, that is, with probability 0.01, give a positive
result. (Such a result is called a false positive.) If a randomly selected person is tested and
gives a positive result what would be your assessment of the probability that the person
actually had the disease? Again, think about this problem and try to formulate an answer
before reading further.
These problems are special cases of a more general situation. Let us suppose that we
have a list of mutually exclusive and completely exhaustive events E1 , . . . , ET . We know
the (unconditional) probability of each of these events. Now we observe the result of an
experiment, say that event F has occurred. Our analysis tells us, for each t the probability
of F conditional on Et . What can we say about the probability of Et conditional on the
result F?
Well, the definition of conditional probability tells us that
P(Et ∩ F)
(30) P(Et | F) = .
P(F)
1. REVISION OF PROBABILITY THEORY 45

And it also tells us (in the form given in equation 29) that
(31) P(Et ∩ F) = P(Et )P(F | Et ).
Now we don’t know P(F). But since the events E1 , . . . , ET are mutually exclusive so are
E1 ∩ F, E2 ∩ F, . . . , ET ∩ F. And since E1 , . . . , ET are completely exhaustive E1 ∪ E2 ∪ · · · ∪
ET = Ω. Thus
F = F ∩Ω
= F ∩ (E1 ∪ E2 ∪ · · · ∪ ET )
= (F ∩ E1 ) ∪ (F ∩ E2 ) ∪ · · · ∪ (F ∩ ET )
= (E1 ∩ F) ∪ (E2 ∩ F) ∪ · · · ∪ (ET ∩ F).
And so
P(F) = P((E1 ∩ F) ∪ (E2 ∩ F) ∪ · · · ∪ (ET ∩ F))
(32) = P(E1 ∩ F) + P(E2 ∩ F) + · · · + P(ET ∩ F)
since the events E1 ∩ F, E2 ∩ F, . . . , ET ∩ F are mutually exclusive.
Collecting these results we substitute 31 and 32 into 30 to get
P(Et )P(F | Et )
P(Et | F) =
P(E1 ∩ F) + P(E2 ∩ F) + · · · + P(ET ∩ F)
P(Et )P(F | Et )
(33) = .
P(E1 )P(F | E1 ) + P(E2 )P(F | E2 ) + · · · + P(ET )P(F | ET )
This result is called Bayes’ Theorem.
Let us go back to our examples. In the first example let E1 be the event that it rains
tomorrow and E2 the event that it doesn’t. Let F be the event that the weather person says
that it rains tomorrow. Thus we want to know P(E1 | F). Bayes’ Theorem tells us that
P(E1 )P(F | Et )
P(E1 | F) =
P(E1 )P(F | E1 ) + P(E2 )P(F | E2 )
0.3 × 0.9
=
0.3 × 0.9 + 0.7 × 0.1
0.27
=
0.27 + 0.07
27
= + 0.79.
34
That is, once we hear the weather person’s prediction we think there is about an 80%
chance of rain tomorrow.
In the second example let E1 be the event that the person has the disease and E2 the
event that they don’t. Let F be the event that the test result is positive. Thus, again, we
want to know P(E1 | F) and Bayes’ Theorem tells us that
P(E1 )P(F | Et )
P(E1 | F) =
P(E1 )P(F | E1 ) + P(E2 )P(F | E2 )
0.0001 × 1
=
0.0001 × 1 + 0.9999 × 0.01
0.0001
=
0.0001 + 0.009999
100
= + 0.01.
10099
In this case even people who have tested positively have only about a one in a hundred
chance of having the disease. Intuitively the reason for this perhaps initially surprising
result is that the disease is so rare that it is much more likely that the person will not have
the disease and the result will be incorrect than that the person will have the disease.
46 5. DECISION MAKING UNDER UNCERTAINTY

1.2. Random Variables. Another important concept is that of a random variable. A


random variable is defined to be a function on the probability space Ω. (If Ω is infinite
we add a technical requirement called measurability, but we won’t be concerned with that
here.) Thus a real-valued random variable is a function that maps Ω to R. For example the
real-valued random variable X would associate with each element ω of Ω a real number
X(ω).
For a real-valued random variable we also define two additional concepts, the expec-
tation or mean of the random variable and its variance. The expectation of the random
variable X is defined to be
E(X) = ∑ p(ω)X(ω).
ω∈Ω
Often the notation µX might be used for E(X). Also the brackets are often not used so the
same concept is denoted EX.
The variance of X is defined to be the expectation of (X − E(X))2 , or
Var(X) = ∑ p(ω)(X(ω) − E(X))2 .
ω∈Ω

Often the notation σX2


might be used for Var(X).
Given two real-valued random variables X and Y we define Cov(X,Y ) to be the expec-
tation of (X − E(X))(Y − E(Y )), or
Cov(X,Y ) = ∑ p(ω)(X(ω) − E(X))(Y (ω) − E(Y )).
ω∈Ω
Often the notation σXY might be used for Cov(XY ).

2. Preferences Over Lotteries


We are now ready to introduce the basic setup in which we shall discuss decision
making under uncertainty. Here the basic building block is the idea of a lottery. A lottery
might be thought of as a random variable in the language of the previous section. However
here we shall keep the notions of a probability space well in the background. Most of the
material (and all of the notation) in this section is taken from ?.
The situation we model is one in which a decision maker chooses among a number
of alternatives or acts. Each alternative is risky and may result in any one of a number of
outcomes or consequences. The decision maker does not know when she chooses which
of the outcomes will actually occur.
More formally we let C be the set of all possible outcomes. We shall assume that
C = {1, 2, . . . , N} is finite to avoid some technical difficulties.
A lottery is a random variable taking values in C. We can however define it without
reference to an underlying probability space.
D EFINITION 16. A simple lottery L is a list L = (p1 , p2 , . . . , pN ) with pn ≥ 0 for each
n and ∑n pn = 1.
The value pn is to be thought of as the probability of outcome n occurring.
We could also think of a situation in which the outcomes themselves could be lotteries.
This leads us to the notion of a compound lottery.
D EFINITION 17. Given K simple lotteries Lk = (pk1 , pk2 , . . . , pkN ), k = 1, 2, . . . , K, and
probabilities αk ≥ 0 with ∑k = 1, the compound lottery (L1 , L2 , . . . , LK ; α1 , α2 , . . . , αK ) is
the risky alternative that yields the simple lottery Lk with probability αk for k = 1, 2, . . . , K.
Given a compound lottery (L1 , L2 , . . . , LK ; α1 , α2 , . . . , αK ) we can calculate the re-
duced lottery that leads to the same distribution over outcomes. This is the simple lottery
(p1 , p2 , . . . , pN ) with
pn = α1 p1n + α2 p2n + . . . αK pKn
3. EXPECTED UTILITY FUNCTIONS 47

for n = 1, 2, . . . , N, or, in vector notation


L = α1 L1 + α2 L2 + . . . αK LK .
We assume that the preferences of the decision maker are such that only the reduced
lotteries are relevant to the decision maker. That is if one simple lottery is at least as
good as another then any compound lottery that reduces to the first is at least as good
as any compound lottery that reduces to the second. Another way of saying this is that
only the consequences matter to the decision maker not how they are arrived at. Thus
this assumption is called the consequentialist assumption. (Or, perhaps more accurately a
consequentialist assumption since the same general idea has application in many settings.)
Given our consequentialist assumption we may take the set of alternatives to be L the
set of all simple lotteries. A little thought should convince you that it would be the same to
consider the set of all compound lotteries and to add formally the assumption described in
the previous paragraph.
We assume that the preference relation is rational, as defined in Chapter 1. We in-
troduce two further assumptions on the preferences. The first is somewhat of a technical
nature.
D EFINITION 18. The preference relation % on the space of simple lotteries L is
continuous if for any L, L0 , L00 in L , the sets
{α ∈ [0, 1] | αL + (1 − α)L0 % L00 } ⊂ [0, 1]
and
{α ∈ [0, 1] | L00 % αL + (1 − α)L0 } ⊂ [0, 1]
are closed.
D EFINITION 19. The preference relation % on the space of simple lotteries L satisfies
the independence axiom if for any L, L0 , L00 in L and α in (0, 1)
L % L0 if and only if αL + (1 − α)L00 % αL0 + (1 − α)L00 .
E XERCISE 37. Show that if the preference relation % on the space of simple lotteries
L satisfies the independence axiom then for any L, L0 , L00 in L and α in (0, 1)
L  L0 if and only if αL + (1 − α)L00  αL0 + (1 − α)L00 .
and
L ∼ L0 if and only if αL + (1 − α)L00 ∼ αL0 + (1 − α)L00 .
Show also that if L  L0 and L00  L000 then αL + (1 − α)L00  αL0 + (1 − α)L000 .
In many of our examples we shall deal with the case in which there are only three pos-
sible outcomes. In that case the set of lotteries is the three dimensional simplex {(p1 , p2 , p3 ) |
p1 + p2 + p3 = 1, pn ≥ 0}. This is shown graphically in Figure 1.
Now, the simplex is an equilateral triangle sitting there in the three dimensional space.
We can, and shall from now on, draw it flat on the two dimensional page instead, as in
Figure 2.

3. Expected Utility Functions


We shall state in this section one of the central results concerning decision making un-
der uncertainty. That is that if the preferences are continuous and satisfy the independence
axiom then they can be represented by a utility function whose form is the expected value
of the “utility” of the consequences. We say that such a utility function has an expected
utility form.
48 5. DECISION MAKING UNDER UNCERTAINTY

p3
6
p2
(0, 0, 1) XXX 
@ XXX
@ XX (0, 1, 0)
@ C
@ C
@ C
@ C
@ C
@ C
@ C
@C -
(1, 0, 0) p1

Figure 1

(0, 0, 1)
TT
 T
 T
 T
 T
 T
 T
 T
(1, 0, 0) (0, 1, 0)

Figure 2

D EFINITION 20. The utility function U : L → R has an expected utility form if there
is an assignment of numbers u1 , u2 , . . . , uN to the N outcomes such that for every simple
lottery L = (p1 , p2 , . . . , pN ) in L
U(L) = u1 p1 + u2 p2 + . . . uN pN .
A utility function U : L → R with the expected utility form is called a von Neumann-
Morgenstern expected utility function.
P ROPOSITION 8. A utility function U : L → R has an expected utility form if and
only if it is linear, that is, if and only if it satisfies the property that
!
K K
U ∑ αk Lk = ∑ αkU(Lk )
k=1 k=1
for any K lotteries Lk in L , k = 1, 2, . . . , K and probabilities α1 , α2 , . . . , αK such that αk ≥ 0
for all k and ∑k αk = 1.
P ROPOSITION 9. Suppose that U : L → R is a von Neumann-Morgenstern utility
function for the preference relation % on L . Then Ũ : L → R is another von Neumann-
Morgenstern utility function for % if and only if there are scalars β > 0 an γ such that
Ũ(L) = βU(L) + γ for every L in L .
3. EXPECTED UTILITY FUNCTIONS 49

E XERCISE 38. Show that if the preference relation % on L is represented by a utility


function U(·) that has the expected utility form, then % satisfies the independence axiom.
P ROPOSITION 10 (Expected Utility Theorem). Suppose that the rational preference
relation % on the space of simple lotteries L satisfies the continuity and independence
axioms. Then % admits a utility representation of the expected utility form. That is, we
can assign a number un to each outcome n = 1, 2, . . . , N in such a manner that for any two
lotteries L = (p1 , p2 , . . . , pN ) and L0 = (p01 , p02 , . . . , p0N ) we have
N N
L  L0 if and only if ∑ un pn ≥ ∑ un p0n .
n=1 n=1

In keeping with the style of this course we shall not prove this result. You can read the
proof in ? if you wish.
We shall however examine a little of the intuition as to why the result is true. The
roles of the two axioms are somewhat separate. The continuity axiom guarantees that the
preferences can be represented by a utility function U : L → R. The independence axiom
guarantees that the utility function may be chosen to have the expected utility form, that
is, to be linear in the probabilities. (Recall that if U : L → R represents the preferences
then so does any increasing transformation of U. Even if U is linear if Ũ is a nonlinear
transformation of U, for example Ũ = eU then Ũ will not be linear. Thus we cannot say
that U is linear but only that it may be chosen to be linear.)
The argument that the continuity axiom implies the existence of a utility function rep-
resenting the preferences is a bit technical and is essentially the same as the argument that
if a consumer’s preferences are continuous then they can be represented by a continuous
utility function. (Remember that we didn’t prove that either.)
We can get some intuition into the reason that the independence axiom implies the
existence of a linear utility function be examining the three outcome case. Recall that in
this case we can draw the three dimensional simplex as a triangle on the two dimensional
page. If the utility function is linear then
U(L) = U(p1 , p2 , p3 ) = u1 p1 + u2 p2 + u3 p3
and the indifference curves are parallel straight lines as shown in Figure 3a. (You should
prove this. It should either be very easy or very useful.)
The independence axiom implies both that the indifference curves are straight lines
and that they are parallel. To see that they are straight lines suppose that they are not. In
Figure 3b we show an indifference curve that is not straight. Notice that in the case shown
we can find L and L0 such that 12 L + 12 L0  L ∼ L0 . But this contradicts the independence
axiom which implies that since L % L0 that L = 12 L + 12 L % 12 L + 12 L0 .
In Figure 3c we see that the indifference curves must be parallel. An example is shown
in which the indifference curves are not parallel. We see that L ∼ L0 but that 13 L0 + 23 L00 
1 2 00
3 L + 3 L contradicting the independence axiom.
50 5. DECISION MAKING UNDER UNCERTAINTY

L00
3 3
1 L + 2 L00
@ 3 1 L0 + 2 L00

@TT Direction of

6 TT
3 3 RT
@ r 3 3
@ TT
Increasing Rr
 TrT
 @T  T @ T
@@ @ T Desirability  T Direction of H
XX H
@ @ T  X z T Increasing  HT T
HH
@ @ @ T  1 L + 1 L0 T Desirability   - 6 T T
@ @ @ @ T  L r 2
r 2
r T  r r
TT
L0
 @ @ @ @T  T  L L0 T
@@ @ @ @ @ T  T  T
1 2 1 2 1 2

(a) (b) (c)

Figure 3
CHAPTER 6

Multiple Agent Models I: Introduction to Noncooperative


Game Theory

We have until now been dealing with problems concerning only a single decision
maker. Let us turn now to examining situations in which a number of decision makers
interact. There are a number of basic ways of modelling such situations. This chapter
examines one such method. We shall think of the decision makers as acting based on their
assessment of what each other individual decision maker will do. This approach is known
as game theory or occasionally, and more informatively, as interactive decision theory.
In some situations it is reasonable to assume that each decision maker reacts to not his
assessment of what each individual will do but rather the value of some aggregate statistic
which varies little with the choices of one individual. In such a case it is a reasonable mod-
elling strategy to model the decision makers as taking as given the value of the aggregate
variable. The main approach of this kind is called general equilibrium theory and is the
subject of Chapter 7.
There are two central models of interactive decision problems or games: the model
of normal form games, and the model of extensive form games. While there exist minor
variants we shall define in this section two rather standard versions of the models. We
shall assume that the number of decision makers or players if finite and that the number of
choices facing each player is also finite.

1. Normal Form Games


1.1. Definition. Let’s first give the definition of a finite normal form game and then
discuss what each of the various parts means.

D EFINITION 21. A (finite) normal form game is a triple (N, S, u) where N = {1, 2, . . . , n, . . . , N}
is the set of players, S = S1 × S2 × · · · × SN is the set of profiles of pure strategies with Sn
the finite set of pure strategies of player n, and u = (u1 , u2 , . . . , uN ) with un : S → R the
utility or payoff function of player n. We call the pair (N, S) the game form. Thus a game
is a game form together with a payoff function.

Thus we have specified a set of players and numbered them 1 through N. Somewhat
abusively we have also denoted this set by N. (This is a fairly common practice in math-
ematics and usually creates no confusion.) For each player we have specified a finite set
of actions or strategies that the player could take, which we denote Sn . We have denoted
the cartesian product of these sets by S. Thus a typical element of S is s = (s1 , s2 , . . . , sN )
where each sn is a pure strategy of player n, that is, an element of Sn . We call such an s a
pure strategy profile.
For each player n we have also specified a utility function un : S → R. We shall shortly
define also randomised or mixed strategies, so that each player will form a probabilistic
assessment over what the other players will do. Thus when a player chooses one of his
own strategies he is choosing a lottery over pure strategy profiles. So we are interpreting
the utility function as a representation of the player’s preferences over lotteries, that is, as
a von Neumann-Morgenstern utility function.
51
52 6. GAME THEORY

D EFINITION 22. A mixed strategy of player n is a lottery over the pure strategies of
player n. One of player n’s mixed strategies is denoted σn and the set of all player n’s mixed
strategies is denoted Σn . Thus σn = (σn (s1n ), σn (s2n ), . . . , σn (sKn n ) where Kn is the number of
pure strategies of player n and σn (sin ) ≥ 0 for i = 1, 2, . . . , Kn and ∑ i = 1Kn σn (sin ) = 1. The
cartesian product Σ = Σ1 × Σ2 × · · · × ΣN is the set of all mixed strategy profiles.

The definition of a mixed strategy should by now be a fairly familiar kind of thing.
We have met such things a number of times already. The interpretation of the concept is
also an interesting question, though perhaps the details are better left for other places. The
original works on game theory treated the mixed strategies as literally randomisation by
the player in question, and in a number of places one finds discussions of whether and why
players would actually randomise.
Some more recent works interpret the randomised strategies as uncertainty in the
minds of the other players as to what the player in question will actually do. This in-
terpretation seems to me a bit more satisfactory. In any case one can quite profitably use
the techniques without worrying too much about the interpretation.
Perhaps more important for us at the moment as we start to learn game theory is the
idea of extending the utility function of a player from that defined on the pure strategy
profiles to that defined on mixed strategies. We shall continue to use the same symbol
un to represent the expected utility of player n as a function of the mixed strategy profile
σ = (σ1 , σ2 , . . . , σN ). Intuitively un (σ ) is just the expected value of un (s) when s is a
random variable with distribution given by σ . Thus

un (σ ) = ∑ ··· ∑ σ1 (s1 ) . . . σN (sN )un (s1 , . . . , sN ).


s1 ∈S1 sN ∈SN

We can in a similar way define un on a more general profile where for some n we have
mixed strategies and for others pure strategies.

1.2. Examples. Let’s look now at some examples.

E XAMPLE 1 (The Prisoner’s Dilemma). Consider the situation in which two criminals
are apprehended as they are making off with their loot. The police clearly have enough
evidence to convict them of possession of stolen property but don’t actually have evidence
that they actually committed the crime. So they question the criminals in separate cells
and offer each the following deal: Confess to the crime. If the other doesn’t confess then
we shall prosecute him on the basis of your evidence and he’ll go to jail for 10 years. In
gratitude for your cooperation we shall let you go free. If the other also confesses then
we don’t actually need your evidence so we shall prosecute both of you, but since you
cooperated with us we shall arrange with the judge that you only get 9 years. We are
offering the other criminal the same deal. If neither of you confess then we shall be able to
convict you for possession of stolen goods and you will each get one year.

How should we model such a situation. In actuality, of course, even in such a situ-
ation a criminal would have many options. He could make counter proposals; he could
confess to many other crimes as well; he could claim that his partner had committed many
other crimes. We might however learn something about what to expect in this situation
by analysing a model in which each criminal’s choices were limited to confessing or not
confessing. We shall also take the actions of the police as given and part of the environ-
ment and consider only the criminals as players. We shall assume that the criminals care
only about how much time they themselves spend in jail and not about how much time
the other spends, and moreover that their preferences are represented by a von Neumann-
Morgenstern utility function that assigns utility 0 to getting 10 years, utility 1 to getting 9
years, utility 9 to getting 1 year and utility 10 to getting off free.
1. NORMAL FORM GAMES 53

Thus we can model the situation as a game in which N = {1, 2}, S1 = S2 = {C, D} (for
(Confess, Don’t Confess)) and u1 (C,C) = 1, u1 (C, D) = 10, u1 (D,C) = 0, u1 (D, D) = 9,
and u2 (C,C) = 1, u2 (C, D) = 0, u2 (D,C) = 10, u1 (D, D) = 9.
Such a game is often represented as a labelled matrix as shown in Figure 1. Here
player 1’s strategies are listed vertically and player 2’s horizontally (and hence they are
sometimes referred to as the row player and the column player). Each cell in the matrix
contains a pair x, y listing first play 1’s payoff in that cell and then player 2’s.

1\2 C D
C 1, 1 10, 0
D 0, 10 9, 9

Figure 1

E XAMPLE 2 (Matching Pennies). This is a parlour game between two players in


which each player chooses to simultaneously announce either “Heads” or “Tails” (per-
haps by producing a coin with the appropriate face up). If the two announcements match
then player 1 receives $1 from player 2. If they don’t match then player 2 receives $1 from
player 1.
We again assume that the players have only the two options stated. (They cannot
refuse to play, go out and get drunk, or anything else equally enjoyable. They have to play
the silly game.) Moreover, since the stakes are so low it seems a reasonable approximation
to assume that their preferences are represented by a von Neumann-Morgenstern utility
function that assigns utility -1 to paying a dollar and utility 1 to getting a dollar.
Thus we can model the situation as a game in which N = {1, 2}, S1 = S2 = {H, T }
and u1 (H, H) = 1, u1 (H, T ) = −1, u1 (T, H) = −1, u1 (T, T ) = 1, and u2 (H, H) = −1,
u2 (H, T ) = 1, u2 (T, H) = 1, u1 (T, T ) = −1. The game is represented by the labelled
matrix shown in Figure 2.

1\2 H T
H 1, −1 −1, 1
T −1, 1 1, −1

Figure 2

1.3. Solution Concepts I: Pre-equilibrium Ideas. The central solution concept in


noncooperative game theory is that of Nash equilibrium or strategic equilibrium. Before
discussing the idea of equilibrium we shall look at a weaker solution concepts. One way of
thinking of this concept is as the necessary implications of assuming that the players know
the game, including the rationality and knowledge of the others.
There is quite a bit entailed in such an assumption. Suppose that some player,say
player 1 knows some fact F. Now since we assume that the players know the knowledge
of the other players the other players both know F and know that player 1 knows it. But
then this is part of their knowledge and so they all know that they all know F. And they
all know this. And so on. Such a situation was formally analysed in the context of game
theory by ? who described F as being common knowledge.
54 6. GAME THEORY

Consider the problem of a player in some game. Except in the most trivial cases the
set of strategies that he will be prepared to play will depend on his assessment of what the
other players will do. However it is possible to say a little. If some strategy was strictly
preferred by him to another strategy s whatever he thought the other players would do, then
he surely would not play s. And this remains true if it was some lottery over his strategies
that was strictly preferred to s. We call a strategy such as s a strictly dominated strategy.
Thus we have identified a set of strategies that we argue a rational player would not
play. But since everything about the game, including the rationality of the players, is
assumed to be common knowledge no player should put positive weight, in his assessment
of what the other players might do, on such a strategy. And we can again ask: Are there
any strategies that are strictly dominated when we restrict attention to the assessments that
put weight only on those strategies of the others that are not strictly dominated. If so, a
rational player who knew the rationality of the others would surely not play such a strategy.
And we can continue for an arbitrary number of rounds. If there is ever a round in
which we don’t find any new strategies that will not be played by rational players com-
monly knowing the rationality of the others, we would never again “eliminate” a strategy.
Thus, since we start with a finite number of strategies, the process must eventually termi-
nate. We call the strategies that remain iteratively undominated or correlatedly rationalis-
able.
There is another related concept called rationalisable strategies that was introduced
by ? and ?. That concept is both a little more complicated to define and, in my view,
somewhat less well motivated so we won’t go into it here.
1.4. Solution Concepts II: Equilibrium. In some games the iterative deletion of
dominated strategies is reasonably powerful and may indeed let us say what will happen in
the game. In other games it says little or nothing.
A more widely used concept is that of Nash equilibrium or strategic equilibrium, first
defined by John Nash in the early 1950s. (See ??.) An equilibrium is a profile of mixed
strategies, one for each player, with the property that if each player’s uncertainty about
what the others will do is represented by the profile of mixed strategies then his mixed
strategy puts positive weight only on those pure strategies that give him his maximum
expected utility. We can state this in a little more detail using the notation developed
above.
D EFINITION 23. A strategic equilibrium (or Nash equilibrium) of a game (N, S, u) is
a profile of mixed strategies σ = (σ1 , σ2 , . . . , σN ) such that for each n = 1, 2, . . . , N for each
sn and tn in Sn if σn (sn ) > 0 then
un (σ1 , . . . , σn−1 , sn , σn+1 , . . . , σN ) ≥ un (σ1 , . . . , σn−1 ,tn , σn+1 , . . . , σN ).
Remember how we extended the definition of un from the pure strategies to the mixed
strategies at the beginning of this chapter.
We can relate this concept to that discussed in the previous section.
P ROPOSITION 11. Any strategic equilibrium profile consists of iteratively undomi-
nated strategies.
Let’s look back now to our two examples and calculate the equilibria. In both of these
examples there is a unique equilibrium. This is not generally the case.
In the prisoner’s dilemma the strategy of Don’t Confess is dominated. That is, Confess
is better whatever the other player is doing. Thus for each player Confess is the only
undominated strategy and hence the only iteratively undominated strategy. Thus by the
previous proposition (Confess,Confess) is the only equilibrium.
In matching pennies there are no dominated strategies. Examining the game we see
there can be no equilibrium in which either player’s assessment of the other’s choice is
a pure strategy. Let us suppose that suppose that player 1’s assessment of player 2 was
2. EXTENSIVE FORM GAMES 55

that player 2 would play H. Then player 1 would strictly prefer to play H. But then
the definition of equilibrium would say that player 2 should put positive weight in his
assessment of what player 1 would play only on H. And in this case player 2 would strictly
prefer to play T , contradicting our supposition that player 1 assessed him as playing H. In
fact we could have started this chain of argument by supposing that player 1’s assessment
of player 2 put weight strictly greater than a half on the fact that player 2 would play H.
We could make a similar argument starting with the supposition that player 1’s assessment
of player 2 put weight strictly less than a half on the fact that player 2 would play H.
Thus player 1’s assessment of player 2’s choices must be ( 12 , 12 ). And similarly player 2’s
assessment of player 1’s choices must also be ( 12 , 12 ).
E XERCISE 39. Consider the variant of matching pennies given in Figure 3. Calculate
the equilibrium for this game.

1\2 H T
H 1, −1 −2, 3
T −1, 1 1, −1

Figure 3

2. Extensive Form Games


There is another model of interactive decision situations that is, in some respects, per-
haps a little more natural. In this model rather than listing all the plans that the decision
makers might have and associating expected payoffs with each profile of plans one de-
scribes sequentially what each player might do and what results. The process is modelled
as a multi-player decision tree. (You might possibly have come across the use of decision
trees to describe decision problems facing single decision makers.)
We shall define such game trees or extensive form games in the next section. For now
let us consider an example, that of Figure 4, that illustrates the essential ingredients.

4, 1 1, 0 0, 0 0, 1
@ @
@ @
L@ R L@ R
@qs q q q q q q q q q q q q q q q q q @
q q sq
@ 2 @ 2, 2
@ @
@ @
@ U@ D
@ @s
@ 1
@
@
T@ B
@s
1

Figure 4
56 6. GAME THEORY

The game starts at the bottom at Player 1’s first decision node. The first node of the
game is called the initial node or the root. Player 1 chooses whether to play T or B. If he
chooses B he moves again and chooses between U and D. If he chooses B and then D the
game ends. If he chooses either T or B and then U then player 2 gets a move and chooses
either L or R. Player 2 might be at either of two nodes when she chooses. The dotted line
between those nodes indicates that the are in the same information set and that Player 2
does not observe which of the nodes she is at when she moves. Traditionally information
sets were indicated by enclosing the nodes of the information set in a dashed oval. The
manner I have indicated is a newer notation and might have been introduced because it’s a
bit easier to generate on the computer, and looks a bit neater. (Anyway that’s why I do it
that way.)
The payoffs given at the terminal nodes give the expected payoffs (first for Player 1
then for Player 2) that generate the von Neumann-Morgenstern utility function that repre-
sents the players’ preferences over lotteries over the various outcomes or terminal nodes.
There are two further features that are not illustrated in this example. We often want
to include in our model some extrinsic uncertainty, that is some random event not under
the control of the players. We indicate this by allowing nodes to be owned by an artifi-
cial player that we call “Nature” and sometimes index as Player 0. Nature’s moves are
not labelled in the same way as the moves of the strategic players. Rather we associate
probabilities to each of nature’s moves. This is shown in the game of Figure 5 in which the
initial node is a move of nature. I shall indicate nodes where Nature moves by open circles
and nodes where real players move by filled circles.

−2, 2 2, −2 2, −2 −2, 2
@ @
S@ D S@ D
@qs q q q q q q q q q q q q q @
q q qs
1, −1 −1, 1
 1 D
@  D
OUT@  IN In D Out
@  D
@ s Ds
2@ 1
@
@
1@ 1
2 2
@c
Nature

Figure 5

Like the Prisoner’s Dilemma or Matching Pennies there is a bit of a story to go with
this game. It’s some kind of parlour game. The game has two players. We first assign
a high card and a low card to players 1 and 2, each being equally likely to get the high
card, and each seeing the card he gets. The player receiving the low card then has the
option of either continuing the game (by playing “In”) or finishing the game (by playing
“Out”), in which case he pays $1 to the other player. If the player who received the low
card continues the game then Player 1, moves again and decides whether to keep the card
he has or to swap it with Player 2’s. However Player 1 does not observe which card he has,
or what he did in his previous move, or even if he has moved previously. This might seem
a bit strange. One somewhat natural interpretation is that Player 1 is not a single person,
2. EXTENSIVE FORM GAMES 57

but rather a team consisting of two people. In any case it is normal in defining extensive
form games to allow such circumstances.
Games such as this are not, however, as well behaved as games in which such things do
not happen. (We’ll discuss this in a little more detail below.) If a player always remembers
everything he knew and everything he did in the past we say that the player has perfect
recall. If each player has perfect recall then we say that the extensive form game has
perfect recall. The game of Figure 4 has perfect recall while the game of Figure 5 does not
have prefect recall. In particular, Player 1 does not have perfect recall.
The extensive form given provides one way of modelling or viewing the strategic
interaction. A somewhat more abstract and less detailed vision is provided by thinking of
the players as formulating plans or strategies. One might argue (correctly, in my view) that
since the player can when formulating his plan anticipate any contingencies that he might
face nothing of essence is lost in doing this. We shall call such a plan a strategy. In the
game of Figure 5 Player 2 has only one information set at which she moves so her plan
will simply say what she should do at that information set. Thus Player 2’s strategy set is
S2 = {IN, OUT}. Player 1 on the other hand has two information sets at which he might
move. Thus his plan must say what to do at each of his information sets. Let us list first
what he will do at his first (singleton) information set and second what he will do at his
second information set. His strategy set is S1 = {(In, S), (In, D), (Out, S), (Out, D)}.
Now, a strategy profile such as ((In, S), IN) defines for us a lottery over the terminal
nodes, and hence over profiles of payoffs. In this case it is (−2, 2) with probability a half
and (2, −2) with probability a half. For the strategy profile ((In, S), OUT) it would be
(1, −1) with probability a half and (2, −2) with probability a half.
We can then calculate the expected payoff profile to each of the lotteries associated
with strategy profiles (for the two given above this would be (0, 0 and (1 12 , −1 12 )) and thus
we have specified a normal form game. For this example the associated normal form game
is given in Figure 5a.

1\2 IN OUT
(In, S) 0, 0 1 12 , −1 12
(In, D) 0, 0 − 12 , 12
(Out, S) −1 12 , 1 12 0, 0
1 1
(Out, D) 2,−2 0, 0

Figure 5a

2.1. Definition. We now define formally the notions we discussed informally above.
D EFINITION 24. An extensive form game consists of
(1) N = {1, 2, . . . , N} a set of players,
(2) X a finite set of nodes,
(3) p : X → X ∪{0} / a function giving the immediate predecessor of each node. There
is a single node x0 for which p(x) = 0. / This is the initial node. We let s(x) =
p−1 (x) = {y ∈ X | p(y) = x} be the immediate successors of node x. We can now
define the set of all predecessors of x to be those y’s for which y = p(p(p . . . (x)))
for some number of iterations of p and similarly the set of all successors of x.
We require that for any x the set of all predecessors of x be disjoint from the set
of all successors of x. (This is what we mean by the nodes forming a tree.) The
58 6. GAME THEORY

set of terminal nodes T is the set of nodes that have no successors, that is those x
for which s(x) = 0. / We call the nonterminal nodes the decision nodes.
(4) A a set of actions and α : X\{x0 } → A a function that for any noninitial node
gives the action taken at the preceding node that leads to that node. We require
that if x and x0 have the same predecessor and x 6= x0 then α(x) 6= α(x0 ). The set
of choices available at the node x is c(x) = {a ∈ A | a = α(x0 ) for some x0 ∈ s(x)}.
(5) H a collection of information sets, and H : X\T → H a function assigning each
decision node x to an information set H(x). We require that any two decision
nodes in the same information set have the same available choices. That is if
H(x) = H(x0 ) then c(x) = c(x0 ). We also require that any nodes in the same
information set be neither predecessors nor successors of each other. Sometimes
this requirement is not made part of the definition of a game but rather separated
and used to distinguish linear games from nonlinear ones. (The linear ones are
the ones that satisfy the requirement.)
(6) n : H → {0, 1, . . . , N} a function assigning each information set to the player
who moves at that information set or to Nature (Player 0). The collection of
Player n’s information sets is denoted Hn = {H ∈ H | n(H) = n}. We assume
that each information set in H0 is a singleton, that is that it contains only a single
node.
(7) ρ : H0 × A → [0, 1] a function that gives the probability of each of Nature’s
choices at the nodes at which Nature moves.
(8) u = (u1 , u2 , . . . , uN ) a collection of payoff functions un : T → R assigning an
expected utility to each terminal node for each player n.
D EFINITION 25. Given an extensive form game we say that Player n in that game
has perfect recall if whenever H(x) = H(x0 ) ∈ Hn with x00 a predecessor of x and with
H(x00 ) ∈ Hn and a00 the action at x00 on the path to x then there is x000 ∈ H(x00 ) a predecessor
to x0 with action a00 the action at x000 on the path to x0 . If each player in N has prefect recall
we say the game has perfect recall.
You should go back to the games of Figures 4 and 5 and see how this definition leads
to the conclusion that the first is a game with perfect recall and the second is not. To
understand the definition a little better let’s look a little more closely at what it is saying.
Since H(x) = H(x0 ) it means that the player observes the same situation at x and x0 . Thus if
he has perfect recall he should have the same experience at x and x0 . Part of his experience
at x was that he had been at the information set H(x00 ) and made the choice a00 . The
definition is requiring that he also have had this experience at x0 .
2.2. The Associated Normal Form. Just as in the previous section we formally de-
fined the details of a game that we had earlier discussed informally here we shall formally
define the process of associating a normal form game to a given extensive form game.
Recall that a normal form game has three components: a set of players, a strategy set
for each player, and a utility function for each player giving that player’s utility for each
profile of strategies. The easiest part is defining the player set. It is the same as the player
set of the given extensive form game. As I said above, a strategy for a player is a rule that
tells the player what to do at each of his information sets.
D EFINITION 26. Given an extensive form game a strategy of Player n is a function
sn : Hn → A with sn (H) ∈ c(H) for all H in Hn .
So, we have now defined the second component of a normal form game. Now, a
strategy profile specifies an action at each move by one of the “real” players and so defines
for us a lottery over the terminal nodes. (It defines a lottery rather than simply a terminal
node because we allow random moves by nature in our description of an extensive form
game. In a game without moves by Nature a strategy profile would define a single terminal
node.)
3. EXISTENCE OF EQUILIBRIUM 59

We then associate an expected payoff profile with the profile of strategies by taking
the expected payoff to the terminal node under this lottery. It might be a good idea to go
back and look again at what we did in defining the normal form game associated with the
extensive form game given in Figure 5.
2.3. Solution Concepts. Still to come.

3. Existence of Equilibrium
We shall examine a little informally in this section the question of the existence of
equilibrium. Lets look first in some detail at an example. I shall include afterwards a
discussion of the general result and a sketch of the proof. Remember, however, that we
didn’t do this in class and you are not required to know it for this course.
Let us consider an example (Figure 6). Player 1 chooses the row and player 2 (simul-
taneously) chooses the column. The resulting payoffs are indicated in the appropriate box
of the matrix, with player 1’s payoff appearing first.

L R
T 2, 0 0, 1
B 0, 1 1, 0

Figure 6

What probabilities could characterise a self-enforcing assessment? A (mixed) strategy


for player 1 (that is, an assessment by 2 of how 1 would play) is a vector (x, 1 − x), where x
lies between 0 and 1 and denotes the probability of playing T . Similarly, a strategy for 2 is
a vector (y, 1−y). Now, given x, the payoff-maximising value of y is indicated in Figure 6a,
and given y the payoff-maximising-value of x is indicated in Figure 6b. When the figures
are combined as in Figure 6c, it is evident that the game possesses a single equilibrium,
namely x = 12 , y = 13 . Thus in a self-enforcing assessment Player 1 must assign a probability
of 13 to 2’s playing L, and player 2 must assign a probability of 21 to Player 1’s playing T .

y y y
16 16 16

1 1
3 3

- - -
1 1 x 1 x 1 1 x
2 2

Figure 6a Figure 6b Figure 6c

The game of Figure 6 is an instance in which our notion of equilibrium completely


pins down the solution. In general, we cannot expect such a sharp conclusion. Consider,
60 6. GAME THEORY

for example, the game of Figure 7. There are three equilibrium outcomes: (8,5), (7,6) and
(6,3) (for the latter, the probability of T must lie between .5 and .6).

L C R
T 8, 5 0, 0 6, 3
B 0, 0 7, 6 6, 3

Figure 7

In Figure 6 we see that while there is no pure strategy equilibrium there is however
a mixed strategy equilibrium. The main result of non-cooperative game theory states that
this is true quite generally.
T HEOREM 6 (Nash 1950, 1951). The mixed extension of every finite game has at least
one strategic equilibrium.
(A game is finite if the player set as well as the set of strategies available to each player
is finite. Remember too that this proof is not required for this course.)
S KETCH OF P ROOF. The proof may be sketched as follows. (It is a multi-dimensional
version of Figure 6c.) Consider the set-valued mapping (or correspondence) that maps each
strategy profile, x, to all strategy profiles in which each player’s component strategy is a
best response to x (that is, maximises the player’s payoff given that the others are adopting
their components of x). If a strategy profile is contained in the set to which it is mapped
(is a fixed point) then it is an equilibrium. This is so because a strategic equilibrium is, in
effect, defined as a profile that is a best response to itself.
Thus the proof of existence of equilibrium amounts to a demonstration that the “best
response correspondence” has a fixed point. The fixed-point theorem of ? asserts the
existence of a fixed point for every correspondence from a convex and compact subset of
Euclidean space into itself, provided two conditions hold. One, the image of every point
must be convex. And two, the graph of the correspondence (the set of pairs (x, y) where y
is in the image of x) must be closed.
Now, in the mixed extension of a finite game, the strategy set of each player consists of
all vectors (with as many components as there are pure strategies) of non-negative numbers
that sum to 1; that is, it is a simplex. Thus the set of all strategy profiles is a product of
simplices. In particular, it is a convex and compact subset of Euclidean space.
Given a particular choice of strategies by the other players, a player’s best responses
consist of all (mixed) strategies that put positive weight only on those pure strategies that
yield the highest expected payoff among all the pure strategies. Thus the set of best re-
sponses is a subsimplex. In particular, it is convex.
Finally, note that the conditions that must be met for a given strategy to be a best
response to a given profile are all weak polynomial inequalities, so the graph of the best
response correspondence is closed.
Thus all the conditions of Kakutani’s theorem hold, and this completes the proof of
Nash’s theorem. 
CHAPTER 7

Multiple Agent Models II: Introduction to General


Equilibrium Theory

1. The basic model of a competitive economy


We summarise the basic ingredients of the model. (In a later version of these notes
there will likely be a more leisurely development.)
• L goods
• N consumers — a typical consumer is indexed consumer n
• the consumption set for each consumer is RL+
• %n the rational preference relation of consumer n on RL+
• ωn in RL+ the endowment of consumer n
• p in RL++ a strictly positive price vector; p = (p1 , . . . , p` , . . . , pL ) where p` is the
price of the `th good.
D EFINITION 27. An allocation x = ((x11 , x21 , . . . , xL1 ), . . . , (x1N , x2N , . . . , xLN ) in (RL+ )N
specifies a consumption bundle for each consumer. A feasible allocation is an allocation
such that
∑ xn ≤ ∑ ωn .
n∈N n∈N
(Note that we are implicitly assuming that the goods are freely disposable.)
D EFINITION 28. Consumer n’s budget set is
B(p, ωn ) = {x ∈ RL+ | p · x ≤ p · ωn }
D EFINITION 29. Consumer n’s demand correspondence is
xn (p, ωn ) = {x ∈ B(p, ωn ) | there is no y ∈ B(p, ωn ) with y n x}.
Let us now make some fairly strong assumptions about the %n ’s. For the most part
the full strength of these assumptions is unnecessary. Most of the results that we give are
true with weaker assumptions. However these assumptions will imply that the demand
correspondences are, in fact, functions, which will somewhat simplify the presentation.
We assume that for each n the preference relation %n is (a) continuous (this is technical
and we won’t say anything further about it), (b) strictly increasing (if x ≥ y and x 6= y then
x n y), and (c) strictly convex (if x %n y, x 6= y, and α ∈ (0, 1) then αx + (1 − α)y n y.
P ROPOSITION 12. If %n is continuous, strictly increasing, and strictly convex then
(1) xn (p, ωn ) 6= 0/ for any ωn in RL+ and any p in RL++ ,
(2) xn (p, ωn ) is a singleton so xn (·, ωn ) is a function, and
(3) xn (·, ωn ) is a continuous function.
D EFINITION 30. The price vector p is a Walrasian (or competitive) equilibrium price
if
∑ xn (p, ωn ) ≤ ∑ ωn .
n∈N n∈N

If we do not assume that the demand functions are single valued the we need a slightly
more general form of the definition.
61
62 7. GENERAL EQUILIBRIUM THEORY

D EFINITION 250 . The pair (p, x) in RL+ × (RL++ )N is a Walrasian equilibrium if x is a


feasible allocation (that is, ∑n∈N xn ≤ ∑n∈N ωn ) and, for each n in N
xn %n y for all y in B(p, ωn ).
Since we assume that %n is strictly increasing (in fact local nonsatiation is enough) it
is fairly easy to see that the only feasible allocations that will be involved in any equilibria
are those for which
(34) ∑ xn = ∑ ωn .
n∈N n∈N
We shall now examine graphically the case L = N = 2. An allocation in this case is
a vector in R4+ . However since we have the two equations of equation 34 we can elimi-
nate two of the variables and illustrate the allocations in two dimensions. A particularly
meaningful way of doing this is by what is known as the Edgeworth box.
Let us first draw the consumption set and the budget set for each consumer, as we
usually do for the two good case in consumer theory. We show this in Figures 1 and 2.
The only new feature of this graph is that rather than having a fixed amount of wealth each
consumer starts off with an initial endowment bundle ωn . The boundary of their budget set
(that is, the budget line) is then given by a line through ωn perpendicular to the price vector
p.

x21
6

@
@
@
@
@
@
@
@
@
@
@
@
@
@ p
ω21 @r
ω1 @
@
@
@ -
01 ω11 x11

Figure 1

What we want to do is to draw Figures 1 and 2 in the same diagram. We do this by


rotating Figure 2 through 180◦ and then lining the figures up so that ω1 and ω2 coincide.
We do this in Figure 3. Any point x in the diagram now represents (x11 , x21 ) if viewed
from 01 looking up with the normal perspective and simultaneously represents (x12 , x22 )
1. THE BASIC MODEL OF A COMPETITIVE ECONOMY 63

x22
6
@
@
p
@
@
ω22 @r
ω2 @
@
@
@
@
@
@
@
@
@
@ -
02 ω12 x12

Figure 2

if viewed from 02 looking down. Notice that while all the feasible allocations are within
the “box” part of each consumer’s budget set goes outside the “box.” One of the central
ideas of general equilibrium theory is that the decision making can be decentralised by the
price mechanism. Thus neither consumer is required to take into account when making
their choices what is globally feasible for the economy. Thus we really do want to draw
the diagrams as I have and not leave out the parts “outside the box.”
We can represent preferences in the usual manner by indifference curves. I shall not
again draw separate pictures for consumers 1 and 2, but rather go straight to drawing them
in the Edgeworth box, as in Figure 4.
Let us look at the definition of a Walrasian equilibrium. If some allocation feasible
x (6= ω) is to be an equilibrium allocation then it must be in the budget sets of both con-
sumers. (Such an allocation is shown in Figure 5.) Thus the boundary of the budget sets
must be the line through x and ω (and the equilibrium price vector will be perpendicular to
this line). Also x must be, for each consumer, at least as good as any other bundle in their
budget set. Now any feasible allocation y that makes Consumer 1 better off than he is at
allocation x must not be in Consumer 1’s budget set. (Otherwise he would have chosen it.)
Thus the allocation must be strictly above the budget line through ω and x. But then there
are points in Consumer 2’s budget set which give her strictly more of both goods than she
gets in the allocation y. So, since her preferences are strictly increasing there is a point in
her budget set that she strictly prefers to what she gets in the allocation y. But since the
allocation x is a competitive equilibrium with the given budget sets then what she gets in
the allocation x must be at least as good any other point in her budget set, and thus strictly
better than what she gets at y.
What have we shown. We have shown that if x is a competitive allocation from the
endowments ω then any feasible allocation that makes Consumer 1 better off makes Con-
sumer 2 worse off. We can similarly show that any feasible allocation that makes Consumer
2 better off makes Consumer 1 worse off. In other words x is Pareto optimal.
We shall now generalise this intuition into the relationship between equilibrium and
efficiency to the more general model. We first define more formally our idea of efficiency.
64 7. GENERAL EQUILIBRIUM THEORY

x21
6

@
@
x12 @ ω12 02
 @
@
@
@
@
@
@
@
@
@
@ p
ω21 @r ω22
ω@
@
@
@ -
01 ω11 @ x11
x22
?

Figure 3

x21
6

x12 02


-
01 x11
x22
?

Figure 4
1. THE BASIC MODEL OF A COMPETITIVE ECONOMY 65

x21
6

@
@
x12 @ 02
 @
@
@
@
@
@ r
@r y
x@
@
@
@ p
@r ω
@
@
@
@ -
01 @ x11
?x22

Figure 5

D EFINITION 31. A feasible allocation x is Pareto optimal (or Pareto efficient) if there
is no other feasible allocation y such that yn %n xn for all n in N and yn0 n0 xn0 for at least
one n0 in N.
In words we say that a feasible allocation is Pareto optimal if there is no other feasible
allocation that makes at least one consumer strictly better off without making any consumer
worse off. The following result generalises our observation about the Edgeworth box.
T HEOREM 7 (The First Fundamental Theorem of Welfare Economics). Sup-
pose that for each n the preferences %n are strictly increasing and that (p, x) is a Wal-
rasian equilibrium. Then x is Pareto optimal.
In fact, we can say something in the other direction as well. It clearly is not the case
that any Pareto optimal allocation is a Walrasian equilibrium. A Pareto optimal alloca-
tion may well redistribute the goods, giving more to some consumers and less to others.
However, if we are permitted to make such transfers then any Pareto optimal allocation is
a Walrasian equilibrium from some redistributed initial endowment. Suppose that in the
Edgeworth box there is some point such as x in Figure 6 that is Pareto optimal. Since x is
Pareto optimal Consumer 2’s indifference curve through x must lie everywhere below Con-
sumer 1’s indifference curve through x. Thus the indifference curves must be tangent to
each other. Let’s draw the common tangent. Now, if we redistribute the initial endowments
to some point ω 0 on this tangent line then with the new endowments the allocation x is a
competitive equilibrium. This result is true with some generality, as the following result
states. However we do require stronger assumptions that were required for the first welfare
theorem. We shall look below at a couple of examples to illustrate why these stronger
assumptions are needed.
T HEOREM 8 (The Second Fundamental Theorem of Welfare Economics). Suppose
that for each n the preferences %n are strictly increasing, convex, and continuous and that
66 7. GENERAL EQUILIBRIUM THEORY

x21
6

@
@
x12 @ 02
 @
@
@
@
@ p
@
@r
x@
@
@
@ rω
@r 0

@
@
@ -
01 @ x11
?x22

Figure 6

x is Pareto optimal with x > 0 (that is x`n > 0 for each ` and each n. Then there is some
feasible reallocation ω 0 of the endowments (that is ∑n∈N ωn0 = ∑n∈N ωn ) and a price vector
p such that (p, x) is a Walrasian equilibrium of the economy with preferences %n and initial
endowments ω 0 .

You might also like