Professional Documents
Culture Documents
2003
John Hillas
University of Auckland
Contents
3. Existence of Equilibrium 59
Chapter 7. Multiple Agent Models II: Introduction to General Equilibrium Theory 61
1. The basic model of a competitive economy 61
CHAPTER 1
1. Introduction
This chapter is very incomplete. It includes at the moment only some material that we
shall use in the next chapter, and not even all of that. Never mind, perhaps there will be
more soon.
2. Logic
Exercises.
A ∪ B = {x | x ∈ A or x ∈ B}
A ∩ B = {x | x ∈ A and x ∈ B}.
1
2 1. LOGIC, SETS, FUNCTIONS, AND SPACES
Just as the number zero is extremely useful so the concept of a set that has no elements
is extremely useful also. This set we call the empty set or the null set and denote by 0. / To
see one use of the empty set notice that having such a concept allows the intersection of
two sets be well defined whether or not the sets have any elements in common.
We also introduce the concept of a Cartesian product. If we have two sets, say A and B,
the Cartesian product, A × B, is the set of all ordered pairs, (a, b) such that a is an element
of A and b is an element of B. Symbolically we write
A × B = {(a, b) | a ∈ A and b ∈ B}.
Exercises.
4. Functions
Exercises.
5. Spaces
Exercises.
Linear Algebra
1. The Space Rn
In the previous chapter we introduced the concept of a linear space or a vector space.1
We shall now examine in some detail one example of such a space. This is the space
of all ordered n-tuples (x1 , x2 , . . . , xn ) where each xi is a real number. We call this space
n-dimensional real space and denote it Rn .
Remember from the previous chapter that to define a vector space we not only need
to define the points in that space but also to define how we add such points and how we
multiple such points by scalars. In the case of Rn we do this element by element in the
n-tuple or vector. That is,
(x1 , x2 , . . . , xn ) + (y1 , y2 , . . . , yn ) = (x1 + y1 , x2 + y2 , . . . , xn + yn )
and
α(x1 , x2 , . . . , xn ) = (αx1 , αx2 , . . . , αxn ).
Let us consider the case that n = 2, that is, the case of R2 . In this case we can visualise
the space as in the following diagram. The vector (x1 , x2 ) is represented by the point that
is x1 units along from the point (0, 0) in the horizontal direction and x2 units up from (0, 0)
in the vertical direction.
x2
6
2 q (1, 2)
-
1 x1
Figure 1
1Of,course, we haven’t actually done any previous chapter from this “book” and, if you’ve checked the
previous chapter on the web site, you will know that the previous chapter is rather incomplete. Never mind.
Perhaps it will be there for next year’s students. Sigh. And even more sighing, since the previous was actually
written for last year’s notes.
3
4 2. LINEAR ALGEBRA
Let us for the moment continue our discussion in R2 . Notice that we are implicitly
writing a vector (x1 , x2 ) as a sum x1 × v1 + x2 × v2 where v1 is the unit vector in the first
direction and v2 is the unit vector in the second direction. Suppose that instead we consid-
ered the vectors u1 = (2, 1) = 2 × v1 + 1 × v2 and u2 = (1, 2) = 1 × v1 + 2 × v2 . We could
have written any vector (x1 , x2 ) instead as z1 × u1 + z2 × u2 where z1 = (2x1 − x2 )/3 and
z2 = (2x2 − x1 )/3. That is, for any vector in R2 we can uniquely write that vector in terms
of u1 and u2 . Is there anything that is special about u1 and u2 that allows us to make this
claim? There must be since we can easily find other vectors for which this would not have
been true. (For example, (1, 2) and (2, 4).)
The property of the pair of vectors u1 and u2 is that they are independent. That is, we
cannot write either as a multiple of the other. More generally in n dimensions we would say
that we cannot write any of the vectors as a linear combination of the others, or equivalently
as the following definition.
α1 x1 + · · · + αk xk = 0.
C OMMENT 1. If you examine the definition above you will notice that there is nowhere
that we actually need to assume that our vectors are in Rn . We can in fact apply the same
definition of linear independence to any vector space. This allows us to define the concept
of the dimension of an arbitrary vector space as the maximal number of linearly indepen-
dent vectors in that space. In the case of Rn we obtain that the dimension is in fact n.
E XERCISE 1. Suppose that x1 , . . . , xk all in Rn are linearly independent and that the
vector y in Rn is equal to β1 x1 + · · · + βk xk . Show that this is the only way that y can be
expressed as a linear combination of the xi ’s. (That is show that if y = γ1 x1 + · · · + γk xk
then β1 = γ1 , . . . , βk = γk .)
The set of all vectors that can be written as a linear combination of the vectors x1 , . . . , xk
is called the span of those vectors. If x1 , . . . , xk are linearly independent and if the span of
x1 , . . . , xk is all of Rn then the collection { x1 , . . . , xk } is called a basis for Rn . (Of course,
in this case we must have k = n.) Any vector in Rn can be uniquely represented as a linear
combination of the vectors x1 , . . . , xk . We shall later see that it can sometimes be useful to
choose a particular basis in which to represent the vectors with which we deal.
It may be that we have a collection of vectors { x1 , . . . , xk } whose span is not all of
R . In this case we call the span of { x1 , . . . , xk } a linear subspace of Rn . Alternatively we
n
say that X ⊂ Rn is a linear subspace of Rn if X is closed under vector addition and scalar
multiplication. That is, if for all x, y ∈ X the vector x + y is also in X and for all x ∈ X
and α ∈ R the vector αx is in X. If the span of x1 , . . . , xk is X and if x1 , . . . , xk are linearly
independent then we say that these vectors are a basis for the linear subspace X. In this
case the dimension of the linear subspace X is k. In general the dimension of the span of
x1 , . . . , xk is equal to the maximum number of linearly independent vectors in x1 , . . . , xk .
Finally, we comment that Rn is a metric space with metric d : R2n → R+ defined by
q
d((x1 , . . . , xn ), (y1 , . . . , yn )) = (x1 − y1 )2 + · · · + (xn − yn )2 .
There are many other metrics we could define on this space but this is the standard one.
2. LINEAR FUNCTIONS FROM Rn TO Rm 5
to find the term to go in the ith row and the jth column of the product matrix AB we take
the ith row of the matrix A which will be a row vector with n elements and the jth column
of the matrix B which will be a column vector with n elements. We then multiply each
element of the first vector by the corresponding element of the second and add all these
products. Thus
For example
p q
a b c r ap + br + ct aq + bs + cv
s = .
d e f d p + er + f t dq + es + f v
t v
We define the identity matrix of order n to be the n × n matrix that has 1’s on its main
diagonal and zeros elsewhere that is, whose i jth element is 1 if i = j and zero if i 6= j. We
denote this matrix by In or, if the order is clear from the context, simply I. That is,
1 0 ... 0
0 1 ... 0
I= .
.. .. .. ..
. . . .
0 0 ... 1
Now, notice that we can write the vector x as a sum ∑ni=1 xi ei , where ei is the ith unit vector,
that is, the vector with 1 in the ith place and zeros elsewhere. That is,
x1 1 0 0
x 0 1 0
2
.. = x1 .. + x2 .. + · · · + xn .. .
. . . .
xn 0 0 1
Now from the linearity of the function f we can write
n
f (x) = f ( ∑ xi ei )
i=1
n
= ∑ f (xi ei )
i=1
n
= ∑ xi f (ei ).
i=1
But, what is f (ei )? Remember that ei is a unit vector in Rn and that f maps vectors in Rn
to vectors in Rm . Thus f (ei ) is the image in Rm of the vector ei . Let us write f (ei ) as
a1i
a
2i
.. .
.
ami
Thus
n
f (x) = ∑ xi f (ei )
i=1
a11 a12 a1n
a21 a22 a2n
= x1 + x2 + · · · + xn
.. .. ..
. . .
am1 am2 amn
∑i=1 a1i xi
n
∑ni=1 a2i xi
=
..
.
∑ni=1 ami xi
and this is exactly what we would have obtained had we multiplied the matrices
4. MATRICES AS LINEAR FUNCTIONS 9
a11 a12 ... a1n x1
a21 a22 ... a2n x2
.
.. .. .. .. ..
. . . . .
am1 am2 ... amn xn
Thus we have not only shown that a linear function is necessarily represented by multi-
plication by a matrix we have also shown how to find the appropriate matrix. It is precisely
the matrix whose n columns are the images under the function of the n unit vectors in Rn .
E XERCISE 3. Find the matrices that represent the following linear functions from R2
to R2 .
E XERCISE 4. Prove that the matrices (A + B) and αA defined in the previous para-
graph coincide with the matrices defined in Section 3.
We can also see that the definition we gave of matrix multiplication is precisely the
right definition if we mean multiplication of matrices to mean the composition of the linear
functions that the matrices represent. To be more precise let f : Rn → Rm and g : Rm → Rk
be linear functions and let A and B be the m × n and k × m matrices that represent them.
Let (g ◦ f ) : Rn → Rk be the composite function defined in Section 2. Now let us define
the product BA to be that matrix that represents the linear function (g ◦ f ).
10 2. LINEAR ALGEBRA
Now since the matrix A represents the function f and B represents g we have
(g ◦ f )(x) = g( f (x))
a11 a12 ... a1n x1
a
21 a22 ... a2n x2
= g .
.. .. .. ..
..
. . . .
am1 am2 . . . amn xn
∑i=1 a1i xi
n
∑ni=1 a2i xi
= g
..
.
∑i=1 ami xi
n
∑ni=1 a1i xi
b11 b12 ... b1m
b21 b22 ... b2m ∑ni=1 a2i xi
=
.. .. .. .. ..
. . . . .
bk1 bk2 ... bkm ∑ni=1 ami xi
∑ j=1 b1 j ∑i=1 a ji xi
m n
∑mj=1 b2 j ∑ni=1 a ji xi
=
..
.
∑ j=1 bk j ∑i=1 a ji xi
m n
∑ni=1 ∑mj=1 b1 j a ji xi
∑ni=1 ∑mj=1 b2 j a ji xi
=
..
.
∑i=1 ∑ j=1 bk j a ji xi
n m
that the image of any point in Rn is in the span of the images of these unit vectors and
similarly that any point in the span of the images is the image of some point in Rn . Thus
Im( f ) is equal to the span of the columns of A. Now, the dimension of the span of the
columns of A is equal to the maximum number of linearly independent columns in A, that
is, to the rank of A.
If the linear function f : Rn → Rn is one-to-one and onto then the function f has
an inverse f −1 . In Exercise 2 you showed that this function too was linear. A matrix
that represents a linear function that is one-to-one and onto is called a nonsingular matrix.
Alternatively we can say that an n×n matrix is nonsingular if the rank of the matrix is n. To
see these two statements are equivalent note first that if f is one-to-one then Ker( f ) = {0}.
(This is the trivial direction of Exercise 5.) But this means that dim(Ker( f )) = 0 and so
dim(Im( f )) = n. And, as we argued at the end of the previous section this is the same as
the rank of matrix that represents f .
E XERCISE 5. Show that the linear function f : Rn → Rm is one-to-one if and only if
Ker( f ) = {0}.
E XERCISE 6. Show that the linear function f : Rn → Rn is one-to-one if and only if
it is onto.
7. Changes of Basis
We have until now implicitly assumed that there is no ambiguity when we speak of
the vector (x1 , x2 , . . . , xn ). Sometimes there may indeed be an obvious meaning to such
a vector. However when we define a linear space all that are really specified are “what
straight lines are” and “where zero is.” In particular, we do not necessarily have defined in
an unambiguous way “where the axes are” or “what a unit length along each axis is.” In
other words we may not have a set of basis vectors specified.
Even when we do have, or have decided on, a set of basis vectors we may wish to re-
define our description of the linear space with which we are dealing so as to use a different
12 2. LINEAR ALGEBRA
set of basis vectors. Let us suppose that we have an n-dimensional space, even Rn say, with
a given set of basis vectors v1 , v2 , . . . , vn and that we wish instead to describe the space in
terms of the linearly independent vectors b1 , b2 , . . . , bn where
bi = b1i v1 + b2i v2 + · · · + bni vn .
Now, if we had the description of a point in terms of the new coordinate vectors, e.g.,
as
z1 b1 + z2 b2 + · · · + zn bn
then we can easily convert this to a description in terms of the original basis vectors. We
would simply substitute the formula for bi in terms of the e j ’s into the previous formula
giving
! ! !
n n n
∑ b1i zi v1 + ∑ b2i zi v2 + · · · + ∑ bni zi vn
i=1 i=1 i=1
or, in our previous notation
∑ni=1 b1i zi
∑ni=1 b2i zi
.
..
.
(∑ni=1 bni zi )
But this is simply the product
b11 b12 ... b1n z1
b21 b22 ... b2n z2
.
.. .. .. .. ..
. . . . .
bn1 bn2 ... bnn zn
That is, if we are given an n-tuple of real numbers that describe a vector in terms of the
new basis vectors b1 , b2 , . . . , bn and we wish to find the n-tuple that describes the vector
in terms of the original basis vectors we simply multiply the ntuple we are given, written
as a column vector by the matrix whose columns are the new basis vectors b1 , b2 , . . . , bn .
We shall call this matrix B. We see among other things that changing the basis is a linear
operation.
Now, if we were given the information in terms of the original basis vectors and wanted
to write it in terms of the new basis vectors what should we do? Since we don’t have the
original basis vectors written in terms of the new basis vectors this is not immediately
obvious. However we do know that if we were to do it and then were to carry out the
operation described in the previous paragraph we would be back with what we started.
Further we know that the operation is a linear operation that maps n-tuples to n-tuples
and so is represented by multiplication by an n × n matrix. That is we multiply the n-
tuple written as a column vector by the matrix that when multiplied by B gives the identity
matrix, that is, the matrix B−1 . If we are given a vector of the form
x1 v1 + x2 v2 + · · · + xn vn
and we wish to express it in terms of the vectors b1 , b2 , . . . , bn we calculate
−1
b11 b12 . . . b1n x1
21 b22 . . . b2n x2
b
.. .. . . . .
. . .. .. ..
bn1 bn2 . . . bnn xn
Suppose now that we consider a linear function f : Rn → Rn and that we have origi-
nally described Rn in terms of the basis vectors v1 , v2 , . . . , vn where vi is the vector with 1
7. CHANGES OF BASIS 13
in the ith place and zeros elsewhere. Suppose that with these basis vectors f is represented
by the matrix
a11 a12 . . . a1n
21 a22 . . . a2n
a
A= . . . . .
.. .. .. ..
an1 an2 . . . ann
If we now describe Rn in terms of the vectors b1 , b2 , . . . , bn how will the linear function
f be represented? Let us think of what we want? We shall be given a vector described
in terms of the basis vectors b1 , b2 , . . . , bn and we shall want to know what the image of
this vector under the linear function f is, where we shall again want our answer in terms
of the basis vectors b1 , b2 , . . . , bn . We shall know how to do this when we are given the
description in terms of the vectors e1 , e2 , . . . , en . Thus the first thing we shall do with our
vector is to convert it from a description in terms of b1 , b2 , . . . , bn to a description in terms
of e1 , e2 , . . . , en . We do this by multiplying the n-tuple by the matrix B. Thus if we call our
original n-tuple z we shall now have a description of the vector in terms of e1 , e2 , . . . , en ,
viz Bz. Given this description we can find the image of the vector in question under f by
multiplying by the matrix A. Thus we shall have A(Bz) = (AB)z. Remember however this
will have given us the image vector in terms of the basis vectors e1 , e2 , . . . , en . In order to
convert this to a description in terms of the vectors b1 , b2 , . . . , bn we must multiply by the
matrix B−1 . Thus our final n-tuple will be (B−1 AB)z.
Recapitulating, suppose that we know that the linear function f : Rn → Rn is rep-
resented by the matrix A when we describe Rn in terms of the standard basis vectors
e1 , e2 , . . . , en and that we have a new set of basis vectors b1 , b2 , . . . , bn . Then when Rn
is described in terms of these new basis vectors the linear function f will be represented
by the matrix B−1 AB.
E XERCISE 9. Let f : Rn → Rm be a linear function. Suppose that with the standard
bases for Rn and Rm the function f is represented by the matrix A. Let b1 , b2 , . . . , bn be
a new set of basis vectors for Rn and c1 , c2 , . . . , cm be a new set of basis vectors for Rm .
What is the matrix that represents f when the linear spaces are described in terms of the
new basis vectors?
E XERCISE 10. Let f : R2 → R2 be a linear function. Suppose that with the standard
bases for Rn and Rm the function f is represented by the matrix
3 1
.
1 2
Let
3 1
and
2 1
be a new set of basis vectors for R2 . What is the matrix that represents f when R2 is
described in terms of the new basis vectors?
Properties of a square matrix that depend only on the linear function that the matrix
represents and not on the particular choice of basis vectors for the linear space are called
invariant properties. We have already seen one example of an invariant property, the rank
of a matrix. The rank of a matrix is equal to the dimension of the image space of the
function that the matrix represents which clearly depends only on the function and not on
the choice of basis vectors for the linear space.
The idea of a property being invariant can be expressed also in terms only of matrices
without reference to the idea of linear functions. A property is invariant if whenever an
n × n matrix A has the property then for any nonsingular n × n matrix B the matrix B−1 AB
also has the property. We might think of rank as a function that associates to any square
matrix a nonnegative integer. We shall say that such a function is an invariant if the property
14 2. LINEAR ALGEBRA
of having the function take a particular value is invariant for all particular values we may
choose.
Two particularly important invariants are the trace of a square matrix and the determi-
nant of a square matrix. We examine these in more detail in the following section.
.. ∑ ii
tr . . .. . = a .
.. ..
. i=1
an1 an2 ... ann
E XERCISE 11. For the matrices given in Exercise 10 confirm that tr(A) = tr(B−1 AB).
It is easy to see that the trace is a linear function on the space of all n × n matrices, that
is, that for all A and B n × n matrices and for all α ∈ R
(1) tr(A + B) = tr(A) + tr(B),
and
(2) tr(αA) = αtr(A).
We can also see that if A and B are both n × n matrices then tr(AB) = tr(BA). In fact, if
A is an m × n matrix and B is an n × m matrix this is still true. This will often be extremely
useful in calculating the trace of a product.
E XERCISE 12. From the definition of matrix multiplication show that if A is an m × n
matrix and B is an n × m matrix that tr(AB) = tr(BA). [Hint: Look at the definition of
matrix multiplication in Section 2. Then write the determinant of the product matrix using
summation notation. Finally change the order of summation.]
The determinant, unlike the trace is not a linear function of the matrix. It does however
have some linear structure. If we fix all columns of the matrix except one and look at
the determinant as a function of only this column then the determinant is linear in this
single column. Moreover this is true whatever the column we choose. Let us write the
determinant of the n × n matrix A as det(A). Let us also write the matrix A as [a1 , a2 , . . . , an ]
where ai is the ith column of the matrix A. Thus our claim is that for all n × n matrices A,
for all i = 1, 2, . . . n, for all n vectors b, and for all α ∈ R
det([a1 , . . . , ai−1 , ai + b, ai+1 , . . . , an ]) = det([a1 , . . . , ai−1 , ai , ai+1 , . . . , an ])
(3)
+ det([a1 , . . . , ai−1 , b, ai+1 , . . . , an ])
and
(4) det([a1 , . . . , ai−1 , αai , ai+1 , . . . , an ]) = α det([a1 , . . . , ai−1 , ai , ai+1 , . . . , an ]).
We express this by saying that the determinant is a multilinear function.
9. CALCULATING AND USING DETERMINANTS 15
Also the determinant is such that any n × n matrix that is not of full rank, that is, of
rank n, has a zero determinant. In fact, given that the determinant is a multilinear function
if we simply say that any matrix in which one column is the same as one of its neighbours
has a zero determinant this implies the stronger statement that we made. We already see
one use of calculating determinants. A matrix is nonsingular if and only if its determinant
is nonzero.
The two properties of being multilinear and zero whenever two neighbouring columns
are the same already almost uniquely identify the determinant. Notice however that if the
determinant satisfies these two properties then so does any constant times the determinant.
To uniquely define the determinant we “tie down” this constant by assuming that det(I) = 1.
Though we haven’t proved that it is so, these three properties uniquely define the deter-
minant. That is, there is one and only one function with these three properties. We call this
function the determinant. In Section 9 we shall discuss a number of other useful properties
of the determinant. Remember that this additional properties are not really additional facts
about the determinant. They can all be derived from the three properties we have given
here.
Let us now look to the geometric interpretation of the determinant. Let us first think
about what linear transformations can do to the space Rn . Since we have already said that
a linear transformation that is not onto is represented by a matrix with a zero determinant
let us think about linear transformations that are onto, that is, that do not map Rn into a
linear space of lower dimension. Such transformations can rotate the space around zero.
They can “stretch” the space in different directions. And they can “flip” the space over.
In the latter case all objects will become “mirror images” of themselves. We call linear
transformations that make such a mirror image orientation reversing and those that don’t
orientation preserving. A matrix that represents an orientation preserving linear function
has a positive determinant while a matrix that represents an orientation reversing linear
function has a negative determinant. Thus we have a geometric interpretation of the sign
of the determinant.
The absolute size of the determinant represents how much bigger or smaller the linear
function makes objects. More precisely it gives the “volume” of the image of the unit
hypercube under the transformation. The word volume is in quotes because it is the volume
with which we are familiar only when n = 3. If n = 2 then it is area, while if n > 3 then it
is the full dimensional analog in Rn of volume in R3 .
In a diagram show the image under the linear function that this matrix represents of the
unit square, that is, the square whose corners are the points (0,0), (1,0), (0,1), and (1,1).
Calculate the area of that image. Do the same for the matrix
4 1
.
−1 1
is not a matrix but rather a real number. For the case n = 2 we define
a a12
det(A) =
11
a21 a22
as a11 a22 − a21 a12 . It is possible to also give a convenient formula for the determinant of
a 3 × 3 matrix. However, rather than doing this, we shall immediately consider the case of
an n × n matrix.
By the minor of an element of the matrix A we mean the determinant (remember a
real number) of the matrix obtained from the matrix A by deleting the row and column
containing the element in question. We denote the minor of the element ai j by the symbol
|Mi j |. Thus, for example,
22 . . . a2n
a
.. .. .. .
|M11 | = . . .
n2 . . . ann
a
We now define the cofactor of an element to be either plus or minus the minor of the
element, being plus if the sum of indices of the element is even and minus if it is odd. We
denote the cofactor of the element ai j by the symbol |Ci j |. Thus |Ci j | = |Mi j | if i + j is even
and |Ci j | = −|Mi j | if i + j is odd. Or,
to be ∑nj=1 a1 j |C1 j |. This is the sum of n terms, each one of which is the product of an
element of the first row of the matrix and the cofactor of that element.
E XERCISE 15. Define the determinant of the 1 × 1 matrix [a] to be a. (What else could
we define it to be?) Show that the definition given above corresponds with the definition
we gave earlier for 2 × 2 matrices.
9. CALCULATING AND USING DETERMINANTS 17
0 −5 0 8 1 −3 1 4
[Hint: Think carefully about which column or row to use in the expansion.]
We shall now list a number of properties of determinants. These properties imply
that, as we stated above, it does not matter which row or column we use to expand the
determinant. Further these properties will give us a series of transformations we may
perform on a matrix without altering its determinant. This will allow us to calculate a
determinant by first transforming the matrix to one whose determinant is easier to calculate
and then calculating the determinant of the easier matrix.
P ROPERTY 1. The determinant of a matrix equals the determinant of its transpose.
|A| = |A0 |
P ROPERTY 2. Interchanging two rows (or two columns) of a matrix changes its sign
but not its absolute value. For example,
c d
= cb − ad = −(ad − cb) = − a b .
a b c d
P ROPERTY 6. If one expands a matrix in terms of one row (or column) and the cofac-
tors of a different row (or column) then the answer is always zero. That is
n
∑ ai j |Ck j | = 0
j=1
whenever i 6= k. Also
n
∑ ai j |Cik | = 0
i=1
whenever j 6= k.
E XERCISE 24. Verify Property 6 for the matrix
4 1 2
5 2 1 .
1 0 3
9. CALCULATING AND USING DETERMINANTS 19
Let us define the matrix of cofactors C to be the matrix [|Ci j |] whose i jth element is
the cofactor of the i jth element of A. Now we define the adjoint matrix of A to be the
transpose of the matrix of cofactors of A. That is
adj(A) = C0 .
It is straightforward to see (using Property 6) that A adj(A) = |A|In = adj(A)A. That is,
A−1 = |A| 1
adj(A). Notice that this is well defined if and only if |A| =
6 0. We now have a
method of finding the inverse of any nonsingular square matrix.
E XERCISE 25. Use this method to find the inverses of the following matrices
3 −1 2 4 −2 1 1 5 2
(a) 1 0 3 (b) 7 3 3 (c) 1 4 3 .
4 0 2 2 0 1 0 1 2
Knowing how to invert matrices we thus know how to solve a system of n linear
equations in n unknowns. For we can express the n equations in matrix notation as Ax = b
where A is an n × n matrix of coefficients, x is an n × 1 vector of unknowns, and b is an
n × 1 vector of constants. Thus we can solve the system of equations as x = A−1 Ax = A−1 b.
Sometimes, particularly if we are not interested in all of the x’s it is convenient to use
another method of solving the equations. This method is known as Cramer’s Rule. Let us
suppose that we wish to solve the above system of equations, that is, Ax = b. Let us define
the matrix Ai to be the matrix obtained from A by replacing the ith column of A by the
vector b. Then the solution is given by
|Ai |
xi = .
|A|
E XERCISE 26. Derive Cramer’s Rule. [Hint: We know that the solution to the system
of equations is solved by x = (1/|A|)adj(A)b. This gives a formula for xi . Show that this
formula is the same as that given by xi = |Ai |/|A|.]
E XERCISE 27. Solve the following system of equations (i) by matrix inversion and
(ii) by Cramer’s Rule
2x1 − x2 = 2 −x1 + x2 + x3 = 1
(a) 3x2 + 2x3 = 16 (b) x1 − x2 + x3 = 1 .
5x1 + 3x3 = 21 x1 + x2 + x3 = 1
E XERCISE 28. Recall that we claimed that the determinant was an invariant. Confirm
this by calculating (directly) det(A) and det(B−1 AB) where
1 0 1 1 0 0
B = 1 −1 2 and A = 0 2 0 .
2 1 −1 0 0 3
E XERCISE 29. An nth order determinant of the form
a
11 0 0 ... 0
a
21 a22 0 ... 0
a
31 a32 a 33 ... 0
.. .. .. .. ..
.
. . . .
a a
n1 a n2 . . . ann
n3
is called triangular. Evaluate this determinant. [Hint: Expand the determinant in terms of
its first row. Expand the resulting (n − 1) × (n − 1) determinant in terms of its first row,
and so on.]
20 2. LINEAR ALGEBRA
An implication of this theorem is that an n × n matrix cannot have more than n eigen-
vectors with distinct eigenvalues. Further this theorem allows us to see that if an n × n
matrix has n distinct eigenvalues then it is possible to find a basis for Rn in which the lin-
ear function that the matrix represents is represented by a diagonal matrix. Equivalently
we can find a matrix B such that B−1 AB is a diagonal matrix.
To see this let b1 , b2 , . . . , bn be n linearly independent eigenvectors with associated
eigenvalues λ1 , λ2 , . . . , λn . Let B be the matrix whose columns are the vectors b1 , b2 , . . . , bn .
Since these vectors are linearly independent the matrix B has an inverse. Now
B−1 AB = B−1 [Ab1 Ab2 . . . Abn ]
= B−1 [λ1 b1 λ2 b2 . . . λn bn ]
= [λ1 B−1 b1 λ2 B−1 b2 . . . λn B−1 bn ]
λ1 0 . . . 0
0 λ ... 0
2
= . .. .
.. . .
.. . . .
0 0 ... λn
CHAPTER 3
We shall start this course with an introduction to the formal modelling of motivated
choice. There is an abundance of experimental evidence to suggest that, at least in some
circumstances, the decision making of actual people is influenced by a myriad of appar-
ently irrelevant considerations. It is nevertheless traditional in economics—and I think for
very good reason—to start, and often to go no further than, what is usually called rational
choice, which I am calling here motivated choice.
The treatment here follows largely the approach and uses the notation of ? The style
shall be rather formal. For better or worse, modern economic theory is formal, sometimes,
perhaps, more formal than it needs to be. Whatever style you will choose for your own
work, you will need to be able to read and work with formal models to access the literature
that exists.
Before getting into the formal modelling and assumptions of the next section I shall
discuss here an assumption that is buried in the setup. In the next section we shall define
a choice rule as a function that for any choice situation specifies which of the available
elements might be chosen. A choice situation will be defined by specifying which elements
are available to be selected, that is, by specifying the available set of choices. Notice that
there is already an assumption buried here. By specifying only the set, and not saying
how it is presented to the decision maker we are already making assumptions. And those
assumptions are not necessarily realistic. There is a good deal of evidence to suggest that
the manner a choice is presented to a decision maker might have a good deal of influence
on the actual decision she comes to.
4. The Relationship Between the Choice Based Approach and the Preference Based
Approach
We return to the choice based approach. Given a choice structure (B,C(·))we can
define preference relation based on it, the so-called revealed preference relation, %∗ .
D EFINITION 5. Given a choice structure (B,C(·))the revealed preference relation %∗
is defined by x %∗ y if and only if there is some B in B such that x, y are in B and x is in
C(B).
We read x %∗ y as “x is revealed at least as good as y” or “x is revealed weakly preferred
to y.” Note that %∗ may not be either transitive or complete. We also say that “x is revealed
(strictly) preferred to y if there is some B in B such that x, y are in B and x is in C(B) and
y is not. We can then restate Definition 3 as follows.
D EFINITION 10 . The choice structure (B,C(·))satisfies the weak axiom of revealed
preference if whenever x is revealed at least as good as y then y is not revealed preferred to
x.
There are two questions we might ask regarding the relationship between the two
approaches.
(1) If the decision-maker has a rational preference ordering do the preference max-
imising choices necessarily generate a choice structure that satisfies the weak
axiom?
(2) If the decision-maker’s choices facing a family of budget sets B are represented
by a choice structure (B,C(·))that satisfies the weak axiom is there necessarily
a rational preference relation that is consistent with these choices?
26 3. CHOICE, PREFERENCE, AND UTILITY
The answer to the first question is unambiguously “yes.” To answer the second ques-
tion in the affirmative requires some assumptions on B.
To analyse the problem formally we need a bit more notation.
Suppose that the decision-maker has a rational preference relation % on X. For any
nonempty subset of alternatives B ⊂ X we let C∗ (B, %) be her preference maximising
choices, i.e.,
C∗ (B, %) = {x ∈ B | x % y for every y ∈ B}.
P ROPOSITION 2. Suppose that % is a rational preference relation and that B is such
that C∗ (B, %) is nonempty for all B in B. Then the choice structure (B,C∗ (·, %)) satisfies
the weak axiom of revealed preference.
P ROOF. Suppose that for some B in B we have x, y in B and x in C∗ (B, %). By the
definition of C∗ (B, %) this means that x % y. Now consider some other B0 in B with x, y in
B and y in C∗ (B0 , %). Thus y % z for all z in B0 . But since x is in C∗ (B, %) and y is in B we
have x % y. Also since % is rational it is transitive and so x % z for all z in B0 . And so x is
in C∗ (B0 , %) as well, as we require.
The relation between C∗ (·, %) and the preferences % is quite clear and is not problem-
atic. The relation between a choice structure (B,C(·))and the revealed preference relation
%∗ is a little less clear, even if %∗ is rational. Can you think of an example in which %∗
is rational but doesn’t seem to quite reflect the choice structure? [Hint: Try to have %∗
contain all ordered pairs, and yet not have the choice structure be that of complete indif-
ference.] To say something systematic it is convenient to say precisely what we mean by a
preference relation representing or rationalising a choice structure.
D EFINITION 6. Given a choice structure (B,C(·)), we say that the rational preference
relation % rationalises C(·) relative to B if C(B) = C∗ (B, %) for all B in B, that is,if %
(and B) generates the choice structure (B,C(·)).
We are now in a position to answer the second question raised earlier.
P ROPOSITION 3. If (B,C(·))is a choice structure satisfying the weak axiom and if
B includes all nonempty subsets of X containing three or fewer elements then there is a
rational preference relation % that rationalises C(·) relative to B. that is, C(B) = C∗ (B, %)
for all B in B. Furthermore this rational preference relation is the only preference relation
that does so.
The proof of this proposition is not too difficult. You can read it in ? if you are up to
it. (It’s Proposition 1.D.2 there.) We shall not prove it here.
We can now ask about the relationship between the preference relation being rational
and it being represented by a utility function. Again, in one direction we get a very stark
answer.
P ROPOSITION 4. If a preference relation % can be represented by a utility function
then it is rational.
Again, we shall not prove this here. It’s not very difficult. You can find the proof in ?
and one of the homework exercises guides you through the proof.
There are a number of circumstances in which the converse is true. In particular, if the
set of alternatives X is finite then any rational preference relation can be represented by a
utility function. In cases in which X is not finite one needs additional conditions in order
to guarantee that the preferences cab be so represented. We shall return to this question a
little when we discuss the consumer’s decision problem in a little more detail in the next
section.
We think of the commodity space X as including all possible uses of the consumer’s
wealth. Thus if the consumer were not to spend all of her wealth she would be forgoing the
possibility of increasing her consumption with no alternate use of her wealth. We could
allow such situations, but it would limit what we could say about the consumer’s behaviour.
Recognising that it is a substantive assumption, though rather a mild one, we assume that
the consumer spends all of her wealth. We call this assumption Walras’ law.
D EFINITION 10. The Walrasian demand correspondence satisfies Walras’ law if p ·
x = w for any p ∈ RL++ , w ≥ 0 and x ∈ x(p, w).
We now assume that the Walrasian demand correspondence is single valued, that is,
that it is a function. There are a number of essentially adding up restrictions implead by the
homogeneity of degree zero and Walras’ law. Consider the requirement of homogeneity
x(α p, αw) − x(p, w) = 0 for all α > 0. Suppose we differentiate this with respect to α and
evaluate the result at α = 1. We obtain the following result.
P ROPOSITION 5. If the Walrasian demand function x(p, w) is homogeneous of degree
zero then for all p and w
L ∂ x` (p, w) ∂ x (p, w)
∑ ∂ p
pk + `
∂w
w = 0 for ` = 1, . . . , L.
k=1 k
If you know matrix notation you can say this a bit more simply.
D p x(p, w)p + Dw x(p, w)w = 0.
Now consider Walras’ law p · x(p, w) = w for all p and w. Suppose we differentiate
this with respect to price. We obtain the following result.
P ROPOSITION 6. If the Walrasian demand function x(p, w) satisfies Walras’ law then
for all p and w
L ∂ x (p, w)
∑ p` `∂ p + xk (p, w) = 0 for k = 1, . . . , L.
`=1 k
Or in matrix notation
p · D p x(p, w)p + x(p, w)T = 0T .
Consider again Walras’ law p · x(p, w) = w for all p and w and differentiate this time
with respect to w. We obtain the following result.
P ROPOSITION 7. If the Walrasian demand function x(p, w) satisfies Walras’ law then
for all p and w
L ∂ x (p, w)
∑ p` `∂ w = 1.
`=1
Or in matrix notation
p · Dw x(p, w)p = 1.
D EFINITION 11. The Walrasian demand function x(p, w) satisfies the weak axiom of
revealed preference if the following property holds for any two price wealth situations
(p, w) and (p0 , w0 :
1. Constrained Maximisation
1.1. Lagrange Multipliers. Consider the problem of a consumer who seeks to dis-
tribute his income across the purchase of the two goods that he consumes, subject to the
constraint that he spends no more than his total income. Let us denote the amount of the
first good that he buys x1 and the amount of the second good x2 , the prices of the two
goods p1 and p2 , and the consumer’s income y. The utility that the consumer obtains from
consuming x1 units of good 1 and x2 of good two is denoted u(x1 , x2 ). Thus the consumer’s
problem is to maximise u(x1 , x2 ) subject to the constraint that p1 x1 + p2 x2 ≤ y. (We shall
soon write p1 x1 + p2 x2 = y, i.e., we shall assume that the consumer must spend all of his
income.) Before discussing the solution of this problem lets write it in a more “mathemat-
ical” way.
max u(x1 , x2 )
x1 ,x2
(5)
subject to p1 x1 + p2 x2 = y
We read this “Choose x1 and x2 to maximise u(x1 , x2 ) subject to the constraint that p1 x1 +
p2 x2 = y.”
Let us assume, as usual, that the indifference curves (i.e., the sets of points (x1 , x2 ) for
which u(x1 , x2 ) is a constant) are convex to the origin. Let us also assume that the indif-
ference curves are nice and smooth. Then the point (x1∗ , x2∗ ) that solves the maximisation
problem (5) is the point at which the indifference curve is tangent to the budget line as
given in Figure 1.
One thing we can say about the solution is that at the point (x1∗ , x2∗ ) it must be true that
the marginal utility with respect to good 1 divided by the price of good 1 must equal the
marginal utility with respect to good 2 divided by the price of good 2. For if this were
not true then the consumer could, by decreasing the consumption of the good for which
this ratio was lower and increasing the consumption of the other good, increase his utility.
Marginal utilities are, of course, just the partial derivatives of the utility function. Thus we
have
∂u ∗ ∗ ∂u ∗ ∗
∂ x1 (x1 , x2 ) ∂ x2 (x1 , x2 )
(6) = .
p1 p2
The argument we have just made seems very “economic.” It is easy to give an alternate
argument that does not explicitly refer to the economic intuition. Let x2u be the function
that defines the indifference curve through the point (x1∗ , x2∗ ), i.e.,
u(x1 , x2u (x1 )) ≡ ū ≡ u(x1∗ , x2∗ ).
Now, totally differentiating this identity gives
∂u ∂u dxu
(x1 , x2u (x1 )) + (x1 , x2u (x1 )) 2 (x1 ) = 0.
∂ x1 ∂ x2 dx1
29
30 4. CONSUMER BEHAVIOUR: OPTIMISATION SUBJECT TO THE BUDGET CONSTRAINT
x2
6
@
@
@
@
@
@
x2∗ qqqqqqqqqqqq@
qq@
q qq
q
qq @
q @ u(x1 , x2 ) = ū
qq
q @
qq @
q
qq
@
p1 x1 + p2 x2 = y
q @
q @ -
x1∗ x1
Figure 1
That is,
∂u u
dx2u ∂ x (x1 , x2 (x1 ))
(x1 ) = − ∂ u1 .
dx1 (x1 , xu (x1 ))
∂ x2 2
Now x2u (x1∗ ) = x2∗ . Thus the slope of the indifference curve at the point (x1∗ , x2∗ )
∂u ∗ ∗
dx2u ∗ ∂ x (x1 , x2 )
(x1 ) = − ∂ u1 .
dx1 (x∗ , x∗ )
∂ x2 1 2
p
Also, the slope of the budget line is − p1 .
Combining these two results again gives result
2
(6).
Since we also have another equation that (x1∗ , x2∗ ) must satisfy, viz
(7) p1 x1∗ + p2 x2∗ = y
we have two equations in two unknowns and we can (if we know what the utility function
is and what p1 , p2 , and y are) go happily away and solve the problem. (This isn’t quite true
but we shall not go into that at this point.) What we shall develop is a systemic and useful
way to obtain the conditions (6) and (7). Let us first denote the common value of the ratios
in (6) by λ . That is,
∂u ∗ ∗ ∂u ∗ ∗
∂ x1 (x1 , x2 ) ∂ x2 (x1 , x2 )
=λ =
p1 p2
and we can rewrite this and (7) as
∂u ∗ ∗
(x , x ) − λ p1 = 0
∂ x1 1 2
(8) ∂u ∗ ∗
(x , x ) − λ p2 = 0
∂ x2 1 2
y − p1 x1∗ − p2 x2∗ = 0.
1. CONSTRAINED MAXIMISATION 31
Now we have three equations in x1∗ , x2∗ , and the new artificial or auxiliary variable λ . Again
we can, perhaps, solve these equations for x1∗ , x2∗ , and λ . Consider the following function
max f (x1 , . . . , xn )
x1 ,...,xn
(10)
subject to g(x1 , . . . , xn ) = c
and we let
then if (x1∗ , . . . , xn∗ ) solves (10) there is a value of λ , say λ ∗ such that
∂L ∗
(12) (x , . . . , xn∗ , λ ∗ ) = 0 i = 1, . . . , n
∂ xi 1
∂L ∗
(13) (x , . . . , xn∗ , λ ∗ ) = 0.
∂λ 1
Notice that the conditions (12) are precisely the first order conditions for choosing
x1 , . . . , xn to maximise L , once λ ∗ has been chosen. This provides an intuition into this
method of solving the constrained maximisation problem. In the constrained problem we
have told the decision maker that he must satisfy g(x1 , . . . , xn ) = c and that he should choose
among all points that satisfy this constraint the point at which f (x1 , . . . , xn ) is greatest. We
arrive at the same answer if we tell the decision maker to choose any point he wishes but
that for each unit by which he violates the constraint g(x1 , . . . , xn ) = c we shall take away λ
units from his payoff. Of course we must be careful to choose λ to be the correct value. If
we choose λ too small the decision maker may choose to violate his constraint—e.g., if we
made the penalty for spending more than the consumer’s income very small the consumer
would choose to consume more goods than he could afford and to pay the penalty in utility
terms. On the other hand if we choose λ too large the decision maker may violate his
constraint in the other direction, e.g., the consumer would choose not to spend any of his
income and just receive λ units of utility for each unit of his income.
It is possible to give a more general statement of this technique, allowing for multiple
constraints. (Of course, we should always have fewer constraints than we have variables.)
Suppose we have more than one constraint. Consider the problem
max f (x1 , . . . , xn )
x1 ,...,xn
subject to g1 (x1 , . . . , xn ) = c1
.. ..
. .
gm (x1 , . . . , xn ) = cm .
L (x1 , . . . , xn , λ1 , . . . , λm ) = f (x1 , . . . , xn )
(14)
+ λ1 (c1 − g1 (x1 , . . . , xn )) + · · · + λm (cm − gm (x1 , . . . , xn ))
32 4. CONSUMER BEHAVIOUR: OPTIMISATION SUBJECT TO THE BUDGET CONSTRAINT
and again if (x1∗ , . . . , xn∗ ) solves (14) there are values of λ , say λ1∗ , . . . , λm∗ such that
∂L ∗
(x , . . . , xn∗ , λ1∗ , . . . , λm∗ ) = 0 i = 1, . . . , n
∂ xi 1
(15)
∂L ∗
(x , . . . , xn∗ , λ1∗ , . . . , λm∗ ) = 0 j = 1, . . . , m.
∂λj 1
1.2. Caveats and Extensions. Notice that we have been referring to the set of condi-
tions which a solution to the maximisation problem must satisfy. (We call such conditions
necessary conditions.) So far we have not even claimed that there necessarily is a solution
to the maximisation problem. There are many examples of maximisation problems which
have no solution. One example of an unconstrained problem with no solution is
(16) max 2x
x
maximise over the choice of x the function 2x. Clearly the greater we make x the greater is
2x, and so, since there is no upper bound on x there is no maximum. Thus we might want
to restrict maximisation problems to those in which we choose x from some bounded set.
Again, this is not enough. Consider the problem
(17) max 1/x .
0≤x≤1
The smaller we make x the greater is 1/x and yet at zero 1/x is not even defined. We could
define the function to take on some value at zero, say 7. But then the function would not
be continuous. Or we could leave zero out of the feasible set for x, say 0 < x ≤ 1. Then
the set of feasible x is not closed. Since there would obviously still be no solution to the
maximisation problem in these cases we shall want to restrict maximisation problems to
those in which we choose x to maximise some continuous function from some closed (and
because of the previous example) bounded set. (We call a set of numbers, or more generally
a set of vectors, that is both closed and bounded a compact set.) Is there anything else that
could go wrong? No! The following result says that if the function to be maximised is
continuous and the set over which we are choosing is both closed and bounded, i.e., is
compact, then there is a solution to the maximisation problem.
T HEOREM 2 (The Weierstrauss Theorem). Let S be a compact set. Let f be a continu-
ous function that takes each point in S to a real number. (We usually write: let f : S → R be
continuous.) Then there is some x∗ in S at which the function is maximised. More precisely,
there is some x∗ in S such that f (x∗ ) ≥ f (x) for any x in S.
Notice that in defining such compact sets we typically use inequalities, such as x ≥ 0.
However in Section 1 we did not consider such constraints, but rather considered only
equality constraints. However, even in the example of utility maximisation at the beginning
of Section 1.1, there were implicitly constraints on x1 and x2 of the form
x1 ≥ 0, x2 ≥ 0.
A truly satisfactory treatment would make such constraints explicit. It is possible to ex-
plicitly treat the maximisation problem with inequality constraints, at the price of a little
additional complexity. We shall return to this question later in the book.
Also, notice that had we wished to solve a minimisation problem we could have trans-
formed the problem into a maximisation problem by simply multiplying the objective func-
tion by −1. That is, if we wish to minimise f (x) we could do so by maximising − f (x).
As an exercise write out the conditions analogous to the conditions (8) for the case that we
wanted to minimise u(x). Notice that if x1∗ , x2∗ , and λ satisfy the original equations then x1∗ ,
x2∗ , and −λ satisfy the new equations. Thus we cannot tell whether there is a maximum
at (x1∗ , x2∗ ) or a minimum. This corresponds to the fact that in the case of a function of a
single variable over an unconstrained domain at a maximum we require the first derivative
2. THE THEOREM OF THE MAXIMUM 33
to be zero, but that to know for sure that we have a maximum we must look at the second
derivative. We shall not develop the analogous conditions for the constrained problem with
many variables here. However, again, we shall return to it later in the book.
subject to g1 (x1 , . . . , xn , a1 , . . . , ak ) = c1
(18)
.. ..
. .
gm (x1 , . . . , xn , a1 , . . . , ak ) = cm
In order to be able to say whether or not the problem has a unique solution it is useful
to know something about the shape or curvature of the functions f and g. We say a function
is concave if for any two points in the domain of the function the value of function at a
weighted average of the two points is greater than the weighted average of the value of the
function at the two points. We say the function is convex if the value of the function at the
average is less than the average of the values. The following definition makes this a little
more explicit. (In both definitions x = (x1 , . . . , xn ) is a vector.)
D EFINITION 12. A function f is concave if for any x and x0 with x 6= x0 and for any t
such that 0 < t < 1 we have f (tx + (1 −t)x0 ) ≥ t f (x) + (1 −t) f (x0 ). The function is strictly
concave if f (tx + (1 − t)x0 ) > t f (x) + (1 − t) f (x0 ).
A function f is convex if for any x and x0 with x 6= x0 and for any t such that 0 <
t < 1 we have f (tx + (1 − t)x0 ) ≤ t f (x) + (1 − t) f (x0 ). The function is strictly convex if
f (tx + (1 − t)x0 ) < t f (x) + (1 − t) f (x0 ).
The result we are about to give is most conveniently stated when our statement of the
problem is in terms of inequality constraints rather than equality constraints. As mentioned
earlier we shall examine this kind of problem later in this course. However for the moment
in order to proceed with our discussion of the problem involving equality constraints we
shall assume that all of the functions with which we are dealing are increasing in the x
variables. (See Exercise 1 for a formal definition of what it means for a function to be
increasing.) In this case if f is strictly concave and g j is convex for each j then the prob-
lem has a unique solution. In fact the concepts of concavity and convexity are somewhat
stronger than is required. We shall see later in the course that they can be replaced by the
concepts of quasi-concavity and quasi-convexity. In some sense these latter concepts are
the “right” concepts for this result.
T HEOREM 3. Suppose that f and g j are increasing in (x1 , . . . , xn ). If f is strictly
concave in (x1 , . . . , xn ) and g j is convex in (x1 , . . . , xn ) for j = 1, . . . , m then for each value
34 4. CONSUMER BEHAVIOUR: OPTIMISATION SUBJECT TO THE BUDGET CONSTRAINT
of the parameters (a1 , . . . , ak ) if problem (18) has a solution (x1∗ , . . . , xn∗ ) that solution is
unique.
Now let v(a1 , . . . , ak ) be the maximised value of f when the parameters are (a1 , . . . , ak ).
Let us suppose that the problem is such that the solution is unique and that (x1∗ (a1 , . . . , ak ), . . . , xn∗ (a1 , . . . , ak ))
are the values that maximise the function f when the parameters are (a1 , . . . , ak ) then
(19) v(a1 , . . . , ak ) = f (x1∗ (a1 , . . . , ak ), . . . , xn∗ (a1 , . . . , ak ), a1 , . . . , ak ).
(Notice however that the function v is uniquely defined even if there is not a unique max-
imiser.)
The Theorem of the Maximum gives conditions on the problem under which the func-
tion v and the functions x1∗ , . . . , xn∗ are continuous. The constraints in the problem (18)
define a set of feasible vectors x over which the function f is to be maximised. Let us call
this set G(a1 , . . . , ak ), i.e.,
(20) G(a1 , . . . , ak ) = {(x1 , . . . , xn ) | g j (x1 , . . . , xn , a1 , . . . , ak ) = c j ∀ j}
Now we can restate the problem as
max f (x1 , . . . , xn , a1 , . . . , ak )
x1 ,...,xn
(21)
subject to (x1 , . . . , xn ) ∈ G(a1 , . . . , ak ).
Notice that both the function f and the feasible set G depend on the parameters a,
i.e., both may change as a changes. The Theorem of the Maximum requires both that the
function f be continuous as a function of x and a and that the feasible set G(a1 , . . . , ak )
change continuously as a changes. We already know—or should know—what it means for
f to be continuous but the notion of what it means for a set to change continuously is less
elementary. We call G a set valued function or a correspondence. G associates with any
vector (a1 , . . . , ak ) a subset of the vectors (x1 , . . . , xn ). The following two definitions define
what we mean by a correspondence being continuous. First we define what it means for
two sets to be close.
D EFINITION 13. Two sets of vectors A and B are within ε of each other if for any
vector x in one set there is a vector x0 in the other set such that x0 is within ε of x.
We can now define the continuity of the correspondence G in essentially the same way
that we define the continuity of a single valued function.
D EFINITION 14. The correspondence G is continuous at (a1 , . . . , ak ) if for any ε > 0
there is δ > 0 such that if (a01 , . . . , a0k ) is within δ of (a1 , . . . , ak ) then G(a01 , . . . , a0k ) is within
ε of G(a1 , . . . , ak ).
It is, unfortunately, not the case that the continuity of the functions g j necessarily im-
plies the continuity of the feasible set. (Exercise 2 asks you to construct a counterexample.)
R EMARK 1. It is possible to define two weaker notions of continuity, which we call
upper hemicontinuity and lower hemicontinuity. A correspondence is in fact continuous in
the way we have defined it if it is both upper hemicontinuous and lower hemicontinuous.
We are now in a position to state the Theorem of the Maximum. We assume that f is
a continuous function, that G is a continuous correspondence, and that for any (a1 , . . . , ak )
the set G(a1 , . . . , ak ) is compact. The Weierstrauss Theorem thus guarantees that there is a
solution to the maximisation problem (21) for any (a1 , . . . , ak ).
T HEOREM 4 (Theorem of the Maximum). Suppose that f (x1 , . . . , xn , a1 , . . . , ak ) is
continuous (in (x1 , . . . , xn , a1 , . . . , ak )), that G(a1 , . . . , ak ) is a continuous correspondence,
and that for any (a1 , . . . , ak ) the set G(a1 , . . . , ak ) is compact. Then
(1) v(a1 , . . . , ak ) is continuous, and
3. THE ENVELOPE THEOREM 35
(2) if (x1∗ (a1 , . . . , ak ), . . . , xn∗ (a1 , . . . , ak )) are (single valued) functions then they are
also continuous.
Later in the course we shall see how the Implicit Function Theorem allows us to iden-
tify conditions under which the functions v and x∗ are differentiable.
Exercises.
E XERCISE 31. We say that the function f (x1 , . . . , xn ) is nondecreasing if xi0 ≥ xi for
each i implies that f (x10 , . . . , xn0 ) ≥ f (x1 , . . . , xn ), is increasing if xi0 > xi for each i implies
that f (x10 , . . . , xn0 ) > f (x1 , . . . , xn ) and is strictly increasing if xi0 ≥ xi for each i and x0j > x j
for at least one j implies that f (x10 , . . . , xn0 ) > f (x1 , . . . , xn ). Show that if f is nondecreasing
and strictly concave then it must be strictly increasing. [Hint: This is very easy.]
E XERCISE 32. Show by example that even if the functions g j are continuous the
correspondence G may not be continuous. [Hint: Use the case n = m = k = 1.]
f (x∗ (a0 ), a0 ) qq qq qq qq qq qq qq qq qq qq qq qq qq qq qq qq q q q q qq
f (x∗ (a), a0 ) qq q
qq
q f (·, a0 )
qqqqqqqqqqqqqqqq q
f (x∗ (a), a) qq qq
qq q
qq
q
qq q
qq
qq q
qq qq
q q f (·, a)
qq qq
q q -
x∗ (a) x∗ (a0 ) x
Figure 2
To motivate our discussion of the Envelope Theorem we will first consider a particular
case, viz, the relation between short and long run average cost curves. Recall that, in
36 4. CONSUMER BEHAVIOUR: OPTIMISATION SUBJECT TO THE BUDGET CONSTRAINT
general we assume that the average cost of producing some good is a function of the amount
of the good to be produced. The short run average cost function is defined to be the function
which for any quantity, Q, gives the average cost of producing that quantity, taking as given
the scale of operation, i.e., the size and number of plants and other fixed capital which we
assume cannot be changed in the short run (whatever that is). The long run average cost
function on the other hand gives, as a function of Q, the average cost of producing Q units
of the good, with the scale of operation selected to be the optimal scale for that level of
production.
That is, if we let the scale of operation be measured by a single variable k, say, and
we let the short run average cost of producing Q units when the scale is k be given by
SRAC(Q, k) and the long run average cost of producing Q units by LRAC(Q) then we have
LRAC(Q) = min SRAC(Q, k).
k
Let us denote, for a given value Q, the optimal level of k by k(Q). That is, k(Q) is the value
of k that minimises the right hand side of the above equation.
Graphically, for any fixed level of k the short run average cost function can be rep-
resented by a curve (normally assumed to be U-shaped) drawn in two dimensions with
quantity on the horizontal axis and cost on the vertical axis. Now think about drawing one
short run average cost curve for each of the (infinite) possible values of k. One way of
thinking about the long run average cost curve is as the “bottom” or envelope of these short
run average cost curves. Suppose that we consider a point on this long run or envelope
curve. What can be said about the slope of the long run average cost curve at this point. A
little thought should convince you that it should be the same as the slope of the short run
curve through the same point. (If it were not then that short run curve would come below
the long run curve, a contradiction.) That is,
d LRAC(Q) ∂ SRAC(Q, k(Q))
= .
dQ ∂Q
See Figure 3.
Cost
6
SRAC
LRAC(Q̄) =
SRAC(Q̄, k(Q̄)) q q q q q q q q q q q q q q q qq
qq
q
qq LRAC
q
qq
q
qq
q
qq -
Q̄ Q
Figure 3
3. THE ENVELOPE THEOREM 37
The envelope theorem is a general statement of the result of which this is a special
case. We will consider not only cases in which Q and k are vectors, but also cases in which
the maximisation or minimisation problem includes some constraints.
Let us consider again the maximisation problem (18). Recall:
max f (x1 , . . . , xn , a1 , . . . , ak )
x1 ,...,xn
subject to g1 (x1 , . . . , xn , a1 , . . . , ak ) = c1
.. ..
. .
gm (x1 , . . . , xn , a1 , . . . , ak ) = cm
Again let L (x1 , . . . , xn , λ1 , . . . , λm ; a1 , . . . , ak ) be the Lagrangian function.
L (x1 , . . . , xn , λ1 , . . . , λm ; a1 , . . . , ak ) = f (x1 , . . . , xn , a1 , . . . , ak )
m
(22)
+ ∑ λ j (c j − g j (x1 , . . . , xn , a1 , . . . , ak )).
j=1
Let (x1∗ (a1 , . . . , ak ), . . . , xn∗ (a1 , . . . , ak )) and (λ1 (a1 , . . . , ak ), . . . , λm (a1 , . . . , ak )) be the values
of x and λ that solve this problem. Now let
(23) v(a1 , . . . , ak ) = f (x1∗ (a1 , . . . , ak ), . . . , xn∗ (a1 , . . . , ak ), a1 , . . . , ak )
That is, v(a1 , . . . , ak ) is the maximised value of the function f when the parameters are
(a1 , . . . , ak ). The envelope theorem says that the derivative of v is equal to the derivative of
L at the maximising values of x and λ . Or, more precisely
T HEOREM 5 (The Envelope Theorem). If all functions are defined as above and the
problem is such that the functions x∗ and λ are well defined then
∂v ∂L ∗
(a1 , . . . , ak ) = (x (a , . . . , ak ), . . . , xn∗ (a1 , . . . , ak ),
∂ ah ∂ ah 1 1
λ1 (a1 , . . . , ak ), . . . , λm (a1 , . . . , ak ), a1 , . . . , ak )
∂f ∗
= (x (a , . . . , ak ), . . . , xn∗ (a1 , . . . , ak ), a1 , . . . , ak )
∂ ah 1 1
m ∂g
− ∑ λ j (a1 , . . . , ak ) h (x1∗ (a1 , . . . , ak ), . . . , xn∗ (a1 , . . . , ak ), a1 , . . . , ak )
j=1 ∂ ah
for all h.
In order to show the advantages of using matrix and vector notation we shall restate
the theorem in that notation before returning to give a proof of the theorem. (In proving
the theorem we shall return to using mainly scalar notation.)
T HEOREM 5 (The Envelope Theorem). Under the same conditions as above
∂v ∂L ∗
(a) = (x (a), λ (a), a)
∂a ∂a
∂f ∗ ∂g
= (x (a), a) − λ (a) (x∗ (a), a).
∂a ∂a
P ROOF. From the definition of the function v we have
(24) v(a1 , . . . , ak ) = f (x1∗ (a1 , . . . , ak ), . . . , xn∗ (a1 , . . . , ak ), a1 , . . . , ak )
Thus
∂v ∂f ∗ n
∂f ∗ ∂ x∗
(25) (a) = (x (a), a) + ∑ (x (a), a) i (a).
∂ ah ∂ ah i=1 ∂ xi ∂ ah
38 4. CONSUMER BEHAVIOUR: OPTIMISATION SUBJECT TO THE BUDGET CONSTRAINT
i=1 i h h
Or
n ∂gj ∂ xi∗ ∂gj ∗
(27) ∑ ∂x (x∗ (a), a)
∂ ah
(a) = −
∂ ah
(x (a), a).
i=1 i
Exercises.
E XERCISE 33. Rewrite this proof using matrix notation. Go through your proof and
identify the dimension of each of the vectors or matrices you use. For example fx is a 1 × n
vector, gx is an m × n matrix.
when evaluated at the point which solves the minimisation problem which we write as
hi (p1 , . . . , pn , u0 ) to distinguish this (compensated) value of the demand for good i as a
function of prices and utility from the (uncompensated) value of the demand for good i as
a function of prices and income. This result is known as Hotelling’s Theorem.
4.3. The Hicks-Slutsky Equations. It can be shown that the compensated demand at
utility u0 , i.e., hi (p1 , . . . , pn , u0 ) is equal to the uncompensated demand at income e(p1 , . . . , pn , u0 ),
i.e., xi (p1 , . . . , pn , e(p1 , . . . , pn , u0 )). (This result is known as the duality theorem.) Thus
totally differentiating the identity
∂ xi ∂ xi ∂ hi
+ h = .
∂ pk ∂ y k ∂ pk
So
∂ xi ∂ hi ∂x
= − hk i
∂ pk ∂ pk ∂y
for all i, k = 1, . . . , n. These are the Hicks-Slutsky equations.
4.4. The Indirect Utility Function. Again let v(p1 , . . . , pn , y) be the indirect utility
function, that is, the maximised value of utility as described in Application (1). Then by
the Envelope Theorem
∂v ∂u
= − λ xi (p1 , . . . , pn , y) = −λ xi (p1 , . . . , pn , y)
∂ pi ∂ pi
∂u ∂v
since ∂ pi = 0. Now, since we have already shown that λ = ∂y (in Section 4.1) we have
∂ v/∂ pi
xi (p1 , . . . , pn , y) = − .
∂ v/∂ y
4.5. Profit functions. Now consider the problem of a firm that maximises profits
subject to technology constraints. Let x = (x1 , . . . , xn ) be a vector of netputs, i.e., xi is
positive if the firm is a net supplier of good i, negative if the firm is a net user of that good.
Let assume that we can write the technology constraints as F(x) = 0. Thus the firm’s
problem is
n
max
x1 ,...,xn
∑ pi xi
i=1
subject to F(x1 , . . . , xn ) = 0.
Let ϕi (p) be the value of xi that solves this problem, i.e., the net supply of commodity
i when prices are p. (Here p is a vector.) We call the maximised value the profit function
which is given by
n
Π(p) = ∑ pi ϕi (p).
i=1
And so by the Envelope Theorem
∂Π
= ϕi (p).
∂ pi
This result is known as Hotelling’s lemma.
Exercises.
E XERCISE 34. Consider the direct utility function
n
u(x) = ∑ βi log(xi − γi ),
i=1
where βi and γi , i = 1, . . . , n are, respectively, positive and nonpositive parameters.
(1) Derive the indirect utility function and show that it is decreasing in its arguments.
(2) Verify Roy’s Theorem.
(3) Derive the expenditure function and show that it is homogeneous of degree one
and nondecreasing in prices.
(4) Verify Hotelling’s Theorem.
costs vary as b changes. (That is, what is the derivative of the minimised cost with respect
to b.)
CHAPTER 5
In this chapter we examine decision making under uncertainty. This is another special
case (like consumer theory) of the general theory of motivated decision making that we
developed in the previous chapter. As was the case with consumer theory the extra structure
that our model of decision making under uncertainty puts on the problem allows us to say
a bit more than we could in the very general case. It is well to remember however that
there are some facts about this model that follow simply from the fact that this model is a
special case of the general model of motivated decision making.
It is possible to divide the models of decision making under uncertainty into those
in which the uncertainty is assumed to have an objective character with uncertain events
having a particular objectively given probability and those in which the subjective prob-
ability that a decision maker assigns to a given event is derived from her preferences or
choice behaviour along with her utility function. we shall develop here only the first kind
of model.
that the event E has occurred (and that this is all we know). Thus we have essentially a
new probability distribution in which the states outside E are given probability zero and
those in E are given a proportionately greater probability so that the sum of the probability
of the states in E is 1. To calculate the probability of F in this new situation we would add
the new probabilities for all those states in F, or, since the states outside E have probability
zero, the probabilities of those states that are in both E and F. That is,
∑ p(ω)
ω∈(E∩F) P(E ∩ F)
P(F | E) = = .
∑ p(ω) P(E)
ω∈E
From these definitions it is possible to derive a number of results about the probability
of events.
First we see that
P(E ∪ F) = P(E) + P(F) − P(E ∩ F).
This is the easiest implication of the definitions and you are left as an exercise to convince
yourself of its truth. (Constructing a proof of the claim is clearly a good way of doing this.)
Rewriting the definition of conditional probability gives
(29) P(E ∩ F) = P(E) · P(F | E).
1.1. Bayes’ Theorem. We turn now to something just a little less trivial. Consider a
situation in which we have some prior assessment of the uncertainty of a particular event.
For example we might believe that the chance that it will rain is 30% or 0.3. (If this seems
a bit unrealistic pretend that we are not in Auckland.)
Now we are sitting watching TV and the weather person tells us that it’s going to rain
tomorrow. How should we adjust our assessment of the probability that it will rain tomor-
row? Suppose that we know that the weather person is pretty good and makes mistakes
with probability only 10% or 0.1. That is, if it’s actually going to rain she says it’s going to
rain with probability 0.9 and says it’s not going to rain with probability 0.1. On the other
hand, if it’s not going to rain she says it’s going to rain with probability 0.1 and says it’s not
going to rain with probability 0.9. Think about this problem and make some calculation
of what you think your assessment of the probability of rain would be after hearing the
weather person’s prediction that it will rain.
As another example suppose that there was screening for some rare disease. Suppose
that one person in ten thousand actually has the disease. Suppose that the test used to show
if a person has the disease is quite good. If the person has the disease the test will positive
with certainty. If the person does not have the disease the test will give a negative result 99
times out of 100, but will one time out of 100, that is, with probability 0.01, give a positive
result. (Such a result is called a false positive.) If a randomly selected person is tested and
gives a positive result what would be your assessment of the probability that the person
actually had the disease? Again, think about this problem and try to formulate an answer
before reading further.
These problems are special cases of a more general situation. Let us suppose that we
have a list of mutually exclusive and completely exhaustive events E1 , . . . , ET . We know
the (unconditional) probability of each of these events. Now we observe the result of an
experiment, say that event F has occurred. Our analysis tells us, for each t the probability
of F conditional on Et . What can we say about the probability of Et conditional on the
result F?
Well, the definition of conditional probability tells us that
P(Et ∩ F)
(30) P(Et | F) = .
P(F)
1. REVISION OF PROBABILITY THEORY 45
And it also tells us (in the form given in equation 29) that
(31) P(Et ∩ F) = P(Et )P(F | Et ).
Now we don’t know P(F). But since the events E1 , . . . , ET are mutually exclusive so are
E1 ∩ F, E2 ∩ F, . . . , ET ∩ F. And since E1 , . . . , ET are completely exhaustive E1 ∪ E2 ∪ · · · ∪
ET = Ω. Thus
F = F ∩Ω
= F ∩ (E1 ∪ E2 ∪ · · · ∪ ET )
= (F ∩ E1 ) ∪ (F ∩ E2 ) ∪ · · · ∪ (F ∩ ET )
= (E1 ∩ F) ∪ (E2 ∩ F) ∪ · · · ∪ (ET ∩ F).
And so
P(F) = P((E1 ∩ F) ∪ (E2 ∩ F) ∪ · · · ∪ (ET ∩ F))
(32) = P(E1 ∩ F) + P(E2 ∩ F) + · · · + P(ET ∩ F)
since the events E1 ∩ F, E2 ∩ F, . . . , ET ∩ F are mutually exclusive.
Collecting these results we substitute 31 and 32 into 30 to get
P(Et )P(F | Et )
P(Et | F) =
P(E1 ∩ F) + P(E2 ∩ F) + · · · + P(ET ∩ F)
P(Et )P(F | Et )
(33) = .
P(E1 )P(F | E1 ) + P(E2 )P(F | E2 ) + · · · + P(ET )P(F | ET )
This result is called Bayes’ Theorem.
Let us go back to our examples. In the first example let E1 be the event that it rains
tomorrow and E2 the event that it doesn’t. Let F be the event that the weather person says
that it rains tomorrow. Thus we want to know P(E1 | F). Bayes’ Theorem tells us that
P(E1 )P(F | Et )
P(E1 | F) =
P(E1 )P(F | E1 ) + P(E2 )P(F | E2 )
0.3 × 0.9
=
0.3 × 0.9 + 0.7 × 0.1
0.27
=
0.27 + 0.07
27
= + 0.79.
34
That is, once we hear the weather person’s prediction we think there is about an 80%
chance of rain tomorrow.
In the second example let E1 be the event that the person has the disease and E2 the
event that they don’t. Let F be the event that the test result is positive. Thus, again, we
want to know P(E1 | F) and Bayes’ Theorem tells us that
P(E1 )P(F | Et )
P(E1 | F) =
P(E1 )P(F | E1 ) + P(E2 )P(F | E2 )
0.0001 × 1
=
0.0001 × 1 + 0.9999 × 0.01
0.0001
=
0.0001 + 0.009999
100
= + 0.01.
10099
In this case even people who have tested positively have only about a one in a hundred
chance of having the disease. Intuitively the reason for this perhaps initially surprising
result is that the disease is so rare that it is much more likely that the person will not have
the disease and the result will be incorrect than that the person will have the disease.
46 5. DECISION MAKING UNDER UNCERTAINTY
p3
6
p2
(0, 0, 1) XXX
@ XXX
@ XX (0, 1, 0)
@ C
@ C
@ C
@ C
@ C
@ C
@ C
@C -
(1, 0, 0) p1
Figure 1
(0, 0, 1)
TT
T
T
T
T
T
T
T
(1, 0, 0) (0, 1, 0)
Figure 2
D EFINITION 20. The utility function U : L → R has an expected utility form if there
is an assignment of numbers u1 , u2 , . . . , uN to the N outcomes such that for every simple
lottery L = (p1 , p2 , . . . , pN ) in L
U(L) = u1 p1 + u2 p2 + . . . uN pN .
A utility function U : L → R with the expected utility form is called a von Neumann-
Morgenstern expected utility function.
P ROPOSITION 8. A utility function U : L → R has an expected utility form if and
only if it is linear, that is, if and only if it satisfies the property that
!
K K
U ∑ αk Lk = ∑ αkU(Lk )
k=1 k=1
for any K lotteries Lk in L , k = 1, 2, . . . , K and probabilities α1 , α2 , . . . , αK such that αk ≥ 0
for all k and ∑k αk = 1.
P ROPOSITION 9. Suppose that U : L → R is a von Neumann-Morgenstern utility
function for the preference relation % on L . Then Ũ : L → R is another von Neumann-
Morgenstern utility function for % if and only if there are scalars β > 0 an γ such that
Ũ(L) = βU(L) + γ for every L in L .
3. EXPECTED UTILITY FUNCTIONS 49
In keeping with the style of this course we shall not prove this result. You can read the
proof in ? if you wish.
We shall however examine a little of the intuition as to why the result is true. The
roles of the two axioms are somewhat separate. The continuity axiom guarantees that the
preferences can be represented by a utility function U : L → R. The independence axiom
guarantees that the utility function may be chosen to have the expected utility form, that
is, to be linear in the probabilities. (Recall that if U : L → R represents the preferences
then so does any increasing transformation of U. Even if U is linear if Ũ is a nonlinear
transformation of U, for example Ũ = eU then Ũ will not be linear. Thus we cannot say
that U is linear but only that it may be chosen to be linear.)
The argument that the continuity axiom implies the existence of a utility function rep-
resenting the preferences is a bit technical and is essentially the same as the argument that
if a consumer’s preferences are continuous then they can be represented by a continuous
utility function. (Remember that we didn’t prove that either.)
We can get some intuition into the reason that the independence axiom implies the
existence of a linear utility function be examining the three outcome case. Recall that in
this case we can draw the three dimensional simplex as a triangle on the two dimensional
page. If the utility function is linear then
U(L) = U(p1 , p2 , p3 ) = u1 p1 + u2 p2 + u3 p3
and the indifference curves are parallel straight lines as shown in Figure 3a. (You should
prove this. It should either be very easy or very useful.)
The independence axiom implies both that the indifference curves are straight lines
and that they are parallel. To see that they are straight lines suppose that they are not. In
Figure 3b we show an indifference curve that is not straight. Notice that in the case shown
we can find L and L0 such that 12 L + 12 L0 L ∼ L0 . But this contradicts the independence
axiom which implies that since L % L0 that L = 12 L + 12 L % 12 L + 12 L0 .
In Figure 3c we see that the indifference curves must be parallel. An example is shown
in which the indifference curves are not parallel. We see that L ∼ L0 but that 13 L0 + 23 L00
1 2 00
3 L + 3 L contradicting the independence axiom.
50 5. DECISION MAKING UNDER UNCERTAINTY
L00
3 3
1 L + 2 L00
@ 3 1 L0 + 2 L00
@TT Direction of
6 TT
3 3 RT
@ r 3 3
@ TT
Increasing Rr
TrT
@T T @ T
@@ @ T Desirability T Direction of H
XX H
@ @ T X z T Increasing HT T
HH
@ @ @ T 1 L + 1 L0 T Desirability - 6 T T
@ @ @ @ T L r 2
r 2
r T r r
TT
L0
@ @ @ @T T L L0 T
@@ @ @ @ @ T T T
1 2 1 2 1 2
Figure 3
CHAPTER 6
We have until now been dealing with problems concerning only a single decision
maker. Let us turn now to examining situations in which a number of decision makers
interact. There are a number of basic ways of modelling such situations. This chapter
examines one such method. We shall think of the decision makers as acting based on their
assessment of what each other individual decision maker will do. This approach is known
as game theory or occasionally, and more informatively, as interactive decision theory.
In some situations it is reasonable to assume that each decision maker reacts to not his
assessment of what each individual will do but rather the value of some aggregate statistic
which varies little with the choices of one individual. In such a case it is a reasonable mod-
elling strategy to model the decision makers as taking as given the value of the aggregate
variable. The main approach of this kind is called general equilibrium theory and is the
subject of Chapter 7.
There are two central models of interactive decision problems or games: the model
of normal form games, and the model of extensive form games. While there exist minor
variants we shall define in this section two rather standard versions of the models. We
shall assume that the number of decision makers or players if finite and that the number of
choices facing each player is also finite.
D EFINITION 21. A (finite) normal form game is a triple (N, S, u) where N = {1, 2, . . . , n, . . . , N}
is the set of players, S = S1 × S2 × · · · × SN is the set of profiles of pure strategies with Sn
the finite set of pure strategies of player n, and u = (u1 , u2 , . . . , uN ) with un : S → R the
utility or payoff function of player n. We call the pair (N, S) the game form. Thus a game
is a game form together with a payoff function.
Thus we have specified a set of players and numbered them 1 through N. Somewhat
abusively we have also denoted this set by N. (This is a fairly common practice in math-
ematics and usually creates no confusion.) For each player we have specified a finite set
of actions or strategies that the player could take, which we denote Sn . We have denoted
the cartesian product of these sets by S. Thus a typical element of S is s = (s1 , s2 , . . . , sN )
where each sn is a pure strategy of player n, that is, an element of Sn . We call such an s a
pure strategy profile.
For each player n we have also specified a utility function un : S → R. We shall shortly
define also randomised or mixed strategies, so that each player will form a probabilistic
assessment over what the other players will do. Thus when a player chooses one of his
own strategies he is choosing a lottery over pure strategy profiles. So we are interpreting
the utility function as a representation of the player’s preferences over lotteries, that is, as
a von Neumann-Morgenstern utility function.
51
52 6. GAME THEORY
D EFINITION 22. A mixed strategy of player n is a lottery over the pure strategies of
player n. One of player n’s mixed strategies is denoted σn and the set of all player n’s mixed
strategies is denoted Σn . Thus σn = (σn (s1n ), σn (s2n ), . . . , σn (sKn n ) where Kn is the number of
pure strategies of player n and σn (sin ) ≥ 0 for i = 1, 2, . . . , Kn and ∑ i = 1Kn σn (sin ) = 1. The
cartesian product Σ = Σ1 × Σ2 × · · · × ΣN is the set of all mixed strategy profiles.
The definition of a mixed strategy should by now be a fairly familiar kind of thing.
We have met such things a number of times already. The interpretation of the concept is
also an interesting question, though perhaps the details are better left for other places. The
original works on game theory treated the mixed strategies as literally randomisation by
the player in question, and in a number of places one finds discussions of whether and why
players would actually randomise.
Some more recent works interpret the randomised strategies as uncertainty in the
minds of the other players as to what the player in question will actually do. This in-
terpretation seems to me a bit more satisfactory. In any case one can quite profitably use
the techniques without worrying too much about the interpretation.
Perhaps more important for us at the moment as we start to learn game theory is the
idea of extending the utility function of a player from that defined on the pure strategy
profiles to that defined on mixed strategies. We shall continue to use the same symbol
un to represent the expected utility of player n as a function of the mixed strategy profile
σ = (σ1 , σ2 , . . . , σN ). Intuitively un (σ ) is just the expected value of un (s) when s is a
random variable with distribution given by σ . Thus
We can in a similar way define un on a more general profile where for some n we have
mixed strategies and for others pure strategies.
E XAMPLE 1 (The Prisoner’s Dilemma). Consider the situation in which two criminals
are apprehended as they are making off with their loot. The police clearly have enough
evidence to convict them of possession of stolen property but don’t actually have evidence
that they actually committed the crime. So they question the criminals in separate cells
and offer each the following deal: Confess to the crime. If the other doesn’t confess then
we shall prosecute him on the basis of your evidence and he’ll go to jail for 10 years. In
gratitude for your cooperation we shall let you go free. If the other also confesses then
we don’t actually need your evidence so we shall prosecute both of you, but since you
cooperated with us we shall arrange with the judge that you only get 9 years. We are
offering the other criminal the same deal. If neither of you confess then we shall be able to
convict you for possession of stolen goods and you will each get one year.
How should we model such a situation. In actuality, of course, even in such a situ-
ation a criminal would have many options. He could make counter proposals; he could
confess to many other crimes as well; he could claim that his partner had committed many
other crimes. We might however learn something about what to expect in this situation
by analysing a model in which each criminal’s choices were limited to confessing or not
confessing. We shall also take the actions of the police as given and part of the environ-
ment and consider only the criminals as players. We shall assume that the criminals care
only about how much time they themselves spend in jail and not about how much time
the other spends, and moreover that their preferences are represented by a von Neumann-
Morgenstern utility function that assigns utility 0 to getting 10 years, utility 1 to getting 9
years, utility 9 to getting 1 year and utility 10 to getting off free.
1. NORMAL FORM GAMES 53
Thus we can model the situation as a game in which N = {1, 2}, S1 = S2 = {C, D} (for
(Confess, Don’t Confess)) and u1 (C,C) = 1, u1 (C, D) = 10, u1 (D,C) = 0, u1 (D, D) = 9,
and u2 (C,C) = 1, u2 (C, D) = 0, u2 (D,C) = 10, u1 (D, D) = 9.
Such a game is often represented as a labelled matrix as shown in Figure 1. Here
player 1’s strategies are listed vertically and player 2’s horizontally (and hence they are
sometimes referred to as the row player and the column player). Each cell in the matrix
contains a pair x, y listing first play 1’s payoff in that cell and then player 2’s.
1\2 C D
C 1, 1 10, 0
D 0, 10 9, 9
Figure 1
1\2 H T
H 1, −1 −1, 1
T −1, 1 1, −1
Figure 2
Consider the problem of a player in some game. Except in the most trivial cases the
set of strategies that he will be prepared to play will depend on his assessment of what the
other players will do. However it is possible to say a little. If some strategy was strictly
preferred by him to another strategy s whatever he thought the other players would do, then
he surely would not play s. And this remains true if it was some lottery over his strategies
that was strictly preferred to s. We call a strategy such as s a strictly dominated strategy.
Thus we have identified a set of strategies that we argue a rational player would not
play. But since everything about the game, including the rationality of the players, is
assumed to be common knowledge no player should put positive weight, in his assessment
of what the other players might do, on such a strategy. And we can again ask: Are there
any strategies that are strictly dominated when we restrict attention to the assessments that
put weight only on those strategies of the others that are not strictly dominated. If so, a
rational player who knew the rationality of the others would surely not play such a strategy.
And we can continue for an arbitrary number of rounds. If there is ever a round in
which we don’t find any new strategies that will not be played by rational players com-
monly knowing the rationality of the others, we would never again “eliminate” a strategy.
Thus, since we start with a finite number of strategies, the process must eventually termi-
nate. We call the strategies that remain iteratively undominated or correlatedly rationalis-
able.
There is another related concept called rationalisable strategies that was introduced
by ? and ?. That concept is both a little more complicated to define and, in my view,
somewhat less well motivated so we won’t go into it here.
1.4. Solution Concepts II: Equilibrium. In some games the iterative deletion of
dominated strategies is reasonably powerful and may indeed let us say what will happen in
the game. In other games it says little or nothing.
A more widely used concept is that of Nash equilibrium or strategic equilibrium, first
defined by John Nash in the early 1950s. (See ??.) An equilibrium is a profile of mixed
strategies, one for each player, with the property that if each player’s uncertainty about
what the others will do is represented by the profile of mixed strategies then his mixed
strategy puts positive weight only on those pure strategies that give him his maximum
expected utility. We can state this in a little more detail using the notation developed
above.
D EFINITION 23. A strategic equilibrium (or Nash equilibrium) of a game (N, S, u) is
a profile of mixed strategies σ = (σ1 , σ2 , . . . , σN ) such that for each n = 1, 2, . . . , N for each
sn and tn in Sn if σn (sn ) > 0 then
un (σ1 , . . . , σn−1 , sn , σn+1 , . . . , σN ) ≥ un (σ1 , . . . , σn−1 ,tn , σn+1 , . . . , σN ).
Remember how we extended the definition of un from the pure strategies to the mixed
strategies at the beginning of this chapter.
We can relate this concept to that discussed in the previous section.
P ROPOSITION 11. Any strategic equilibrium profile consists of iteratively undomi-
nated strategies.
Let’s look back now to our two examples and calculate the equilibria. In both of these
examples there is a unique equilibrium. This is not generally the case.
In the prisoner’s dilemma the strategy of Don’t Confess is dominated. That is, Confess
is better whatever the other player is doing. Thus for each player Confess is the only
undominated strategy and hence the only iteratively undominated strategy. Thus by the
previous proposition (Confess,Confess) is the only equilibrium.
In matching pennies there are no dominated strategies. Examining the game we see
there can be no equilibrium in which either player’s assessment of the other’s choice is
a pure strategy. Let us suppose that suppose that player 1’s assessment of player 2 was
2. EXTENSIVE FORM GAMES 55
that player 2 would play H. Then player 1 would strictly prefer to play H. But then
the definition of equilibrium would say that player 2 should put positive weight in his
assessment of what player 1 would play only on H. And in this case player 2 would strictly
prefer to play T , contradicting our supposition that player 1 assessed him as playing H. In
fact we could have started this chain of argument by supposing that player 1’s assessment
of player 2 put weight strictly greater than a half on the fact that player 2 would play H.
We could make a similar argument starting with the supposition that player 1’s assessment
of player 2 put weight strictly less than a half on the fact that player 2 would play H.
Thus player 1’s assessment of player 2’s choices must be ( 12 , 12 ). And similarly player 2’s
assessment of player 1’s choices must also be ( 12 , 12 ).
E XERCISE 39. Consider the variant of matching pennies given in Figure 3. Calculate
the equilibrium for this game.
1\2 H T
H 1, −1 −2, 3
T −1, 1 1, −1
Figure 3
4, 1 1, 0 0, 0 0, 1
@ @
@ @
L@ R L@ R
@qs q q q q q q q q q q q q q q q q q @
q q sq
@ 2 @ 2, 2
@ @
@ @
@ U@ D
@ @s
@ 1
@
@
T@ B
@s
1
Figure 4
56 6. GAME THEORY
The game starts at the bottom at Player 1’s first decision node. The first node of the
game is called the initial node or the root. Player 1 chooses whether to play T or B. If he
chooses B he moves again and chooses between U and D. If he chooses B and then D the
game ends. If he chooses either T or B and then U then player 2 gets a move and chooses
either L or R. Player 2 might be at either of two nodes when she chooses. The dotted line
between those nodes indicates that the are in the same information set and that Player 2
does not observe which of the nodes she is at when she moves. Traditionally information
sets were indicated by enclosing the nodes of the information set in a dashed oval. The
manner I have indicated is a newer notation and might have been introduced because it’s a
bit easier to generate on the computer, and looks a bit neater. (Anyway that’s why I do it
that way.)
The payoffs given at the terminal nodes give the expected payoffs (first for Player 1
then for Player 2) that generate the von Neumann-Morgenstern utility function that repre-
sents the players’ preferences over lotteries over the various outcomes or terminal nodes.
There are two further features that are not illustrated in this example. We often want
to include in our model some extrinsic uncertainty, that is some random event not under
the control of the players. We indicate this by allowing nodes to be owned by an artifi-
cial player that we call “Nature” and sometimes index as Player 0. Nature’s moves are
not labelled in the same way as the moves of the strategic players. Rather we associate
probabilities to each of nature’s moves. This is shown in the game of Figure 5 in which the
initial node is a move of nature. I shall indicate nodes where Nature moves by open circles
and nodes where real players move by filled circles.
−2, 2 2, −2 2, −2 −2, 2
@ @
S@ D S@ D
@qs q q q q q q q q q q q q q @
q q qs
1, −1 −1, 1
1 D
@ D
OUT@ IN In D Out
@ D
@ s Ds
2@ 1
@
@
1@ 1
2 2
@c
Nature
Figure 5
Like the Prisoner’s Dilemma or Matching Pennies there is a bit of a story to go with
this game. It’s some kind of parlour game. The game has two players. We first assign
a high card and a low card to players 1 and 2, each being equally likely to get the high
card, and each seeing the card he gets. The player receiving the low card then has the
option of either continuing the game (by playing “In”) or finishing the game (by playing
“Out”), in which case he pays $1 to the other player. If the player who received the low
card continues the game then Player 1, moves again and decides whether to keep the card
he has or to swap it with Player 2’s. However Player 1 does not observe which card he has,
or what he did in his previous move, or even if he has moved previously. This might seem
a bit strange. One somewhat natural interpretation is that Player 1 is not a single person,
2. EXTENSIVE FORM GAMES 57
but rather a team consisting of two people. In any case it is normal in defining extensive
form games to allow such circumstances.
Games such as this are not, however, as well behaved as games in which such things do
not happen. (We’ll discuss this in a little more detail below.) If a player always remembers
everything he knew and everything he did in the past we say that the player has perfect
recall. If each player has perfect recall then we say that the extensive form game has
perfect recall. The game of Figure 4 has perfect recall while the game of Figure 5 does not
have prefect recall. In particular, Player 1 does not have perfect recall.
The extensive form given provides one way of modelling or viewing the strategic
interaction. A somewhat more abstract and less detailed vision is provided by thinking of
the players as formulating plans or strategies. One might argue (correctly, in my view) that
since the player can when formulating his plan anticipate any contingencies that he might
face nothing of essence is lost in doing this. We shall call such a plan a strategy. In the
game of Figure 5 Player 2 has only one information set at which she moves so her plan
will simply say what she should do at that information set. Thus Player 2’s strategy set is
S2 = {IN, OUT}. Player 1 on the other hand has two information sets at which he might
move. Thus his plan must say what to do at each of his information sets. Let us list first
what he will do at his first (singleton) information set and second what he will do at his
second information set. His strategy set is S1 = {(In, S), (In, D), (Out, S), (Out, D)}.
Now, a strategy profile such as ((In, S), IN) defines for us a lottery over the terminal
nodes, and hence over profiles of payoffs. In this case it is (−2, 2) with probability a half
and (2, −2) with probability a half. For the strategy profile ((In, S), OUT) it would be
(1, −1) with probability a half and (2, −2) with probability a half.
We can then calculate the expected payoff profile to each of the lotteries associated
with strategy profiles (for the two given above this would be (0, 0 and (1 12 , −1 12 )) and thus
we have specified a normal form game. For this example the associated normal form game
is given in Figure 5a.
1\2 IN OUT
(In, S) 0, 0 1 12 , −1 12
(In, D) 0, 0 − 12 , 12
(Out, S) −1 12 , 1 12 0, 0
1 1
(Out, D) 2,−2 0, 0
Figure 5a
2.1. Definition. We now define formally the notions we discussed informally above.
D EFINITION 24. An extensive form game consists of
(1) N = {1, 2, . . . , N} a set of players,
(2) X a finite set of nodes,
(3) p : X → X ∪{0} / a function giving the immediate predecessor of each node. There
is a single node x0 for which p(x) = 0. / This is the initial node. We let s(x) =
p−1 (x) = {y ∈ X | p(y) = x} be the immediate successors of node x. We can now
define the set of all predecessors of x to be those y’s for which y = p(p(p . . . (x)))
for some number of iterations of p and similarly the set of all successors of x.
We require that for any x the set of all predecessors of x be disjoint from the set
of all successors of x. (This is what we mean by the nodes forming a tree.) The
58 6. GAME THEORY
set of terminal nodes T is the set of nodes that have no successors, that is those x
for which s(x) = 0. / We call the nonterminal nodes the decision nodes.
(4) A a set of actions and α : X\{x0 } → A a function that for any noninitial node
gives the action taken at the preceding node that leads to that node. We require
that if x and x0 have the same predecessor and x 6= x0 then α(x) 6= α(x0 ). The set
of choices available at the node x is c(x) = {a ∈ A | a = α(x0 ) for some x0 ∈ s(x)}.
(5) H a collection of information sets, and H : X\T → H a function assigning each
decision node x to an information set H(x). We require that any two decision
nodes in the same information set have the same available choices. That is if
H(x) = H(x0 ) then c(x) = c(x0 ). We also require that any nodes in the same
information set be neither predecessors nor successors of each other. Sometimes
this requirement is not made part of the definition of a game but rather separated
and used to distinguish linear games from nonlinear ones. (The linear ones are
the ones that satisfy the requirement.)
(6) n : H → {0, 1, . . . , N} a function assigning each information set to the player
who moves at that information set or to Nature (Player 0). The collection of
Player n’s information sets is denoted Hn = {H ∈ H | n(H) = n}. We assume
that each information set in H0 is a singleton, that is that it contains only a single
node.
(7) ρ : H0 × A → [0, 1] a function that gives the probability of each of Nature’s
choices at the nodes at which Nature moves.
(8) u = (u1 , u2 , . . . , uN ) a collection of payoff functions un : T → R assigning an
expected utility to each terminal node for each player n.
D EFINITION 25. Given an extensive form game we say that Player n in that game
has perfect recall if whenever H(x) = H(x0 ) ∈ Hn with x00 a predecessor of x and with
H(x00 ) ∈ Hn and a00 the action at x00 on the path to x then there is x000 ∈ H(x00 ) a predecessor
to x0 with action a00 the action at x000 on the path to x0 . If each player in N has prefect recall
we say the game has perfect recall.
You should go back to the games of Figures 4 and 5 and see how this definition leads
to the conclusion that the first is a game with perfect recall and the second is not. To
understand the definition a little better let’s look a little more closely at what it is saying.
Since H(x) = H(x0 ) it means that the player observes the same situation at x and x0 . Thus if
he has perfect recall he should have the same experience at x and x0 . Part of his experience
at x was that he had been at the information set H(x00 ) and made the choice a00 . The
definition is requiring that he also have had this experience at x0 .
2.2. The Associated Normal Form. Just as in the previous section we formally de-
fined the details of a game that we had earlier discussed informally here we shall formally
define the process of associating a normal form game to a given extensive form game.
Recall that a normal form game has three components: a set of players, a strategy set
for each player, and a utility function for each player giving that player’s utility for each
profile of strategies. The easiest part is defining the player set. It is the same as the player
set of the given extensive form game. As I said above, a strategy for a player is a rule that
tells the player what to do at each of his information sets.
D EFINITION 26. Given an extensive form game a strategy of Player n is a function
sn : Hn → A with sn (H) ∈ c(H) for all H in Hn .
So, we have now defined the second component of a normal form game. Now, a
strategy profile specifies an action at each move by one of the “real” players and so defines
for us a lottery over the terminal nodes. (It defines a lottery rather than simply a terminal
node because we allow random moves by nature in our description of an extensive form
game. In a game without moves by Nature a strategy profile would define a single terminal
node.)
3. EXISTENCE OF EQUILIBRIUM 59
We then associate an expected payoff profile with the profile of strategies by taking
the expected payoff to the terminal node under this lottery. It might be a good idea to go
back and look again at what we did in defining the normal form game associated with the
extensive form game given in Figure 5.
2.3. Solution Concepts. Still to come.
3. Existence of Equilibrium
We shall examine a little informally in this section the question of the existence of
equilibrium. Lets look first in some detail at an example. I shall include afterwards a
discussion of the general result and a sketch of the proof. Remember, however, that we
didn’t do this in class and you are not required to know it for this course.
Let us consider an example (Figure 6). Player 1 chooses the row and player 2 (simul-
taneously) chooses the column. The resulting payoffs are indicated in the appropriate box
of the matrix, with player 1’s payoff appearing first.
L R
T 2, 0 0, 1
B 0, 1 1, 0
Figure 6
y y y
16 16 16
1 1
3 3
- - -
1 1 x 1 x 1 1 x
2 2
for example, the game of Figure 7. There are three equilibrium outcomes: (8,5), (7,6) and
(6,3) (for the latter, the probability of T must lie between .5 and .6).
L C R
T 8, 5 0, 0 6, 3
B 0, 0 7, 6 6, 3
Figure 7
In Figure 6 we see that while there is no pure strategy equilibrium there is however
a mixed strategy equilibrium. The main result of non-cooperative game theory states that
this is true quite generally.
T HEOREM 6 (Nash 1950, 1951). The mixed extension of every finite game has at least
one strategic equilibrium.
(A game is finite if the player set as well as the set of strategies available to each player
is finite. Remember too that this proof is not required for this course.)
S KETCH OF P ROOF. The proof may be sketched as follows. (It is a multi-dimensional
version of Figure 6c.) Consider the set-valued mapping (or correspondence) that maps each
strategy profile, x, to all strategy profiles in which each player’s component strategy is a
best response to x (that is, maximises the player’s payoff given that the others are adopting
their components of x). If a strategy profile is contained in the set to which it is mapped
(is a fixed point) then it is an equilibrium. This is so because a strategic equilibrium is, in
effect, defined as a profile that is a best response to itself.
Thus the proof of existence of equilibrium amounts to a demonstration that the “best
response correspondence” has a fixed point. The fixed-point theorem of ? asserts the
existence of a fixed point for every correspondence from a convex and compact subset of
Euclidean space into itself, provided two conditions hold. One, the image of every point
must be convex. And two, the graph of the correspondence (the set of pairs (x, y) where y
is in the image of x) must be closed.
Now, in the mixed extension of a finite game, the strategy set of each player consists of
all vectors (with as many components as there are pure strategies) of non-negative numbers
that sum to 1; that is, it is a simplex. Thus the set of all strategy profiles is a product of
simplices. In particular, it is a convex and compact subset of Euclidean space.
Given a particular choice of strategies by the other players, a player’s best responses
consist of all (mixed) strategies that put positive weight only on those pure strategies that
yield the highest expected payoff among all the pure strategies. Thus the set of best re-
sponses is a subsimplex. In particular, it is convex.
Finally, note that the conditions that must be met for a given strategy to be a best
response to a given profile are all weak polynomial inequalities, so the graph of the best
response correspondence is closed.
Thus all the conditions of Kakutani’s theorem hold, and this completes the proof of
Nash’s theorem.
CHAPTER 7
If we do not assume that the demand functions are single valued the we need a slightly
more general form of the definition.
61
62 7. GENERAL EQUILIBRIUM THEORY
x21
6
@
@
@
@
@
@
@
@
@
@
@
@
@
@ p
ω21 @r
ω1 @
@
@
@ -
01 ω11 x11
Figure 1
x22
6
@
@
p
@
@
ω22 @r
ω2 @
@
@
@
@
@
@
@
@
@
@ -
02 ω12 x12
Figure 2
if viewed from 02 looking down. Notice that while all the feasible allocations are within
the “box” part of each consumer’s budget set goes outside the “box.” One of the central
ideas of general equilibrium theory is that the decision making can be decentralised by the
price mechanism. Thus neither consumer is required to take into account when making
their choices what is globally feasible for the economy. Thus we really do want to draw
the diagrams as I have and not leave out the parts “outside the box.”
We can represent preferences in the usual manner by indifference curves. I shall not
again draw separate pictures for consumers 1 and 2, but rather go straight to drawing them
in the Edgeworth box, as in Figure 4.
Let us look at the definition of a Walrasian equilibrium. If some allocation feasible
x (6= ω) is to be an equilibrium allocation then it must be in the budget sets of both con-
sumers. (Such an allocation is shown in Figure 5.) Thus the boundary of the budget sets
must be the line through x and ω (and the equilibrium price vector will be perpendicular to
this line). Also x must be, for each consumer, at least as good as any other bundle in their
budget set. Now any feasible allocation y that makes Consumer 1 better off than he is at
allocation x must not be in Consumer 1’s budget set. (Otherwise he would have chosen it.)
Thus the allocation must be strictly above the budget line through ω and x. But then there
are points in Consumer 2’s budget set which give her strictly more of both goods than she
gets in the allocation y. So, since her preferences are strictly increasing there is a point in
her budget set that she strictly prefers to what she gets in the allocation y. But since the
allocation x is a competitive equilibrium with the given budget sets then what she gets in
the allocation x must be at least as good any other point in her budget set, and thus strictly
better than what she gets at y.
What have we shown. We have shown that if x is a competitive allocation from the
endowments ω then any feasible allocation that makes Consumer 1 better off makes Con-
sumer 2 worse off. We can similarly show that any feasible allocation that makes Consumer
2 better off makes Consumer 1 worse off. In other words x is Pareto optimal.
We shall now generalise this intuition into the relationship between equilibrium and
efficiency to the more general model. We first define more formally our idea of efficiency.
64 7. GENERAL EQUILIBRIUM THEORY
x21
6
@
@
x12 @ ω12 02
@
@
@
@
@
@
@
@
@
@
@ p
ω21 @r ω22
ω@
@
@
@ -
01 ω11 @ x11
x22
?
Figure 3
x21
6
x12 02
-
01 x11
x22
?
Figure 4
1. THE BASIC MODEL OF A COMPETITIVE ECONOMY 65
x21
6
@
@
x12 @ 02
@
@
@
@
@
@ r
@r y
x@
@
@
@ p
@r ω
@
@
@
@ -
01 @ x11
?x22
Figure 5
D EFINITION 31. A feasible allocation x is Pareto optimal (or Pareto efficient) if there
is no other feasible allocation y such that yn %n xn for all n in N and yn0 n0 xn0 for at least
one n0 in N.
In words we say that a feasible allocation is Pareto optimal if there is no other feasible
allocation that makes at least one consumer strictly better off without making any consumer
worse off. The following result generalises our observation about the Edgeworth box.
T HEOREM 7 (The First Fundamental Theorem of Welfare Economics). Sup-
pose that for each n the preferences %n are strictly increasing and that (p, x) is a Wal-
rasian equilibrium. Then x is Pareto optimal.
In fact, we can say something in the other direction as well. It clearly is not the case
that any Pareto optimal allocation is a Walrasian equilibrium. A Pareto optimal alloca-
tion may well redistribute the goods, giving more to some consumers and less to others.
However, if we are permitted to make such transfers then any Pareto optimal allocation is
a Walrasian equilibrium from some redistributed initial endowment. Suppose that in the
Edgeworth box there is some point such as x in Figure 6 that is Pareto optimal. Since x is
Pareto optimal Consumer 2’s indifference curve through x must lie everywhere below Con-
sumer 1’s indifference curve through x. Thus the indifference curves must be tangent to
each other. Let’s draw the common tangent. Now, if we redistribute the initial endowments
to some point ω 0 on this tangent line then with the new endowments the allocation x is a
competitive equilibrium. This result is true with some generality, as the following result
states. However we do require stronger assumptions that were required for the first welfare
theorem. We shall look below at a couple of examples to illustrate why these stronger
assumptions are needed.
T HEOREM 8 (The Second Fundamental Theorem of Welfare Economics). Suppose
that for each n the preferences %n are strictly increasing, convex, and continuous and that
66 7. GENERAL EQUILIBRIUM THEORY
x21
6
@
@
x12 @ 02
@
@
@
@
@ p
@
@r
x@
@
@
@ rω
@r 0
@ω
@
@
@ -
01 @ x11
?x22
Figure 6
x is Pareto optimal with x > 0 (that is x`n > 0 for each ` and each n. Then there is some
feasible reallocation ω 0 of the endowments (that is ∑n∈N ωn0 = ∑n∈N ωn ) and a price vector
p such that (p, x) is a Walrasian equilibrium of the economy with preferences %n and initial
endowments ω 0 .