Professional Documents
Culture Documents
J.M. Ward
MT1174, 2790174
2011
Undergraduate study in
Economics, Management,
Finance and the Social Sciences
This subject guide is for a 100 course offered as part of the University of London
International Programmes in Economics, Management, Finance and the Social Sciences.
This is equivalent to Level 4 within the Framework for Higher Education Qualifications in
England, Wales and Northern Ireland (FHEQ).
For more information about the University of London International Programmes
undergraduate study in Economics, Management, Finance and the Social Sciences, see:
www.londoninternational.ac.uk
This guide was prepared for the University of London International Programmes by:
J.M. Ward, Department of Mathematics, London School of Economics and Political Science.
This is one of a series of subject guides published by the University. We regret that due to
pressure of work the author is unable to enter into any correspondence relating to, or arising
from, the guide. If you have any comments on this subject guide, favourable or unfavourable,
please use the form at the back of this guide.
Contents
Contents
Preface
1 Introduction
1.1
This subject . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2
Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3
1.3.1
The VLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3.2
1.4
1.5
Examination advice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.6
2 Functions
2.1
2.1.1
11
2.1.2
Combinations of functions . . . . . . . . . . . . . . . . . . . . . .
15
2.1.3
Inverse functions . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
2.1.4
Identities
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
24
2.1.5
Applications of functions . . . . . . . . . . . . . . . . . . . . . . .
26
Conic sections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
30
2.2.1
Parabolae . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
30
2.2.2
Circles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
32
2.2.3
Ellipses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
2.2.4
Hyperbolae . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
34
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
Solutions to activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
36
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
45
Solutions to exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
46
2.2
3 Differentiation
3.1
53
53
Contents
3.2
55
3.2.1
Standard derivatives . . . . . . . . . . . . . . . . . . . . . . . . .
56
3.2.2
57
3.2.3
Higher-order derivatives . . . . . . . . . . . . . . . . . . . . . . .
65
Using derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
66
3.3.1
66
3.3.2
68
3.3.3
Applications of derivatives . . . . . . . . . . . . . . . . . . . . . .
72
3.3.4
Existence of derivatives . . . . . . . . . . . . . . . . . . . . . . . .
74
78
3.4.1
Maclaurin series . . . . . . . . . . . . . . . . . . . . . . . . . . . .
78
3.4.2
Taylor series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
85
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
87
Solutions to activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
88
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
96
Solutions to exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
97
3.3
3.4
4 One-variable optimisation
4.1
103
4.2
104
4.2.1
104
4.2.2
Stationary points . . . . . . . . . . . . . . . . . . . . . . . . . . .
106
4.2.3
109
110
4.3.1
110
4.3.2
111
4.3.3
Points of inflection . . . . . . . . . . . . . . . . . . . . . . . . . .
113
Curve sketching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
114
4.4.1
115
4.4.2
119
4.4.3
121
Optimisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
123
4.5.1
Constrained optimisation . . . . . . . . . . . . . . . . . . . . . . .
125
4.5.2
126
4.5.3
Applications of optimisation . . . . . . . . . . . . . . . . . . . . .
127
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
130
4.3
4.4
4.5
ii
103
Contents
Solutions to activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
131
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
136
Solutions to exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
138
5 Integration
145
5.1
145
5.2
147
5.2.1
Standard integrals . . . . . . . . . . . . . . . . . . . . . . . . . .
147
5.2.2
149
5.2.3
Integration by substitution . . . . . . . . . . . . . . . . . . . . . .
150
5.2.4
Integration by parts
. . . . . . . . . . . . . . . . . . . . . . . . .
158
5.2.5
162
5.2.6
167
5.3
. . . . . . . . . . . . . . . . . . . . . . . . .
170
5.3.1
170
5.3.2
178
Applications of integrals . . . . . . . . . . . . . . . . . . . . . . . . . . .
182
5.4.1
182
5.4.2
183
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
186
Solutions to activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
187
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
195
Solutions to exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
196
5.4
201
6.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
201
6.2
Surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
202
6.2.1
Planes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
203
6.2.2
204
Partial differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
210
6.3.1
211
6.3.2
212
6.3.3
214
6.3.4
220
6.3.5
224
226
6.3
6.4
iii
Contents
6.4.1
Tangent planes . . . . . . . . . . . . . . . . . . . . . . . . . . . .
226
6.4.2
Gradient vectors . . . . . . . . . . . . . . . . . . . . . . . . . . .
230
6.4.3
Directional derivatives . . . . . . . . . . . . . . . . . . . . . . . .
232
6.4.4
234
6.4.5
Taylor series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
238
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
241
Solutions to activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
242
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
253
Solutions to exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
255
7 Two-variable optimisation
7.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
261
7.2
Unconstrained optimisation . . . . . . . . . . . . . . . . . . . . . . . . .
261
7.2.1
Stationary points . . . . . . . . . . . . . . . . . . . . . . . . . . .
262
7.2.2
264
7.2.3
269
7.2.4
Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
272
Constrained optimisation . . . . . . . . . . . . . . . . . . . . . . . . . . .
275
7.3.1
277
7.3.2
279
7.3.3
282
7.3.4
Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
284
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
289
Solutions to activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
290
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
294
Solutions to exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
296
7.3
8 Differential equations
303
8.1
303
8.2
First-order ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
306
8.2.1
307
8.2.2
308
8.2.3
310
Second-order ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
312
8.3.1
312
8.3.2
314
8.3
iv
261
Contents
8.4
318
8.4.1
319
8.4.2
321
Applications of ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
323
8.5.1
323
8.5.2
324
8.5.3
325
8.5.4
Market trends . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
327
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
327
Solutions to activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
328
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
334
Solutions to exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
334
8.5
339
341
Contents
vi
Preface
This subject guide is not a course text. It sets out a logical sequence in which to study
the topics in this subject. Where coverage in the main texts is weak, it provides some
additional background material.
I am grateful to Mark Baltovic for his careful reading of a draft of this guide and for his
many helpful comments.
Preface
Chapter 1
Introduction
In this very brief introduction, we aim to give you an idea of the nature of this subject
and to advise you on how best to approach it. We give general information about the
contents and use of this subject guide, and on recommended reading and how to use the
textbooks.
1.1
This subject
Calculus, as studied in this Level 1 course is primarily the study of derivatives and
integrals of functions of one variable and partial derivatives of functions of several
variables.
Our approach here is not just to help you acquire proficiency in techniques and
methods, but also to help you understand some of the theoretical ideas behind these.
For example, after completing this course, you will hopefully understand why the
derivatives of a function allow you to determine where a function of one variable is
optimised. In addition to this, we try to indicate the uses of some of the methods in
applications to economics, finance and related disciplines.
Aims of the course
The broad aims of this course are as follows:
to enable students to acquire skills in the methods of calculus (including
multivariate calculus), as required for their use in further mathematics subjects and
economics-based subjects;
to prepare students for further courses in mathematics and/or related disciplines.
As emphasised above, however, we do also want you to understand why certain
methods work: this is one of the skills that you should acquire. Indeed, the
examination will not simply test your ability to perform routine calculations, it will also
probe your knowledge and understanding of the principles that underlie the material.
Learning outcomes
We now state the broad learning outcomes of this course, as a whole. At the end of this
course and having completed the essential reading and activities, you should be able to:
1. Introduction
use the concepts, terminology, methods and conventions covered in the course to
solve mathematical problems in this subject;
solve unseen mathematical problems involving understanding of these concepts and
application of these methods;
see how calculus can be used to solve problems in economics and related subjects;
demonstrate knowledge and understanding of the underlying principles of calculus.
There are a couple of things that we should stress at this point. Firstly, note the
intention that you will be able to solve unseen problems. This means simply that you
will be expected to be able to use your knowledge and understanding of the material to
solve problems that are not completely standard. This is not something you should
worry unduly about: all topics in mathematics expect this, and you will never be
expected to do anything that cannot be done using the material of this course.
Secondly, we expect you to be able to demonstrate knowledge and understanding and
you might well wonder how you would demonstrate this in the examination. Well, it is
precisely by being able to grapple successfully with unseen, non-routine, questions that
you will indicate that you have a proper understanding of the topic.
Topics covered
Descriptions of the topics to be covered appear in the relevant chapters. However, it is
useful to give a brief overview at this stage.
We start by revising some of the basic ideas that are needed for the study of this course
and, in particular, the idea of a function of one variable. We then introduce derivatives
of such functions and how to find them using the techniques of differentiation. This
enables us to see how such functions are behaving and, in particular, enables us to see
where such functions are optimised. We then introduce integrals of such functions and
how to find them using the techniques of integration. In particular, this will enable us
to see how to relate functions to areas. We then introduce functions of several variables
and develop techniques for finding their partial derivatives. In particular, we will see
how we can use these ideas to see where these slightly more complicated functions are
optimised. Lastly, we introduce the idea of a differential equation and examine methods
for solving them.
Throughout this subject guide, the emphasis will be on the theory as much as on the
methods. That is to say, our aim in this subject is not only to provide you with some
useful techniques and methods from calculus, but to also enable you to understand why
these techniques work.
1.2
Reading
There are many books that would be useful for this subject. We recommend two in
particular, and a couple of others for additional, further reading. (You should note,
however, that there are very many books suitable for this course. Indeed, almost any
text on first-year university calculus will cover the majority of the material.)
1.2. Reading
Textbook reading is essential as textbooks will provide you with more in-depth
explanations than you will find in this subject guide, and they will also provide many
more examples to study and exercises to work through. The books listed are the ones
we have referred to in this guide.
Essential reading
Detailed reading references in this subject guide refer to the editions of the set
textbooks listed below. New editions of one or more of these textbooks may have been
published by the time you study this course. You can use a more recent edition of any
of the books; use the detailed chapter and section headings and the index to identify
relevant readings. Also check the virtual learning environment (VLE) regularly for
updated guidance on readings.
Binmore, K. and J. Davies Calculus: Concepts and Methods. (Cambridge:
Cambridge University Press, 2002, second revised edition) [ISBN 9780521775410].
Anthony, M. and N. Biggs Mathematics for economics and finance: methods and
modelling. (Cambridge: Cambridge University Press, 1996) [ISBN 9780521559133].
By and large we will be following Binmore and Davies but, sometimes, we will follow
the simpler treatment found in Anthony and Biggs. Both texts, when used wisely, will
provide you with a large number of examples for you to study and exercises for you to
attempt. It is recommended that you purchase both of these. Another thing you might
like to bear in mind is that some of the material from Binmore and Davies that we omit
here will be useful if you go on to study 176 Further Calculus.
Further reading
Once you have covered the essential reading you are then free to read around the
subject area in any text, paper or online resource. You will need to support your
learning by reading as widely as possible and by thinking about how these principles
apply in the real world. To help you read extensively, you have free access to the VLE
and University of London Online Library (see Section 1.3.2). However, two useful
textbooks that we have referred to in this subject guide are the following.
Simon, C.P. and L. Blume Mathematics for economists. (New York and London:
W.W. Norton and Company, 1994) [ISBN 9780393957334].
Adams, R.A. and C. Essex Calculus: a complete course. (Toronto: Pearson, 2010,
seventh edition) [ISBN 9780321549280].
Simon and Blume is a useful supplementary text with an emphasis on applications of
the material to economics; whereas Adams and Essex (which is merely an example from
a large range of very similar calculus textbooks) is a detailed calculus textbook which
contains much material which is beyond the scope of this course. Both of these texts are
suitable as sources of additional explanation, examples and exercises, but they are
probably not worth purchasing.
1. Introduction
1.3
In addition to the subject guide and the essential reading, it is crucial that you take
advantage of the study resources that are available online for this course, including the
virtual learning environment (VLE) and the Online Library.
You can access the VLE, the Online Library and your University of London email
account via the Student Portal at
http://my.londoninternational.ac.uk
You should receive your login details in your study pack. If you have not, or you have
forgotten your login details, please email uolia.support@london.ac.uk quoting your
student number.
1.3.1
The VLE
The VLE, which complements this subject guide, has been designed to enhance your
learning experience, providing additional support and a sense of community. It forms an
important part of your study experience with the University of London and you should
access it regularly.
The VLE provides a range of resources for EMFSS courses:
Self-testing activities: Doing these allows you to test your own understanding of
subject material.
Electronic study materials: The printed materials that you receive from the
University of London are available to download, including updated reading lists
and references.
Past examination papers and Examiners commentaries: These provide advice on
how each examination question might best be answered.
A student discussion forum: This is an open space for you to discuss interests and
experiences, seek support from your peers, work collaboratively to solve problems
and discuss subject material.
Videos: There are recorded academic introductions to the subject, interviews and
debates and, for some courses, audio-visual tutorials and conclusions.
Recorded lectures: For some courses, where appropriate, the sessions from previous
years Study Weekends have been recorded and made available.
Study skills: Expert advice on preparing for examinations and developing your
digital literacy skills.
Feedback forms.
Some of these resources are available for certain courses only, but we are expanding our
provision all the time and you should check the VLE regularly for updates.
1.3.2
The Online Library contains a huge array of journal articles and other resources to help
you read widely and extensively.
To access the majority of resources via the Online Library at
http://tinyurl.com/ollathens
you will either need to use your University of London Student Portal login details, or
you will be required to register and use an Athens login.
The easiest way to locate relevant content and journal articles in the Online Library is
to use the Summon search engine.
If you are having trouble finding an article listed in a reading list, try removing any
punctuation from the title, such as single quotation marks, question marks and colons.
For further advice, please see the online help pages at
www.external.shl.lon.ac.uk/summon/about.php
1.4
We have already mentioned that this guide is not a textbook. It is important that you
read textbooks in conjunction with the guide and that you try problems from the
textbooks. The exercises at the end of the main chapters of this subject guide are a very
useful resource and you should try them once you think you have mastered the material
from the chapter. You should really try these exercises before consulting the solutions,
as simply reading the solutions provided will not help you at all. Sometimes, the
solutions we provide will just be an overview of what is required, i.e. an indication of
how you should answer the questions, but in the examination, you must always show all
of your calculations. It is vital that you develop and enhance your problem-solving skills
and the only way to do this is to try lots of exercises.
1.5
Examination advice
Important: the information and advice given here are based on the examination
structure used at the time this guide was written. Please note that subject guides may
be used for several years. Because of this we strongly advise you to always check both
the current Regulations for relevant information about the examination, and the virtual
learning environment (VLE) where you should be advised of any forthcoming changes.
You should also carefully check the rubric/instructions on the paper you actually sit
and follow those instructions.
Remember, it is important to check the VLE for:
Up-to-date information on examination and assessment arrangements for this
course.
1. Introduction
Where available, past examination papers and Examiners commentaries for the
course which give advice on how each question might best be answered.
This course is assessed by a three hour unseen written examination. There are no
optional topics in this subject: you should study them all and this is reflected in the
structure of the examination paper. There are five questions (each worth 20 marks) and
all questions are compulsory. A sample examination paper may be found in an appendix
to this subject guide.
Please do not think that the questions in your real examination will necessarily be very
similar to the exercises in this subject guide or those in the sample examination paper.
The examination is designed to test you. You will get examination questions unlike the
questions in this subject guide. The whole point of examining is to see whether you can
apply your knowledge in familiar and unfamiliar settings. The Examiners (nice people
though they are) have an obligation to surprise you! For this reason, it is important
that you try as many examples as possible, from the subject guide and from the
textbooks. This is not so that you can cover any possible type of question the
Examiners can think of! It is so that you get used to confronting unfamiliar questions,
grappling with them, and finally coming up with the solution.
Do not panic if you cannot completely solve an examination question. There are many
marks to be awarded for using the correct approach or method.
1.6
You will not be permitted to use calculators of any type in the examination. This is not
something that you should worry about: the Examiners are interested in assessing that
you understand the key concepts, ideas, methods and techniques, and will set questions
which do not require the use of a calculator.
Chapter 2
Functions
Essential reading
(For full publication details, see Chapter 1.)
Binmore and Davies (2002) Sections 2.12.6, 2.14 and part of 7.1.2.
Anthony and Biggs (1996) Chapters 1, 2 and parts of 7.
Further reading
Simon and Blume (1994) Sections 2.1, part of 2.2, 5.1, 5.3, and 5.4, Appendices
A1.1, parts of A1.2 and A2.16.
Adams and Essex (2010) Preliminaries parts of P.1P.7, parts of Sections 3.13.3
and 3.5.
Aims and objectives
The objectives of this chapter are as follows.
To introduce functions in general and the elementary functions and their graphs in
particular.
To see how to find combinations of functions and the inverse of a function (if it
exists).
To see how functions can be used in economics-based subjects.
To introduce conic sections and see how to draw them.
Specific learning outcomes can be found near the end of this chapter.
2.1
NOTE: Before you start this chapter, you should make sure that you have
covered the background material in Chapter 1 of 173 Algebra.
Given two sets A and B, a function, f , from A to B is a rule which takes each element
of A and gives us a unique (or exactly one) element of B. We often express the fact that
the function f takes elements from A and gives us elements of B by writing
2. Functions
f : A B. In such cases, we call the sets A and B the domain and co-domain of the
function respectively.
and
[a, b] = {x R | a x b},
which only differ according to whether the end-points, i.e. the elements a and b, are in
the set. Of course, we can also have finite intervals where one end-point, but not the
other, is in the set and we denote these by
(a, b] = {x R | a < x b}
and
There are also infinite intervals which will have one finite end-point, say a R, and we
denote these by
(, a] = {x R | x a}
and
[a, ) = {x R | a x},
and
if it isnt. Of course, as we can see by looking at the sets involved when writing these
infinite intervals, the symbols and are not end-points as they are not real
numbers, they are just a notational convenience.
Putting these ideas together, we find that another way of visualising a function
f : A B is its graph which is the set of all points (x, y) R2 such that y = f (x).
Indeed, as a function f : A B must give a unique output y B for each x A, its
graph could look like the one illustrated in Figure 2.1(a) but not like the one in
Figure 2.1(b).
10
y
c
x
y = f (x)
a x
(a)
(b)
Figure 2.1: In (a) we have the graph of a function f : [0, a] [b, c] as each input, x [0, a],
gives a unique output y [b, c]. In (b), we do not have the graph of a function from [0, a]
to [b, c] as each input, x [0, a], gives two outputs y [b, c].
2.1.1
We now revise some elementary functions that will be useful in this course and look at
their graphs.
Power functions
A power function is a function f : R R given by
f (x) = xn ,
where n N. Depending on the value of n, the graphs of these functions look very much
like the ones illustrated in Figure 2.2. In addition to this, we also include the power
function f (x) = x0 = 1 as the function whose graph is a horizontal straight line that
goes through the point (0, 1).
y
y = xn
y=x
O
y=
O
(a) n = 1
(b) n is even
xn
(c) n 3 is odd
Figure 2.2: (a) When n = 1, the graph of the function f (x) = xn is just the straight line
y = x. (b) The graph of the function f (x) = xn when n is even. (c) The graph of the
function f (x) = xn when n 3 is odd. Of course, in (b) and (c) we are only looking at
the shape of the graph for different values of n without any regard to the scales on the
axes.
In particular, if we let x mean that x is positive and getting arbitrarily large (i.e.
we are considering what happens as x takes values far to the right on the x-axis) and
11
2. Functions
x means that x is negative but getting arbitrarily large in magnitude (i.e. we are
considering what happens as x takes values far to the left on the x-axis), we see that:
If n is even, xn as x and as x .
If n is odd, xn as x whereas xn as x .
This insight will be important in Section 4.4 when we consider how to sketch the graphs
of more complicated functions.
Exponential functions
An exponential function with base a is a function f : R (0, ) given by
f (x) = ax ,
where a = 1 is a positive real number. Depending on the value of a, the graphs of these
functions look very much like the ones illustrated in Figure 2.3.
y
y = ax
y = ax
(b) a > 1
Figure 2.3: (a) The graph of the function f (x) = ax when 0 < a < 1. (b) The graph of the
function f (x) = ax when a > 1. Of course, in both of these graphs we are only looking
at the shape of the graph for different values of a without any regard to the scales on the
axes.
Indeed, looking at these graphs we see that
If 0 < a < 1, ax 0 as x and ax as x .
If a > 1, ax as x and ax 0 as x .
And, as a0 = 1 for any positive a = 1, the graphs of these functions always go through
the point (0, 1).
Trigonometric functions
The two elementary trigonometric functions that we will be using are the sine and
cosine functions but, unlike what you may have seen before, we will always be using
them for angles that are given in radians instead of degrees. As you may know, we can
easily convert between these two units by using the formula
angle in radians =
12
2
angle in degrees,
360
and
s
nu
e
t
po
hy
adjacent
cos =
adjacent
.
hypotenuse
opposite
sin =
Figure 2.4: Defining the sine and cosine functions, sin and cos , for 0 /2.
In particular, by considering the two special triangles in Figure 2.5, we can see that the
values of these functions for some common angles (in radians) are
sin
cos
6
1
2
3
2
4
1
2
1
3
3
2
1
2
Activity 2.1 Recall that we also have the tangent function which, for 0 /2,
can be defined by using the right-angled triangle in Figure 2.4 to get
tan =
opposite
.
adjacent
Use the triangles in Figure 2.5 to find the values of tan when is /6, /4 and /3
radians. Incidentally, what are these three angles in degrees?
/4
4 1
/6
(a)
(b)
3 1
Figure 2.5: Finding sin and cos when (a) = /4 radians and (b) when = /6 or
= /3 radians.
At this point, well stop saying that an angle is in radians as, unless explicitly stated
otherwise, this will always be the case.
13
2. Functions
and
y = sin ,
which can be found as before. But, if we now have /2 2, we get the situation
illustrated in Figure 2.6(b), where we can find the magnitude of x and y using our
original triangle and their sign by considering where the point lies in the (x, y)-plane.
For instance, in Figure 2.6(b), the angle could be 5/4 and so the angle in the triangle
y
y
(x, y)
1
O
(x, y)
(a)
(b)
Figure 2.6: Finding sin and cos when 0 2 by considering a unit circle.
the x-axis is ). This gives x and y a magnitude of 1/ 2 and their signs would be
negative as x, y < 0 so we see that
sin
5
1
=
4
2
and
cos
5
1
= ,
4
2
and
cos( + 2) = cos ,
and their graphs are illustrated in Figure 2.7. In particular, we observe that
cos = sin( + 2 ), i.e. the graph of the cosine function is what we get when we shift the
sine function to the left by /2.
14
Figure 2.7: The graphs of the sine and cosine functions, sin (solid line) and cos (dashed
line), for 4.
2.1.2
Combinations of functions
The elementary functions we have seen can be combined in various ways to make more
complicated functions. Generally, this is straightforward and works in the way you
would expect, but sometimes there are slight complications and so we revise these
different types of combination here.
Linear combinations of functions
If we have two functions with the same domain and co-domain, say f : A B and
g : A B, we can define a new function which is a linear combination of these two
functions. For instance, if k and l are constants, we would have the new function
kf + lg : A B defined by
(kf + lg)(x) = kf (x) + lg(x),
for all x A. In particular, this gives us polynomials, i.e. functions pn : R R which
are a linear combination of power functions of the form
pn (x) = an xn + an1 xn1 + + a1 x + a0 ,
where the ai for 0 i n are real constants. Indeed, if an = 0, we say that this is a
polynomial of degree n.
Of course, you have seen polynomials before as, in Chapter 1 of 173 Algebra, you saw
how to solve polynomial equations of the form pn (x) = 0 where n = 1 (a linear
equation), n = 2 (a quadratic equation) and n = 3 (a cubic equation). The information
we get from solving these equations is useful when we come to draw the graphs of
polynomial functions as the next example shows.
15
2. Functions
When we draw graphs, we will often do this by doing a sketch. Indeed, for a sketch
of the simple functions given here, it suffices to indicate their shape (they are both
straight lines) and where they are relative to the x and y-axes (by indicating where
they intersect these axes). So, as we saw in Section 2.1.1, we should expect the graph
of g(x) to be a horizontal line that goes through the point (0, 5) as g(x) = 5 for all
x R whereas for f (x), we would expect a straight line that has an
x-intercept that occurs when f (x) = 0, i.e. when x = 2, and a
y-intercept that occurs when x = 0, i.e. when f (0) = 2.
This information allows us to obtain the sketch illustrated in Figure 2.8.
To find the point(s) at which these two graphs intersect, we are looking for the
value(s) of x that make f (x) = g(x), i.e. where 5 = x + 2. This gives x = 3 and we
know that the values of the functions here must satisfy f (3) = g(3) = 5 which gives
(3, 5) as the required point of intersection.1
y
5
y = g(x)
y = f (x)
2
2
Figure 2.8: The graphs of the functions f (x) = 5 and g(x) = x + 2. Notice that these
Of course, thinking about the graphs of these functions as the points, (x, y), satisfying the equations
y = f (x) and y = g(x), all we have done here is solve the equations y = 5 and y = x + 2 simultaneously.
16
x2 + x 2
and g(x) =
,
x1
at the point x = 1.
For f (x), the polynomials in the numerator and denominator of the quotient are
defined for all x R, but f itself is not defined at x = 1 because that would entail
division by zero. As such, f must be a function from {x R | x = 1} to R. Indeed, if
we are considering values of x close to one, i.e. x 1, we could say that
f (x)
1+1
2
=
,
x1
x1
As such, we see that f (x) has a vertical asymptote at the point x = 1 where it is
undefined. The graph of this function is illustrated in Figure 2.9(a) so that you can
see this asymptote and you will understand why its graph looks like this away from
the asymptote after you have covered the material in Section 2.2.4.
For g(x), the polynomials in the numerator and the denominator of the quotient are
defined for all x R, but g itself is not defined at x = 1 because, again, that would
17
2. Functions
(x + 2)(x 1)
x2 + x 2
=
,
x1
x1
y
y = f (x)
O
(a)
y = g(x)
(b)
Figure 2.9: The graphs of the functions f (x) and g(x) from Example 2.2. In (a), the
vertical asymptote at x = 1 is indicated by a dashed line. In (b), the point where the
function is undefined is indicated by .
We can also form quotients using trigonometric functions and, in particular, we can use
the triangle in Figure 2.4 to see that
tan =
opposite/hypotenuse
sin
opposite
=
=
,
adjacent
adjacent/hypotenuse
cos
sin
,
cos
(2.1)
which will be defined for R as long as cos = 0, i.e. as long as = (2n + 1) 2 for
n Z. At the points where it is undefined this function has vertical asymptotes and its
graph is sketched in Figure 2.10.
18
Figure 2.10: The graph of the tangent function, tan for 4. Note the vertical
We can also find the reciprocals of our three trigonometric functions and these are
defined as follows.
The secant function, sec =
1
which is defined when = (2n + 1) 2 for n Z.
cos
1
which is defined when = n for n Z.
sin
1
which is defined when = n for n Z.
tan
cos
as long as = n for n Z.
sin
Compositions of functions
If we have two functions, say f : A B and g : B C, then we can define the
composition g f : A C to be the function
(g f )(x) = g(f (x)),
and here we say that we are applying g after f . That is, thinking of this in terms of
black boxes we have
x A f f (x) B g g(f (x)) C,
i.e. we take an x A and apply f to get the output f (x) B which we then use as the
input for g yielding the final output g(f (x)) C which is the value of (g f )(x).
19
2. Functions
Indeed, observe that as (2x 1)2 = 4x2 4x + 1, these are certainly not the same
function.
Activity 2.5 Let f : R R and g : R R be the functions f (x) = x2 + 1 and
g(x) = 2x . What are the functions g f and f g?
In particular, we will also need to be able to identify compositions the other way when
we cover the chain rule in Section 3.2.2. For instance, it should be clear that the
function (x2 + 5)3 is the composition of the function x3 after the function x2 + 5.
Activity 2.6 Explain why the function (x2 + 5)3 is the composition of the function
x3 after the function x2 + 5.
2.1.3
Inverse functions
If A and B are sets and we have a function f : A B, we know that this means that
for every x A there is a unique y B such that y = f (x). Now, if we can define
another function g : B A, i.e. for every y B there is a unique x A such that
y = f (x) if and only if x = g(y),
then we call the function, g, the inverse of f and denote it by f 1 . In terms of black
boxes, this means that we have
x A f f (x) B,
for f and, if it exists, we have
y B f 1 f 1 (y) A,
for f 1 , or more usefully,
f (x) B f 1 x A.
20
In particular, this means that if the inverse, f 1 , of f exists, we see that the
composition f 1 after f gives us
x A f f (x) B f 1 x A,
and so (f 1 f )(x) = f 1 (f (x)) = x whereas the composition f after f 1 gives us
y B f 1 f 1 (y) A f y B,
and so (f f 1 )(y) = f (f 1 (y)) = y. That is, the inverse of a function (if it exists)
undoes what the function does and vice versa.
The question, then, is how can we tell whether an inverse function exists? And, if it
does exist, how can we find it? Well, given the function f : A B, the inverse will exist
if we are able to take y = f (x) and solve it to obtain a unique solution, x, in terms of y
for every y B. And, if we can do this, these solutions will tell us what the inverse
function is, i.e. they will allow us to identify the function, f 1 (y), by comparison with
x = f 1 (y). To make this clear, lets look at an example.
Example 2.4 Consider the function f : R R given by f (x) = x + 2. Explain why
this function has an inverse and find it.
Using the graph or common sense, we see that the function f (x) = x + 2 has an
inverse, since every y R where y = f (x) gives rise to a unique x R given by
x = y 2. As such, we can conclude that the inverse of this function exists and we
have x = f 1 (y) = y 2. Of course, we can now write this inverse as f 1 (x) = x 2
if we want it in terms of x.
Indeed, notice that, if we have the function f (x) and its inverse function f 1 (x), the
graph of f 1 is the reflection of the graph of f about the line y = x. This happens
because any point (x, y) on the curve y = f (x) becomes, under a reflection about the
line y = x, a point (y, x) on the curve x = f (y) which is the same as saying that
y = f 1 (x)!
Activity 2.7 Verify that the curve y = f 1 (x) is the reflection about the line y = x
of the curve y = f (x) using the function we saw in Example 2.4.
Of course, not every function has an inverse as the next example shows.
Example 2.5 Consider the function f : R R given by f (x) = x2 . Explain why
this function does not have an inverse.
If we take any y R where y = f (x) this gives us the equation y = x2 and, if we are
considering x R, this gives rise to a problem as far as the inverse of f is concerned
because:
If y < 0, we get no solution for x as we know that x R means that y = x2 0.
That is, we can find no inverse in this case since we cannot guarantee a unique
solution for x R from the equation y = x2 for all y R.
21
2. Functions
Of course, we can usually get around such problems if we are prepared to restrict the
domain and the co-domain of the function. But, in that case, we would be finding the
appropriate local inverses as opposed to its inverse (which, remember, doesnt exist!).
Activity 2.8 By considering the domains (, 0] and [0, ) and suitably
restricting the co-domain of the function in Example 2.5, find its local inverses.
Lets now look at the inverses of the elementary functions we considered in Section 2.1.1.
Power functions: root functions
If we have the power function f (x) = xn where x N and f : [0, ) [0, ) we can
see that the inverse is given by
x = f 1 (y) = y 1/n ,
and this is called a root function. Thus, we have
x = y 1/n
if and only if y = xn ,
have y 1/2 = y.
Activity 2.9 Draw the graph of the power function f : [0, ) [0, ) where
f (x) = x2 and its inverse.
This also works for f (x) = xn where f : R R if n is odd. But, if n is even, the
function f (x) = xn where f : R R does not have an inverse as we saw, for n = 2, in
Example 2.5.
Activity 2.10 Explain why we can find an inverse of the function f : R R where
f (x) = xn if n is odd. Why doesnt this work if n is even?
Exponential functions: logarithmic functions
If we have the exponential function f (x) = ax where f : R (0, ) and a = 1 is a
positive real number, the inverse is the function f 1 : (0, ) R given by
x = f 1 (y) = loga y,
which is the logarithm to base a. Thus, we have
x = loga y
provided that y > 0.
22
if and only if y = ax ,
Activity 2.11 Draw the graph of the exponential function f : R (0, ) where
f (x) = 2x and its inverse, f 1 (x) = log2 x where f 1 : (0, ) R.
In particular, we see from this that as
(f f 1 )(x) = f (f 1 (x)) = x we have aloga x = x,
and as
(f 1 f )(x) = f 1 (f (x)) = x we have
loga ax = x.
These results will be useful in Section 2.1.4 when we consider the laws of of logarithms.
Trigonometric functions: inverse trigonometric functions
If we want to discuss the inverses of the trigonometric functions sine and cosine, it is
first necessary to restrict their domain due to their oscillatory nature. To do this, we
consider a certain interval of values of , called the principal range, so that each value of
the function corresponds to a unique value of . Indeed, for the:
sine function, we take the principal range to be the interval [ 2 , 2 ] so that the
function sin : [ 2 , 2 ] [1, 1] where y = sin has an inverse. This inverse is
denoted by sin1 (or arcsin) where sin1 : [1, 1] [ 2 , 2 ]. Thus, we have
y = sin
provided that 2
and 1 y 1.
cosine function, we take the principal range to be the interval [0, ] so that the
function cos : [0, ] [1, 1] where y = cos has an inverse. This inverse is
denoted by cos1 (or arccos) where cos1 : [1, 1] [0, ]. Thus, we have
y = cos
It will also be convenient for us to consider the inverse of the tangent function where, as
well as the oscillations, we need to take care to avoid the asymptotes that occur when
this function is undefined. As such, for the
tangent function, we take the principal range to be the interval ( 2 , 2 ) so that the
function tan : ( 2 , 2 ) R where y = tan has an inverse. This inverse is denoted
by tan1 (or arctan) where tan1 : R ( 2 , 2 ). Thus, we have
y = tan
In particular, observe that sin1 , cos1 and tan1 are the inverses of the functions sin,
cos and tan respectively and not their reciprocals which we denoted by cosec, sec and
cot respectively in Section 2.1.2!
23
2. Functions
Activity 2.12 Find the acute angles 1 , 2 and 3 where 1 = sin1 12 , 2 = cos1
and t3 = tan1 1.
1
2
2.1.4
Identities
An expression such as
(x + 1)2 = x2 + 2x + 1,
which is true for all x is called an identity and, as you know, these are useful when we
need to simplify expressions. In particular, in Chapter 1 of 173 Algebra, you saw that
the power laws dictate that
am an = am+n ,
am
= amn
an
(am )n = amn ,
and
and these are identities that work for any values of a, m and n for which both sides are
defined. Indeed, these laws allow us to simplify expressions that may result from
appropriate products, quotients and compositions of power functions or exponential
functions.
Activity 2.13 If f (x) = x3 , g(x) = x4 and h(x) = 2x , find the functions (f g)(x),
(f /g)(x) and (g h)(x) simplifying your answers as far as possible.
We now look at some other identities that will be useful in this course.
The laws of logarithms
For any positive real number a = 1, the laws of logarithms state that
loga x + loga y = loga (xy),
x
y
provided that all of the terms involved are defined. As you may know, these laws are
easily derived from the power laws we saw above and the fact that
aloga x = x,
which we saw earlier in Section 2.1.3.
Activity 2.14
It is also useful to note that if a, b = 1 are positive real numbers, then we have the
change of base formula which states that
loga x =
logb x
,
logb a
24
Activity 2.15
Trigonometric identities
There are also identities that allow us to simplify various expressions involving the
trigonometric functions. For instance, using the triangle in Figure 2.4, Pythagoras
theorem allows us to see that
2
opposite
adjacent
+
hypotenuse
hypotenuse
2
2
opposite + adjacent
=
hypotenuse2
hypotenuse2
=
hypotenuse2
= 1,
sin2 + cos2 =
(2.2)
In particular, for natural numbers n 2, note that we commonly abbreviate things like
(sin )n by writing them as sinn . Further, dividing both sides of this expression by
sin2 we get
1 + cot2 = cosec2 ,
(2.3)
and this works as long as = n for n Z whereas dividing both sides of this
expression by cos2 we get
tan2 + 1 = sec2 ,
(2.4)
and this works as long as = (2n + 1) 2 for n Z. We call these three identities the
Pythagorean identities as they are simple consequences of Pythagorass theorem.
Activity 2.16
Another useful pair of trigonometric identities are the compound-angle formulae given
by
sin( + ) = sin cos + cos sin and
for , R.
2
Of course, if we consider how we extend the definitions of the sine and cosine functions to all R,
it should be clear that this identity is actually true for all R.
25
2. Functions
for , R. Indeed, they are especially useful since, setting = , we can use them to
obtain the double-angle formulae
sin(2) = 2 sin cos
and
(2.6)
and
cos(2) = 2 cos2 1,
for all R.
2.1.5
Applications of functions
In economics and related subjects, functions can be used to represent how one quantity
depends on another. For instance, as the profit that a company makes, , would depend
on the quantity of goods sold, q, it makes sense to suppose that there is some function
of q, say f , that tells us the corresponding profit, . In this case, we would use an
equation of the form = f (q) to express this dependency and we would have found a
profit function. Moreover, if f is invertible, we could find its inverse function, f 1 , and
we would use this to find the value of q that corresponds to a given value of . In which
case, the dependency would now be given by an equation of the form q = f 1 (). We
will look at profit functions properly in Section 4.5.3, but for now, we consider another
application of functions in economics, namely how they can be used to represent
information about supply and demand in a market.
Supply and demand functions
In any given market, there is a good which is supplied by the producers (and demanded
by the consumers) and the general idea is that, for both supply (and demand), if
producers are charging (or consumers are buying) at a price of p per-unit, then the level
of supply (or demand) for that good, q, will depend on p. Indeed, since each value of p
will lead the producers to supply (and the consumers to demand) exactly one quantity
q, it makes sense to think of the quantity, q, supplied (or demanded) as a function of
the price, p. This leads us to a description of the market in terms of two kinds of
function, namely:
If the quantity supplied, q, can be written in terms of p then we can identify the
supply function, q S , from the fact that we have q = q S (p). This tells us the
quantity, q, that the producers will supply if the prevailing market price is p.
26
If the quantity demanded, q, can be written in terms of p then we can identify the
demand function, q D , from the fact that we have q = q D (p). This tells us the
quantity, q, that the consumers will demand if the prevailing market price is p.
In particular, note that, although we have q as a function of p in both of these cases we
follow the practice common in economics and use the vertical axis for p and the
horizontal axis for q when drawing the graphs of these functions. As such, any point on
the graph of these functions is of the form (q, p) where q = q S (p) for supply and
q = q D (p) for demand. Also, these functions and their graphs only make economic sense
when p 0 and the quantities they yield, q, are also non-negative.3
Once we have these functions, we are often interested in the the equilibrium point for
the market as this is the point where the supply and demand functions are equal. In
theory, this is the point, (q , p ), where the market stabilises since, at this point, the
per-unit price, p , is such that the levels of supply and demand are equal, i.e. we have
q S (p ) = q D (p ).
As such, we can find the equilibrium price, p , by solving the resulting equation and the
corresponding equilibrium quantity, q , can then be found by, say, using the demand
function as q = q D (p ). Lets look at a simple example.
Example 2.6
and
q D (p) = 3 p,
respectively. Sketch the graphs of these functions and find the equilibrium point.
Here the supply and demand functions are straight lines which can easily be
sketched using the method outlined in Example 2.1 and the results of doing this are
illustrated in Figure 2.11. To find the equilibrium price, p , we have
q S (p ) = q D (p )
p + 1 = 3 p
2p = 4,
Although, when drawing their graphs, it is often useful to consider all possible values of p and q
before restricting your attention to the economically meaningful ones where p, q 0!
27
2. Functions
p
3
S
1
D
O
1
Figure 2.11: A sketch of the graphs of the supply and demand functions in Example 2.6
indicating the equilibrium point for this market. (Note that this sketch only makes
economic sense when p 0.)
If the price, p, can be written in terms of q then we can identify the inverse
demand function, pD , from the fact that we have p = pD (q). This tells us the price,
p, that the consumers will pay if the quantity being demanded is q.
Activity 2.19 Decide whether the supply and demand functions in the example
above are invertible. If they are, find the inverse supply and demand functions.
The effects of taxation
Sometimes, in order to control a market, a government will impose an excise tax of T
per unit sold. We model such situations by assuming that the tax is paid to the
government by the supplier and so, if the price paid by the consumers in the presence of
this tax is p per unit, the suppliers effectively receive p T for each unit sold as they
must pay T of each p received to the government. As such, the supply and demand
functions in the presence of the tax, lets call them qTS (p) and qTD (p) respectively, will be
given by
qTS (p) = q S (p T )
and
That is, the consumers still pay a price of p per unit and so the demand function is
unchanged, but the suppliers now only receive an amount p T per unit and so the
supply function is modified by the introduction of an excise tax. Of course, the
introduction of an excise tax will affect the equilibrium price and quantity for a market,
i.e. in the presence of such a tax, the new equilibrium point, lets call it (qT , pT ), will be
the point where
qTS (pT ) = qTD (pT )
or, equivalently,
q S (pT T ) = q D (pT ),
and, using the unchanged demand function qT = qTD (pT ) or, equivalently, qT = q D (pT ).
Lets look at how such a tax would affect the market we considered in Example 2.6.
28
pT T +1 = 3pT
2pT = 2+T
T
p = 1+ ,
2
T
2
=2
T
,
2
if we use the demand function, qTD (p).4 Thus, the new equilibrium point is
(2 T /2, 1 + T /2). Sketching the graph of the new supply function, as in
Figure 2.12, we see that it is parallel to the old one and the p-intercept has increased
by T . Indeed, as the equilibrium price has increased from 1 to 1 + T /2 due to the
presence of the tax, half the tax has been passed on to the consumer. Of course, the
equilibrium quantity in the presence of the tax must be positive and so, for the
market to function, we require that
qT > 0
T
>0
2
T < 4,
to find qT . However, we can not use q S (p) = p + 1 as this no longer holds in the presence of the tax!
29
2. Functions
(2 12 T, 1 + 21 T )
new S
S
1
T 1
D
1
1
Figure 2.12: Following on from the sketch in Figure 2.11, if an excise tax of T per unit is
imposed, the supply set changes as shown and the demand set stays the same. Observe
how the introduction of this tax affects the equilibrium point for this market. (Note that
this sketch only makes economic sense when p 0.)
and
That is, once again, the consumers still pay a price of p per unit and so the demand
function is unchanged, but the suppliers now only receive an amount p rp per unit
and so the supply function is modified by the introduction of a percentage of the price
tax. Of course, the introduction of this tax will also affect the equilibrium price and
quantity for the market, i.e. in the presence of such a tax, the new equilibrium point,
lets call it (qr , pr ), will be the point where
qrS (pr ) = qrD (pr )
or, equivalently,
and, using the unchanged demand function qr = qrD (pr ) or, equivalently, qr = q D (pr ).
See, for example, Exercise 2.3 at the end of this chapter.
2.2
Conic sections
So far, we have been dealing with functions that are explicitly defined in terms of an
independent variable but, sometimes, we may have an equation relating two variables,
say x and y, which implicitly defines y as one or more functions of x. As it will be useful
in various places, we now investigate some important instances of functions defined in
this way and their graphs, the so-called conic sections.5
2.2.1
Parabolae
See, for example, Binmore and Davies (2002) Section 2.14 for a full discussion of the geometric
aspects of conic sections and where they come from. Although this is interesting, we will not be delving
into these overly geometric aspects of conic sections in this course.
30
where a = 0, b and c are constants. Indeed, if we complete the square, we can write this
in the form
y = a(x p)2 + q,
for some constants p and q. This curve will have a y-intercept which we can find by
setting x = 0 and it may have x-intercepts which, if they exist, we can find by setting
y = 0. It will also have a turning point with coordinates (p, q) which will be a minimum
if a > 0 and a maximum if a < 0. Once we have this information, the parabola should
be easy to draw as the next example shows.
Example 2.8
(a) y = x2 4x + 3, and
(b) y = x2 + 2x + 3.
For (a), we are told that y = x2 4x + 3 and so we find that:
For the y-intercept: Setting x = 0 we get y = 3.
For the x-intercepts: Setting y = 0 we get
x2 4x + 3 = 0
(x 1)(x 3) = 0,
Putting this information together, we then get the sketch in Figure 2.13(a).
For (b), we are told that y = x2 + 2x + 3 and so we find that
For the y-intercept: Setting x = 0 we get y = 3.
For the x-intercept: Setting y = 0 we get
x2 + 2x + 3 = 0
x2 2x 3 = 0
(x + 1)(x 3) = 0,
Putting this information together, we then get the sketch in Figure 2.13(b).
31
2. Functions
y
y = x2 4x + 3
4
3
y = x2 + 2x + 3
2
O
x
1
(a)
x
1
(b)
Figure 2.13: In (a) we have a sketch of the parabola from Example 2.8(a). In (b) we have
2.2.2
Circles
x2 6x + y 2 8y = 0.
is the equation of a circle and so, completing the square in x and y, we find that
(x 3)2 9 + (y 4)2 16 = 0
32
and so, comparing this with (x a)2 + (y b)2 = r2 we see that we have a circle of
radius 5 centred on the point (3, 4). We also find that:
(x 3)2 = 9
x 3 = 3,
y 4 = 4,
(y 4)2 = 16
y
3
5
4
O
2
3
(a)
(b)
Figure 2.14: In (a) we have a sketch of the circle from Example 2.9. In (b) we have a
2.2.3
Ellipses
x2 y 2
+
=1
4
9
x2 = 4
x = 2,
33
2. Functions
y2 = 9
y = 3.
Putting this information together, and bearing in mind that this should look like a
circle centred on the origin that has been squashed, we then get the sketch in
Figure 2.14(b).
2.2.4
Hyperbolae
x2 y 2
= 1.
4
9
= 1,
4
9
we see that the x-intercepts, which occur when y = 0, are given by
x2
= 1 = x2 = 4 = x = 2,
4
whereas there are no y-intercepts since, setting x = 0, we get
y2
= 1 = y 2 = 9,
9
which has no real solutions. To find the asymptotes, we write the equation as
y2
=9
x2
1
1
2
4 x
y2
9
3
= y = x,
=
2
x
4
2
as the equations of the asymptotes. Putting this information together, we then get
the sketch in Figure 2.15(a).
34
y
2
x1
y =1+
x2
4
y2
9
=1
x
O
1
1
=
3
(a)
(b)
Figure 2.15: In (a) we have a sketch of the hyperbola from Example 2.11. In (b) we have
y 2 x2
= 1.
9
4
y =1+
2
,
x1
35
2. Functions
Putting this information together, we then get the sketch in Figure 2.15(b). In
particular, observe that here we have
y =1+
(x 1) + 2
x+1
2
=
=
,
x1
x1
x1
and so this gives us y = f (x) where f (x) is the first function in Example 2.2 which
was illustrated in Figure 2.9(a).
Learning outcomes
At the end of this chapter and having completed the relevant reading and activities, you
should be able to:
identify elementary functions and sketch their graphs;
find combinations of elementary functions and inverses (if they exist);
use identities to rewrite expressions involving powers, logarithms and trigonometric
functions;
solve problems from economics-based subjects that involve functions;
identify and sketch conic sections.
Solutions to activities
Solution to activity 2.1
Using the triangles in Figure 2.5 and the definition of the tangent function, it should be
clear that
tan = , tan = 1 and tan = 3.
6
4
3
3
Indeed, using the fact that
angle in radians =
2
angle in degrees,
360
36
straight line in this case the x-axis is ) giving x a magnitude of 1/2 and y a
magnitude of 3/2 whereas their signs would be negative for x (as x < 0) and positive
for y (as y > 0). Thus we see that
3
2
2
1
sin
=
and
cos
= ,
3
2
3
2
using the unit circle method.
(x, y)
y
1
2/3
Figure 2.16: For Activity 2.2, we find sin and cos when = 2/3 by considering a unit
circle.
Solution to activity 2.3
Using the unit circle in Figure 2.17(a), it should be clear that
sin 0 = 0
and
cos 0 = 1,
whereas using the unit circle in Figure 2.17(b), it should be clear that
and
cos = 0.
sin = 1
2
2
Then, using similar reasoning, we should be able to deduce that
sin
cos
3
2
1
0
2
0
1
are the other values of the functions sin and cos that we seek.
Solution to activity 2.4
From the definition of cot , we have
1
1
cos
=
=
,
sin
tan
sin
cos
as we know that tan = sin / cos . This function is defined as long as = n for n Z
since, at these values of , we have tan = 0 or, equivalently, sin = 0.
cot =
37
2. Functions
1
O
(a)
1
x
(b)
Figure 2.17: For Activity 2.3, we find sin and cos by considering a unit circle when (a)
= 0 and (b) = /2.
2 +1
where (g f ) : R R.
f g is the function where
(f g)(x) = f (g(x)) = f (2x ) = (2x )2 + 1 = 22x + 1,
and (f g) : R R.
2 +1
38
2
x
y
O
x
=
Figure 2.18: For Activity 2.7, we see that the graph of f 1 (x) = x 2 is the reflection of
If we take the two domains given by the intervals (, 0] and [0, ) so that we
have x 0 and x 0 respectively, then we remove the problem that occurs
because y = x2 has two solutions for x R.
y = x2
x=
y,
1
as x 0 because x [0, ).
Thus, using x = f (y), the inverse of this function is
1
1
f (y) = y or f (x) = x if we want it in terms of x.
y = x2
x = y,
1
as x 0 because x (, 0]. Thus,
using x = f (y), the inverse of this function
1
1
is f (y) = y or f (x) = x if we want it in terms of x.
In particular,
this means that the local inversesof f : R [0, ) where f (x) = x2 are
y = xn
=
x = y 1/n = n y,
gives us this unique solution for any y R provided that n
is odd and so we have
1
1
n
n
f (y) = y as the inverse function or, indeed, f (x) = x if we want it in terms of x.
39
x
y
y=
x2
2. Functions
y=
Figure 2.19: For Activity 2.9, we see that the graph of f 1 (x) =
x is the reflection of
x = n y R when n is even.
As such, we can not find a unique solution, x, for all y R and so the inverse of this
function can not exist.
Solution to activity 2.11
x
log 2
=
y
1
O
y= x
2
We saw the graph of a function like f : R (0, ) where f (x) = 2x in Figure 2.3(b)
since we have a = 2 > 1 here. As such, we find that the graphs of the function
f : R (0, ) where f (x) = 2x and its inverse, f 1 (x) = log2 x where
f 1 : (0, ) R, are as illustrated in Figure 2.20. In particular, observe that the curve
y = log2 x is the reflection about the line y = x of the curve y = 2x .
Figure 2.20: For Activity 2.11, we see that the graph of f 1 (x) = log2 x is the reflection
40
1
= sin
2
6
sin 1 =
gives us
1
2
1 = sin1
= ,
2
6
and
1
1
= cos
gives us
2 = cos1 = ,
2
3
2
3
1
whereas to find the acute angle 3 where t3 = tan 1, we use the table we found in
Activity 2.1 to see that
cos 2 =
tan 3 = 1 = tan
gives us
3 = tan1 1 =
.
4
We also have
cosec 1 =
1
1
1
1
1
1
=
= 2, sec 2 =
=
= 2 and cot t3 =
= = 1,
sin 1
1/2
cos 2
1/2
tan 3
1
using the definitions of the reciprocals of our three trigonometric functions, which we
saw in Section 2.1.2.
Solution to activity 2.13
Given that f (x) = x3 , g(x) = x4 and h(x) = 2x , we use the definitions of the
combinations of functions we need from Section 2.1.2, to get
(f g)(x) = f (x)g(x) = (x3 )(x4 ) = x7 ,
f (x)
x3
1
(f /g)(x) =
= 4 = , and
g(x)
x
x
(g h)(x) = g(h(x)) = g(2x ) = (2x )4 = 24x ,
where we have used the power laws to simplify our answers. Indeed, observe that for the
last function, we can also write 24x = (24 )x = 16x .
Solution to activity 2.14
To derive the laws of logarithms, we note that for the first one, we use the power laws
and the given fact to get
aloga x+loga y = aloga x aloga y = xy = aloga (xy) ,
which means that loga x + loga y = loga (xy), for the second one, we similarly get
aloga xloga y =
x
aloga x
=
= aloga (x/y) ,
aloga y
y
which means that loga x loga y = loga (x/y) and for the third one, we get
y
41
2. Functions
We take logarithms to the base b on both sides of the given fact to see that
aloga x = x
where we have used the third law of logarithms in the last step. Then, dividing through
on both sides by logb a (which is non-zero as a = 1), we get
loga x =
logb x
,
logb a
as required.
Solution to activity 2.16
Starting with sin2 + cos2 = 1, we divide both sides by sin2 to get
sin2 + cos2
1
=
2
sin
sin2
sin2 cos2
1
=
2 +
2
sin sin
sin2
1+
cos
sin
1
sin
so that 1 + cot2 = cosec2 if we use the definition of cosec from Section 2.1.2 and the
result from Activity 2.4. Then, again starting with sin2 + cos2 = 1, we divide both
sides by cos2 to get
sin2 + cos2
1
=
2
cos
cos2
sin2 cos2
1
+ 2 =
2
cos cos
cos2
sin
cos
+1 =
1
cos
so that tan2 + 1 = sec2 if we use the definition of sec and (2.1) from Section 2.1.2.
Solution to activity 2.17
With the given facts, we can use the compound-angle formula for sin( + ) to see that
sin( ) = sin( + ()) = sin cos() + cos sin() = sin cos cos sin ,
and the compound-angle formula for cos( + ) to see that
cos( ) = cos( + ()) = cos cos() sin sin() = cos cos + sin sin ,
as required.
Solution to activity 2.18
Using the compound-angle formula
sin( + ) = sin cos + cos sin ,
with = we get
sin( + ) = sin cos + cos sin
42
with = we get
cos( + ) = cos cos sin sin
as required. Indeed, since we also have the Pythagorean identity sin + cos = 1, we
can write this last double-angle formula as
cos(2) = (1 sin2 ) sin2 = 1 2 sin2 ,
in terms of sin2 , or as
cos(2) = cos2 (1 cos2 ) = 2 cos2 1,
in terms of cos2 , as required.
Solution to activity 2.19
From the graph in Figure 2.11, we can see that the economically meaningful part of the
supply function is q S : [0, ) [1, ) where q S (p) = p + 1 and the economically
meaningful part of the demand function is q D : [0, 3] [0, 3] where q D (p) = 3 p.
Clearly, both of these functions are invertible as each q in the co-domain gives rise to a
unique p in the domain and we find that
q =p+1
p = pS (q) = q 1,
p = pD (q) = 3 q,
a(x p)2 0
a(x p)2 + q q,
i.e. for all x R, y q and so the smallest value of y occurs when y = q which, in
turn, means that we must have x = p. Thus, the turning point of the parabola is a
minimum and this occurs at the point (p, q).
If a < 0, then for any x R,
(x p)2 0
a(x p)2 0
a(x p)2 + q q,
i.e. for all x R, y q and so the largest value of y occurs when y = q which, in
turn, means that we must have x = p. Thus, the turning point of the parabola is a
maximum and this occurs at the point (p, q).
43
2. Functions
y 2 x2
= 1,
9
4
we see that there are no x-intercepts since, setting y = 0, we get
x2
=1
4
x2 = 4,
which has no real solutions, whereas we see that the y-intercepts, which occur when
x = 0, are given by
y2
= 1 = y 2 = 9 = y = 3.
9
To find the asymptotes, we write the equation as
y2
=9
x2
1
1
+ 2
4 x
3
y = x,
2
as the equations of the asymptotes. Putting this information together, we then get the
sketch in Figure 2.21.
x2
4
=1
y2
9
y
=
3
44
y 2 x2
= 1.
9
4
2.2. Exercises
Exercises
Exercise 2.1
Sketch the graph of the function f : {x R | x = 1, 1} R given by
x4 1
.
x2 1
f (x) =
Exercise 2.2
Use the compound-angle formulae to show that
tan( ) =
tan tan
,
1 tan tan
and
q D (p) = 8 p,
respectively. Sketch the graphs of these functions and find the equilibrium point.
A percentage [of the price] tax of 100r% is imposed. Find the new equilibrium point
and, by sketching the graph of the new supply function on your earlier sketch, comment
on how the equilibrium point for the market has changed. How much of the tax has
been passed onto the consumers? What is the maximum tax, rm , that can be imposed if
this market is to continue functioning?
Exercise 2.4
When selling a quantity, q, a firm makes a profit given by
(q) = q 2 + 2q + 2,
and the largest quantity it can produce is 10. Sketch the graph of this profit function
and deduce the value of q that will yield the greatest profit for this firm.
Explain why the inverse profit function exists and find it.
Exercise 2.5
Sketch the circle and the rectangular hyperbola with equations
x2 + y 2 = 1
and
2xy = 1,
45
2. Functions
Solutions to exercises
(x2 1)(x2 + 1)
,
x2 1
y
y = f (x)
2
1
1 O 1
Figure 2.22: For Exercise 2.1, a sketch of the graph of f (x). (Note that the points at
46
sin( )
,
cos( )
if we divide the numerator and denominator by cos cos and cancel where
appropriate. Thus, using (2.1) again, we have
tan( ) =
tan tan
,
1 tan tan
as required. Indeed, observe that this only makes sense if , = (2n + 1) 2 for n Z as,
if this isnt true, we cant divide through by cos cos or, equivalently, one of tan or
tan wont exist.
To deduce a formula for tan(2), we set = in the formula for tan( + ) to get
tan(2) = tan( + ) =
2 tan
tan + tan
=
.
1 tan tan
1 tan2
Again, we observe that this only makes sense if = (2n + 1) 2 for n Z as, if this isnt
true, tan wont exist.
Solution to exercise 2.3
Here the supply and demand functions are straight lines which can be easily sketched
using the method outlined in Example 2.1 and the results of doing this are illustrated in
Figure 2.23(a). To find the equilibrium price, p , we have
q S (p ) = q D (p )
p 4 = 8 p
2p = 12,
Here we will start by restricting our attention to the case where 0 r 1 as, prima facie, these are
the values that would appear to be economically sensible. Although, as we will soon see, the economically
meaningful values of r will turn out to be 0 r < 1/2!
47
2. Functions
as the suppliers now see an effective price of p rp. This means that the equilibrium
price in the presence of tax, pr , is given by
16 8r 12
4 8r
12
=
=
,
2r
2r
2r
if we use the demand function, qrD (p).7 Sketching the graph of the new supply function,
as in Figure 2.23(b), we see that by writing its equation as
p=
q
4
+
,
1r 1r
1
4
1
and
4,
1r
1r
when considering 0 r 1, this means that it is steeper than the old one and that the
p-intercept, which is now
4
4(1 r) + 4r
4r
=
=4+
,
1r
1r
1r
has increased by 4r/(1 r). In this case, as the equilibrium price has increased from 6 to
12
6(2 r) + 6r
6r
=
=6+
,
2r
2r
2r
we see that the consumer pays 6r/(2 r) more. But, as the total tax to be paid by the
supplier is given by
12
12r
rpr = r
=
,
2r
2r
this means that only half of the tax has been passed on to the consumer in this case. Of
course, the equilibrium quantity in the presence of the tax must be positive and so, for
the market to function, we require that
qr > 0
4 8r
>0
2r
4 > 8r
1
r< ,
2
(bearing in mind that 2 r > 0 if 0 r 1), i.e. the maximum tax, rm , that can be
imposed is given by rm = 1/2.
7
to find qr . However, we can not use q S (p) = p 4 as this no longer holds in the presence of the tax!
48
p
8
new S
12 48r
, 2r )
( 2r
4
1r
6
4
D
4
(a)
(b)
Figure 2.23: For Exercise 2.3, a sketch of the graphs of the supply and demand functions
indicating the equilibrium point of the market when (a) there is no tax and (b) a
percentage of the price tax of 100r% is imposed. (Note that these sketches only make
economic sense when q 0.)
Solution to exercise 2.4
The firms profit function is
(q) = q 2 + 2q + 2,
and its domain is the interval [0, 10] as q 0 since it is a quantity and q 10 since the
largest quantity it can produce is 10. So, to sketch the graph of this profit function, we
start by sketching the parabola
y = q 2 + 2q + 2 = (q + 1)2 + 1,
in completed square form. This has
a y-intercept when q = 0, i.e. y = 2,
no q-intercepts as y = 0 gives (q + 1)2 + 1 = 0 which has no real solutions, and
a turning point which is a minimum at the point (1, 1).
We then restrict our attention to the relevant values of q, i.e. those that satisfy
0 q 10, to get a sketch of the graph of the profit function itself as illustrated in
Figure 2.24.
Looking at the graph of the profit function, we see that as it is a function
: [0, 10] [2, 122], its inverse exists since there is a unique q [0, 10] such that
y = (q) for all y [2, 122]. Indeed, solving this equation we find that, using the
completed square form above, we have
y = (q+1)2 +1
(q+1)2 = y1
q+1 = y 1
q = 1 y 1,
which gives us two values of q for each value of y > 1. However, as we must be getting
values of q [0, 10] from our inverse function, we take the + sign here (i.e. we discard
49
2. Functions
the sign) so that we can get the solutions where q 1 (instead of getting the
solutions where q 1 which we dont want). That is, we have found that
q = 1 (y) = 1 +
y 1,
10 q
Figure 2.24: For Exercise 2.4, a sketch of the graph of the profit function, (q). (Note
that dashed parts of the curve are on the parabola but are not part of the graph of the
profit function.)
Solution to exercise 2.5
To sketch the circle and the rectangular hyperbola, we note that:
The circle with equation
x2 + y 2 = 1,
is centred on the origin and has a radius of 1. Indeed, setting x = 0, we find that its
y-intercepts are y = 1 and, setting y = 0, we find that its x-intercepts are x = 1.
The rectangular hyperbola with equation
2xy = 1
y=
1
,
2x
has the x and y-axes, i.e. the lines y = 0 and x = 0, respectively, as its asymptotes
since
For the vertical asymptote: As x 0+ we have y and as x 0 we have
y .
For the horizontal asymptote: As x we have y 0 from above and as
x we have y 0 from below.
To find the points of intersection of these two curves we have to solve the equations
x2 + y 2 = 1
and
2xy = 1,
simultaneously. This can easily be done in two different ways which we give here for
completeness.
50
Method I: The equation 2xy = 1 tells us that, say, y = 1/(2x) and substituting
this into the other equation we get
x2 + y 2 = 1
1
=1
4x2
x2 +
4x4 4x2 + 1 = 0,
x2 =
1
2
1
x = .
2
1
2
1
y=
=
= ,
2x
2
2
as the corresponding values of y.
Method II: We note that, using our equations, we have
(x y)2 = x2 2xy + y 2 = (x2 + y 2 ) (2xy) = 1 1 = 0,
and so any solutions we seek must satisfy (x y)2 = 0 or, equivalently, x = y. If we
substitute this into one of our equations, say 2xy = 1, we get
2xy = 1
2y 2 = 1
y2 =
1
2
1
y = .
2
y
y=
1
2x
1
1
O
1
x2 + y 2 = 1
hyperbola
2xy
51
2. Functions
52
Chapter 3
Differentiation
3
Essential reading
(For full publication details, see Chapter 1.)
Binmore and Davies (2002) Sections 2.72.13.
Anthony and Biggs (1996) Chapter 6 and parts of Chapter 7.
Further reading
Simon and Blume (1994) Sections 2.32.7 and 3.6, Chapter 4 and Section 5.5.
Adams and Essex (2010) Sections 2.12.7, parts of Sections 3.1 and 3.3, parts of
Sections 4.9 and 4.10.
Aims and objectives
The objectives of this chapter are as follows.
To introduce the idea of a derivative and see how it can be found using various
techniques.
To use derivatives to find tangent lines and approximate functions using various
techniques.
To see how derivatives can be used in economics-based subjects.
Specific learning outcomes can be found near the end of this chapter.
3.1
Having revised the idea of a function in the previous chapter, we now turn to
differentiation, the process by which we find the derivative of a function. Given a
function, f , its derivative at the point a, which we denote by f (a), is given by the
formula
f (a + h) f (a)
f (a) = lim
,
h0
h
provided that the limit exists. Indeed, when the limit exists, i.e. when we can find a
value for f (a), we say that the function is differentiable at a. Observe that here, we
have introduced the notation
lim g(h),
h0
53
3. Differentiation
to denote the value1 of the function g(h) as h 0 (provided, of course, that there is
such a value) and we call this value the limit of g(h) as h 0 whereas if there is no
such value, we say that this limit does not exist.2 To see how this works in practice, we
can consider a simple example.
Example 3.1 Use the definition to find the derivative of the function f (x) = x2 at
the point x = 3.
We need to find f (3) and, using the formula above with a = 3, we start by looking at
f (3 + h) f (3)
(3 + h)2 32
=
,
h
h
which, looking at the numerator, is easily simplified to give
f (3 + h) f (3)
(9 + 6h + h2 ) 9
6h + h2
=
=
= 6 + h.
h
h
h
This in turn means that
f (3) = lim
h0
f (3 + h) f (3)
= lim 6 + h
h0
h
= 6,
Example 3.2 Use the definition to find the derivative of the function f (x) = x2 at
the general point x and use this to verify that f (3) = 6 as we found in Example 3.1.
We need to find f (x) and, using the formula above, we start by looking at
f (x + h) f (x)
(x + h)2 x2
=
,
h
h
1
54
f (x) = lim
= 2x,
Use the result in Example 3.2 to verify your answer to Activity 3.1.
or f (a),
x=a
3.2
The previous section told us how to find derivatives from first principles, but now we
want to explore a more convenient way of finding them. The key idea is that we
3
Indeed, as this limit exists for all x R, we can say that the function f (x) = x2 is differentiable for
all x R.
55
3. Differentiation
introduce standard derivatives which tell us how to differentiate the basic functions that
we saw in the previous chapter. Once we know how to differentiate these, the rules of
differentiation will allow us to differentiate combinations of these functions.
3.2.1
Standard derivatives
In Example 3.2, we used the definition of the derivative to show that the function
f (x) = x2 has a derivative given by f (x) = 2x. We now state some results that will
allow us to differentiate other elementary functions.
Power and root functions
If n Z, we can use the definition of the derivative to show that
f (x) = xn
f (x) = nxn1 .
f (x) = 0x1 = 0,
f (x) = 1x0 = 1,
1 1
1
f (x) = x 2 = ,
2
2 x
x.
f (x) = ex ,
56
f (x) =
1
,
x
which, as we will see in Activity 3.12, follows from the fact that the function ln x is the
inverse of ex .
If we have another base, a, the derivatives are not so simple. We shall see in Activity 3.9
that
f (x) = ax
=
f (x) = ax ln a,
and, using the change of base formula for logarithms, we will see that
f (x) = loga x
1
,
f (x) =
x ln a
f (x) = cos x,
in Section 3.2.2.
Sine and cosine functions
For the sine function we find that
f (x) = sin x
and for the cosine function we have
f (x) = cos x
f (x) = sin x.
Although, we could have used the fact that the sine and cosine functions are
interdefinable, i.e.
cos x = sin x +
and
sin x = cos x +
,
2
to derive the latter from the former once we have the chain rule (see Exercise 3.2).
Indeed, using these standard derivatives, we can then derive the derivatives of the other
trigonometric functions using their definitions in terms of sine and cosine together with
the rules of differentiation in Section 3.2.2 see, for example, Activity 3.6(c).
3.2.2
In Section 2.1.2, we saw that there are several standard ways of making new functions
from old ones. Here, we see how we can use the standard derivatives, i.e. the derivatives
of our basic functions, and rules of differentiation to differentiate new functions that are
created from these basic ones in these standard ways. We start with the most
straightforward of these which allows us to differentiate linear combinations of functions.
The linear combination rule
If k and l are constants, this allows us to differentiate the linear combination,
kf (x) + lg(x), of two functions f (x) and g(x). It states that
df
dg
d
kf (x) + lg(x) = k
+l ,
dx
dx
dx
or, using our shorthand, (kf + lg) (x) = kf (x) + lg (x). Indeed, this gives us three more
basic rules straightaway, i.e. the
57
3. Differentiation
,
dx
dx dx
or, using our shorthand, (f g) (x) = f (x) g (x).
Activity 3.3 Derive the constant multiple, sum and difference rules from the linear
combination rule.
Example 3.3
3
4 ex by the linear combination rule.
x
So, in the case of simple combinations of functions such as these, we see that the
derivative of the linear combination is given by the linear combination of the derivatives.
Activity 3.4 Use the rules above to differentiate the following functions with
respect to x.
(a) 3 cos x,
(b) ex + cos x,
(c) 3 sin x 3 ln x.
Indeed, we can see that using the change of base formula for logarithms from
Section 2.1.4, we have
ln x
loga x =
,
ln a
58
loga x
d
dx
ln x
ln a
1 d
ln a dx
ln x
1
ln a
1
x
1
,
x ln a
as mentioned in Section 3.2.1. We now look at the other rules of differentiation, i.e. the
ones that will allow us to differentiate products, quotients and compositions of functions.
The product rule
This allows us to differentiate the product of two functions f (x) and g(x). It states that
d
df
dg
f (x)g(x) =
g(x) + f (x) ,
dx
dx
dx
or, using our shorthand, [f (x)g(x)] = f (x)g(x) + f (x)g (x)]. Lets have a look at some
examples of how it works.
Example 3.4
and
g(x) = ex ,
f (x) = 1
and
g (x) = ex .
and
g(x) = ln x,
and
g (x) =
1
.
x
1
x
= ln x + 1.
59
3. Differentiation
Example 3.6
and
g(x) = ln x,
and
g (x) =
1
.
x
1
x
= ex ln x +
1
x
Of course, as we saw in Section 2.1.2, this all assumes that the quotient of the two
functions is defined for the values of x that we are working with, i.e. it only works for
values of x in the domain where g(x) = 0. Lets have a look at some examples of how it
works.
Example 3.7
and
g(x) = x,
f (x) = ex
and
g (x) = 1.
60
ex
with respect to x.
x
Example 3.8
x3
with respect to x.4
ln x
and
g(x) = ln x,
and
g (x) =
1
.
x
1
x
x2 (3 ln x 1)
,
[ln x]2
Example 3.9
ln x
with respect to x.5
ex
and
g(x) = ex ,
1
and
x
As such, the quotient rule tells us that
f (x) =
h (x) =
1
x
g (x) = ex .
sin x
,
x
(b)
ex
,
cos x
(c)
sin x
.
cos x
What can you deduce about the derivative of tan x from your answer to (c)?
4
5
61
3. Differentiation
or, using our shorthand, [f (g(x))] = f (g)g (x). Lets have a look at some examples of
how it works.
Example 3.10
and
g(x) = 2x + 1.
As such we have
f (g) = 3g 2
and
g (x) = 2,
f (g) =
g = g2
and
g(x) = 2x + 1.
As such we have
1 1
f (g) = g 2
2
and so the chain rule tells us that
h (x) =
1 1
g 2
2
and
g (x) = 2,
(2) = g 2 =
1
,
2x + 1
Example 3.12
6
3 +2
with respect to x.
In particular, observe that here the original function is only defined if x 1/2 whereas the derivative
is only defined if x > 1/2 (as, in the derivative, x = 1/2 would entail division by zero).
62
3 +2
f (g) = eg
and
g(x) = x3 + 2.
As such we have
f (g) = eg
g (x) = 3x2 ,
and
Use the chain rule to differentiate the following functions with respect
(a) sin(2x),
(c) ln(ex ).
and using the product and chain rules to differentiate it with respect to x.
Activity 3.11 (Derivatives of inverse functions)
If the function, f , has an inverse, f 1 , then we can let y = f (x) so that x = f 1 (y).
Use the chain rule to show that
d 1
f (y) = 1
dy
d
f (x) .
dx
63
3. Differentiation
and
and clearly, f (x) = 3x2 . But to differentiate g(x) we need to use the chain rule
because it is a composition. In this case, we have
g(h) = ln h
which gives us
g (h) =
so that
g (x) =
and
1
h
and
1
h
(2x) =
h(x) = x2 + 4,
h (x) = 2x,
2x
2x
= 2
,
h
x +4
by the chain rule. Now, putting all of this into the product rule gives us
l (x) = (3x2 ) ln(x2 + 4) + (x3 + 1)
2x
2
x +4
= 3x2 ln(x2 + 4) +
2x(x3 + 1)
,
x2 + 4
2 +x
2 +x
and
and to differentiate f (x) we need to use the chain rule because it is a composition.
In this case, we have
f (h) = eh
and
h(x) = x2 + x,
f (h) = eh
and
h (x) = 2x + 1,
which gives us
64
so that
f (x) = (eh )(2x + 1) = (2x + 1) eh = (2x + 1) ex
2 +x
by the chain rule. Then, to differentiate g(x), we need to use the chain rule again
because it is also a composition. In this case, we have
g(h) = ln h
and
h(x) = x3 + 1,
1
h
and
h (x) = 3x2 ,
(3x2 ) =
3x2
3x2
= 3
,
h
x +1
which gives us
g (h) =
so that
g (x) =
1
h
by the chain rule. Now, putting all of this into the product rule gives us
l (x) = (2x + 1) ex
2 +x
ln(x3 + 1) + ex
= (2x + 1) ln(x3 + 1) +
2 +x
3x2
x3 + 1
3x2
x2 +x
e
,
x3 + 1
(b)
sin(cos x)
,
esin x
3.2.3
Higher-order derivatives
As we have seen above, when we differentiate a function, f (x), we find that its
derivative, f (x), is also a function of x. In this context, we call f (x) the first-order
derivative of f (x) and we can differentiate it again to get the second-order derivative,
i.e. we find
d2 f
d df
and we denote this by
or f (x).
dx dx
dx2
Of course, the second-order derivative will also be a function of x and so we can
differentiate it again to get the third-order derivative, i.e. we find
d
dx
d2 f
dx2
d3 f
dx3
or f (x).
We can, of course, do this again and again but the shorthand notation we use can
become a bit unwieldy once we pass the third-order derivative. As such, for n 4, we
65
3. Differentiation
as f (n) (x),
Example 3.15 Find the first four derivatives of f (x) = sin x. What is the
relationship between these derivatives of sin x?
We have f (x) = sin x, and so the first-order derivative of f is given by
f (x) =
d
sin x = cos x.
dx
d
dx
df
dx
d
cos x = sin x.
dx
d
dx
d2 f
dx2
d
( sin x) = cos x.
dx
(4)
d
(x) =
dx
d3 f
dx3
d
( cos x) = sin x.
dx
So, in particular, we see that f (x) = f (x), f (x) = f (x) and f (4) (x) = f (x).
Activity 3.14
n 1?
Using the pattern inherent in Example 3.15, what is f (n) (x) for
Activity 3.15 Find the first four derivatives of f (x) = x ex . Hence deduce an
expression for f (n) (x) for n 1.
3.3
Using derivatives
Derivatives can be very useful in mathematics and economics, but before we see how,
we need to understand what derivatives represent.
3.3.1
If we draw the graph of a function, f , we get the curve with equation y = f (x). At any
point on this curve, say the point (a, f (a)), we can draw a chord (or secant line) that
connects the given point to another point on the curve. For instance, in Figure 3.1, the
66
f (b)
3
C
f (a)
O
Figure 3.1: The line segment C is the chord joining the points (a, f (a)) and (b, f (b)) on
the curve y = f (x). This is extended using the dotted lines at both ends so that we can
see what line the chord is a line segment of.
line segment C is the chord joining the points (a, f (a)) and (b, f (b)) on the curve
y = f (x). In particular, we see that the gradient of this chord, lets call it mC , can be
found using the formula
f (b) f (a)
,
mC =
ba
which you should know.
To relate this to the derivative, we take some number, h = 0, and let b = a + h so that
we now have a chord, C, which is joining the points (a, f (a)) and (a + h, f (a + h)). The
gradient of this chord is then given by
mC (h) =
f (a + h) f (a)
f (a + h) f (a)
=
,
(a + h) a
h
and, for h = 0, this is a function of h since the value of mC will depend on the value of
h that we choose. In particular, recalling what we saw in Section 3.1, we can see that
f (a) = lim mC (h),
h0
67
3. Differentiation
y
y = f (x)
C3
C2
f (a + h3 )
C1
T
f (a + h2 )
f (a + h1 )
f (a)
O
a + h1
a + h2
a + h3
Figure 3.2: C1 , C2 and C3 are three chords of the curve y = f (x) originating from the
point (a, f (a)). Observe that as the other end of a chord approaches this point, the chords
pivot about it and their gradients get closer to the gradient of the line, T .
gradients of these chords tend to some finite limit as h 0? That is, does the limit in
our expression for f (a) above exist?
Hopefully, in Figure 3.2, you can see that as h gets smaller (i.e. as we consider C3 , then
C2 and then C1 ), the lines are pivoting through the point (a, f (a)) and their gradients
are getting closer to the gradient of the line T . Indeed, in the limit as h 0, the lines
we get from extending an arbitrary chord joining the points (a, f (a)) and
(a + h, f (a + h)) should become the line T . In particular, this means that the limit of
mC (h) as h 0 exists because it should be equal to the gradient of T . This means that
the line T , called the tangent to f at the point (a, f (a))
goes through the point (a, f (a)), and
its gradient is the limit, as h 0, of mC (h), i.e. f (a).
For this reason, we define the gradient of a curve y = f (x) at the point (a, f (a)) to be
the gradient of its tangent line at that point and this, as we have seen, is simply the
value of f (a).
3.3.2
Now that we know how the tangent lines to a curve are related to derivatives, we can
use derivatives to find the equation of the tangent line to a curve at a given point. This,
in turn, will introduce us to a useful way of performing approximations.
68
y f (a)
,
xa
(3.1)
gives us the equation of the tangent line as it goes through the point (a, f (a)) and its
gradient is given by f (a). Lets look at a quick example.
Example 3.16
when x = 3.
When x = 3, the point on the curve y = x2 is (3, 9) and we know that f (3) = 6 as
f (x) = 2x. Consequently, using (3.1), the equation of the tangent line is given by
6=
y9
x3
y 9 = 6x 18
y = 6x 9.
In particular, when written in this form, we see that the gradient of the line is
indeed 6 and the point (3, 9) does indeed lie on it as 6(3) 9 = 9.
Activity 3.16 Find the equation of the tangent line to the function f (x) = ex when
x = 1.
Linear approximations
One use of tangent lines is that they provide us with a simple way of approximating the
value of a function. For instance, if we have the tangent line to the function f (x) at the
point x = a, the equation of its tangent line, i.e.
f (a) =
y f (a)
,
xa
69
3. Differentiation
y
y = f (x)
3
T
error
f (x )
f (a)
O
Figure 3.3: When x is close to a we can use the tangent line at a to find y which gives
f (0) = 3 e0 = 3,
70
f (a) + hf (a)
f (a)
f (a + h) f (a)
,
h
and so, if we denote the change in f by f and the change in x by x = h, we see that
f
f (a)
or
f f (a)x.
x
That is, we can find the approximate value of the change in f if we change x from a to
a + h. Of course, the smaller x = h is, the better our approximation. This is
illustrated in Figure 3.4.
y
y = f (x)
approx f
f (a)
O
exact f
error
f (a + h)
a+h
x = h
Obviously, the smaller the value of the change x = h, the better the approximation for
f will be.
Example 3.18 Without using a calculator, find the approximate change in 3 ex if
x is increased from zero to 0.1. Hence deduce the approximate value of 3 e0.1 .
Given that f (x) = 3 ex , we have
f (x) = 3 ex
f (0) = 3 e0 = 3,
i.e. the change in f is approximately 0.3. Observe that the minus sign is telling us
that when x increases from 0 to 0.1, f (x) is decreasing by approximately 0.3.
This means that using
f (0.1)
we see that the approximate value of 3 e0.1 is 2.7 as we would expect from the linear
approximation in Example 3.17.
71
3. Differentiation
Further, as the derivative of a function gives us information about how f (x) is changing
due to changes in x, we often refer to f (a) as the rate of change of f (x) with respect to
x when x = a.
3.3.3
Applications of derivatives
Derivatives are useful in economics and we now introduce two ways in which they can
arise in that subject. The first is their use when discussing marginal functions and the
second is when they are used in the context of elasticities. At this point, we will just
introduce these ideas and see how they might be useful, but they will also be used when
we consider some applications of the material contained in other chapters of this subject
guide.
Marginal functions
In economics, the term marginal denotes the rate of change of a quantity with respect
to a variable on which it depends. For instance, if a firm has a cost function, C(q), this
tells us the cost of producing q units of their product. The marginal cost of the firm,
which we denote by MC(q), would then be given by
MC(q) =
dC
.
dq
This is useful since, using what we saw above, we can see that the marginal cost is
telling us (approximately) about how changes in the level of production, q, will incur
changes in the costs, C. That is, if the level of production is increased by q, i.e. our
production increases from q to q + q, we find that
MC(q) =
dC
dq
MC(q)
C
q
MC(q)q,
MC(q).
That is, in these circumstances, the marginal cost tells us (approximately) the extra
cost incurred if the firm wishes to produce one more unit of their good given that they
are already producing q units.
Example 3.19
in dollars. Find the marginal cost function for this firm and use it to determine the
approximate cost of producing one more unit if the original level of production is 100
units.
The marginal cost function, MC(q), is given by
MC(q) = C (q) = 5 + 2q,
72
and so using the fact that the change in cost, C, is related to the change in
production, q, by
C C (q)q,
we see that an increase in production of one unit, i.e. q = 1, gives rise to an
increase in costs given by
C
That is, if the firm is producing 100 units and they increase their production by one
unit, they will incur additional costs of approximately 205 dollars.
Activity 3.17 By using C(q + 1) C(q) directly when q = 100, determine how
good the approximation found in Example 3.19 is.
Generally then, if f is some economically meaningful function, its derivative is referred
to as the marginal of f and we denote this by Mf . For instance, if R(q) is the revenue
function for a firm, the marginal revenue, MR(q), is just R (q).
Elasticities
Suppose that, as in Section 2.1.5, we have a market where consumers purchase a good
according to the demand function, q D (p). If the price of this good was to increase from
p to p + p, then there will be a change in the quantity demanded by the consumers
from q to q + q. Indeed, since a rise in price will usually lead to a fall in demand, we
would expect q to be negative here. In these circumstances, we can see how these
changes are related by noting that
q = q D (p + p) q D (p)
q (p)p
q
p
q (p),
where we have used q to denote the quantity demanded, i.e. q(p) = q D (p).
Now, suppose that we are interested in the relative change in quantity, q/q, and the
relative change in price, p/p, we can see that the ratio of these two terms is then given
by
q/q
p q
p
=
q (p).
p/p
q p
q
Indeed, as q is usually negative (whereas the other terms on the left-hand-side, i.e. p,
q and p, are all positive) we would usually expect the right-hand-side to be negative
as well. With this in mind, we define the [price] elasticity of demand, (p), to be
p
(p) = q (p),
q
where q = q D (p) and the minus sign is introduced so that, in the usual case where q is
negative, we can be sure that (p) itself will be positive.7 Then, we can see that using
q
q
(p)
p
,
p
Some books omit the minus sign in their definition of the elasticity of demand, but it will be useful
for us to include it as it is easier to deal with positive quantities.
73
3. Differentiation
we can see how the relative change in quantity is simply related to the relative change
in price via the elasticity of demand.
Example 3.20 Suppose that the demand function for some good is given by
q D (p) = 10pr where r is a constant. Find the elasticity of demand. What does this
tell us about the effect of relative changes in price on relative changes in quantity?
10rpr1
= r,
(p)
p
,
p
we see that a relative increase in price of, say, x% will lead to a relative decrease in
quantity purchased of (approximately) rx%.
Indeed, we will see, in Section 4.2.3, that elasticities can also give us useful information
about how the revenue, R = pq, generated from selling a quantity, q, at a price of p per
unit will be affected by increases in the price.
3.3.4
Existence of derivatives
Although we will usually be dealing with situations where a function has a derivative at
every point where it is defined, we will occasionally encounter situations where there is
at least one point at which the derivative of a function does not exist. Just so that we
are aware of what this means and the kinds of situation in which it can arise, we
consider some of the most common ways in which a derivative can fail to exist at a
certain point.8
Discontinuous functions
If a function is discontinuous at a point, i.e. there is a point at which the function is not
continuous, then the derivative will not exist at that point as the next example
illustrates.
Example 3.21
1
x0
,
1 x < 0
See, for example, Section 2.8 of Binmore and Davies (2002) for a discussion of some similar cases.
74
f (h) f (0)
.
h0
h
f (0) = lim
However, here we can not just find
f (h) f (0)
,
h
and let h 0 as we did in Section 3.1 since the value of f (h) is different depending
on whether h is positive or negative. In such cases, we say that the limit we seek, i.e.
f (h) f (0)
,
h0
h
lim
h0
f (h) f (0)
h
and
lim+
h0
f (h) f (0)
,
h
exist9 and, secondly, if they exist, they must be equal. But, using the given function,
we see that
(1) 0
1
f (h) f (0)
lim
= lim
= lim
= ,
h0
h0
h0
h
h
h
and
(1) 0
1
f (h) f (0)
= lim+
= lim+ = ,
lim+
h0
h0 h
h0
h
h
i.e. neither of these limits exists as is not a value10 but more of a notational
convenience which tells us that a function is getting arbitrarily large in the limit.
Consequently, we see that
f (h) f (0)
,
h0
h
f (0) = lim
fails to exist too and so the derivative of this function does not exist at x = 0.
Of course, the graph of a function can also have a discontinuity due to the presence of a
vertical asymptote. In such cases, the function is not actually defined at the value of x
where the asymptote occurs and so, because of this, the derivative cannot exist at this
point either.11 In both of these cases, as we cant ascribe a gradient to the function at
these points, the function cant have a tangent line at these points.
9
Notice that the former limit allows us to deal with negative h and the latter allows us to deal with
positive h. Also recall that the notation h 0 and h 0+ was explained in Example 2.2.
10
That is, it is not a real number.
11
Well come across this again in Section 4.4.3.
75
3. Differentiation
y
y
y = x1/3
y = |x|
1
y=
1
x0
1 x < 0
(a)
(b)
(c)
Figure 3.5: The graphs of three functions that have no derivative at x = 0 as explained in
(a) Example 3.21, (b) Example 3.22 and (c) Example 3.23. We note however that, unlike
the functions in (a) and (b), the function in (c) does have a tangent line at x = 0 given
by the vertical line with equation x = 0.
Continuous functions with corners
But, even if a function is continuous at every point, the derivative will not exist at
points where the curve changes too sharply, i.e. when the curve has a corner, as the
next example illustrates.
Example 3.22
when x = 0.
Show that the derivative of the function f (x) = |x| does not exist
This function is illustrated in Figure 3.5(b) and, clearly, as the function is the
continuous straight line f (x) = x when x < 0 and f (x) = x when x > 0, its
derivative is defined and equal to 1 when x < 0 and 1 when x > 0. However, when
x = 0, the function has a corner and its derivative, if it exists, would be given by
f (h) f (0)
.
h0
h
f (0) = lim
h0
f (h) f (0)
h
and
lim+
h0
f (h) f (0)
,
h
exist12 and, secondly, if they exist, they must be equal. But, using the given
function, we see that
lim
h0
76
f (h) f (0)
(h) 0
= lim
= lim 1 = 1,
h0
h0
h
h
and
f (h) f (0)
h0
= lim+
= lim+ 1 = 1,
h0
h0
h0
h
h
i.e. both of these limits exist, but they are clearly not equal. Consequently, we see
that
f (h) f (0)
f (0) = lim
,
h0
h
fails to exist and so the derivative of this function does not exist at x = 0.
lim+
Observe that, in this case, the limits as h 0+ and as h 0 both exist, but the
problem occurs because they are not equal and so we cannot ascribe a value to the
derivative (i.e. the limit as h 0) in such situations. In particular, as this means that
we cant ascribe a gradient to f at this point, the function cant have a tangent line
here either.
Continuous functions with vertical tangent lines
Also, if a function is continuous at every point, the derivative will not exist at points
where the gradient of the curve becomes infinite, i.e. when the curve has a vertical
tangent line, as the next example illustrates.
Example 3.23
when x = 0.
Show that the derivative of the function f (x) = x1/3 does not exist
This function is illustrated in Figure 3.5(c) and, clearly, we can see that its
derivative is given by
1
f (x) = 31 x2/3 = 2/3 ,
3x
which exists as long as x = 0. Of course, when x = 0, the derivative cannot exist
since, if we were to use this formula, we would have to divide by zero and this is
never allowed. However, we can see from Figure 3.5(c) that the graph of the function
has a vertical tangent line at x = 0 which is given by the vertical line with equation
x = 0.13 Thus, we have a situation where the derivative of the function does not
exist at x = 0, but it does have a tangent line at that point.
Observe that, in cases where the tangent line to f at a point is a vertical line we cannot
use (3.1) to find its equation as its derivative is not defined.14
12
Again, as in Example 3.21, the former limit allows us to deal with negative h and the latter allows
us to deal with positive h.
13
Notice that the tangent lines of the function are getting steeper as we move towards x = 0 on the
left and shallower as we move away from x = 0 on the right.
14
Well come across this again in Section 4.4.3.
77
3. Differentiation
3.4
We have seen that the first derivative of a function, f , can allow us to find a linear
approximation to f around a by using the formula
f (x)
(x a)2
(x a)n (n)
f (a) + +
f (a) + ,
2!
n!
(3.2)
which is called the Taylor series for f (x) about x = a.15 You will notice that the
right-hand-side of this formula is an infinite series and, for reasons beyond the scope of
this course, there will generally be conditions that depend on f and a that determine
whether this infinite series does indeed give us the value of f (x) that we expect to get
on the left-hand-side. For now, we just note that these conditions can be used to find a
set of values of x, that includes the point x = a, for which the formula works. Of course,
if the value of x in question does not lie in this set, the formula does not work!
In this course, we will often just use the first few terms from the Taylor series to get an
approximate value of f (x).16 And, as long as we are considering what this formula tells
us about f (x) when x is close to a, these approximations will generally be more than
adequate. For instance, if we take n = 1 in this formula, i.e. if we take the first two
terms of the Taylor Series, we recover our linear approximation to f around a and, if we
take n = 2, we get
f (x)
(x a)2
f (a),
2!
which is now a quadratic approximation to f around a. Indeed, we have seen how useful
the linear approximation is in Section 3.3.2 and the quadratic approximation will be
useful in the next chapter.
3.4.1
Maclaurin series
Lets start with the relatively simple case of a Maclaurin series which is what we call a
Taylor series about x = 0. That is, the Maclaurin series of the function f (x) is found by
setting a = 0 in (3.2) to get
f (x) = f (0) + xf (0) +
xn
x2
f (0) + + f (n) (0) + .
2!
n!
(3.3)
To see how this works, lets start by finding a simple Maclaurin series.
15
See, for instance, Section 2.13 of Binmore and Davies (2002) for an explanation of where this formula
comes from.
16
It will be an approximation since, if we only keep the first few terms from the beginning of the series,
we lose all the information about the value of f (x) that is contained in the terms we are neglecting.
78
Example 3.24
Here we have f (x) = ex so that f (0) = 1. We also note that the first three
derivatives of this function are
f (x) = ex ,
f (x) = ex
and f (x) = ex .
Indeed, it should be clear that f (n) (x) = ex for all n 1. Then, to use these in (3.3),
we need to evaluate these derivatives at x = 0, i.e. we find that
f (0) = e0 = 1,
Indeed, it should be clear that f (n) (0) = e0 = 1 for all n 1. Consequently, putting
this into (3.3), we get
x2 x3
xn
e =1+x+
+
+ +
+ ,
2!
3!
n!
x
Here we have f (x) = (1 + x)r so that f (0) = 1. We also note that the first three
derivatives of this function are
f (x) = r(1 + x)r1 , f (x) = r(r 1)(1 + x)r2 and f (x) = r(r 1)(r 2)(1 + x)r3 .
Indeed, it should be clear that
f (n) (x) = r(r 1) (r [n 1])(1 + x)rn ,
for all n 1. Then, to use these in (3.3), we need to evaluate these derivatives at
x = 0, i.e. we find that
f (0) = r,
79
3. Differentiation
In particular, notice that if r Q but r N, this is always an infinite series as, for any
n N, we will find that r [n 1] = 0. However, if r N, we will find a value of n,
namely n = r + 1 that makes r [n 1] = 0 and this will mean that all of the terms
with n r + 1 will be zero, i.e. the Maclaurin series will be finite and will terminate at
the term where n = r. This is a very special Maclaurin series that you may have
encountered before as the binomial theorem and we look at some examples of this
special case in Activity 3.18.
Activity 3.18 Use the Maclaurin series for (1 + x)r which we found in
Example 3.25 to find (1 + x)2 and (1 + x)3 .
As well as the two Maclaurin series derived in Examples 3.24 and 3.25, you should also
remember the following
x3 x5
x2n+1
sin x = x
+
+ +
+
3!
5!
(2n + 1)!
cos x = 1
x2n
x2 x4
+
+ +
+
2!
4!
(2n)!
for x R.
for x R.
x2 x3
xn
+
+ + (1)n+1 + for |x| < 1.
2
3
n
In particular, observe how these series differ in their first term, the presence of terms of
odd and even degree and the absence of factorials in the series for ln(1 + x).
ln(1 + x) = x
Write down the second and fourth-order Maclaurin series for cos x.
As we saw above, the Maclaurin series for cos x is given by the infinite series
x2 x 4
x2n
cos x = 1
+
+ +
+ ,
2!
4!
(2n)!
As such, the second-order Maclaurin series for cos x is
1
x2
,
2!
which, since there is no x3 term in the Maclaurin series for cos x, is also the
third-order Maclaurin series for cos x. Similarly, the fourth-order Maclaurin series for
cos x is
x2 x4
1
+ ,
2!
4!
5
which, since there is no x term in the Maclaurin series for cos x, is also the
fifth-order Maclaurin series for cos x.
80
These nth-order Maclaurin series can be used to approximate a function, f (x), for
values of x close to x = 0. In general, there are two factors that determine how accurate
this approximation will be, namely
the value of x we are considering: the closer this value of x is to x = 0, the better
the approximation will be, and
the order of the Maclaurin series we use: the more terms we keep, the better the
approximation will be.
The precise way of determining the accuracy of such approximations in terms of these
two factors will be dealt with in 176 Further Calculus where you will encounter Taylors
theorem. But, we can see how it works and begin to see how these factors affect the
accuracy of our approximations by considering some examples.
Example 3.27 Use the fourth-order Maclaurin series for cos x to find an
approximate value for cos 1 and cos 2.
The fourth-order Maclaurin series for cos x is
1
x2 x4
+ .
2!
4!
cos 1
13
12 14
+
= ,
2
24
24
which is 0.5417 to 4dp. Using a calculator we see that the true value of cos 1 is 0.5403
to 4dp and so this is a good approximation as, to 2dp, it gives us 0.54 either way.
Similarly, taking x = 2, we see that
cos 2
22 24
2
1
+
=12+ = ,
2
24
3
3
which is 0.3333 to 4dp. Using a calculator we see that the true value of cos 2 is
0.4161 to 4dp and so this is a poor approximation as it isnt even accurate to 1dp.
But, of course, we should expect our approximations to be poor if we move too far away
from x = 0 as, by definition, the Maclaurin series represents how the function is
behaving around x = 0. To see this, consider the curves in Figure 3.6 which illustrate
how the fourth-order Maclaurin series for cos x becomes less accurate at approximating
the function as we move away from x = 0.
The other way in which the accuracy of our approximation to a function can be affected
is the number of terms we take in the Maclaurin series. For instance, the second-order
Maclaurin series for cos x contains less information about the function than the
fourth-order one and so we would expect this to give us a worse approximation. This
can be seen in Figure 3.7, which illustrates how the second-order Maclaurin series is
even less accurate than the fourth-order one as we move away from x = 0.
81
3. Differentiation
Figure 3.6: The solid curve is the graph of the function cos x and the dashed curve is the
graph of the fourth-order Maclaurin series for this function. Observe how the Maclaurin
series moves away from the function as we take values of x further away from x = 0.
Using Maclaurin series to approximate other functions
We now look at some ways of finding Maclaurin series for more complicated functions
and see how we can use these to find approximations.
Example 3.28
There are two ways to do this. We could use (3.3) to see that as f (x) = x ex we have
f (0) = 0 and then, using what we found in Activity 3.15 above, i.e.
f (x) = (1+x) ex ,
f (x) = (2+x) ex ,
f (x) = (3+x) ex ,
we see that
f (0) = 1,
f (0) = 2,
f (0) = 3,
x3
x4
x3 x 4
x2
(2) + (3) + (4) = x + x2 +
+ ,
2!
3!
4!
2
6
82
x2 x3 x4
+
+
+ ,
2!
3!
4!
Figure 3.7: The solid curve is the graph of the function cos x, the dotted curve is the graph
of its second-order Maclaurin series and the dashed curve is the graph of its fourth-order
Maclaurin series. Observe how the former less accurately tracks the function than the
latter as we take values of x further away from x = 0.
we can see that
x ex = x 1 + x +
x2 x3 x4
+
+
+
2!
3!
4!
= x + x2 +
x3 x4
+
+ ,
2
6
Here we have f (x) = cos(ln(1 + x)) which is a composition where f (x) = cos y with
y = ln(1 + x). So we need to look at the Maclaurin series for cos y which is given by
y2 y4
+
+ ,
2!
4!
and y, in turn, will be given by the Maclaurin series for ln(1 + x), i.e.
cos y = 1
y = ln(1 + x) = x
x2 x3 x 4
+
+
+ .
2
3
4!
83
3. Differentiation
So, substituting our series for y into our series for cos y, we can see that
f (x) = 1
1
2!
x2 x3 x4
+
+
2
3
4
1
4!
x2 x3 x4
+
+
2
3
4
+ ,
and we start by looking at how the terms A and B contribute to the series if we are
only interested in terms up to x4 . For A, we have
A=
=
x2 x3 x4
+
+
2
3
4
x2 x3 x4
x
+
+
2
3
4
x2 x3 x4
+
+
2
3
4
so we can multiply each term in the second bracket by the appropriate terms in the
first bracket (taking care to include cross-terms) to get
A = (x)(x) 2
x2
2
(x) + 2
x3
3
(x) +
x2
2
x2
2
+ = x2 x3 +
11 4
x + ,
12
where indicates terms we can ignore because their degree is greater than four.
Similarly, for B, we have
B=
x2 x3 x4
+
+
x
2
3
4
x2 x3 x 4
+
+
2
3
4
multiplied by itself four times. The terms which arise from this product are obtained
by multiplying together four objects, one from each occurrence of the bracketed
expression. Since the term with lowest power of x in each bracket is x, it is only by
taking the x from each bracket that we obtain a term which is at most x4 and so we
get
B = x4 + ,
where indicates terms we can ignore because their degree is greater than four.
Of course, using similar reasoning, we can see that there will be no further terms for
our series as the next term in the cos y series (i.e. the first one we omitted above) is
y 6 /6! and the smallest term this can yield looks like x6 whose degree is greater
than four.
Therefore, putting this all together, we have
A B
+ +
2! 4!
1
11
=1
x2 x3 + x4 +
2
12
1
x4 +
24
5
x2 x3
+
x4 + ,
2
2
12
and this gives us the fourth-order Maclaurin series for cos(ln(1 + x)) as we have kept
all of the terms up to x4 .
=1
84
Activity 3.19 Find the fourth-order Maclaurin series for cos(ln(1 + x)) by using
the definition and differentiation to verify the answer we found in Example 3.29.
(Notice that it is harder to work it out using this method!)
Once we have the Maclaurin series of a function, f (x), we can use it to estimate the
value of the function at some value of x close to zero as we did above.
Example 3.30 Use the Maclaurin series we found in Example 3.29 to find an
approximate value for cos(ln 1.1) and cos(ln 1.9).
To find an approximate value for cos(ln 1.1), we use the Maclaurin series above to
get the approximation
cos(ln(1 + x))
x2 x3
5
+
x4 ,
2
2
12
0.12 0.13 5
+
0.14 = 10.005+0.00050.000042,
2
2
12
which is 0.995458 to 6dp. In passing we note that, using a calculator, the true value
is 0.995461 to 6dp and so this is a good approximation as, to 5dp, it gives us 0.99546
either way.
To find an approximate value for cos(ln 1.9), we use the approximation above with
x = 0.9 to get
cos(ln 1.9) = cos(ln(1+0.9))
0.92 0.93 5
+
0.94 = 10.405+0.36450.273375,
2
2
12
which is 0.686125 to 6dp. In passing we note that, using a calculator, the true value
is 0.800987 to 6dp and so this is a poor approximation as it isnt even accurate to
1dp.
Observe that this approximation has deteriorated much more quickly than the one we
used when considering approximate values of cos x in Example 3.27. We wont pursue
the nature of this sensitivity here, but we do reiterate that we should expect our
approximations to be poor if we move too far away from x = 0 for, as we have seen, the
Maclaurin series is there to represent how the function is behaving around x = 0.
3.4.2
Taylor series
We now briefly consider what happens when we are looking for the Taylor series for
f (x) around x = a when a = 0. In this case, we follow the general method outlined
above, but now we have to use (3.2), i.e.
f (x) = f (a) + (x a)f (a) +
(x a)2
(x a)n (n)
f (a) + +
f (a) + ,
2!
n!
85
3. Differentiation
Example 3.31
Here we have f (x) = ex so that f (1) = e. We also note, as in Example 3.24, that
f (n) (x) = ex for n 1. Then, to use these derivatives in (3.2), we need to evaluate
them at x = 1, i.e. we find that f (n) (1) = e for n 1. Consequently, putting this into
(3.2), we get
ex = e +(x 1) e +
(x 1)2
(x 1)3
(x 1)n
e+
e+ +
e+ ,
2!
3!
n!
1 + 1.1 +
For (b), we know from Example 3.31 that the second-order Taylor series for ex
around x = 1 is given by
(x 1)2
e +(x 1) e +
e,
2!
and, using this, we find that
e1.1
86
e +(1.1 0.1) e +
(1.1 1)2
e = 1.105 e,
2!
which, if we know the value of e, gives us 3.0037 (to 4dp). This agrees with the
exact value of e1.1 to 3dp.
As we should expect, the answer to (b) gives us a better approximation to e1.1 than
the one we found in (a) since x = 1.1 is closer to x = 1 than it is to x = 0. But, on
the other hand, the answer to (a) didnt require us to have any accurate knowledge
of the value of e itself!
Following on from this example, as we can see in Figure 3.8, we observe that the
Maclaurin series for ex is most accurate when x is close to x = 0 whereas the Taylor
series for ex about x = 1 is most accurate when x is close to x = 1. This is, of course,
exactly what we should expect!
Figure 3.8: The solid curve is the graph of the function ex , the dashed curve is the graph
of its second-order Maclaurin series and the dotted curve is the graph of its second-order
Taylor series about x = 1. Observe how, as we might expect, the Maclaurin series is more
accurate around x = 0 and this Taylor series is more accurate around x = 1.
Learning outcomes
At the end of this chapter and having completed the relevant reading and activities, you
should be able to:
find simple derivatives using the definition of the derivative;
find derivatives using standard derivatives and the rules of differentiation;
use the derivative to find tangent lines and use these to approximate functions;
solve problems from economics-based subjects that involve derivatives;
find Maclaurin and Taylor series and use these to approximate functions.
87
3. Differentiation
Solutions to activities
Solution to activity 3.1
We need to find the derivative of the function f (x) = x2 at the point x = 1, i.e.
f (1). So, using the definition of the derivative with a = 1, we start by looking at
f (1 + h) f (1)
(1 + h)2 (1)2
=
,
h
h
which, looking at the numerator, is easily simplified to give
f (1 + h) f (1)
(1 2h + h2 ) 1
2h + h2
=
=
= 2 + h.
h
h
h
This in turn means that
f (1 + h) f (1)
= lim
h0
h0
h
f (1) = lim
2+h
= 2,
.
dx
dx
dx
dx
dx dx
88
3 cos x
= 3
d
dx
cos x
= 3 sin x
= 3 sin x.
ex + cos x
d
dx
ex
d
dx
= ex + sin x
cos x
= ex sin x.
=3
d
dx
sin x 3
d
dx
ln x
= 3 cos x 3
1
x
3
= 3 cos x .
x
and
g(x) = sin x,
f (x) = 1
and
g (x) = cos x.
and
g(x) = cos x,
and
g (x) = sin x.
and
g(x) = cos x,
and
g (x) = sin x.
89
3. Differentiation
1
sin(2x)
2
= cos(2x),
sin(2x)
= 2 cos(2x).
This result will make sense once we have seen the chain rule and, in particular,
Activity 3.8(a).
Solution to activity 3.6
sin x
For (a), h(x) =
is the quotient of the two functions
x
f (x) = sin x
and
g(x) = x,
f (x) = cos x
and
g (x) = 1.
x cos x sin x
(cos x)(x) (sin x)(1)
=
.
2
x
x2
In this case, the original function and the derivative are only defined if x = 0.
For (b), h(x) =
ex
is the quotient of the two functions
cos x
f (x) = ex
and
g(x) = cos x,
and
g (x) = sin x.
In this case, the original function and the derivative are only defined if cos x = 0, i.e. if
x = (2n + 1) 2 for n Z.
For (c), h(x) =
sin x
is the quotient of the two functions
cos x
f (x) = sin x
and
g(x) = cos x,
and
g (x) = sin x.
90
In this case, the original function and the derivative are only defined if cos x = 0, i.e. if
x = (2n + 1) 2 for n Z.
Indeed, using the Pythagorean identity
sin2 x + cos2 x = 1 and the definitions
tan x =
sin x
cos x
and
sec x =
1
,
cos x
tan x
1
= sec2 x,
2
cos x
and
g(x) = 2x.
As such we have
f (g) = cos g
and
g (x) = 2,
and
g(x) = cos x.
As such we have
1
g
and so the chain rule tells us that
f (g) =
h (x) =
1
g
and
g (x) = sin x,
( sin x) =
sin x
= tan x.
cos x
and
g(x) = ex .
91
3. Differentiation
As such we have
f (g) =
1
g
g (x) = ex ,
and
1
g
(ex ) =
ex
= 1.
ex
ex ln a
ex ln a
ln a
= ax ln a,
as required.
Solution to activity 3.10
Writing the quotient f (x)/g(x) as the product f (x)[g(x)]1 , the product rule gives us
d
f (x)[g(x)]1
dx
dg
df
[g(x)]1 + f (x) [g(x)]2
,
dx
dx
where we have used the chain rule to differentiate [g(x)]1 with respect to x. Rewriting
this, we then have
df
dg
g(x) f (x)
d f (x)
dx ,
= dx
dx g(x)
[g(x)]2
which is the quotient rule, as required.
Solution to activity 3.11
We have y = f (x) so that x = f 1 (y). Thus, differentiating both sides of the latter with
respect to x, we get
dx
df 1 dy
=
,
dx
dy dx
where we have used the chain rule on the right-hand-side as y itself is a function of x
since y = f (x). This gives us
1=
df 1 dy
dy dx
df 1
=1
dy
df
,
dx
as required.17 In particular, observe that this formula makes no sense at points where
f (x) = 0.
17
See Section 2.9 of Binmore and Davies (2002) for a geometric view of this result.
92
ln y
d
dx
=1
ex
1
1
= ,
x
e
y
as (ex ) = ex = y.
Solution to activity 3.13
There is, generally, no need to apply the rules of differentiation in as much detail as we
have been using. So, lets do the three examples in this activity quickly.
2
For (a), we have h(x) = ex ln(sin x) which is the product of two compositions and so
using the product and chain rules we get
h (x) =
x2
2x e
ln(sin x) + e
x2
cos x
sin x
ex
=
2x sin x ln(sin x) + cos x .
sin x
h (x) =
[esin x ]2
sin x cos(cos x) + cos x sin(cos x)
.
=
esin x
For (c), we have h(x) = sin2 (3x) + cos2 (3x) which is the sum of two compositions and so
we can easily use the chain rule to see that
h (x) = 2 sin(3x) cos(3x)(3) + 2 cos(3x)[ sin(3x)](3) = 0.
Of course, this is obvious as sin2 (3x) + cos2 (3x) = 1 using (2.2) and so its derivative
with respect to x is zero.
Solution to activity 3.14
We have seen that the first four derivatives are given by
f (x),
f (x) = f (x),
f (x) = f (x),
which returns us to our original function. Indeed, we can then see that the next four
derivatives will be given by
f (5) (x) = f (x),
93
3. Differentiation
which, again, returns us to our original function. This means that, spotting the pattern,
we can see that
f (x)
n = 4, 8, . . .
f (x)
n = 1, 5, 9, . . .
f (n) (x) =
f (x) n = 2, 6, 10, . . .
f (x) n = 3, 7, 11, . . .
for n 1.
(2)(1) 2
x = 1 + 2x + x2 ,
2!
as all terms involving xn with n 3 will have a coefficient of zero. Similarly, the
Maclaurin series for (1 + x)3 is given by
(1 + x)3 = 1 + 3x +
94
(3)(2) 2 (3)(2)(1) 3
x +
x = 1 + 3x + 3x2 + x3 ,
2!
3!
as all terms involving xn with n 4 will have a coefficient of zero. Of course, this is
exactly what we would get if we just multiplied out the brackets in the usual way!
Solution to activity 3.19
To use (3.3), we see that f (x) = cos(ln(1 + x)) gives
sin(ln(1 + x))
,
1+x
f (x) =
f (x) =
f (4) (x) = 10
cos(ln(1 + x))
,
(1 + x)4
f (0) = 0,
and this gives us the fourth-order Maclaurin series for cos(ln(1 + x)) in agreement with
what we saw before in Example 3.29. Notice, however, that this method involved some
fairly complicated differentiation whereas the method in Example 3.29 only involved
some simple algebra!
Solution to activity 3.20
For values of y around y = 0 we have the Maclaurin series
ey = 1 + y +
yn
y2 y3
+
+ +
+ ,
2!
3!
n!
(x 1)2 (x 1)3
(x 1)n
+
+ +
+ ,
2!
3!
n!
95
3. Differentiation
which gives us the Taylor series for ex1 for values of x around x = 1. So, as
ex = e1 ex1 , this means that
ex = e +(x 1) e +
(x 1)3
(x 1)n
(x 1)2
e+
e+ +
e+ ,
2!
3!
n!
is the Taylor series for ex for values of x around x = 1 in agreement with what we found
in Example 3.31.
Solution to activity 3.21
To find the Taylor series for ex around x = 2, we can either use (3.2) or the method we
saw in Activity 3.20.
Method I: Using (3.2), we have f (x) = ex so that f (2) = e2 . We also note, as in
Example 3.24, that f (n) (x) = ex for n 1. Then, to use these derivatives in (3.2), we
need to evaluate them at x = 2, i.e. we find that f (n) (2) = e2 for n 1. Consequently,
putting these into (3.2), we get
ex = e2 +(x 2) e2 +
(x 2)2 2 (x 2)3 2
(x 2)n 2
e +
e + +
e + ,
2!
3!
n!
yn
y2 y3
+
+ +
+ ,
2!
3!
n!
(x 2)2 (x 2)3
(x 2)n
+
+ +
+ ,
2!
3!
n!
which gives us the Taylor series for ex2 for values of x around x = 2. So, as
ex = e2 ex2 , this means that
ex = e2 +(x 2) e2 +
(x 2)n 2
(x 2)2 2 (x 2)3 2
e +
e + +
e + ,
2!
3!
n!
is the Taylor series for ex for values of x around x = 2 in agreement with what we have
just found using the other method.
Exercises
Exercise 3.1
Find the derivatives of the following functions.
(a) esin x cos x,
96
(b)
tan x
,
ex2
(c) sin(x ex ).
Exercise 3.2
Use the compound-angle formulae to show that
cos x = sin x +
and
sin x = cos x +
.
2
Hence use the chain rule to derive the derivative of cos x from the derivative of sin x.
Exercise 3.3
Verify that the point (e, e) is on the curve with equation
y = x ln x,
and find the equation of the tangent line to the curve at this point.
Consider, for some constants a and b, the curve with equation
y = ax2 + b.
For what values of a and b does this curve pass through the point (e, e) with the same
tangent line as the one you found above?
Exercise 3.4
Suppose the demand function for a good is
q D (p) =
1
1 + p4
Find the elasticity of demand in terms of p and verify that it is positive if p > 0.
Exercise 3.5
Find the fourth-order Maclaurin series for ln
1 + sin x
.
1+x
Solutions to exercises
Solution to exercise 3.1
We apply the rules of differentiation quickly as we did in Activity 3.13.
(a) The function h(x) = esin x cos x is a product that has the composition esin x as one
of its terms. As such, applying the product rule we get
h (x) =
sin x
97
3. Differentiation
2
(b) The function h(x) = (tan x)/ ex is a quotient whose denominator is the
2
composition ex . As such, applying the quotient rule we get
2
h (x) =
[e ]2
sec2 x 2x tan x
,
ex2
where we have used the fact, from Activity 3.6(c), that the derivative of tan x is
sec2 x and the chain rule to differentiate the composition.
Also note that this derivative can be found by writing the function as
2
h(x) = (tan x) ex and, if we do this, we would use the product rule instead of the
quotient rule.
(c) The function h(x) = sin(x ex ) is the composition sin x after x ex where the latter
function is a product. As such, applying the chain rule we get
h (x) = cos(x ex ) (1) ex +x(ex )
= (1 + x) ex cos(x ex ),
= sin x cos
and
Now, using chain rule and the derivative of sin x, we see that
d
sin x +
dx
2
= cos x +
(1) = cos x +
,
2
2
98
1
x
= ln(x) + 1.
ye
xe
y e = 2(x e)
y = 2x e,
as the equation of the tangent line to the curve y = x ln x at the point (e, e).
The curve y = ax2 + b will have a tangent line at (e, e) which is the same as the one we
have just found if, firstly, the curve goes through the point (e, e), i.e. a and b must satisfy
e = a e2 +b,
and, secondly, it has the same gradient at e, i.e. if the derivative of g(x) = ax2 + b at
x = e is two. That is, as
g (x) = 2ax
we need
g (e) = 2a e,
1
a= ,
e
b = e e = 0.
1
e
e2 +b
Consequently, we see that when a = 1/ e and b = 0 the curve y = ax2 + b passes through
the point (e, e) with the same tangent line as the one we found above.
Solution to exercise 3.4
We have the demand function
q D (p) =
1
1 + p4
= (1 + p4 ) 2 ,
and so, setting q = q D (p), we can use the chain rule to get the derivative
3
1
2p3
q (p) = (1 + p4 ) 2 (4p3 ) =
3 .
2
(1 + p4 ) 2
Then, using the definition of the elasticity of demand from Section 3.3.3, we have
p
p
(p) = q (p) =
1
q
(1 + p4 ) 2
2p3
(1 + p4 ) 2
2p4
,
1 + p4
in terms of p. Indeed, when p > 0, we have p4 > 0 and 1 + p4 > 0, which means that
(p) > 0 too.
99
3. Differentiation
We start by noticing that it really is much easier to make use of the standard Maclaurin
series rather than trying to use (3.3) directly on the given function. Especially as, in
order to apply (3.3), we would need to find the first four derivatives of the function to
answer this question and this would get very messy very quickly! Indeed, if we decide to
use the standard Maclaurin series, two methods present themselves.
Method I: We start by simplifying the function by using the laws of logarithms from
Section 2.1.4. This gives us
ln
1 + sin x
1+x
and so, we can easily use the Maclaurin series for ln(1 + x) from Section 3.4.1, i.e.
x2 x3 x4
+
+ ,
2
3
4
to get the second term in this difference. Then, using the Maclaurin series for sin x, also
from Section 3.4.1, we have
x3
sin x = x
+ ,
3!
which means that the first term in this difference is
ln(1 + x) = x
ln(1 + sin x) = ln 1 + x
=
x3
+
3!
x3
x
+
3!
x3
x
+
3!
1
+
x
3
x
4
+ ,
where we have used the Maclaurin series for ln(1 + x) again in the second line. Now, as
we want to keep terms up to x4 , we can see that the brackets in the second term give us
x
x3
+
3!
x3
+
3!
= x2 2 x
x3
x4
+ = x2
+ ,
3!
3
where, here, were trying to make it clear that each term that arises from this product is
obtained by multiplying out the relevant brackets. Further, we see that the brackets in
the last two terms will give us x3 and x4 respectively. Overall, then, we have
1
x4
1 3
1 4
x3
+
x2
+ +
x
x
3!
2
3
3
4
x2 x3 x4
=x
+
+ ,
2
6
12
for the first part of our difference. Putting these together in our expression for the
function, we then have
ln(1 + sin x) =
ln
1 + sin x
1+x
+
2
3
4
x3 x4
= +
,
6
6
=
100
x2 x3 x4
+
+
2
6
12
x3
+ ,
3!
3
1
= (1 + x)1 = 1 x + x2 x3 + x4 + ,
1+x
which, with r = 1, follows from a simple application of the Maclaurin series for
(1 + x)r that we saw in Example 3.25. This means that we have
1 + sin x
=
1+x
1+x
x3
+
3!
1 x + x2 x3 + x4 +
= 1 1 x + x2 x3 + x4 +
+ x 1 x + x2 x3 +
x3
1 x +
3!
x3 x4
=1
+
+ ,
3!
3!
if we want to keep terms up to x4 . Then, using the Maclaurin series for ln(1 + x) which
we saw above, we get
ln
1 + sin x
1+x
= ln 1 +
x3 x 4
+
+
3!
3!
x3 x4
+
+ ,
3!
3!
and this gives us the same fourth-order Maclaurin series as the one we found using the
other method.
101
3. Differentiation
102
Chapter 4
One-variable optimisation
Essential reading
4.1
Having seen how to find derivatives in the previous chapter, we now consider what they
tell us about a function. In particular, we will see that the first-order derivatives of a
function tell us where the function is increasing, stationary or decreasing; and its
second-order derivatives tell us where the function is convex or concave. Indeed, once we
have access to this information about a function we will be able to do two things.
Firstly, we will be able to sketch the curve that represents the graph of a function; and
secondly, we will be able to see where a function is optimised, i.e. we will be able to find
the points where the function takes its largest and smallest values.
103
4. One-variable optimisation
4.2
4.2.1
y = f (x)
y = f (x)
00000001010
1111111
1111
0000
1010 1010
1010 1010
f (b)
f (a)
(a) f is increasing
1010
0000
1111
1010 10
1111111
0000000
1010 1010
10 10
f (a)
f (b)
(b) f is decreasing
Figure 4.1: As x increases, (a) f is increasing as its values get larger and (b) f is decreasing
as its values get smaller. This can also be seen by taking two values of x, say a and b,
such that a < b. In (a), the function is increasing because we have f (a) < f (b) and in (b)
the function is decreasing because we have f (a) > f (b).
However, of more interest here is the fact that we can use derivatives to determine
whether a function is increasing or decreasing over some interval, I. To see how this
works, consider that the first-order Taylor approximation to f (x) around x = a is given
by
f (x) = f (a) + (x a)f (a),
and to make this a good approximation, we want x a to be small. So, if we now
consider another value of x, say x = b, where b > a and b a is small, we see that this
104
approximation gives us
f (b) = f (a) + (b a)f (a).
Now, b a > 0, so we just need to know the sign of f (a) to determine whether f (b) is
greater or less than f (a), i.e. whether f is increasing or decreasing as we move from a to
b. Indeed, we see that
if f (a) > 0, then f is increasing at a because f (b) > f (a), and
if f (a) < 0, then f is decreasing at a because f (b) < f (a).
Indeed, by letting a be any value of x, we can generalise this to obtain the following
useful result. Let I be an interval,
if f (x) > 0 for x I, then f is increasing on I, and
if f (x) < 0 for x I, then f is decreasing on I.
Example 4.1 Determine the intervals on which the function f (x) = x3 2x2 15x
is (a) increasing and (b) decreasing.
Differentiating the function with respect to x, we find that
f (x) = 3x2 4x 15.
This factorises to give us
f (x) = (3x + 5)(x 3),
and so, by looking at what is happening away from the points x = 5/3 and x = 3
where f (x) = 0, we see that the sign of this derivative can be found by considering
the signs of its two factors, i.e.
3x + 5
x3
f (x)
x < 35
53 < x < 3
+
3<x
+
+
+
This means that the function is (a) increasing on the intervals x < 5/3 and x > 3
where f (x) > 0 and (b) decreasing on the interval 5/3 < x < 3 where f (x) < 0 as
illustrated in Figure 4.2(a).
A useful consequence of this is that it tells us something about the tangent lines to the
function f (x) at points where it is increasing or decreasing. Recall, from Section 3.3.2,
that the tangent line to f (x) at the point x = a has an equation given by
y = f (a) + (x a)f (a),
and, in particular, the gradient of the tangent line is given by f (a). This means that, if
f (x) is increasing (or decreasing) at x = a, then f (a) will be positive (or negative) and
this, in turn, means that the tangent line at this point will also be an increasing (or
decreasing) function of x. This will be useful in a moment, but for now, we can see how
this works by looking at Figure 4.3.
105
4. One-variable optimisation
y = f (x)
53
y = f (x)
2
3
(a)
(b)
Figure 4.2: The graph of f (x) = x3 2x2 15x indicating the points relevant to (a)
Examples 4.1, 4.2 and 4.4; (b) Examples 4.5 and 4.6.
y
y = f (x)
y = f (x)
1010
0000
1111
1010
1010
10
f (a)
T
0000
1111
1010
1010
f (a)
Figure 4.3: (a) When f (x) is increasing at x = a, its tangent line at the point (a, f (a))
will also be increasing as the gradient of the curve (and hence the gradient of the tangent
line) at this point is positive. (b) When f (x) is decreasing, its tangent line at the point
(a, f (a)) will also be decreasing as the gradient of the curve (and hence the gradient of
the tangent line) at this point is negative.
At this point, we know what a positive or negative derivative tells us about a function
but you may be wondering what happens when the derivative is neither positive nor
negative. That is, what happens when the derivative is zero? This is very important and
we now turn our attention to that.
4.2.2
Stationary points
When we find a point, say x = a, that makes f (x) = 0, the tangent line at that point is
horizontal and its Cartesian equation is given by
y = f (a).
This means that we will have a function which may look like the one illustrated in
Figure 4.4. We call such points, i.e. points where f (x) = 0, stationary points.
106
y = f (x)
f (a)
1
0
0
1
0
1
Figure 4.4: The point x = a is a stationary point of the function f (x) as f (a) = 0. Observe
that this means that the tangent line to f (x) at the point (a, f (a)) is a horizontal line.
There are, essentially, four different kinds of stationary point that we will encounter and
these depend on how the function is changing as we move through the stationary point
in the direction of increasing x. In particular, as x is increasing through a stationary
point at x = a, we have a
local minimum if f changes from being increasing to being decreasing at the
stationary point, and a
local maximum if f changes from being decreasing to being increasing at the
stationary point.
Of course, f could also be increasing (or decreasing) on both sides of the stationary
point and in these cases we have a point of inflection. These four possibilities are
illustrated in Figure 4.5 and, in particular, we see that the stationary point we saw
earlier in Figure 4.4 is a local minimum.
This provides us with a way of classifying any stationary points we find by looking at
the sign of the first-order derivative of the function as we move through a stationary
point. This is called the first-order derivative test and it runs as follows. As we move
through the stationary point in the direction of increasing x, if we find that:
f (x) changes from positive to negative, i.e. the function goes from being increasing
to being decreasing as we pass through the stationary point, then the stationary
point is a local maximum.
f (x) changes from negative to positive, i.e. the function goes from being decreasing
to being increasing as we pass through the stationary point, then the stationary
point is a local minimum.
And, if the sign of f (x) does not change, i.e. if the function is increasing (or decreasing)
on both sides of the stationary point, then the stationary point is a point of inflection.
107
4. One-variable optimisation
y
y
y = f (x)
T
T
y = f (x)
y = f (x)
y = f (x)
T
O
Example 4.2 Find the stationary points of the function given in Example 4.1 and
classify them by using the first-order derivative test.
We saw in Example 4.1 that the derivative of the function can be written as
f (x) = (3x + 5)(x 3),
and so the stationary points of this function, i.e. the points that make f (x) = 0,
occur when x = 5/3 and x = 3 as you can see in Figure 4.2(a).
We can also use what we saw in Example 4.1 to see that, according to the
first-derivative test, the stationary point that occurs when:
x = 5/3 is a local maximum as f changes from being increasing to being
decreasing (i.e. f changes from positive to negative) at the stationary point.
x = 3 is a local minimum as f changes from being decreasing to being
increasing (i.e. f changes from negative to positive) at the stationary point.
This, of course, can be clearly seen in Figure 4.2(a).
108
4.2.3
= q 1 (p) ,
using the definition of (p). So, as q > 0 for this to be economically meaningful, we have:
If (p) > 1, we see that R (p) < 0 and so a small increase in price leads to a
decrease in revenue. In such cases we say that demand for the product is elastic.
If (p) < 1, we see that R (p) > 0 and so a small increase in price leads to an
increase in revenue. In such cases we say that demand for the product is inelastic.
Thus, even though an increase in price will usually lead to a decrease in the quantity
that the consumers will demand, the value of the elasticity (i.e. whether it is greater
than or less than one) determines how such changes will affect the revenue (i.e. whether
it will decrease or increase).
Example 4.3 Suppose that the demand function for a good is given by
q D (p) = 20 2p. Determine the values of p that make the demand (a) elastic and (b)
inelastic.
In this case, we have q = q D (p) = 20 2p and so the elasticity of demand is given by
p
p
p
(p) = q (p) =
(2) =
,
q
20 2p
10 p
as long as p = 10. And, of course, we need values of p where 0 p 10 in order for
the demand function to be economically meaningful.
So, for (a), where we want the values of p that make demand elastic, we see that
(p) > 1
p
>1
10 p
p > 10 p,
109
4. One-variable optimisation
as 10 p > 0 since 0 p 10. This means that demand is elastic if p > 5 and, in
particular, if we have 5 < p 10 a small increase in price will lead to a decrease in
revenue.
For (b), similar reasoning shows us that demand is inelastic if p < 5 and, in
particular, if we have 0 p < 5 a small increase in price will lead to an increase in
revenue.
4.3
The second-order derivative of a function can allow us to infer useful information about
the shape of a function. For instance, they can allow us to infer whether a stationary
point is a local maximum or a local minimum and, more generally, whether the function
is convex or concave. Indeed, once we understand convexity and concavity, we will be in
a position to extend our understanding of what we mean by a point of inflection.
4.3.1
The key to understanding the link between the shape of a function and its
second-derivative is the second-order Taylor approximation to f (x) around x = a, i.e.
f (x) = f (a) + (x a)f (a) +
(x a)2
f (a),
2
and we know that this is a good approximation as long as x a is small. Now, to start
with, lets suppose that f (x) has a stationary point at x = a, i.e. f (a) = 0, so that our
second-order Taylor approximation becomes
f (x) = f (a) +
(x a)2
f (a)
2
f (x) f (a) =
(x a)2
f (a).
2
Here, for all x near the stationary point, the sign of f (x) f (a) on the left-hand-side,
i.e. the relative magnitude of f (x) and f (a), is determined by the sign of f (a) on the
right-hand-side. That is, the sign of f (x) f (a) for x near the stationary point is
determined by the value of the second-order derivative at the stationary point. Indeed,
we see that:
If f (a) > 0, then f (x) > f (a) for all x near to a and so the function always lies
above the horizontal tangent line at x = a. This means that the stationary point is
a local minimum as in Figure 4.5(c).
If f (a) < 0, then f (x) < f (a) for all x near to a and so the function always lies
below the horizontal tangent line at x = a. This means that the stationary point is
a local maximum as in Figure 4.5(b).
Thus, the sign of the second-order derivative at a stationary point allows us to infer
whether the stationary point is a local maximum or a local minimum. When we classify
stationary points in this way, we call it the second-order derivative test. However,
observe that if f (a) = 0, then the second-order Taylor approximation tells us nothing
useful about the shape of the function as it reduces to f (x) = f (a).
110
Example 4.4 Use the second-order derivative test to classify the stationary points
of the function in Example 4.1.
We saw in Example 4.1 that the first-order derivative of f is
f (x) = 3x2 4x 15,
and, in Example 4.2, we saw that its stationary points occur when x = 5/3 and
x = 3. To use the second-order derivative test, we note that
f (x) = 6x 4,
4.3.2
More generally, the sign of the second-order derivative of a function tells us whether a
function is convex or concave. Indeed, we find that:
If f (x) > 0 on some interval, we say that f is convex on that interval.
If f (x) < 0 on some interval, we say that f is concave on that interval.
To get an idea of what this means, consider that a convex function on an interval, I,
has f (x) > 0 for all x I. So, if we take any particular point, say a I, the tangent
line to f at x = a has an equation given by
y = f (a) + (x a)f (a),
and so, our second-order Taylor approximation can be written as
f (x) = y +
(x a)2
f (a).
2
Now, as f (a) > 0 (recall that a I too), we see that f (x) > y for all x I where
x = a, i.e. these values of f always lie above the values from the tangent line to f at
x = a, as illustrated in Figure 4.6(a). But, of course, we can use any a I when we run
this argument and so a convex function is one which lies above all of its tangent lines,
as illustrated in Figure 4.6(b). In particular, a function must be convex in the
neighbourhood of a local minimum.
A similar argument can be given to show that a concave function always lies below all
of its tangent lines so that, in particular, a function must be concave in the
neighbourhood of a local maximum.
111
4. One-variable optimisation
y
y = f (x)
y = f (x)
f (x)
y
T
O
Figure 4.6: The relationship between a convex function and its tangent lines. (a) When
changing the value of x, we can see that the values of f (x) are greater than the
corresponding values of y from the tangent line to f at a, i.e. f lies above this tangent
line. (b) By changing the value of a, we can see that f lies above all of its tangent lines.
Activity 4.1 Using an argument similar to the one above, explain why a concave
function always lies below all of its tangent lines.
This gives us another, more visual, way of deciding whether a function is convex or
concave, namely:
A function is convex on some interval if it lies above all of its tangent lines in that
interval.
A function is concave on some interval if it lies below all of its tangent lines in that
interval.
And, we can see how this all works by continuing with our example.
Example 4.5 Determine the intervals on which the function in Example 4.1 is (a)
convex and (b) concave.
In Example 4.3 we saw that the second-order derivative of the function from
Example 4.1 is given by
f (x) = 6x 4,
so we find that
f (x) > 0 when 6x 4 > 0 which means that x > 2/3, and
f (x) < 0 when 6x 4 < 0 which means that x < 2/3.
This means that the function is convex on the interval x > 2/3 where f (x) > 0 and
concave on the interval x < 2/3 where f (x) < 0 as illustrated in Figure 4.2(b).
Indeed, when looking at this figure, observe that when x > 2/3 the function lies
above all of its tangent lines in that interval and that when x < 2/3 the function lies
below all of its tangent lines in that interval.
112
4.3.3
Points of inflection
Not all points of inflection are stationary points like the ones we saw in Section 4.2.2.
More generally, a point of inflection is a point where a function changes from being
convex to concave (or vice versa) in a certain well-defined way. Technically, we say that:
If f (a) = 0 and f (x) changes sign at x = a, then f has a point of inflection at a.
As such, we can see that the points indicated in Figure 4.7 as well as the ones we saw
earlier in Figure 4.5(a) and (d) are points of inflection although, of course, only the ones
in Figure 4.5(a) and (d) are stationary points as well.
y
T
y = f (x)
y = f (x)
O
a
(a)
a
(b)
Figure 4.7: A point of inflection where f changes from (a) convex to concave at a and (b)
Example 4.6
We saw in Example 4.4 that the second-order derivative changes sign when x = 2/3
and, furthermore, we can see that f (2/3) = 0. This means that the function in
Example 4.1 has a point of inflection when x = 2/3.
Indeed, looking at Figure 4.2(b), we can see that when x = 2/3, the function changes
from being concave to convex as we should expect from a point of inflection.
However, this point of inflection is not a stationary point because f (x) = 0 when
x = 2/3.
It is, perhaps, worth stressing that the condition f (a) = 0 on its own is not enough to
guarantee that we have a point of inflection. For instance, the two functions illustrated
in Figure 4.8 both have f (0) = 0, but in neither case does the second derivative change
sign and so we do not have a point of inflection.
Activity 4.2 Show that f (0) = 0 for both of the functions illustrated in
Figure 4.8. How can we infer that they have those shapes by looking at (a) the
first-order derivative and (b) the second-order derivative of the function?
113
4. One-variable optimisation
(a) f (x) = x4 1
(b) f (x) = 1 x4
Figure 4.8: Both of these functions have f (0) = 0 but neither of them have a point
of inflection. (a) This is convex on both sides of x = 0 and the function has a local
minimum at that point. (b) This is concave on both sides of x = 0 and the function has
a local maximum at that point. (The dashed curves in these figures represent the curves
y = x2 1 in (a) and y = 1 x2 in (b) for comparison).
It is also worth noting that the condition that f (x) changes sign at x = a on its own is
not enough to guarantee that we have a point of inflection either. Of course, if f (x) is
changing sign at x = a and f (a) exists, we must have f (a) = 0. But, although we do
not dwell on it here, sometimes we may encounter functions where f (a) does not exist
even though f (x) changes sign at x = a. We will briefly consider what happens in
these cases when we look at cusps and asymptotes in Section 4.4.3.
4.4
Curve sketching
One useful application of this material on derivatives and what they tell us about the
shape of a function is curve sketching. The aim here is to illustrate the behaviour of
the curve described by the equation y = f (x) by picking out its main features and
where these features occur by means of a sketch. For most functions we will deal with,
these features include any points where the curve may cross the axes and the location
and nature of any stationary points. But, it may also be necessary to assess how the
curve behaves as x and, in particular, assessing whether the function has any
asymptotes. A general method for sketching the curve y = f (x) would therefore involve
us thinking about the following:
x-intercepts: The x-axis is given by the equation y = 0 and so the curve y = f (x)
crosses the x-axis at any point (x, 0) for which f (x) = 0. Solving this equation will
therefore give us the x-intercepts of the curve if there are any.
y-intercept: The y-axis is given by the equation x = 0 and so the curve crosses the
y-axis at the point (0, y) for which y = f (0). As f is a function, there can be only
one such point and this is the y-intercept.
Finding stationary points: We can find the stationary points, as we saw above, by
solving the equation f (x) = 0.
114
Classifying stationary points: We can also determine whether each of the stationary
points is a local maximum, local minimum or point of inflection by using the
methods outlined above.
Limiting behaviour in the x-direction: We can determine how f (x) is behaving as
x and as x .
Of course, in certain cases, it may also be advantageous to think carefully about the
intervals in which the function is increasing (or decreasing) or whether the function is
convex (or concave). But, generally, the method above should suffice when we sketch
most functions.
In particular, observe that a sketch is very different from a plot. A plot involves plotting
certain points and joining them up with little regard to any interesting behaviour the
curve may be exhibiting elsewhere. A sketch, on the other hand, isolates any interesting
behaviour the curve may be exhibiting (such as the ones listed above) and concentrates
on these. Please be aware that there is a difference and in this course, we will always
want to see sketches and not plots!
To see how we can implement the method above, we will start by sketching the
relatively simple curves that arise when f is a polynomial. We will then consider how
we would proceed when the functions are differentiable, but involve other elementary
functions. Then, just so that we are aware of some possible complications, we look at
what happens when our function fails to be differentiable at some points.
4.4.1
Given what we have seen so far, the only real obstacle to sketching a polynomial is an
understanding of the limiting behaviour of this kind of function. The key result here is
that, if f (x) is a polynomial, its behaviour as x gets arbitrarily large in magnitude (that
is to say, as x or x ) is determined solely by its leading term, i.e. the one
with the highest power of x. Then, with this in mind, we can look at the term with the
highest power of x, lets say that this is xn , and note that:
if n is even, then xn as x and as x ; whereas
if n is odd, then xn as x and xn as x .
Using these facts and noting how the sign of the coefficient of the term with the highest
power of x can influence the sign of the limit, we can determine the limiting behaviour
of any polynomial.
Activity 4.3 Suppose that f (x) is a polynomial and that, for some constants a = 0
and n N, the term in this polynomial with the highest power of x is axn .
Determine the behaviour of f (x) as x and as x in the cases which arise
according to whether a is positive and negative and whether n is even or odd.
We can now see how to sketch some polynomials and we start by seeing how to sketch
the function that we have been considering throughout this chapter.
115
4. One-variable optimisation
Example 4.7 Sketch the curve y = f (x) where f (x) is the function in Example 4.1.
From the earlier examples in this section, we know quite a lot about this function
and, in particular, we have found and classified its stationary points. But, to sketch
this curve, we need to find a bit more information, namely its
x-intercepts: These occur when y = 0 and so we solve the equation given by
f (x) = 0, i.e.
x3 2x2 15x = 0,
which, on taking out the common factor of x and factorising the remaining
quadratic, gives us
x(x2 2x 15) = 0
x(x 5)(x + 3) = 0.
limiting behaviour: The term with the highest power of x in f (x) is x3 and so
f (x) as x and f (x) as x .
So, using this information, we begin to sketch this curve by roughly indicating these
key features on some axes as in Figure 4.9(a) and then, joining them up with a nice
smooth curve, we get the sketch itself as in Figure 4.9(b).
In particular, it is worth noting that in this sketch:
all of the key features are labelled;
the curve has the right kind of limiting behaviour, i.e. f (x) as x and
f (x) as x ; and
points of inflection which are not stationary points (recall that, in Example 4.5,
we saw that this curve has one when x = 2/3) are not usually indicated.
Of course, what we see here is similar to what we saw in Figure 4.2, but a sketch
must include information about all of the relevant key features.
116
400
27
400
27
53
3
5 x
53
5 x
36
36
Figure 4.9: Sketching the curve y = x3 2x2 15x in Example 4.7. (a) Using what we
have discovered about the key features of the curve, we can begin to see what it must
look like. (b) By joining up these key features with a nice smooth curve, we get the sketch
itself.
Indeed, it can be seen that, unlike plotting a function, sketching it is a bit of an art and
it can only be done well by learning to appreciate what your calculations are telling you
about its appearance. With this in mind, lets sketch a function that we havent
encountered before.
Example 4.8
We find the key features of this curve according to the list given above, namely
x-intercepts: These occur when y = 0 and so we solve the equation given by
f (x) = 0, i.e.
2x4 4x3 + 2x2 = 0,
which, on taking out the common factor of 2x2 and factorising the remaining
quadratic, gives us
2x2 (x2 2x + 1) = 0
2x2 (x 1)2 = 0.
117
4. One-variable optimisation
4x(2x 1)(x 1) = 0,
x = 1 gives y = f (1) = 0.
So, the stationary points have coordinates given by (0, 0), (1/2, 1/8) and (1, 0).
classifying the stationary points: Lets use the second-order derivative test here.
We can see that
f (x) = 24x2 24x + 4,
and so, looking at the stationary points, we have
f (0) = 4 > 0 and so (0, 0) is a local minimum;
limiting behaviour: The term with the highest power of x in f (x) is 2x4 and so
f (x) as x and as x .
So, using this information, we begin to sketch this curve by roughly indicating these
key features on some axes as in Figure 4.10(a) and then, joining them up with a nice
smooth curve, we get the sketch itself as in Figure 4.10(b).
1
8
1
8
1
2
y = f (x)
1
2
Figure 4.10: Sketching the curve y = 2x4 4x3 + 2x2 in Example 4.8. (a) Using what we
have discovered about the key features of the curve, we can begin to see what it must
look like. (b) By joining up these key features with a nice smooth curve, we get the sketch
itself.
Activity 4.4 Find the points of inflection of the function in Example 4.8.
118
4.4.2
When sketching curves defined using other elementary functions the only real obstacle
is, again, an understanding of the limiting behaviour of such functions. For instance, as
we saw in Section 2.1.1, exponential functions like ex and ex have very simple limiting
behaviours, i.e.
ex as x and ex 0 as x ; whereas
ex 0 as x and ex as x .
But, when functions such as these are multiplied by polynomials (say), it is not clear
how this will affect their limiting behaviour. For now, we just state the following fact1
When an exponential is multiplied by a polynomial, the exponential dominates.
Thus, for example, the function x3 ex 0 as x because the exponential ex 0
as x and this dominates the behaviour of the polynomial, x3 , even though
x3 as x . Lets sketch this curve to see why this is reasonable.
Example 4.9
We find the key features of this curve according to the list given above, namely
x-intercepts: These occur when y = 0 and so we solve the equation given by
f (x) = 0, i.e.
x3 ex = 0.
But, as ex = 0 for all x R, we find that the only x-intercept occurs when
x = 0.
y-intercept: This occurs when x = 0 and so using y = f (0) we see that the
y-intercept occurs when y = 0. Note, in particular, that this means that the
curve goes through the origin (as we should have expected since the x-intercept
we found occurs when x = 0).
finding the stationary points: These occur when f (x) = 0 and so, using the
product rule, we get
f (x) = (3x2 )(ex ) + (x3 )( ex ) = x2 (3 x) ex ,
and so we solve the equation
x2 (3 x) ex = 0.
But, as ex = 0 for all x R, we find that the stationary points occur when
x = 0 and x = 3. Then, we use y = f (x) to find the values of y at these points
so that we can locate them on the sketch. Doing this, we find that
1
In 176 Further Calculus we will encounter techniques for finding limits which are much more
sophisticated than the ones that we have seen so far. Once we have these, we will be able to see exactly
why this fact is true and be in a better position to assess the limiting behaviour of curves which are
defined using other elementary functions.
119
4. One-variable optimisation
So, the stationary points have coordinates given by (0, 0) and (3, 27 e3 ).
classifying the stationary points: Lets use the second-order derivative test here.
We can use the product rule again to see that
f (x) = (6x 3x2 )(ex ) + (3x2 x3 )( ex ) = (6x 6x2 + x3 ) ex ,
and so, looking at the stationary points, we have
f (0) = (0) e0 = 0 and so the second derivative test fails! However, we can
see that as
f (x) = x2 (3 x) ex ,
is positive when x < 0 and positive when 0 < x < 3, we can see that this
function is increasing on both sides of the stationary point at x = 0. Thus,
the first-derivative test tells us that (0, 0) is a point of inflection.
f (3) = (9) e3 < 0 and so (3, 27 e3 ) is a local maximum.
limiting behaviour: Using the fact above we would expect the ex to dominate
and this would mean that f (x) 0 as x whereas, as x , we would
expect f (x) as x3 and ex .
Then, using this information, we begin to sketch this curve by roughly indicating
these key features on some axes as in Figure 4.11(a) and then, joining them up with
a nice smooth curve, we get the sketch itself as in Figure 4.11(b).
27e3
27e3
y = f (x)
O
Figure 4.11: Sketching the curve y = x3 ex in Example 4.9. (a) Using what we have
discovered about the key features of the curve, we can begin to see what it must look
like. (b) By joining up these key features with a nice smooth curve, we get the sketch
itself.
Activity 4.5 Does the function in Example 4.9 have any other points of inflection?
If so, find them.
120
Activity 4.6 Sketch the curve y = f (x) where f (x) = x2 ex and find all of its points
of inflection.
4.4.3
The method above for sketching y = f (x) assumes, as we generally have throughout
this chapter, that the function, f (x), and its derivatives are well-defined for all x R.
But, more generally, there may be points at which the function or some of its
derivatives are not defined. When this happens we start to encounter asymptotes and
cusps. We will not dwell on this a great deal here, but we can use the following
examples to see how this may affect our sketches.
Example 4.10
1
,
x1
1
(x 1)2
and
f (x) =
2
,
(x 1)3
and so these derivatives arent defined at x = 1 either.2 Using these, we can see that
when
x < 1 we have f (x) < 0, f (x) < 0 and f (x) < 0, meaning that for these values
of x the function is negative, decreasing and concave; whereas when
x > 1 we have f (x) > 0, f (x) < 0 and f (x) > 0, meaning that for these values
of x the function is positive, decreasing and convex.
We can also see that the y-intercept of this curve occurs when y = 1 and that
f (x) 0 as x which means that this function has a horizontal asymptote
given by y = 0. However, the main feature that concerns us here is the vertical
asymptote at x = 1 which comes about because
lim f (x) =
x1
and
lim f (x) = ,
x1+
as we should expect to see from our discussion of hyperbolae in Section 2.2.4. The
sketch of this curve is illustrated in Figure 4.12(a).
In particular, observe that in Example 4.10, we have a case like the one mentioned at
the end of Section 4.3.3. That is, the function changes from being concave to convex at
a point, but there is no point of inflection. This happens because the second derivative
of this function does not exist at the point.
2
That is, the function and its derivatives are undefined when x = 1 as that would require us to divide
by zero and that is never allowed.
121
4. One-variable optimisation
Example 4.11
1
,
(x 1)2
2
(x 1)3
and
f (x) =
6
,
(x 1)4
and so these derivatives arent defined at x = 1 either.3 Using these, we can see that
when
x < 1 we have f (x) > 0, f (x) > 0 and f (x) > 0, meaning that for these values
of x the function is positive, increasing and convex; whereas when
x > 1 we have f (x) > 0, f (x) < 0 and f (x) > 0, meaning that for these values
of x the function is positive, decreasing and convex.
We can also see that the y-intercept of this curve occurs when y = 1 and that
f (x) 0 as x which means that this function has a horizontal asymptote
given by y = 0. However, the main feature that concerns us here is the vertical
asymptote at x = 1 which comes about because
lim f (x) =
x1
and
lim f (x) = ,
x1+
Example 4.12
2
3(x 1)1/3
and
f (x) =
2
,
9(x 1)4/3
and so these derivatives arent defined at x = 1.4 Using these, we can see that when
x < 1 we have f (x) > 0, f (x) < 0 and f (x) < 0, meaning that for these values
of x the function is positive, decreasing and concave; whereas when
3
Again, the function and its derivatives are undefined when x = 1 as that would require us to divide
by zero and that is never allowed.
122
4.5. Optimisation
x > 1 we have f (x) > 0, f (x) > 0 and f (x) < 0, meaning that for these values
of x the function is positive, increasing and concave.
We can also see that the y-intercept of this curve occurs when y = 1. The sketch of
this curve is illustrated in Figure 4.12(c) and we say that this curve has a cusp at
x = 1.
y
y
1
y=
x1
O
1
y=
x
1
(x 1)2
y = (x 1)2/3
1
O
x=1
x=1
(a)
(b)
(c)
Figure 4.12: Sketches of the curves in (a) Example 4.10, (b) Example 4.11 and (c)
Example 4.12. Observe the behaviour of all three of these curves at x = 1: in (a) and (b)
we have a vertical asymptote at x = 1 and in (c) we have a cusp at x = 1.
4.5
Optimisation
We have seen how to use derivatives to find and classify the stationary points of a
function and we have seen that a local maximum (or local minimum) is a point where
the function is larger (or smaller) than it is at other nearby points. However, we now
want to find the points, called a global maximum (or global minimum), where the
function is larger (or smaller) than it is at all other points. In such cases, we often say
that we are looking for the points where the function is optimised. We will see that
some functions do not have a global maximum (or a global minimum) even though they
may have a local maximum (or a local minimum).
In order to determine whether a function, f (x), has a global maximum or a global
minimum, it is always useful to ask the following questions.
Which local maximum gives the largest value of f (x) and which local minimum
gives the smallest value of f (x)?
What is the behaviour of f (x) as x and as x ?
Then, having answered these questions one should be in a position to identify the global
maximum with the largest value of f and the global minimum with the smallest value
of f assuming, of course, that these exist. Indeed, one way of making sense of these
questions and their answers is to sketch the relevant features of the curve y = f (x) and
then, using this sketch, one can then easily identify any global maximum or global
minimum that the function may have.
4
We can see that these derivatives are undefined when x = 1 as that would require us to divide by
zero and that is never allowed. Moreover, observe that this function does not have a vertical tangent
line at x = 1 because to the left of x = 1 the gradient is tending to and to the right of x = 1 the
gradient is tending to .
123
4. One-variable optimisation
For instance, consider the function whose graph is sketched in Figure 4.13(a) which has
two local maxima and two local minima. If we ask our questions about this function, we
see that:
Comparing the relevant values, we see that the largest local maximum occurs when
x = a and the smallest local minimum occurs when x = b.
The function tends to zero as x .
So, in this case, it should be clear that the global maximum occurs when x = a and the
global minimum occurs when x = b as illustrated in Figure 4.13(b). However, if we have
global max
local max
local max
local max
b
a
b
a
local min
local min
local min
global min
Figure 4.13: (a) A sketch of a function with two local maxima and two local minima
which tends to zero as x . (b) This function has a global maximum and a global
minimum as indicated.
the function sketched in Figure 4.14(a) and ask our questions about that we see that:
Comparing the relevant values, we see that the largest local maximum occurs when
x = a and the smallest local minimum occurs when x = b.
The function tends to zero as x but tends to as x .
In this case, as illustrated in Figure 4.14(b), it should be clear that the global maximum
still occurs when x = a but now there is no global minimum since we can get far smaller
values of the function as x than we do from the smallest local minimum.
Activity 4.7 Use the sketches in Figures 4.9(b), 4.10(b) and 4.11(b) to determine
whether the functions in Examples 4.7, 4.8 and 4.9 have any global maxima or
global minima.
So, in general, we can see that if f : R R is a function that is differentiable for all
x R, then
its global maximum (or global minimum) can exist if the function is suitably
well-behaved as x and x ; and
if they exist, its global maximum (or global minimum) must occur at a local
maximum (or a local minimum).
But, having said this, a sketch is still the easiest way to see what is happening. We now
turn to some cases of optimisation where things work slightly differently.
124
4.5. Optimisation
y
y
local max
global max
local max
local max
local min
local min
local min
local min
!!
Figure 4.14: (a) A sketch of a function with two local maxima and two local minima
4.5.1
Constrained optimisation
Sometimes, it may be necessary to find the maximum (or minimum) value of f (x) when
the values of x are constrained (or restricted ). In such cases, there will be some interval,
such as x a or a x b, and we need to find the maximum (or minimum) value of
f (x) when x can only take these values.
For instance, consider the function whose graph is sketched in Figure 4.15(a) which has
a local minimum and a local maximum in the interval a x b. In this case, we can
see that the maximum and minimum values of f (x) for x in this interval must occur at
one of the points indicated by a . And, by comparing the values of f (x) at these
points we can see that the maximum occurs at the local maximum and the minimum
occurs at the local minimum as illustrated in Figure 4.15(b).
y
max
local max
min
local min
Figure 4.15: (a) A sketch of a function in the interval a x b with a local maximum
and a local minimum. (b) This function has a maximum and a minimum as indicated.
125
4. One-variable optimisation
However, suppose we have the function whose graph is sketched in Figure 4.16(a) which,
again, has a local minimum and a local maximum in the interval a x b. In this case,
we can again see that the maximum and minimum values of f (x) for x in this interval
must occur at one of the points indicated by a . And, by comparing the values of f (x)
at these points we can now see that the maximum occurs at the end-point x = a and
the minimum occurs at the end-point x = b as illustrated in Figure 4.16(b).
y
y
max
local max
local max
local min
local min
min
Figure 4.16: (a) A sketch of a function in the interval a x b with a local maximum
and a local minimum. (b) This function has a maximum and a minimum as indicated.
Activity 4.8 Use the sketches in Figures 4.9(b), 4.10(b) and 4.11(b) to find the
maximum and minimum values of the functions in Example 4.7 when 3 x 5,
Example 4.8 when 0 x 1 and Example 4.9 when 0 x 3.
So, in general, suppose that we have the interval a x b and f is a differentiable
function on this interval. In this case, the maximum (or minimum) value of f (x) will
occur
either at the local maximum (or local minimum) inside the interval that gives the
largest (or smallest) value of f (x)
or at one of the end-points of the interval, i.e. at x = a or x = b, if these give the
largest (or smallest) value of f (x).
This means that we should find the value of f (x) at any local maximum (or local
minimum) inside the interval and its value at the end-points of the interval, i.e. f (a)
and f (b). Having done this, the maximum (or minimum) will be the largest (or
smallest) of these values of f (x). But, of course, a sketch is still the easiest way to see
what is happening.
4.5.2
126
4.5. Optimisation
some interval. However, it is important to note that even if the function is not
differentiable at some relevant value(s) of x, we may still find that the maximum (or
minimum) value of the function occurs at such a point.
For instance, in Sections 3.3.4 and 4.4.3, we considered some ways in which a function
could fail to be differentiable at a point. Using these as a guide, we can consider the
three functions illustrated in Figure 4.17 which all fail to be differentiable at x = 1.
However, despite this, we see that in all three cases the global maximum of the function
occurs at x = 1 even though none of these points is a local maximum.5
y
(a) discontinuous
(b) corner
(c) cusp
Figure 4.17: Three functions which are not differentiable at x = 1 because (a) the function
is discontinuous at x = 1, (b) the function has a corner at x = 1 and (c) the function has
a cusp at x = 1.
Also, thinking about what we saw in Section 4.4.3, the presence of a vertical asymptote
may also mean that a global maximum or global minimum does not exist. Of course, as
we saw above, a sketch should enable us to see what is happening in any of these cases.
Activity 4.9 Consider the curves sketched in Figures 4.12(a) and (b). Do either of
these curves have a global minimum or a global maximum?
Now suppose that we are only interested in these curves for values of x in the
interval 0 x 1. Do either of these curves have a maximum or a minimum?
4.5.3
Applications of optimisation
Optimisation problems are very common in economics and we now introduce two ways
in which they can arise in that subject. The first is their use when a firm wants to find
the level of production which maximises its profit; and the second is when a government
wants to find the level of taxation which maximises the revenue generated by a tax that
has been imposed on a market.
Profit maximisation
When a firm sells an amount, q, it makes a profit given by
(q) = R(q) C(q),
5
That is, in all three cases, as f (1) does not exist it certainly cant be equal to zero!
127
4. One-variable optimisation
where R(q) is the revenue generated by selling this amount and C(q) is the cost of
producing this amount. Obviously, when doing this, the firm will want to sell an
amount q that will maximise its profit. Indeed, whereas the costs involved are
determined by factors intrinsic to the firm, the revenue generated is given by
R(q) = pq,
where p, the price per unit, is determined by the market the firm is selling in.
As an example, consider the case where the firm is a monopoly, i.e. it is the only
supplier of this product to the market. Indeed, as they are the only suppliers and the
amount they are supplying is q, the price that the consumers will be willing to pay for
this is given by p = pD (q) where pD (q) is, as in Section 2.1.5, the inverse demand
function of the market. As such, in this case, the revenue generated by the sale of an
amount q is given by
R(q) = qpD (q),
and this will yield a profit of
(q) = qpD (q) C(q).
Thus, in the case of a monopoly, given the firms cost function and the inverse demand
function for the market, we should be able to determine the amount, q, that the firm
should be selling by finding the value of q that maximises the firms profit. Lets look at
an example.
Example 4.13
(q) = qpD (q) C(q) = q(10 q) (q 3 10q 2 + 25q + 10) = q 3 + 9q 2 15q 10,
given that q is in the interval given by 0 q 10.
To do this, we note that (q) is given by
(q) = 3q 2 + 18q 15,
and so, as the stationary points occur when (q) = 0, we solve the equation
3q 2 + 18q 15 = 0
128
q 2 6q + 5 = 0
(q 1)(q 5) = 0,
4.5. Optimisation
to see that the stationary points occur when q = 1 and q = 5. We can then see that
(q) = 6q + 18,
which, using the second-derivative test, tells us that when:
q = 1, we have (1) = 12 > 0 and so this is a local minimum.
q = 5, we have (5) = 12 < 0 and so this is a local maximum.
This means that the point we seek, i.e. the maximum of the profit function, must
occur at q = 5 or at one of the two end-points of our interval. But, using the profit
function, we see that
(0) = 10,
(5) = 15
and
(10) = 260,
which means that the maximum occurs at q = 5 because it yields the largest profit.
Thus, q = 5 will maximise the firms profit.
Activity 4.10 Sketch the profit function from Example 4.13 to verify that q = 5
does indeed give a maximum. (Do not try to find the q-intercepts here.)
Maximising tax revenue
In Section 2.1.5, we saw how the supply and demand functions for a market are
modified if a tax is imposed. We are now in a position to see what level of tax should be
imposed if the government wants to maximise its tax revenue. For instance, if an excise
tax of T per unit is imposed, then the governments tax revenue, R(T ), is given by the
tax per unit multiplied by the number of units sold at equilibrium, i.e.
R(T ) = qT T,
where qT is the equilibrium quantity in the presence of the tax. Of course, we can then
use this to find the value of T , say T , that maximises this tax revenue. Lets look at an
example.
Example 4.14 In Example 2.7, we saw how the introduction of an excise tax
affected the market in Example 2.6 and that the maximum tax that can be imposed
is given by Tm = 4. What excise tax, T , should be imposed if the government wants
to maximise its tax revenue, R(T ), from this market? Sketch a graph of the tax
revenue, R(T ), against T and comment on the relationship between the values of Tm
and T .
This is a constrained optimisation problem as we must have
T 0 as T is the tax per unit, and
T Tm as, otherwise, the market will cease to function.
129
4. One-variable optimisation
So, we need to maximise the tax revenue generated by the tax, R(T ), i.e.
R(T ) = qT T =
T
2
T =
T2
+ 2T,
2
and so, as the stationary point occurs when R (T ) = 0, we see that we have a
stationary point when T = 2. We can then see that R (T ) = 1 < 0 which, using
the second-derivative test, tells us that this stationary point is a maximum. This
means that the point we seek, i.e. the maximum of the tax revenue function, must
occur at T = 2 or at one of the two end-points of our interval. But, using the tax
revenue function, we see that
R(0) = 0,
R(2) = 2
and
R(4) = 0,
which means that the maximum occurs at T = 2 because it yields the largest tax
revenue. Thus, we take T = 2 and, as in the sketch in Figure 4.18, we find that T
is half-way between no tax (i.e. T = 0) and the maximum tax, Tm = 4.
RT
2
Figure 4.18: A sketch of the tax revenue generated by an excise tax of T for Example 4.14.
Notice how, in the presence of an excise tax, the tax revenue is maximised at a value of
T half-way between no tax (i.e. T = 0) and the maximum tax that can be imposed (i.e.
Tm = 4).
Of course, if a percentage of the price tax of 100r% is imposed, then the governments
tax revenue, R(r), would be given by the tax per unit, rpr , multiplied by the number of
units sold at equilibrium, i.e.
R(r) = rpr qr ,
where pr and qr are the equilibrium price and quantity in the presence of the tax. Of
course, we can also use this to find the value of r, say r , that maximises this tax
revenue. See, for example, Exercise 4.5.
Learning outcomes
At the end of this chapter and having completed the relevant reading and activities, you
should be able to:
130
use first and second-order derivatives to identify the relevant features of a function;
sketch curves by identifying their key features;
optimise functions of one variable;
solve problems from economics-based subjects that involve optimisation.
Solutions to activities
(x a)2
f (a).
2
Now, as f (a) < 0 (recall that a I too), we see that f (x) < y for all x I where
x = a, i.e. these values of f always lie below the values from the tangent line to f at
x = a, as illustrated in Figure 4.19(a). But, of course, we can use any a I when we
run this argument and so a concave function is one which lies below all its tangent lines,
as illustrated in Figure 4.19(b). In particular, a function must be concave in the
neighbourhood of a local maximum.
y
y
f (x)
y = f (x)
O
y = f (x)
O
Figure 4.19: The relationship between a concave function and its tangent lines. (a) When
changing the value of x, we can see that the values of f (x) are less than the corresponding
values of y from the tangent line to f at a, i.e. f lies below this tangent line. (b) By
changing the value of a, we can see that f lies below all of its tangent lines.
Solution to activity 4.2
For f (x) = x4 1, we see that
f (x) = 4x3
and
f (x) = 12x2 ,
131
4. One-variable optimisation
which means, in particular, that f (0) = 0. Then, looking at the first-order derivative,
we see that f (x) < 0 for x < 0 and f (x) > 0 for x > 0 which means that the function
is decreasing for x < 0 and then increasing for x > 0 as shown in Figure 4.8(a). Or,
looking at the second-order derivative, we see that f (x) > 0 for all x = 0 and so the
function is convex as shown in Figure 4.8(a).
For f (x) = 1 x4 , we see that
f (x) = 4x3
and
f (x) = 12x2 ,
which means, in particular, that f (0) = 0. Then, looking at the first-order derivative,
we see that f (x) > 0 for x < 0 and f (x) < 0 for x > 0 which means that the function
is increasing for x < 0 and then decreasing for x > 0 as shown in Figure 4.8(b). Or,
looking at the second-order derivative, we see that f (x) < 0 for all x = 0 and so the
function is concave as shown in Figure 4.8(b).
Solution to activity 4.3
Given that f (x) is a polynomial and that, for some constants a = 0 and n N, the term
in this polynomial with the highest power of x is axn . We see that, as x , we have
f (x) if a > 0 as axn , and
f (x) if a < 0 as axn ,
1
1
6 12
2
2
= ,
24x 24x + 4 = 0 = 6x 6x + 1 = 0 = x =
12
2
12
if we use the quadratic formula. Now, if f (x) also changes sign at these values of x we
have a point of inflection. To see whether this is the case, consider that we now have
f (x) = 24x2 24x + 4 = 24 x
1
1
2
12
1
1
+
2
12
if we let x = a and x = b denote the smaller and the larger values of x we are interested
in respectively. This means that, considering the signs of the two factors, we have
132
x<a
xa
xb
f (x)
+
x=a
0
a<x<b
+
x=b
+
0
0
x>b
+
+
+
so we see that f (x) does indeed change sign at x = a and x = b. Consequently, both
x = a and x = b where
a=
1
1
2
12
and b =
1
1
+ ,
2
12
x(6 6x + x2 ) ex = 0,
6 12
= 3 3,
x=
2
if we use the quadratic formula. Now, if f (x) also changes sign at these values of x we
have a point of inflection. To see whether this is the case, consider that we now have
f (x) = (6x6x2 +x3 ) ex = x x 3
x 3+
ex = x(xa)(xb) ex ,
if we let x = a and x = b denote the smaller and the larger values of x we are interested
in respectively. This means that, considering the signs of the four factors, we have
x
xa
xb
ex
f (x)
0<x<a
+
+
+
x=a
+
0
+
0
a<x<b
+
+
x=b
+
+
0
+
0
x>b
+
+
+
+
+
so we see that f (x) does indeed change sign at x = a and x = b. Consequently, both
x = a and x = b where
a = 3 3 and b = 3 + 3,
are also points of inflection of the function in Example 4.9.
Solution to activity 4.6
Here f (x) = x2 ex and we find the key features of the curve y = f (x), namely
133
4. One-variable optimisation
finding the stationary points: These occur when f (x) = 0 and so, using the
product rule, we get
f (x) = (2x)(ex ) + (x2 )(ex ) = x(2 + x) ex ,
and so we solve the equation
x(2 + x) ex = 0.
But, as ex = 0 for all x R, we find that the stationary points occur when x = 0
and x = 2. Then, we use y = f (x) to find the values of y at these points so that
we can locate them on the sketch. Doing this, we find that
x = 0 gives y = f (0) = (0)2 e0 = (0)(1) = 0, and
So, the stationary points have coordinates given by (0, 0) and (2, 4 e2 ).
classifying the stationary points: Lets use the second-order derivative test here. We
can use the product rule again to see that
f (x) = (2 + 2x)(ex ) + (2x + x2 )(ex ) = (2 + 4x + x2 ) ex ,
and so, looking at the stationary points, we have
f (0) = (2) e0 > 0 and so (0, 0) is a local minimum, and
limiting behaviour: Using the fact in Section 4.4.2, we would expect the ex to
dominate and this would mean that f (x) as x whereas, as x , we
would expect f (x) 0 as ex 0.
Then, using this information, we begin to sketch this curve by roughly indicating these
key features on some axes as in Figure 4.20(a) and then, joining them up with a nice
smooth curve, we get the sketch itself as in Figure 4.20(b).
To find the points of inflection of this function, we start by seeing where f (x) = 0.
That is, we solve the equation
(2 + 4x + x2 ) ex = 0,
which, as ex = 0 for all x R, gives us
4
x=
2
134
= 2
2,
4e2
4e2
y = f (x)
Figure 4.20: Sketching the curve y = x2 ex . (a) Using what we have discovered about the
key features of the curve, we can begin to see what it must look like. (b) By joining up
these key features with a nice smooth curve, we get the sketch itself.
if we use the quadratic formula. Now, if f (x) also changes sign at these values of x we
have a point of inflection. To see whether this is the case, consider that we now have
x 2 + 2 ex = (x a)(x b) ex ,
f (x) = (2 + 4x + x2 ) ex = x 2 2
if we let x = a and x = b denote the smaller and the larger values of x we are interested
in respectively. This means that, considering the signs of the four factors, we have
x<a
xa
xb
ex
+
f (x)
+
x=a
0
+
0
a<x<b
+
x=b
+
0
+
0
x>b
+
+
+
+
so we see that f (x) does indeed change sign at x = a and x = b. Consequently, both
x = a and x = b where
a = 2 2 and b = 2 + 2,
are points of inflection of the function f (x) = x2 ex .
Solution to activity 4.7
Looking at the figures in question, we have:
Using Figure 4.9(b) we see that the function in Example 4.7 has neither a global
maximum (as f (x) as x ) nor a global minimum (as f (x) as
x ).
Using Figure 4.10(b) we see that the function in Example 4.8 has a global
minimum of zero when x = 0 and x = 1 but no global maximum as f (x) as
x or x .
Using Figure 4.11(b) we see that the function in Example 4.9 has a global
maximum of 27 e3 when x = 3 but no global minimum as f (x) as
x .
135
4. One-variable optimisation
Using Figure 4.10(b) when 0 x 1 we see that the function in Example 4.8 has a
maximum value of 1/8 when x = 1/2 and a minimum value of zero when x = 0 and
x = 1.
Using Figure 4.11(b) when 0 x 3 we see that the function in Example 4.9 has a
maximum value of 27 e3 when x = 3 a minimum value of zero when x = 0.
Solution to activity 4.9
Looking at the figures in question, we have:
Using Figure 4.12(a) we see that the function has neither a global maximum (as
f (x) as x 1+ ) nor a global minimum (as f (x) as x 1 ).
Using Figure 4.12(b) we see that the function has neither a global maximum (as
f (x) as x 1) nor a global minimum (as, even though f (x) 0 as x
or x , it never gets there).
Now, restricting our attention to 0 x 1, we can see from the figures that:
Using Figure 4.12(a) we see that the function has a maximum value of 1 when
x = 0 but no minimum value as f (x) as x 1 .
Using Figure 4.12(b) we see that the function has a minimum value of 1 when
x = 0 but no maximum value as f (x) as x 1 .
Solution to activity 4.10
Using the information in Exercise 4.13 and noting that (1) = 17, we get the sketch in
Figure 4.21 if we are allowed to omit the q-intercepts. From this, we can clearly see that
q = 5 gives us the maximum value of (q) for 0 q 10.
Exercises
Exercise 4.1
For what values of x is the function
f (x) = x ex ,
increasing or decreasing? Use this information to find and classify any stationary points
of this function.
136
4.5. Exercises
(q)
15
1
O
10
17
10
5
4
260
Figure 4.21: A sketch of the profit function from Example 4.13 for Activity 4.10. (Note
that, as instructed, we have not found the q-intercepts of this profit function.)
For what values of x is this function convex or concave? Use this information to
determine whether this function has any points of inflection.
Exercise 4.2
Consider the function
f (x) = 12 ln x x2 + 10x,
where x > 0. Find the x-coordinates of the stationary points of f (x) and classify them.
Exercise 4.3
Sketch the curve y = f (x) where
f (x) = x3 +
1
,
x3
137
4. One-variable optimisation
Exercise 4.5
In Exercise 2.3 we saw how the introduction a percentage [of the price] tax of 100r%
affected a market and we found that the maximum tax that can be imposed is given by
rm = 1/2.
What tax, r , should be imposed if the government wants to maximise its tax revenue,
R(r), from this market? Sketch the graph of the tax revenue function, R(r), for values
of r that make economic sense.
Solutions to exercises
Solution to exercise 4.1
Using the product rule, we see that the derivative of the function f (x) = x ex is given by
f (x) = (1) ex +x(ex ) = (1 + x) ex ,
and so, as ex > 0 for all x R, we see that the function is
decreasing when x < 1 as f (x) < 0, and
increasing when x > 1 as f (x) > 0.
f (x) =
12
2x + 10.
x
The stationary points of f (x) occur when f (x) = 0 and so we have to solve the equation
12 + 2x2 10x
12
2x+10 = 0 =
= 0 = x2 5x+6 = 0 = (x2)(x3) = 0.
x
x
138
f (x) = 12x2 2 =
12
2,
x2
4
3
Thus, the function, f (x), has a local minimum when x = 2 and a local maximum when
x = 3.
Solution to exercise 4.3
To sketch the curve y = f (x) where
f (x) = x3 +
1
= x3 + x3 ,
x3
1
=0
x3
x6 + 1
=0
x3
x6 + 1 = 0.
But, as x6 + 1 > 0 for all x R, we find that this equation has no solutions and so
the curve has no x-intercepts.
y-intercept: This occurs when x = 0 and so, as the function is not defined when
x = 0, we find that the curve has no y-intercepts.
finding the stationary points: These occur when f (x) = 0 and so, as
f (x) = 3x2 3x4 ,
we have to solve the equation
3x2 3x4 = 0
x2 =
1
x4
x6 = 1,
i.e. the stationary points of f (x) occur when x = 1. Then, we use y = f (x) to find
the values of y at these points so that we can locate them on the sketch. Doing
this, we find that
x = 1 gives y = f (1) = 1 + 1 = 2, and
So, the stationary points have coordinates given by (1, 2) and (1, 2).
classifying the stationary points: The second-order derivative of the function is
f (x) = 6x + 12x5 = 6x +
12
,
x5
139
4. One-variable optimisation
f (x) as x 0+ , and
f (x) as x 0 ,
i.e. the curve y = f (x) has a vertical asymptote when x = 0. Consequently, using this
information, we can get the sketch in Figure 4.22.
y
y = x3 +
2
1
1
x3
x
Indeed, using this sketch, we can clearly see that this function has neither a global
minimum nor a global maximum. In particular, notice that the local minimum is not
global because our local maximum gives us a smaller value of f (x) and the local
maximum is not global since the local minimum gives us a larger value of f (x)!
Solution to exercise 4.4
(a) To find the x-intercepts of the curve y = f (x) we set y = 0 and solve the equation
3x5 25x3 +60x = 0
To deal with this second possibility, we notice that we have a quadratic equation in x2
and so, if we were to use the quadratic formula (say), we get
x2 =
25
252 4(3)(60)
.
2(3)
140
and so this equation gives us no solutions for x2 and, hence, no solutions for x. Thus,
the only solution to y = 0 is x = 0 and this is, therefore, the only x-intercept of the
curve y = f (x).
(b) The stationary points occur when f (x) = 0 and so, as
f (x) = 15x4 75x2 + 60,
we have to solve the equation
15x4 75x2 + 60 = 0
x4 5x2 + 4 = 0.
x2 = 4, 1,
141
4. One-variable optimisation
If x = 2, we have
f (2) = 30(2)(2(2)2 5) = 60(8 5) = 180 > 0,
and so this is a local minimum. At this point we also have
y = f (2) = 3(2)5 25(2)3 + 60(2) = 96 200 + 120 = 16,
and so the coordinates of this point are (2, 16).
(c) We can use the information that we have found so far together with the observation
that the y-intercept occurs when x = 0, i.e. when y = f (0) = 0, to get the sketch in
Figure 4.23(a).
y
y = f (x)
y = f (x)
(3, 234)
(1, 38)
(2, 16)
(2, 16)
(1, 38)
(1, 38)
3
(2, 16)
(3, 234)
(a)
(2, 16)
3
(1, 38)
(b)
Figure 4.23: (a) A sketch of the curve y = f (x) from Exercise 4.4(c). (b) For
Exercise 4.4(d), picking out the interval 2 x 2 using vertical dotted lines and
the interval 3 x 3 using vertical dashed lines.
(d) Given that 2 x 2 and looking at the sketch in Figure 4.23(a), it should be
clear that the global maximum and the global minimum of f (x) are at the points (1, 38)
and (1, 38) respectively. If youre unclear about this, this interval is picked out by
the vertical dotted lines in Figure 4.23(b).
If we now have 3 x 3, looking at the sketch in Figure 4.23(a), it should be clear
that the global maximum and the global minimum of f (x) are at the points (3, 234) as
f (3) = 234 and (3, 234) as f (3) = 234 respectively. If youre unclear about this,
this interval is picked out by the vertical dashed lines in Figure 4.23(b).
142
4 8r
2r
12
2r
r 2r2
= 48
.
(2 r)2
Now, to find the value of r, i.e. r , that maximises R(r), we differentiate it with respect
to r using the quotient and chain rules to get
2
R (r) = 48
2 7r
.
(2 r)3
This has a stationary point when R (r) = 0, i.e. when r = 2/7, and as R (r) changes
from positive to negative as r goes through this value, we can see that this stationary
point is a local maximum.6 Now, in this case, we must have 0 r rm for the market
to function and so this is a constrained optimisation problem. That is, the maximum we
seek is either the value of R(r) at our local maximum, i.e.
R (r) = 48
2
7
2
=
7
48
2 72
12
2 72
2
7
2
=
7
12
12
7
12
7
12
7
= 2,
and
1
2
1
=
2
12
2 12
48
2 12
1
2
= 0 < 2,
and so the maximum value of R(r) is 2 and this occurs when r = 2/7, i.e. at the local
maximum. Thus, r = 2 and using the information we have so far, we can get the sketch
in Figure 4.24(a) for values of r that make economic sense, i.e. those where
0 r 1/2.7
Aside: As shown in Figure 4.24(b), observe that once we move away from the
economically meaningful values of r (i.e. where 0 r 1/2) the graph of R(r) gets
quite complicated. Indeed, note that as
R(r) = 48
r 2r2
,
(2 r)2
we can see that it has a vertical asymptote when r = 2 and, because we can write
R(r) = 48
r 2r2
2(r2 4r + 4) 7r + 8
7r 8
=
48
= 96 48
,
2
2
(2 r)
4 4r + r
(2 r)2
we can see that R(r) 96 as r , i.e. we also have a horizontal asymptote here.
6
Alternatively, you can show that this stationary point is a local maximum by showing that R (r) < 0
when r = 2/7, but this isnt quite so easy.
7
Note, in particular, that r is clearly not half-way between no tax (i.e. r = 0) and the maximum tax
(i.e. rm = 1/2) as it was in Example 4.14 when we looked at an excise tax.
143
4. One-variable optimisation
4
R(r)
R(r)
2
O 2
2
7
1
2
(a)
96
(b)
Figure 4.24: For Exercise 4.5: (a) A sketch of the graph of R(r) for the economically
meaningful values of r, i.e. those between zero (i.e. no tax) and 1/2 (i.e. the maximum
tax). (b) As an aside, we could have sketched the graph of R(r) for some economically
meaningless values of r (specifically r < 0 and r 1/2). Observe, in particular, the
vertical asymptote when r = 2 and the horizontal asymptote where R(r) 96 as
r . (Note that the details of what is happening in the positive quadrant, which we
saw in (a), have been omitted from (b) for clarity.)
144
Chapter 5
Integration
Essential reading
(For full publication details, see Chapter 1.)
Binmore and Davies (2002) Sections 10.2, parts of 10.310.4, 10.510.9.
5.1
or
f (x).
And, in particular, we saw how to find such derivatives by using the rules of
differentiation and some standard derivatives. Now, given a function, f (x), we want to
make sense of what it means to find the indefinite integral of this function with respect
to x, which is denoted by
f (x) dx.
145
5. Integration
In such cases, as we are integrating the function f (x) with respect to x, we call it the
integrand. And, similarly to what we saw before, we will see how to find such integrals
by using the rules of integration and some standard integrals. In particular, the standard
integrals will be closely related to our standard derivatives since the key idea behind our
method for finding integrals will be the idea that integration is the process that
undoes (or reverses) the process of differentiation, i.e. the process of indefinite
integration can be thought of as antidifferentiation and the resulting indefinite integral
can be thought of as an antiderivative.
Consider the functions F (x) and f (x) where we know that f (x) is the derivative1 of
F (x), i.e.
dF
= f (x).
dx
Now, using the idea that integration undoes differentiation, i.e. if we integrate f (x)
with respect to x we are looking for a function, F (x), whose derivative is f (x), we can
see that
f (x) dx must be, more or less, given by F (x).
In such cases, we say that F (x) is an antiderivative of f (x) as opposed to, say, the
indefinite integral.
However, you may wonder why we say that the function, F (x), that we found above is
an, as opposed to the, antiderivative of f (x). The reason for this is that if, instead of
the function F (x) we had the function F (x) + c where c is a constant, then its
derivative would still be f (x), i.e.
d
F (x) + c
dx
= f (x),
We say that it is the derivative because differentiation always yields exactly one answer.
146
Example 5.2
What is
8x dx?
We saw in Example 5.1 that 4x2 is an antiderivative of 8x. This means that
8x dx = 4x2 + c,
where c is an arbitrary (i.e. any) constant. Notice that this works because
differentiating 4x2 + c we get 8x.
Generally speaking then, we have the following.
If F (x) is a function whose derivative is the function f (x), then we have
f (x) dx = F (x) + c,
where c is an arbitrary constant. In particular, we call the
function, f (x), the integrand as it is what we are integrating,
function, F (x), an antiderivative as its derivative is f (x),
constant, c, a constant of integration which is completely arbitrary,2 and
integral,
Now that we have the idea, lets see how were going to actually find the indefinite
integrals of the functions that commonly occur in this course.
5.2
The previous section told us how to find indefinite integrals using the antiderivatives,
but now we want to explore a more convenient way of finding them. The key idea is
that we introduce standard integrals which tell us how to integrate the basic functions
that we saw in Chapter 2. Once we know how to integrate these, the rules of integration
will allow us to integrate combinations of these functions.
5.2.1
Standard integrals
As we can add any constant to F (x) to account for the fact that F (x) + c, for any constant c R,
is also an antiderivative.
147
5. Integration
Power functions
If n = 1, we have
xn+1
x dx =
+ c,
n+1
where c is an arbitrary constant and this works because
n
d
dx
xn+1
+c
n+1
(n + 1)xn
+ 0 = xn .
n+1
In particular, if n = 0, we have
x0 dx = x + c,
1 dx =
x1 dx =
where we need the modulus sign in ln |x| as x may be negative but the logarithm
function is only defined for x > 0. This works because, if x > 0, we have |x| = x and so
d ln(x)
1
d ln |x|
=
= ,
dx
dx
x
whereas if x < 0, we have |x| = x and so
d ln(x)
1
1
d ln |x|
=
=
= ,
dx
dx
x
x
ex +c
= ex .
However, there is no nice standard integral for ln x and so well see how to find
ln x dx,
when we encounter integration by parts in Example 5.20.
If we have another base, a, the standard integrals are not so simple. But, we can see that
ax dx =
148
ax
+ c,
ln a
where c is an arbitrary constant since, using the result from Activity 3.9, we have
d
dx
ax
+c
ln a
ax ln a
+ 0 = ax .
ln a
However, there is also no nice standard integral for loga x and so well see how to find
loga x dx,
in Activity 5.12 where we will use the change of base formula once we can integrate ln x.
Sine and cosine functions
For the sine and cosine function we find that
sin x dx = cos x + c
and
cos x dx = sin x + c,
cos x + c
= ( sin x) + 0 = sin x,
5.2.2
In Section 2.1.2, we saw that there are several standard ways of making new functions
from old ones and, in Section 3.2.2, we saw how the rules of differentiation could be
used to differentiate these new functions. Here we will see how we can use standard
integrals, i.e. the integrals of our basic functions, and rules of integration to integrate
the new functions that are created from these basic ones in these standard ways. We
start with the most straightforward of these which allows us to integrate linear
combinations of functions.
The linear combination rule
If k and l are constants, this allows us to integrate the linear combination,
kf (x) + lg(x), of two functions f (x) and g(x). It states that
[kf (x) + lg(x)] dx = k
f (x) dx + l
g(x) dx.
Indeed, this gives us three more basic rules straightaway, i.e. the
constant multiple rule: If k is a constant and f (x) is a function, then
kf (x) dx = k
f (x) dx.
149
5. Integration
f (x) dx +
g(x) dx.
f (x) dx +
g(x) dx.
Activity 5.1 Derive the constant multiple, sum and difference rules from the linear
combination rule.
Activity 5.2
Example 5.3
3x
21
dx = 3
x2
1
2
x +x
1
2
3
1 3
x3 x 2
dx =
+ 3 +c=
x + 2x 2 + c by the sum rule,
3
3
2
Use the rules above to integrate the following functions with respect
(a) 3 cos x,
(b) ex + cos x,
3
(c) 3 sin x .
x
We now look at the other rules of integration, i.e. the ones that will allow us to
integrate other combinations of functions. But, unlike what we saw with the rules of
differentiation in Section 3.2.2, we shall see that these are harder to apply.
5.2.3
Integration by substitution
150
dh
df dg
=
,
dx
dg dx
df dg
dx = f (g(x)) + c,
dg dx
which is the basis of integration by substitution. However, this is quite hard to apply
and so, as a useful way of applying this rule, we think of
dg
as
dg
dx,
dx
so that we have
df
dg = f (g) + c,
dg
and this is the key to the method that we shall be using here.
How to integrate by substitution
We can now see how to apply integration by substitution. The basic idea is that, if you
are given an integrand that involves a composition of two functions, this rule of
integration sometimes allows you to turn it into an easier integral by making a
substitution. That is:
The integral involves the derivative of a composition and has the form
f (g(x))g (x) dx.
Write f (g(x)) as f (g) and g (x)dx as dg. This should give you the easier integral
f (g) dg.
Find this integral and replace all occurrences of g with g(x) to get your final
answer.
Now, to make this clearer, lets look at some examples.
Some simple applications of integration by substitution
Easy integrations by substitution involve an integrand which is nothing more than a
simple composition of two functions and so there can be no doubt about which function
should be g. To see this, lets consider what happens when we want to integrate a
simple composition which involves the function 3x + 1.
151
5. Integration
Example 5.4
Find
dg
= 3 and so dg = 3 dx, i.e. dx = 13 dg. Hence,
Taking g = 3x + 1 we have
dx
substitution gives
(3x + 1)2 dx =
g2
1
dg
3
1
3
g 2 dg =
(3x + 1)3
1 g3
+c=
+ c,
3 3
9
Example 5.5
1
dx.
3x + 1
Find
dg
Taking g = 3x + 1 we have
= 3 and so dg = 3 dx, i.e. dx = 13 dg. Hence,
dx
substitution gives
1
dx =
3x + 1
1
g
1
dg
3
1
3
g 1 dg =
1
1
ln |g| + c = ln |3x + 1| + c,
3
3
Example 5.6
e3x+1 dx.
Find
dg
= 3 and so dg = 3 dx, i.e. dx = 13 dg. Hence,
Taking g = 3x + 1 we have
dx
substitution gives
e3x+1 dx =
eg
1
dg
3
1
3
eg dg =
1 g
1
e +c = e3x+1 +c,
3
3
Find
dg
Taking g = 4x + 7 we have
= 4 and so dg = 4 dx, i.e. dx = 14 dg. Hence,
dx
substitution gives
(4x + 7)2 dx =
g2
1
dg
4
152
1
4
g 2 dg =
1 g3
(4x + 7)3
+c=
+ c,
4 3
12
1
dx and
4x + 7
Activity 5.4
e4x+7 dx.
Note that in all of these examples, the substitution works because we have
g(x) = ax + b and hence
dg
=a
dx
dg = a dx
1
dg = dx,
a
sin(ax + b) dx and
cos(ax + b) dx.
What happens if a = 0?
Activity 5.6 Using the expressions you found in Activity 5.5, verify your answers
to Activity 5.4.
Some less simple applications of integration by substitution
We will also see slightly harder integrations by substitution where the integrand
involves a composition of two functions multiplied by another function. Although, even
in these cases, there can be little doubt about which function should be g. To see this,
lets consider what happens when we want to integrate a simple composition which
involves the function x2 + 1.
Example 5.8
Find
g2
1
dg
2
1
2
g 2 dg =
1 g3
(x2 + 1)3
+c=
+ c,
2 3
6
153
5. Integration
i.e. the extra x in the integrand was actually needed for the substitution g = x2 + 1
to work.
x
dx.
x2 + 1
Find
Example 5.9
1
g
x
dx =
+1
1
dg
2
1
2
g 1 dg =
1
1
ln |g| + c = ln |x2 + 1| + c,
2
2
i.e. the extra x in the integrand was, again, needed for the substitution g = x2 + 1
to work.
5
Example 5.10
x ex
Find
2 +1
dx.
2 +1
dx =
ex
2 +1
x dx =
1
dg
2
eg
1
2
eg dg =
1 g
1 2
e +c = ex +1 +c,
2
2
i.e. the extra x in the integrand was, again, needed for the substitution g = x2 + 1
to work.
In particular, observe what changes in these examples and what stays the same. Indeed,
just for comparison, we can see what would happen if we had a composition which is
like the one in Example 5.8 but it now involves the function 3x2 + 7 instead of x2 + 1.
Example 5.11
Find
g2
1
dg
6
1
6
g 2 dg =
1 g3
(3x2 + 7)3
+c=
+ c,
6 3
18
i.e. the extra x in the integrand was actually needed for the substitution
g = 3x2 + 7 to work.
Activity 5.7
x
dx and
+7
3x2
2 +7
x e3x
dx.
To summarise, it is worth noting that in all of these examples, the substitution works
because we have g(x) = ax2 + b and hence
dg
= 2ax
dx
154
dg = 2ax dx,
where a = 0 and b are constants. But, 2ax is not a constant and so we can not deal with
this by taking it out of the integral as we did in the last set of examples. However, in
these cases, the substitution still works because we have
dg
= 2ax
dx
dg = 2ax dx
1
dg = xdx,
2a
and there is also an x in the integrand to facilitate the transition from dx to dg.
Indeed, in the absence of this extra x, the substitution would produce a more
complicated integral and we would not be able to proceed!
Integration by substitution more generally
The general lesson that we should be drawing from the last two sets of examples is that
integration by substitution works when we have an integrand which is the product of
the composition of two functions f (g(x)), and
a constant multiple of g (x).
The first of these enables us to replace f (g(x)) with f (g) and the second enables us to
replace dx with some constant multiple of dg. Having done this, the substitution has
turned a hard integral into an easier one and we can proceed. Lets now consider some
more complicated examples.
Find
Example 5.12
2 7
(x + x ) (3x + 2x) dx =
g8
(x3 + x2 )8
g dg =
+c=
+ c.
8
8
7
Here, the extra 3x2 + 2x in the integrand was needed for the substitution
g = x3 + x2 to work.
Example 5.13
Find
x2
2x + 2
dx.
+ 2x + 2
155
5. Integration
which is the other part of the product in the integrand, i.e. this substitution will
work. Thus, we see that
dg = (2x + 2) dx,
and so the substitution gives
1
dg = ln |g| + c = ln |x2 + 2x + 2| + c.
g
2x + 2
dx =
x2 + 2x + 2
Here, the extra 2x + 2 in the integrand was needed for the substitution
g = x2 + 2x + 2 to work.
Example 5.14
Find
(x2 + 1) ex
3 +3x+7
3 +3x+7
dx.
dg
= 3x2 + 3 = 3(x2 + 1),
dx
which is a constant multiple of the other part of the product in the integrand, i.e.
this substitution will work. Thus, we see that
dg = 3(x2 + 1) dx
1
dg = (x2 + 1) dx,
3
3 +3x+7
dx =
eg
1
dg
3
1
3
eg dg =
1 g
1 3
e +c = ex +3x+7 +c.
3
3
Here, the extra x2 + 1 in the integrand was needed for the substitution
g = x3 + 3x + 7 to work.
Activity 5.8
Find
x sin(x2 ) dx.
Find
156
g 2 dg =
1
g3
+ c = sin3 x + c.
3
3
Here, of course, the extra cos x in the integrand was needed for the substitution
g = sin x to work.
Activity 5.9
Find
Indeed, as the next example shows, this kind of substitution allows us to find another
useful result.
Example 5.16
Find
tan x dx.
sin x
,
cos x
which means that the composition is (cos x)1 and so we take g = cos x. As such, we
have
dg
= sin x,
dx
which, up to a minus, is the other part of the product in the integrand, i.e. this
substitution will work. Thus, we see that
tan x =
dg = sin x dx,
and so the substitution gives
tan x dx =
sin x
dx =
cos x
dg
= ln |g| + c = ln | cos x| + c.
g
Here, of course, the extra sin x in the integrand was needed for the substitution
g = cos x to work.
Activity 5.10
Find
cot x dx.
However, not every trigonometric substitution is so easy to spot as the next example
shows.
Example 5.17
Find
dx
.
(x + a)2 + b2
157
5. Integration
Here, for reasons that will soon become apparent, we make the substitution
x + a = b tan . As such, differentiating both sides of this expression with respect to
, we have
dx
= b sec2
=
dx = b sec2 d.
d
This means that our integral becomes
dx
=
(x + a)2 + b2
b sec2
d =
b2 tan2 + b2
sec2
d =
b sec2
d
,
b
if we use the trigonometric identity tan2 + 1 = sec2 from (2.4). This then gives us
d
1
= + c = tan1
b
b
b
x+a
b
+ c,
since x + a = b tan and where c is an arbitrary constant. Thus, we have found that
dx
1
tan1
=
2
2
(x + a) + b
b
x+a
b
+ c,
Find
x2
dx
. (Hint: Complete the square in the
+ 2x + 2
denominator.)
We will see other examples of how trigonometric identities can be used when finding
integrals in Section 5.2.6.
5.2.4
Integration by parts
Integration by parts is a way of dealing with integrands which involve the product of
two functions and, as such, it is closely related to the product rule of differentiation. To
see how it works, we will start by seeing how integration by parts is related to the
product rule and then we will describe how to apply this rule. We will then see some
examples of how it can be applied.
Why integration by parts works
We start by noting that the product rule for differentiation tells us that
d
[f (x)g(x)] = f (x)g(x) + f (x)g (x).
dx
So, integrating both sides with respect to x, we get
d
[f (x)g(x)] dx = f (x)g(x) dx + f (x)g (x) dx,
dx
which, on noting that integration undoes differentiation, yields
f (x)g(x) =
158
f (x)g(x) dx +
f (x)g(x) dx,
Choose f (x) and g (x) so that we can differentiate f (x) to get f (x) and
straightforwardly integrate g (x) to get g(x).
Apply the formula and make sure that the new integral,
integrate.
If it is, proceed. If it is not, then you have been unwise in your choice of f (x) and
g (x).
Lets look at some simple examples of how it works.
Example 5.18
Find
x ex dx.
and
g (x) = ex ,
and
g(x) = ex ,
where we have suppressed the arbitrary constant from the integration. Applying the
rule then gives,
x ex dx = (x)(ex )
(1)(ex ) dx = x ex
ex dx,
and, clearly, the new integral is easier to find. Thus, finding this integral, we get
x ex dx = x ex
ex dx = x ex ex +c = (x 1) ex +c,
as the answer.
159
5. Integration
Warning! Observe that if we had chosen f (x) and g (x) differently, we would have
got
f (x) = ex
and
g (x) = x,
so that differentiating f (x) and integrating g (x) we would have got
f (x) = ex
and
g(x) =
x2
,
2
where we have suppressed the arbitrary constant from the integration. Applying the
rule then gives,
x ex dx = (ex )
x2
2
x2
2
(ex )
dx =
x2 ex 1
2
2
x2 ex dx,
Example 5.19
Find
x ln x dx.
and
g (x) = x,
1
x
and
g(x) =
x2
,
2
where we have suppressed the arbitrary constant from the integration. Applying the
rule then gives,
x ln x dx = (ln x)
x2
2
1
x
x2
2
dx =
x2
ln x
2
x
dx,
2
and, clearly, the new integral is easier to find. Thus, finding this integral, we get
x ln x dx =
x2
ln x
2
x
x2
x2
dx =
ln x
+ c,
2
2
4
as the answer.
Warning! Observe that if we had chosen f (x) and g (x) differently, we would have
got
f (x) = x
and
g (x) = ln x.
This would have been bad because we cant integrate g (x) = ln x to get g(x) at the
moment.
However, having said that, now that we can integrate by parts, we can finally see how
to integrate ln x.
160
Example 5.20
Find
ln(x) dx.
ln(x) dx =
To apply integration by parts, we choose
f (x) = ln(x)
and
g (x) = 1,
1
x
and
g(x) = x,
where we have suppressed the arbitrary constant from the integration. Applying the
rule then gives,
1 ln(x) dx = (x)(ln(x))
(x)
1
x
dx = x ln(x)
1 dx,
and, clearly, the new integral is easier to find. Thus, finding this integral, we get
ln(x) dx = x ln(x)
1 dx = x ln(x) x + c,
and
g (x) = ln x,
161
5. Integration
Find
and
g (x) = x2 ,
and
g(x) =
x3
,
3
where we have used the chain rule to perform the differentiation and suppressed the
arbitrary constant from the integration. Applying the rule then gives,
x3
3
2(x2 + 1)(2x)
x3
3
dx,
and, clearly, the new integral is easier to find because we can easily multiply out the
brackets and integrate term-by-term. Thus, finding this integral, we get
(x2 + 1)2 x2 dx =
4
x3 2
(x + 1)2
3
3
x6 + x4 dx =
x3 2
4
(x + 1)2
3
3
x7 x5
+
7
5
+ c,
as the answer.
Activity 5.14 Verify that this answer is correct by multiplying out the brackets in
the integrand and integrating term-by-term.
The last two ways of making progress with an integral that we will consider are not rules
of integration, but handy techniques that allow us to rewrite integrands so that we can
see how to integrate them. The first of these uses a particular kind of algebraic identity
known as partial fractions and the second involves the use of trigonometric identities.
5.2.5
Suppose that we have an integrand which is a rational function of two polynomials, say
R(x) =
3
P (x)
.
Q(x)
This is unlike the situation with differentiation where it is always pretty obvious which rule we should
be applying!
162
In order to apply the method of partial fractions, it must be the case that the degree of
the numerator, i.e. P (x), is less than the degree of the denominator, i.e. Q(x). If this is
the case, we start by looking at how the denominator factorises and then proceed
according to which of the following cases we are in.
Case 1: The denominator has distinct [real] linear factors
If the denominator, Q(x), is of degree n and has n real and distinct roots a1 , a2 , . . . , an
then we can write
Q(x) = (x a1 )(x a2 ) (x an ),
i.e. Q(x) has distinct [real] linear factors. In this case, the method of partial fractions
dictates that we can write
A1
A2
An
P (x)
=
+
+ +
,
R(x) =
(x a1 )(x a2 ) (x an )
x a1 x a2
x an
x
dx.
x2 x 2
Find
Here the integrand is a rational function of two polynomials and the degree of the
numerator is less than the degree of the denominator. As such, we can use the
method of partial fractions and, looking at the denominator, we see that
x2 x 2 = (x 2)(x + 1),
so we are in the case where we have distinct linear factors. This means that we can
write
x
A1
A2
x
=
=
+
,
2
x x2
(x 2)(x + 1)
x2 x+1
for some constants A1 and A2 . To find these constants, we cross-multiply on the
right-hand-side to see that
A1 (x + 1) + A2 (x 2)
x
=
,
(x 2)(x + 1)
(x 2)(x + 1)
x = A1 (x + 1) + A2 (x 2).
Indeed, setting x = 2 on both sides, we see that 2 = 3A1 whereas setting x = 1 on
both sides, we see that 1 = 3A2 . Thus, we have
x2
x
x
2/3
1/3
=
=
+
,
x2
(x 2)(x + 1)
x2 x+1
using the values of A1 and A2 that we have found. Consequently, we find that
x2
x
dx =
x2
2/3
1/3
+
x2 x+1
dx =
2
1
ln |x 2| + ln |x + 1| + c,
3
3
163
5. Integration
We observe, in particular, that the degree of the denominator determines how many
constants we have to find.
Case 2: The denominator has a repeated [real] linear factor
If we find that one of the roots, say ak , of the denominator, Q(x), is real and repeated
m times then we replace the term
Ak
,
x ak
in the expansion from Case 1 with the terms
B1
B2
Bm
+
+ +
.
2
x ak (x ak )
(x ak )m
We then have to find the numbers B1 , B2 , . . . , Bm as well as any other numbers that
remain from Case 1. Lets look at a simple example.
Example 5.23
Find
x+3
dx.
(x + 2)(x 1)2
Here the integrand is a rational function of two polynomials and the degree of the
numerator is less than the degree of the denominator. As such, we can use the
method of partial fractions and, looking at the denominator, we have
(x + 2)(x 1)2 ,
and so we are in the case where we have a repeated linear factor. This means that
we can write
A1
B1
B2
x+3
=
+
+
,
2
(x + 2)(x 1)
x + 2 x 1 (x 1)2
for some constants A1 , B1 and B2 . To find these constants, we cross-multiply on the
right-hand-side to see that
A1 (x 1)2 + B1 (x 1)(x + 2) + B2 (x + 2)
x+3
=
,
(x + 2)(x 1)2
(x + 2)(x 1)2
Indeed, setting x = 2 on both sides, we see that 1 = 9A1 and setting x = 1 on both
sides, we see that 4 = 3B2 . However, to find B1 , we now note that comparing (say)
the coefficient of the x2 term on both sides of this expression we get 0 = A1 + B1
and so B1 = A1 = 1/9. Thus, we have
x+3
1/9
1/9
4/3
=
+
+
,
2
(x + 2)(x 1)
x + 2 x 1 (x 1)2
using the values of A1 , B1 and B2 that we have found. Consequently, we find that
1/9
1/9
4/3
+
+
dx
x + 2 x 1 (x 1)2
1
1
4
= ln |x + 2| ln |x 1|
+ c,
9
9
3(x 1)
x+3
dx =
(x + 2)(x 1)2
where c is an arbitrary constant.
164
We observe, again, that the degree of the denominator determines how many constants
we have to find.
Case 3: The denominator has an irreducible [real] factor
If we find that the denominator, Q(x), has an irreducible [real] factor like ax2 + bx + c,4
then we replace the corresponding term in the expansion from Case 1 with the term
C1 x + C2
.
ax2 + bx + c
We then have to find the numbers C1 and C2 as well as any other numbers that remain
from Case 1. Lets look at a simple example.
Example 5.24
Find
x
dx.
2
(x 1)(x + 2x + 2)
Here the integrand is a rational function of two polynomials and the degree of the
numerator is less than the degree of the denominator. As such, we can use the
method of partial fractions and, looking at the denominator, we have
(x 1)(x2 + 2x + 2),
and so we are in the case where we have an irreducible factor as x2 + 2x + 2 has no
real roots as, for instance, b2 4ac gives us 22 4(1)(2) = 4 8 = 4 < 0. This
means that we can write
A1
C1 x + C2
x
=
+
,
(x 1)(x2 + 2x + 2)
x 1 x2 + 2x + 2
for some constants A1 , C1 and C2 . To find these constants, we cross-multiply on the
right-hand-side to see that
x
A1 (x2 + 2x + 2) + (C1 x + C2 )(x 1)
=
,
(x 1)(x2 + 2x + 2)
(x 1)(x2 + 2x + 2)
and so, comparing the numerators, we need
x = A1 (x2 + 2x + 2) + (C1 x + C2 )(x 1).
Indeed, setting x = 1 on both sides, we see that 1 = 5A1 and, to find C1 , we now note
that comparing the coefficient of the x2 term on both sides of this expression we get
0 = A1 + C1 and so C1 = A1 = 1/5 and comparing the coefficient of the constant
term on both sides we get 0 = 2A1 C2 and so C2 = 2A1 = 2/5. Thus, we have
(x
x
1/5
(1/5)(x + 2)
=
+ 2
,
+ 2x + 2)
x1
x + 2x + 2
1)(x2
That is, we have a quadratic like ax2 + bx + c with b2 4ac < 0 so we cannot find real roots. This
means that we cannot factorise it using real factors and so we cannot use Case 1 or Case 2 on it.
165
5. Integration
using the values of A1 , C1 and C2 that we have found. Consequently, we find that
(x
x
dx =
+ 2x + 2)
1)(x2
1
5
1/5
(1/5)(x 2)
+ 2
x1
x + 2x + 2
dx
1
x2
2
x 1 x + 2x + 2
dx.
Now, the integral of the first term is easy but, to deal with the integral of the second
term, we note that the derivative of x2 + 2x + 2 is 2x + 2 (i.e. we are thinking about
the substitution g = x2 + 2x + 2 which we saw in Example 5.13). This means that,
writing
x2
1 2x 4
1 2x + 2 6
1 2x + 2
3
x2
=
=
=
2
,
2
2
2
+ 2x + 2
2 x + 2x + 2
2 x + 2x + 2
2 x + 2x + 2 x + 2x + 2
we can see that, completing the square in the denominator of the last term, we have
(x
x
1
dx =
+ 2x + 2)
5
1)(x2
1
5
1
1 2x + 2
3
+
2
x 1 2 x + 2x + 2 (x + 1)2 + 1
ln |x 1|
dx
1
ln |x2 + 2x + 2| + 3 tan1 (x + 1) + c,
2
Find
x4 + x3 + 2x2
dx.
(x 1)(1 + x2 )2
Here the integrand is a rational function of two polynomials and the degree of the
numerator is less than the degree of the denominator. As such, we can use the
method of partial fractions and, looking at the denominator, we have
(x 1)(1 + x2 )2 ,
and so we are in the case where we have a repeated irreducible factor as x2 + 1 has
no real roots as, for instance, b2 4ac gives us 02 4(1)(1) = 4 < 0. This means
that we can write
x4 + x3 + 2x2
A1
C1 x + C2 D1 x + D2
=
+
+
,
2
2
(x 1)(1 + x )
x1
1 + x2
(1 + x2 )2
5
That is, the number of constants we have to find is equal to the degree of the denominator in the
term we are dealing with.
166
1
1
x
+
+
dx
2
x1 1+x
(1 + x2 )2
1
+c
= ln |x 1| + tan1 x
2(1 + x2 )
5.2.6
167
5. Integration
In this case, the substitution would not work since we do not have the extra factor of
cos x in the integrand. However, as we shall see in the next example, we can easily find
this new integral if we use one of the trigonometric identities that we saw in
Section 2.1.4.
sin2 x dx.
Find
Example 5.26
which allows us to write the problematic integrand sin2 x in terms of the function
cos(2x) which is far easier to integrate. That is, rearranging this trigonometric
identity, we have
1
sin2 x =
1 cos(2x) ,
2
and so we find that
sin2 x dx =
1
2
1 cos(2x) dx =
1
1
x sin(2x) + c,
2
2
Activity 5.15
Find
cos2 x dx.
Example 5.27
dx
b2
(x + a)2
Here, for reasons that will soon become apparent, we make the suggested
substitution. As such, differentiating both sides of x + a = b sin with respect to ,
we have
dx
= b cos
=
dx = b cos d.
d
This means that our integral becomes
dx
b2
(x +
a)2
b cos
2
b2 b2 sin
dx =
cos
d =
cos
d,
if we use the trigonometric identity 1 sin2 = cos2 from (2.2). This then gives us
d = + c = sin1
168
x+a
b
+ c,
since x + a = b sin and where c is an arbitrary constant. Thus, we have found that
dx
= sin1
b2 (x + a)2
x+a
b
+ c,
Example 5.28
d
.
1 + cos(2)
The substitution t = tan is very useful and so we start by seeing how it can be
applied. Firstly, we note that, differentiating both sides with respect to , we get
dt
= sec2 ,
d
and so, using the trigonometric identity sec2 = 1 + tan2 from (2.4), this gives us
d =
dt
.
1 + t2
and
cos =
1
,
1 + t2
=
,
2
2
1+t
1+t
1 + t2
2
1 + t2 dt
=
2 1 + t2
1
t
1
dt = + c = tan + c,
2
2
2
169
5. Integration
1+
t2
1
Figure 5.1: A right-angled triangle with t = tan can have t on the opposite side and 1
on the
adjacent side which means that, using Pythagoras theorem, the hypotenuse must
be 1 + t2 . With this triangle, we can then quickly deduce the expressions for sin and
cos in terms of t which are needed for Example 5.28.
5.3
So far, we have been looking at indefinite integrals and we have been finding them by
using the idea of an antiderivative to deduce standard integrals and rules of integration.
We now turn to the geometric interpretation of an integral and this involves introducing
the idea of a definite integral and seeing what it represents.
5.3.1
In Section 3.3.1 we saw that the derivative of a function, f (x), gave us the gradient of
the curve y = f (x). We now consider what the integral of a function, f (x), tells us about
the curve y = f (x) and see how this comes about through the idea of a definite integral.
What is a definite integral?
Recall that an indefinite integral is so-called since, given a function, f (x), and one of its
antiderivatives, F (x), i.e. two functions related by the fact that
dF
= f (x),
dx
we have
f (x) dx = F (x) + c,
where c is an arbitrary constant. And, indeed, it is this arbitrary constant that makes
this integral indefinite as we do not know what c is. In a similar vein, instead of writing,
b
f (x) dx,
a
f (x) dx = F (x) .
a
170
F (x)
a
= F (b) F (a),
i.e. the value of the integral depends only on the value of the antiderivative at the
points x = a and x = b. Thus, this is now a definite integral as it no longer involves an
arbitrary constant, c.
Activity 5.16
f (x) dx = F (x) + c
a
= F (b) F (a),
if c is a constant. Hence explain why we can omit the constant of integration when
evaluating definite integrals.
Another consequence of this discussion is that it allows us to see how to use our basic
rules of integration to evaluate definite integrals. For instance, if k and l are constants
and f (x) and g(x) are functions, then we can see that the linear combination rule gives
us
b
g(x) dx,
f (x) dx + l
Example 5.29
Evaluate
(x + 4) dx.
1
If we follow the two step procedure above, i.e. integrating to find an antiderivative
and then dealing with the limits, we get
3
1
x2
(x+4) dx =
+ 4x
2
=
1
32
12
+ 4(3)
+ 4(1)
2
2
9
1
+ 12
+4
2
2
= 12,
171
5. Integration
x dx +
(x + 4) dx =
9 1
2 2
x2
4 dx =
2
+ 12 4
+ 4x
1
=
1
32 12
2
2
+ 4(3) 4(1)
= 12,
Definite integrals are useful because they tell us about the area under a curve.
Specifically, if we have the definite integral
b
f (x) dx,
(5.1)
where f (x) 0 for all x such that a x b,6 we say that we have a non-negative
integrand and find that the value of the integral is the area of the region between the
curve y = f (x), the x-axis and the vertical lines x = a and x = b as illustrated in
Figure 5.2.
y
y = f (x)
x
a
Figure 5.2: The hatched region is between the curve y = f (x), the x-axis and the vertical
lines x = a and x = b. In cases like this we have a non-negative integrand, i.e. f (x) 0
for a x b, and so the definite integral in (5.1) gives us the area of this hatched region.
Example 5.30 Find the area of the region between the line y = 4 2x, the x-axis
and the vertical lines x = 0 and x = 2 which is illustrated in Figure 5.3(a).
There are two ways to find this area:
As this is just a right-angled triangle, the area is just half times base times
height, i.e.
1
area of triangle = 2 4 = 4.
2
Thus, the area of the region is four.
6
At the moment we will just accept this caveat. The reason why we need f (x) to be non-negative for
values of x between the limits of integration will become clear very soon.
172
As we have y = f (x) with f (x) = 4 2x, we can see from Figure 5.3(a) that
f (x) 0 between x = 0 and x = 2. So, as noted above, the area should be given
by
2
2
0
(4 2x) dx = 4x x2
= (4 2 22 ) (4 0 02 ) = (8 4) 0 = 4,
11111
00000
00000
11111
00000
11111
00000
11111
00000
11111
00000
11111
00000
11111
00000
11111
4
3
y = 4 2x
y = 4 x2
(a)
(b)
Figure 5.3: Non-negative integrands. (a) For Example 5.30, the region between the line
y = 4 2x, the x-axis and the vertical lines x = 0 and x = 2. (b) For Example 5.31, the
region between the parabola y = 4 x2 , the x-axis and the vertical lines x = 1 and
x = 1.
However, generally, we wont have a simple geometric way of finding the area under a
curve and so we will have to use integration.
Example 5.31 Find the area of the region between the parabola y = 4 x2 , the
x-axis and the vertical lines x = 1 and x = 1 which is illustrated in Figure 5.3(b).
As we have y = f (x) with f (x) = 4 x2 , we can see from Figure 5.3(b) that
f (x) 0 between x = 1 and x = 1. So, as noted above, the area should be given by
1
1
(4 x2 ) dx = 4x
=
11
3
x3
3
1
1
4(1)
11
3
(1)3
3
4(1)
(1)3
3
22
,
3
173
5. Integration
Activity 5.19 Observe that the region in the previous example is symmetric about
the y-axis. Use this observation to explain why the area of this region is two times
the area represented by the definite integral,
1
0
(4 x2 ) dx,
and verify that this does indeed give the correct area.
What definite integrals with non-positive integrands represent
We now start to consider what happens to the definite integral in (5.1) when we cant
guarantee that the integrand is non-negative, i.e. what happens if we do not have
f (x) 0 for all x such that a x b? To simplify matters, we will start by asking:
What happens when this condition always fails? That is, what happens when the
integrand is non-positive as f (x) 0 for all x such that a x b.
So what does the definite integral in (5.1) tell us about the area of the region bounded
by the curve y = f (x), the x-axis and the vertical lines x = a and x = b when we have a
non-positive integrand, i.e. when f (x) 0 for a x b, as illustrated in Figure 5.4?
One way of looking at this is to note that,
If f (x) 0 for all a x b, then f (x) 0 for all a x b.
But, this means that f (x) gives us a non-negative integrand and the area, A, of the
region in question is given by
b
A=
a
f (x) dx =
f (x) dx
f (x) dx = A,
i.e. for non-positive integrands, the definite integral gives us minus the area. Thus, in
the case of non-positive integrands, the area is given by the magnitude of the definite
integral. Lets have a look at an example.
y
O
x
y = f (x)
Figure 5.4: The hatched region is between the curve y = f (x), the x-axis and the vertical
lines x = a and x = b. In cases like this we have a non-positive integrand, i.e. f (x) 0
for a x b, and so the definite integral in (5.1) gives us minus the area of this hatched
region.
174
Example 5.32 Find the area of the region between the line y = 4 2x, the x-axis
and the vertical lines x = 2 and x = 4 which is illustrated in Figure 5.5(a).
There are two ways to find this area:
As this is just a right-angled triangle, the area is just half times base times
height, i.e.
1
area of triangle = 2 4 = 4.
2
Thus, the area of the region is four.
As we have y = f (x) with f (x) = 4 2x, we can see from Figure 5.5(a) that
f (x) 0 between x = 2 and x = 4. So, looking at the definite integral we get,
4
(42x) dx = 4xx
which is minus the answer we would expect. As such, we take the magnitude of
this answer and so the area is, again, four.
Consequently, if f (x) 0 between the vertical lines, the definite integral gives us
minus the area and so we take the magnitude of the definite integral to find the area.
y = 4 2x
1
O
x
1
(a)
y = 4 2x
x
1
(b)
Figure 5.5: Negative integrands and their relation to area. The region between the line
y = 4 2x, the x-axis and the vertical lines (a) x = 2 and x = 4 for Example 5.32, and
(b) x = 0 and x = 4 for Example 5.33.
175
5. Integration
as illustrated in Figure 5.6. One way of looking at this is to note that the definite
integral
f (x) dx gives us the hatched area, A1 , between the vertical lines x = a and
x = c,
a
f (x) dx gives us minus the hatched area, A2 , between the vertical lines x = c
c
and x = b.
As such, the hatched area, A, between the lines x = a and x = b is given by
c
A = A1 + A 2 =
f (x) dx +
a
f (x) dx ,
c
x
a
Figure 5.6: The hatched region is between the curve y = f (x), the x-axis and the vertical
176
Thirdly, use this information to determine the areas by finding the appropriate
definite integrals (bearing in mind that the integrands will now be either
non-negative or non-positive).
Fourthly, add up all the areas to find the total area.
To see how this works lets consider a couple of examples.
Example 5.33 Find the area of the region between the line y = 4 2x, the x-axis
and the vertical lines x = 0 and x = 4 which is illustrated in Figure 5.5(b).
As indicated in Figure 5.5(b), the line y = 4 2x crosses the x-axis when x = 2 and
this lies between x = 0 and x = 4. We can also see that the function is non-negative
for 0 x 2 and non-positive for 2 x 4. As such, using our earlier workings in
Examples 5.30 and 5.32, we split the total region into two sub-regions to see that:
as we saw in Example 5.32. Thus, the area is four here as we have a non-positive
integrand.
Consequently, the total area is eight.
We also note, in passing, that the definite integral
4
4
0
(4 2x) dx = 4x x
2
0
= (4 4 42 ) (4 0 02 ) = (16 16) 0 = 0,
and, as this is zero, it most definitely is not giving us the area we seek!
Activity 5.20 Verify that the answer to the previous example is correct by finding
the areas of the triangles involved.
Example 5.34 Find the area of the region between the parabola y = 1 x2 , the
x-axis and the vertical lines x = 2 and x = 2 which is illustrated in Figure 5.7.
As indicated in Figure 5.7, the parabola y = 1 x2 crosses the x-axis when x = 1
and these points lie between x = 2 and x = 2. We can also see that the function is
non-negative for 1 x 1 and non-positive for 2 x 1 and 1 x 2. As
such, we split the total region into three sub-regions to see that:
177
5. Integration
x3
(1)3
(2)3
(1 x ) dx = x
= 1
2
3 2
3
3
1
8
4
= 1 +
2 +
= .
3
3
3
2
4
3
x3
13
(1)3
(1 x ) dx = x
= 1
1
3 1
3
3
1
1
4
1
1 +
= .
= 1
3
3
3
2
4
3
(1 x2 ) dx = x
4
3
x3
3
1
1
= 2
23
13
8
1
4
1
= 2
1
= .
3
3
3
3
3
4
3
+ 43 +
4
3
which is four.
(1x2 ) dx = x
x3
3
2
2
= 2
23
(2)3
8
8
4
(2)
= 2 2 +
= ,
3
3
3
3
3
5.3.2
We have seen how to use the basic rules of integration when dealing with definite
integrals and so we now look at how we can use the other two rules of integration,
namely integration by substitution and integration by parts, in this context.
Integration by substitution
When evaluating a definite integral using integration by substitution we follow the same
procedure as before but now, we also change the limits of integration so that they are
values of g rather than values of x. That is, if we are making the substitution g = g(x)
and we have a definite integral with limits x = a and x = b, after the substitution, the
limits will be g = g(a) and g = g(b) respectively. This is best illustrated by an example.
178
y
y = 1 x2
2
1
1
2
3
Figure 5.7: Negative integrands and their relation to area (continued). For Example 5.34,
the region between the parabola y = 1 x2 , the x-axis and the vertical lines x = 2 and
x = 2.
Example 5.35
x ex
Find
2 +1
dx.
dg = 2x dx
x dx =
1
dg.
2
In this case, as we have a definite integral, we also change the limits of integration,
i.e.
lower limit: x = 0 gives g = g(0) = 02 + 1 = 1, and
upper limit: x = 1 gives g = g(1) = 12 + 1 = 2.
Hence, the substitution gives
1
xe
x2 +1
2
g
dx =
e
1
1
=
2
1
dg
2
1 g
e dg =
e
2
g
=
1
1
2
e2 e1
e
= (e 1),
2
as the answer.
Alternatively, using our indefinite integral from Example 5.10, we saw that
integration by substitution gave us
x ex
2 +1
dx =
1 x2 +1
e
+c,
2
x ex
0
2 +1
dx =
1 x2 +1
e
2
=
0
1
2
2 +1
e1
2 +1
e0
1
2
e2 e1
e
= (e 1),
2
as before.
179
5. Integration
For a harder example, lets see what happens when we have to make a substitution that
works because of our double-angle formulae from Section 2.1.4.
1
Example 5.36
x 1 x dx.
dx = 2 sin cos d,
x 1 x dx =
/2
/2
where we have
used the trigonometric identity cos2 = 1 sin2 from (2.2)to get
cos from the 1 x in the integrand. Then, using the double-angle formula
sin(2) = 2 sin cos from (2.6), we see that this gives us
1
0
1
x 1 x dx =
2
/2
sin2 (2) d,
0
which we solve using a variation on the method given in Example 5.26, i.e. we note
that cos(4) = 1 2 sin2 (2) from Activity 2.18, so that
1
0
1
x 1 x dx =
4
/2
0
1 cos(4) d =
1
1
sin(4)
4
4
/2
=
0
Example 5.37
d
.
4 2 cos2
dt
1 + t2
and
cos2 =
1
,
1 + t2
180
2
2 + 4t2
=
.
1 + t2
1 + t2
,
8
1 + t2 dt
1
dt
=
2 + 4t2 1 + t2
4 0
0
2
1
1
,
=
2 tan ( 2t)
=
4
8
0
d
=
4 2 cos2
1
2
1
dt
+ t2
f (x)g(x) dx,
a
i.e. we have to evaluate the f (x)g(x) term using the limits of integration as well as
evaluating the new [easier] definite integral.
1
Example 5.38
x ex dx.
Find
0
g(x) = ex ,
and
where we have suppressed the arbitrary constant from the integration. Applying the
rule in the case of a definite integral then gives,
1
x ex dx = (x)(ex )
0
(1)(ex ) dx = x ex
0
ex dx,
0
which leads to
1
1
x
x e dx =
0
(1)(e ) (0)(e ) e
=
0
e1 0
e1 e0
= 1,
as the answer.
181
5. Integration
Alternatively, using our indefinite integral from Example 5.18, we saw that
integration by parts gave us
x ex dx = (x 1) ex +c,
and so this means that, if we suppress the constant of integration, we get
1
1
0
x ex dx = (x 1) ex
= (1 1) e1 (0 1) e0 = 0 ( e0 ) = 1,
as before.
5.4
Applications of integrals
Integrals can be used in economics and we now introduce two ways in which they can
arise in that subject. The first is what happens when we want to find a cost function
but we only know the marginal cost; and the second introduces the idea of consumer
and producer surpluses.
5.4.1
Suppose that the cost of producing a quantity, q, of goods is given by the cost function,
C(q). In Section 3.3.3, we met the idea of the marginal cost, MC(q), of producing q
units which was given by
dC
MC(q) =
,
dq
and this was useful since the approximation
C
MC(q)q,
MC(q) dq.
(5.2)
However, this presents us with a problem as finding the indefinite integral on the
right-hand-side of (5.2) will yield all the antiderivatives of MC(q) i.e. a function C(q)
that contains an arbitrary constant whereas we want to find the particular
antiderivative that is actually the cost function i.e. we want to find a particular value
of this constant. So, the question is: Which value of the arbitrary constant will give us
the cost function? In order to answer this question, we need to be given more
information, say the fixed costs associated with this production, so that we can find the
right value for this constant. Lets consider an example.
182
Example 5.39
and its fixed costs are 10, 000. What is the cost function, C(q), for this company?
Using (5.2) above, we see that the cost function is given by the integral of the
marginal cost, i.e.
C(q) =
where c is an arbitrary constant. This tells us, depending on the value of c, all of the
possible cost functions for this company. But, which one should we take? Obviously,
perhaps, we want the one which also gives us fixed costs of 10, 000, i.e. we want
C(0) = 10, 000 = 10, 000 = 02 + 100 e0 +c = 10, 000 = 100 + c = c = 9, 900,
as the fixed costs are the cost of producing nothing. Thus, the cost function for this
company is given by
C(q) = q 2 + 100 eq +9, 900,
as this function agrees with the question on both the marginal and the fixed costs of
production.
5.4.2
Suppose that a market has linear supply and demand functions as illustrated in
Figure 5.8. As we know from Section 2.1.5, the equilibrium price, p , and the
equilibrium quantity, q , occur at the point where the graphs of these functions
intersect. Indeed, at equilibrium, as the consumers buy q units of the good at a price of
p per unit, they pay an amount p q to the suppliers and we can think of this as the
area of the hatched region in Figure 5.9(b).
However, if the consumers are willing to buy q units of the good, it can be argued7
that the consumers would be willing to pay an amount given by
q
pD (q) dq,
0
which is the area of the hatched region in Figure 5.9(a). The difference between the area
that represents what they would pay and the area that represents what they actually
pay, i.e. the area of the hatched region in Figure 5.9(d), is called the consumer surplus.
Indeed, this consumer surplus, CS, can be found using the formula
q
CS =
0
pD (q) dq p q ,
and this is the amount that the consumers save by paying what they actually paid
instead of what they would have paid.
7
183
5. Integration
Figure 5.8: Linear supply and demand functions for a market. Note that the equilibrium
price, p , and the equilibrium quantity, q , occur at the point where the graphs of these
functions intersect.
Similarly, if the suppliers are willing to supply q units of the good, it can be argued
that they need to be paid an amount given by
q
pS (q) dq,
0
which is the area of the hatched region in Figure 5.9(c). The difference between the area
that represents what they are actually paid and the area that represents what they need
to be paid, i.e. the area of the hatched region in Figure 5.9(e), is called the producer
surplus. Indeed, this producer surplus, PS, can be found using the formula
q
PS = p q
pD (q) dq,
0
and this is the amount that the suppliers gain by being paid what they actually receive
instead of what they need to receive. Lets look at a simple example.
Example 5.40
1
pD (q) = 70 q,
3
and an inverse supply function given by
1
pS (q) = 20 + q.
2
Find the equilibrium price and quantity. What are the consumer and producer
surpluses for this market?
The equilibrium quantity, q , makes the prices obtained from the inverse demand
and supply functions equal, i.e.
1
1
5
70 q = 20 + q
=
50 = q
=
q = 60,
3
2
6
184
111111
000000
000000
111111
000000
111111
000000
111111
000000
111111
000000
111111
000000
111111
000000
111111
p
000000
111111
000000
111111
000000
111111
000000
111111
000000
111111
000000
111111
000000
111111
000000
111111
000000
111111
000000
111111
000000
111111
000000
111111
000000
111111
O
111111
000000
000000
111111
000000
111111
000000
111111
000000
111111
000000
111111
000000
111111
p
(a)
1111111111
0000000000
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
(b)
(c)
p
111111
000000
000000
111111
000000
111111
000000
111111
000000
111111
000000
111111
000000
111111
000000
111111
p
000000
111111
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
O
0q
1
1010
111111
000000
000000
111111
10
000000
111111
1010
000000
111111
000000
111111
1010
p
Consumer surplus:
area (a) area (b)
Producer surplus:
area (b) area (c)
(d)
(e)
Figure 5.9: What people pay or need to be paid. (a) What the consumers would pay for a
quantity q . (b) What the consumers pay for a quantity q if the market is at equilibrium.
(c) What the suppliers need to be paid for a quantity q . (d) What the consumers save if
they pay for a quantity q in a market that is at equilibrium, this is the consumer surplus.
(e) What the producers gain if they sell a quantity q in a market that is at equilibrium,
this is the producer surplus.
and this means that the equilibrium price, p , is given by
1
p = 70 (60) = 70 20 = 50,
3
if we use the inverse demand function.
Hence, to find the consumer surplus, CS, we have
q
CS =
0
pD (q) dq p q ,
1
70 q
3
q2
dq = 70q
6
60
0
1
= 70(60) (60)2 0 = 4, 200 600 = 3, 600,
6
185
5. Integration
is the consumer surplus. And, to find the producer surplus, PS, we have
q
PS = p q
pS (q) dq,
0
1
20 + q
2
q2
dq = 20q +
4
60
0
1
= 20(60) + (60)2 0 = 1, 200 + 900 = 2, 100,
4
5
Although, as both the demand and supply functions are linear in this example, there is
an easier way to find the consumer and producer surpluses as the next Activity shows.
Activity 5.21 Sketch the inverse demand and supply functions in the previous
example and shade in the regions which represent the consumer and producer
surplus. What are the areas of these regions?
Of course, the demand and supply functions that we are given may not be linear and, in
such cases, we would have to use integration to find the consumer and producer
surpluses.
Activity 5.22
If the equilibrium quantity is 10, find the equilibrium price and hence determine the
consumer surplus.
Learning outcomes
At the end of this chapter and having completed the relevant reading and activities, you
should be able to:
find integrals using standard integrals and the rules of integration;
find integrals by simplifying the integrand using partial fractions and trigonometric
identities;
use integrals to find areas;
solve problems from economics-based subjects that involve integrals.
186
Solutions to activities
Solution to activity 5.1
Given the linear combination rule, i.e.
[kf (x) + lg(x)] dx = k
f (x) dx + l
g(x) dx,
f (x) dx + 0
g(x) dx = k
f (x) dx,
f (x) dx + 1
g(x) dx
g(x) dx,
f (x) dx + (1)
g(x) dx
g(x) dx.
and
dG
= g(x).
dx
f (x) dx + l
where c is an arbitrary constant. But, by the linear combination rule for differentiation,
we also have
d
dF
dG
kF (x) + lG(x) + c = k
+l
+ 0 = kf (x) + lg(x),
dx
dx
dx
which means that kF (x) + lG(x) + c is also an antiderivative of kf (x) + lg(x), i.e.
[kf (x) + lg(x)] dx = kF (x) + lG(x) + c.
Consequently, we have
[kf (x) + lg(x)] dx = k
f (x) dx + l
g(x) dx,
187
5. Integration
cos x dx = 3 sin x + c,
where c is an arbitrary constant. For (b), we use the sum rule to see that
(ex + cos x) dx =
ex dx +
cos x dx = ex + sin x + c,
where c is an arbitrary constant. For (c), we use the linear combination rule to see that
3 sin x
3
x
dx = 3
1
dx = 3( cos x)3 ln |x|+c = 3 cos x3 ln |x|+c,
x
sin x dx3
dg = 4dx
1
dx = dg.
4
1
g
1
dg
4
1
4
1
1
1
dg = ln |g| + c = ln |4x + 7| + c,
g
4
4
1
dg
4
eg
1
4
eg dg =
1 g
1
e +c = e4x+7 +c,
4
4
1 xn+1
+ c,
a n+1
whereas, if n = 1, we have
(ax + b)1 dx =
1
1
dx = ln |ax + b| + c,
ax + b
a
1 ax+b
e
+c,
a
1
sin(ax + b) dx = cos(ax + b) + c, and
a
188
1
sin(ax + b) + c,
a
where c is an arbitrary constant.
cos(ax + b) dx =
cos b dx = x cos b + c,
and
e4x+7 dx =
1 4x+7
e
+c,
4
3x2
1
g
1
dg
6
1
6
1
1
1
dg = ln |g| + c = ln |3x2 + 7| + c,
g
6
6
where c is an arbitrary constant whereas, in the second integral, this substitution gives
2 +7
x e3x
dx =
eg
1
dg
6
1
6
eg dg =
1 g
1 2
e +c = e3x +7 +c,
6
6
where c is an arbitrary constant. In both cases, note that the extra x in the integrand
was actually needed for the substitution g = 3x2 + 7 to work.
Solution to activity 5.8
Here the composition is sin(x2 ) and so we take g = x2 . As such, we have
dg
1
= 2x
=
x dx = dg,
dx
2
which is a constant multiple of the other part of the product in the integrand, i.e. this
substitution will work. Thus, the substitution gives
x sin(x2 ) dx =
sin(g)
1
dg
2
1
2
1
1
sin(g) dg = cos(g) + c = cos(x2 ) + c,
2
2
where c is an arbitrary constant. Here, of course, the extra x in the integrand was
needed for the substitution g = x2 to work.
189
5. Integration
g 2 ( dg) =
g 2 dg =
1
g3
+ c = cos3 x + c.
3
3
Here, of course, the extra sin x in the integrand was needed for the substitution
g = cos x to work.
Solution to activity 5.10
In Activity 2.4, we saw that
cos x
,
sin x
which means that the composition is (sin x)1 and so we take g = sin x. As such, we
have
dg
= cos x,
dx
which is the other part of the product in the integrand, i.e. this substitution will work.
Thus, we see that
dg = cos x dx,
cot x =
cos x
dx =
sin x
dg
= ln |g| + c = ln | sin x| + c.
g
Here, of course, the extra cos x in the integrand was needed for the substitution
g = sin x to work.
Solution to activity 5.11
We note that the quadratic expression in the denominator can be written as
x2 + 2x + 2 = (x + 1)2 + 1,
if we complete the square. As such, we have
x2
dx
=
+ 2x + 2
dx
= tan1 (x + 1) + c,
2
(x + 1) + 1
using the result we derived in Example 5.17. (A useful exercise at this point is to try
and get this answer by actually making the substitution x + 1 = tan as we did in that
example.)
190
loga (x) =
we have
loga x dx =
1
ln a
ln x dx =
x
1
x ln(x) x + c = x loga (x)
+ c,
ln a
ln a
and
g (x) = ln x,
we differentiate f (x) and integrate g (x) using the result in Example 5.20 to get
f (x) = 1
g (x) = x ln x x,
and
where we have suppressed the arbitrary constant from the integration. Applying the
rule then gives
x ln x dx = x(x ln x x)
= x2 ln x x2
= x2 ln x
x2
(1)(x ln x x) dx
x ln x dx
x2
2
+c
x ln x dx + c,
so that, taking the integral on the right-hand-side over to the left-hand-side, we have
2
x2
x ln x dx = x ln x
+c
2
2
x2
x2
x ln x dx =
ln x
+ c,
2
4
where c is an arbitrary constant. Notice that this is the same as the answer we found in
Example 5.19 but it is slightly trickier to get and we need to know the answer to
Example 5.20.
Solution to activity 5.14
Unlike what we saw in Example 5.21, it would actually make more sense to find
(x2 + 1)2 x2 dx,
by multiplying out the brackets and integrating term-by-term rather than integrating it
by parts. Doing this, we get
(x2 + 1)2 x2 dx =
(x6 + 2x4 + x2 ) dx =
x7 2 5 x3
+ x +
+ c,
7
5
3
191
5. Integration
where c is an arbitrary constant. Indeed, to verify that this is the same answer as the
one we saw in the example, it is easiest to take the earlier answer and note that
4
x3 2
(x + 1)2
3
3
x 7 x5
+
7
5
x3 4
4 x 7 x5
(x + 2x2 + 1)
+
+c
3
3 7
5
x7 2 5 x3
4
4
=
+ x +
x 7 x5 + c
3
3
3
21
15
7
3
x
2
x
=
+ x5 +
+ c,
7
5
3
+c=
To find this integral we also use the other double-angle formula from Activity 2.18,
namely
1
cos(2x) = 2 cos2 x 1
=
cos2 x =
1 + cos(2x) ,
2
as this allows us to write the problematic integrand cos2 x in terms of the function
cos(2x) which is far easier to integrate. This means that we have
cos2 x dx =
1
2
1 + cos(2x) dx =
1
1
x + sin(2x) + c,
2
2
f (x) dx = F (x) + c ,
a
F (x) + c
=
a
F (b) + c F (a) + c
= F (b) F (a),
which is exactly what we wanted. That is, including a constant of integration does not
affect the value of a definite integral and so we can omit it.
Solution to activity 5.17
For definite integrals, it should be easy to see that we have the
constant multiple rule: If k is a constant and f (x) is a function, then
b
kf (x) dx = k
a
8
f (x) dx.
a
In what follows, bear in mind that a constant such as c, when evaluated at either x = a or x = b, is
just c.
192
[f (x) + g(x)] dx =
f (x) dx +
g(x) dx.
a
[f (x) g(x)] dx =
f (x) dx
g(x) dx.
a
dG
= g(x).
dx
and
f (x) dx+l
a
g(x) dx = k F (x)
a
+l G(x)
b
a
Consequently, we have
b
f (x) dx + l
g(x) dx,
(4 x2 ) dx =
(4 x2 ) dx +
(4 x2 ) dx,
where the values of the two integrals on the right-hand-side, i.e. the areas they
represent, are equal. As such, we can write
1
1
2
(4 x ) dx = 2
(4 x2 ) dx,
193
5. Integration
if we decide to find the second of these integrals. Then, looking at the integral on the
right-hand-side, we get
1
x3
(4 x ) dx = 4x
3
4(1)
=
0
(1)3
3
4(0)
(0)3
3
11
,
3
1
60 20 = 600,
2
1
60 30 = 900,
2
231
,
q+1
CS =
0
194
pD (q) dq p q ,
5.4. Exercises
p
111111
000000
70
000000
111111
000000
111111
000000
111111
000000
111111
000000
111111
000000
111111
CS
000000
111111
000000
111111
50
000000
111111
1111111111
0000000000
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
PS
20
210
60
Figure 5.10: A sketch of the consumer and producer surpluses for Activity 5.21.
we need to find
10
0
231
dq = 231 ln |q + 1|
q+1
10
0
Exercises
Exercise 5.1
Find the following indefinite integrals.
(a)
(b)
sin3 x dx,
(c)
(x + 2) ln x dx.
Exercise 5.2
Find
1 + ex
dx.
ex
Exercise 5.3
x2
Find
dx.
x2 1
Exercise 5.4
Use the substitution t = tan
x
to evaluate
2
/2
0
dx
.
2 + cos x
Exercise 5.5
Find the area of the region between the curve y = x3 , the x-axis and the vertical lines
x = 1 and x = 2.
195
5. Integration
Solutions to exercises
Solution to exercise 5.1
For (a), we have to find
sin3 x cos x dx,
and we notice that the integrand involves the composition sin3 x. This suggests that we
should make the substitution g = sin x and, as this gives us
dg
= cos x
dx
dg = cos x dx,
which is the other part of the product in the integrand, we can be sure that this will
work. So, using this substitution we get
sin3 x cos x dx =
g 3 dg =
g4
1
+ c = sin4 x + c,
4
4
(1 cos2 x) sin x dx =
sin x dx
Of course, the first of these integrals on the right-hand-side is trivial and the other was
found in Activity 5.9. So, using this, we find that
sin3 x dx =
sin x dx
= cos x +
1
cos2 x sin x dx = cos x cos3 x + c
3
1
cos3 x + c,
3
and
g (x) = x + 2,
like we did in Example 5.19. So, differentiating f (x) and integrating g (x) we get
f (x) =
196
1
x
and
g(x) =
x2
+ 2x,
2
where we have suppressed the arbitrary constant from the integration. Applying the
rule then gives,
x2
+ 2x
2
(x+2) ln x dx = (ln x)
x2
+ 2x
2
1
x
dx =
x
(x+4) ln x
2
x
+ 2 dx,
2
and, clearly, the new integral is easier to find. Thus, finding this integral, we get
(x + 2) ln x dx =
x
(x + 4) ln x
2
x
x
+ 2 dx = (x + 4) ln x
2
2
x2
+ 2x + c,
4
1 + ex
since, in this form, we can see that we have the composition 1 + ex/2 in the integrand.
This suggests that we should make the substitution g = 1 + ex/2 and, as this gives us
1
dg
= ex/2
dx
2
2 dg = ex/2 dx,
which is the other part of the product in the integrand, we can be sure that this will
work. So, using this substitution we get
g 3/2
4
1 + ex
3/2
1/2
g
(2
dg)
=
2
g
dg
=
2
+c = 1 + ex/2
dx
=
+c,
x
3/2
3
e
where c is an arbitrary constant.
Solution to exercise 5.3
The integral
x2
dx,
x2 1
has an integrand which is the quotient of two polynomials. But, as these have the same
degree, we can not use the method of partial fractions on it as it stands. Instead, we
start by rewriting the integrand as
x2
x2 1 + 1
1
=
=1+ 2
,
2
2
x 1
x 1
x 1
1
,
1
as the degree of its numerator is less than the degree of its denominator. That is, since
x2 1 = (x 1)(x + 1), we have distinct linear factors and so we can write
x2
1
1
A1
A2
=
=
+
,
1
(x 1)(x + 1)
x1 x+1
197
5. Integration
1
1/2
1/2
=
+
,
1
x1 x+1
using the values of A1 and A2 that we have found. Consequently, putting this all
together, we find that
x2
dx =
x2 1
1+
1/2
1/2
+
x1 x+1
dx = x +
1
1
ln |x 1| ln |x + 1| + c,
2
2
dx
,
2 + cos x
using the substitution t = tan(x/2). This substitution, like the substitution t = tan
that we saw in Example 5.28, is very useful and so we start by seeing how it can be
applied. Firstly, we note that we can easily write sin(x/2) and cos(x/2) in terms of t by
using a right-angled triangle like the one in Figure 5.1 as this immediately tells us that
sin
t
x
=
2
1 + t2
and
cos
x
1
=
.
2
1 + t2
So, using the double-angle formula cos(2x) = cos2 x sin2 x from (2.6), we see that the
denominator of our integrand can be written as
2 + cos x = 2 + cos2 x sin2 x = 2 +
t2
3 + t2
1
=
.
1 + t2 1 + t2
1 + t2
2 dt
,
1 + t2
in terms of t. Thirdly, as this is a definite integral, we also have to change the limits of
integration, i.e.
198
dx
=
2 + cos x
1 + t2
3 + t2
2 dt
1 + t2
=2
0
dt
,
3 + t2
1
dx
x
= 2 tan1
2 + cos x
3
3
0
1
0
2
=
3
1
tan1 tan1 0
3
2
0 ,
=
3 6
y = x3
1
0
0
1
0
1
01
1
0
1
0
0
1
0
1
0
1
0
1
0
1
0
1
1
0
0
1
0
0
1
01
011
00
1 1
1
Figure 5.11: The hatching indicates the region of interest in Exercise 5.5.
In particular, we see that the curve crosses the x-axis when x = 0 and that the function
is non-positive when 1 x 0 and non-negative when 0 x 2. As such, we split
the total region into two sub-regions to see that:
Between x = 1 and x = 0 we evaluate the definite integral,
0
x4
x dx =
4
1
1
4
=
1
04 (1)4
1
= .
4
4
4
x4
x dx =
4
=
0
24 04
16
=
= 4.
4
4
4
199
5. Integration
200
Chapter 6
Functions of several variables
Essential reading
(For full publication details, see Chapter 1.)
Binmore and Davies (2002) Sections 3.13.9.
Anthony and Biggs (1996) Chapters 11 and 12.
Further reading
Simon and Blume (1994) parts of 13.113.2, parts of 14.114.6 and 14.8, parts of
15.115.2.
Adams and Essex (2010) parts of Chapter 12.
Aims and objectives
The objectives of this chapter are as follows.
To understand that functions of two variables represent surfaces and see how to
visualise these surfaces using sections and contours.
To introduce partial derivatives and use them in various contexts.
To introduce tangent planes, gradient vectors, directional derivatives and Taylor
series for functions of two variables.
Specific learning outcomes can be found near the end of this chapter.
6.1
Introduction
In Section 2.1, we saw that a function f : R R was a rule which takes an input,
x R, and gives us a unique output, f (x) R. We now turn our attention to functions
of two variables, i.e. functions where the input consists of a pair of numbers, (x, y) R2 ,
and whose output is a unique number f (x, y) R.1 In particular, we will mainly be
concerned with functions of two variables where the variables are independent, i.e. the
1
The theory we consider extends to the general case where the input consists of n numbers
(x1 , x2 , . . . , xn ). This extension to functions of n variables (with n 3) should be obvious and so we
do not spend much time on it here. However, although we will mainly be dealing with the two-variable
case, we will occasionally consider functions of more than two variables.
201
value of x can be chosen independently of the value of y and vice versa. As we shall see,
functions of two variables often occur in economics and other fields where we might
wish to apply mathematical techniques. Two important examples of such functions from
economics are:
The production function of a firm, q(k, l), gives the amount it produces when using
k units of capital and l units of labour.
The utility function of a consumer, u(x1 , x2 ), describes how much utility a
consumer derives from a bundle (x1 , x2 ) of two goods. As such it enables us to
compare the preferences of the consumer when he is confronted with different
combinations of these two goods.
These applications will be discussed later because, before we consider what we may
want to use them for, we want to know how we can visualise what is going on when we
have a function of two variables.
6
6.2
Surfaces
g(x, y) = x2 y 2
and
h(x, y) = x2 y 2 ,
202
6.2. Surfaces
z
c
(a, b, c)
O
b
(a, b, 0)
y
Figure 6.1: Representing the point (a, b, c) using the x, y and z-axes in R3 .
6.2.1
Planes
The simplest kind of two-variable function is one which is linear in x and y, i.e. where
z = f (x, y) = ax + by,
for some constants a and b. Such functions represent planes and, generally speaking,
any surface which has an equation of the form
ax + by + cz = d,
where at least one of the constants a, b and c is non-zero will represent a plane. For
what follows, the important kinds of plane are, basically, those that fall into the
following categories:
The (x, y), (y, z) and (x, z)-planes which have equations z = 0, x = 0 and y = 0
respectively. (These are the planes in the middle of the three planes illustrated in
Figures 6.3(a), (b) and (c) respectively.)
Planes parallel to the (x, y), (y, z) and (x, z)-planes which, for some constant c, will
have equations z = c, x = c and y = c respectively. (These are the other planes
illustrated in Figures 6.3(a), (b) and (c) respectively.)
Planes which dont fall into either of the above categories, i.e. those with equations of
the form
ax + by + cz = d,
for some constants a, b, c and d (where at least two of the constants a, b and c are
non-zero) will not overly concern us here even though you will come across them in
Section 2.11 of 173 Algebra.
203
(a)
(b)
(c)
z
z
y
y
x
(a)
(b)
(c)
Figure 6.3: Planes parallel to the (x, y), (y, z) and (x, z)-planes: (a) From bottom, z =
10, 0, 10; (b) From left x = 10, 0, 10 and (c) From right y = 10, 0, 10. (Note, in
particular, how the axes are labelled in these pictures.)
6.2.2
Although curve sketching (which is sketching the graph of a function of one variable) is
important in this course, you will not be asked to sketch surfaces (such as the ones
illustrated above in Figure 6.2) for functions of two variables. However, there are useful
ways of visualising such surfaces which do not involve sketching it in three dimensions.
One of these is to use planes, such as the ones we saw in Figure 6.3, to carve up a
three-dimensional illustration of a surface into two-dimensional representations in terms
of contours and sections. In particular, these ideas may be familiar to you from your
experiences with maps (for contours) and other technical diagrams (for sections).
Horizontal planes and the contours of a surface
One way of visualising a surface is to look at its contours, which are the curves of
intersection that arise when we look at the points of intersection of a surface with
planes that are parallel to the (x, y)-plane. To find the contours, we take a plane
204
6.2. Surfaces
parallel to the (x, y)-plane, say the plane z = c, and find the curve of intersection
between it and the surface z = f (x, y), i.e. the curve with equation c = f (x, y). This
curve is the z = c contour, i.e. the set of points (x, y) which give z = c when we put
them into the equation z = f (x, y).
Example 6.1 Find the z = 2 contour of the surface z = x y + 4. Repeat for z = 4
and z = 6.
To find the z = 2 contour of the surface z = x y + 4 we need to find the curve of
intersection, which in this case, is given by
2 = x y + 4.
Rearranging this gives the equation y = x + 2 which is the equation of a straight line.
Similarly, we find that:
For z = 4, the curve of intersection is given by 4 = x y + 4 which gives us
y = x.
Thus, we see from these equations that these two contours are straight lines as well.
The surface and its contours are illustrated in Figure 6.4.
5
1
z
0
5
2
2
2
0
0
5.0
5.0
2.5
0.0
0.0
2.5
2.5
5.05.0
2.5
5.0
2.5
0.0
2.5
x
y
5.05.0
2.5
0.0
2.5
5.0
y
3
y
4
(a)
(b)
(c)
Figure 6.4: For Example 6.1. (a) The surface z = x y + 4 and, from the bottom, the
planes z = 2, 4, 6. (b) The curves of intersection of the surface and the planes in (a)
with their corresponding values of z. (c) The contours: Each line represents a contour
(i.e. the points with coordinates (x, y) that map to a particular value of z) in this
case, the further to the right the line is, the larger the corresponding value of z is, as we
have z = 2, 4, 6 as we move from left to right. Notice that, here, the contours are parallel
straight lines (i.e. they have the same gradient but different y-intercepts).
Activity 6.1 Find the equations of the z = 10, z = 0 and z = 10 contours of the
surface z = 4x + 2y 2 and sketch these in the (x, y)-plane clearly labelling the
value of z which is associated with each contour.
205
Example 6.2 Find the z = 16 contour of the surface z = x2 + y 2 . What are the
z = c contours of the surface z = x2 + y 2 when (i) c > 0, (ii) c = 0 and (iii) c < 0?
To find the z = 16 contour of the surface z = x2 + y 2 we need to find the curve of
intersection which, in this case, is simply
x2 + y 2 = 16.
This is the equation of a circle, centred on the origin, with a radius of four.
To find the z = c contours in the three cases indicated we just need to find out what
the curve
x2 + y 2 = c,
looks like in the three cases. So, we have:
c.
If c = 0, the contour is the point (0, 0) as this is the only solution to the
equation x2 + y 2 = 0.
4
70
3
60
2
50
70
60
40
50
30
40
20
30
20
5.0
0
5.0
10
0.0
5.0
2.5
5.0
2.5
2.5
10
2.5
0.0 y
0
1
5.0
2.5
2.5
0.0
5.0
2.5
5.0
2.5
0.0 y
5.0
(a)
(b)
(c)
Figure 6.5: For Example 6.2. (a) The surface z = x2 + y 2 , which we saw in Figure 6.2(a),
and the planes z = 4, 16, 25. (b) The curves of intersection of the surface and the planes
in (a) with their corresponding values of z. (c) The contours: Each circle represents a
contour (i.e. the points with coordinates (x, y) that map to a particular value of z) in
this case, the larger the radius of the contour, the larger the corresponding value of z as
we have z = 4, 16, 25. Notice that, here, the contours are concentric circles (i.e. they have
the same centre but different radii).
206
6.2. Surfaces
Activity 6.2 Find the z = 25 contour of the surface z = x2 y 2 . What are the
z = c contours of this surface when (i) c > 0, (ii) c = 0 and (iii) c < 0?
Vertical planes and the sections of a surface
Another way of visualising a surface is to look at its sections, which are the curves of
intersection that arise when we look at the points of intersection of a surface with
planes that are perpendicular to the (x, y)-plane. To find the sections, we take a plane
perpendicular to the (x, y)-plane and find the curve of intersection between it and the
surface z = f (x, y). In particular, in this course, we shall only need to consider sections
that arise from planes that are parallel to the (x, z)-plane (i.e. y = c for some constant
c) or parallel to the (y, z)-plane (i.e. x = c for some constant c).
As such, the easiest sections to sketch are the ones we get when we consider the (x, z)
and (y, z)-planes which are both perpendicular to the (x, y)-plane. In particular, we find
that the section which we get from the:
(x, z)-plane, which has the equation y = 0, is the curve of intersection between it
and the surface z = f (x, y), i.e. the curve with equation z = f (x, 0).
(y, z)-plane, which has the equation x = 0, is the curve of intersection between it
and the surface z = f (x, y), i.e. the curve with equation z = f (0, y).
Lets look at what these sections look like in the case of the two surfaces we considered
above when we were looking for contours.
Example 6.3
Activity 6.3 Find the (x, z) and (y, z)-sections of the surface z = 4x + 2y 2 and
sketch these in the appropriate planes.
Example 6.4
207
For the (y, z)-section, we have x = 0 and so the curve of intersection is given by
z = y 2 and this is a parabola in the (y, z)-plane.
The surface and these sections are illustrated in Figure 6.7.
8.0
8.0
7.2
7.2
6.4
6.4
5.6
5.6
4.8
4.8
z 4.0
z 4.0
3.2
3.2
2.4
2.4
1.6
1.6
2.5
0.8
0.8
0.0
6
z
4
5.0
2
0.0
5.0
2.5
2.5
0.0
x
2.5
0.0
0
5.0
5.0
(a)
(b)
(c)
Figure 6.6: For Example 6.3. (a) The surface z = x y + 4 and the planes x = 0 (which
goes diagonally from bottom left to top right) and y = 0 (which goes diagonally from top
left to bottom right). (b) The (x, z)-section is the line z = x + 4. (c) The (y, z)-section is
the line z = y + 4.
8.0
8.0
7.2
7.2
6.4
6.4
5.6
5.6
4.8
4.8
z 4.0
z 4.0
3.2
3.2
2.4
2.4
1.6
1.6
0.8
0.8
0.0
6
z
4
4
0
4
2
0
x
0.0
0
(a)
(b)
(c)
Figure 6.7: For Example 6.4. (a) The surface z = x2 + y 2 and the planes x = 0 (which
goes diagonally from bottom left to top right) and y = 0 (which goes diagonally from top
left to bottom right). (b) The (x, z)-section is the parabola z = x2 . (c) The (y, z)-section
is the parabola z = y 2 .
Activity 6.4 Find the (x, z) and (y, z)-sections of the surface z = x2 y 2 and
sketch these in the appropriate planes.
More generally, we may want to look at the sections we get when we consider planes
that are parallel to the (x, z) and (y, z)-planes which we considered above. In
particular, we find that the sections we get from the planes that are parallel to the:
208
6.2. Surfaces
(x, z)-plane, which have equations of the form y = c where c is a constant, are the
curves of intersection between it and the surface z = f (x, y), i.e. the curve with
equation z = f (x, c).
(y, z)-plane, which have equations of the form x = c where c is a constant, are the
curves of intersection between it and the surface z = f (x, y), i.e. the curve with
equation z = f (c, y).
Lets see what these sections look like in the case of the two surfaces we considered
above.
Example 6.5
Observe that only the first of these sections lives in the (x, z)-plane, but we can
sketch the other two in this plane to get a feel for how the surface is changing when
we look at the sections y = c for different values of c. The surface and these sections,
when drawn in the (x, z)-plane, are illustrated in Figure 6.8.
6
8
4
z
4
2
4
2
2
0
0
x
0
2
5
4
1
(a)
(b)
Figure 6.8: For Example 6.5. (a) The surface z = x y + 4 and the planes y = 0, y = 2
and y = 4 as we move from right to left. (b) The y = 0, y = 2 and y = 4 sections (as
we move from top to bottom) all drawn in the (x, z)-plane. Note that, the y = 0 section
is the (x, z)-section and, of the three sections illustrated, this is the only one that really
lives in the (x, z)-plane. Also notice that, as the value of c increases when we look at
the plane y = c, the value of the z-intercept decreases when we look at the section.
209
Find the y = 2, 0, 2 sections of this surface and sketch them in the (x, z)-plane.
Find the x = 2, 0, 2 sections of this surface and sketch them in the (y, z)-plane.
Example 6.6
Find the y = 0, 1, 2 sections of this surface and sketch them in the (x, z)-plane.
Find the x = 0, 1, 2 sections of this surface and sketch them in the (y, z)-plane.
6.3
Partial differentiation
210
8
6
6
z
4
z
4
2
2
4
2
0 y
0
4
2
3
y
1
4
1
(a)
(b)
Figure 6.9: For Example 6.6. (a) The surface z = x2 + y 2 and the planes x = 0, x = 1 and
x = 2 as we move from left to right. (b) The x = 0, x = 1 and x = 2 sections all drawn
in the (y, z)-plane. Note that, the x = 0 section is the (y, z)-section and, of the three
sections illustrated, this is the only one that really lives in the (y, z)-plane. Notice that,
as the value of c increases when we look at the plane x = c, the value of the z-intercept
increases when we look at the section.
to yield partial derivatives.2 In some ways, this will be similar to what we saw when we
differentiated functions of one variable to get their derivatives, but as we now have two
variables to deal with, things get a little trickier.
6.3.1
Consider f (x, y), a function of two independent variables. For a fixed value of y, say
y = y0 , we can look at the function g(x) = f (x, y0 ) which is now a function of x only.
Clearly, the rate of change of g(x) with respect to x is just the derivative of this
function with respect to x. But, what happens when we want to calculate the rate of
change of f (x, y) with respect to x for any fixed value of y? To do this we avoid
specifying a particular value of y by just assuming that y is a constant and
differentiating with respect to x. So, given a function f (x, y) we denote the operation of
differentiating f with respect to x whilst holding y constant by
f
or, more compactly, fx (x, y),
x
(6.1)
and call this the partial derivative of f (x, y) with respect to x.3 In a similar manner, we
can define the partial derivative of f (x, y) with respect to y, denoted by
f
or, more compactly, fy (x, y),
y
(6.2)
Most of the material in these notes can be generalised to functions with more than two variables.
But, in this course, almost without exception, we will be considering functions of two variables.
3
Note that we use the curly-d, i.e. , for partial derivatives rather than the normal straight-d, i.e.
d, which one encounters in the notation dg/dx for the derivative of a function g(x) of one variable. We
shall see why it is important to keep these two notions of differentiation separate later.
Similarly, we use fx (x, y) as shorthand for the partial derivative of f (x, y) with respect to x rather
than the g (x) which one encounters as the shorthand for the derivative of a function g(x) of one variable.
211
which is what we obtain from differentiating f (x, y) with respect to y whilst holding x
constant.
Clearly, the partial derivative of f (x, y) with respect to x, i.e. the result of
differentiating f (x, y) with respect to x whilst holding y constant, is going to be another
function of x and y. This function of x and y is what is denoted by the symbols in (6.1).
But, what does this partial derivative mean? In effect, what we have done when we
consider the function f (x, y) for some fixed value of y, say y0 , is to look at the section of
the curve z = f (x, y) we get when y = y0 , i.e. the section given by the equation
z = f (x, y0 ) which lies in a plane that has y = y0 and is parallel to the (x, z)-plane.
Then, when we differentiate f (x, y0 ) with respect to x, we are finding the gradient of
this section, i.e. it tells us how z = f (x, y0 ) is varying with x. Consequently, this partial
derivative is telling us something about the gradient of the surface when we are at the
point (x, y0 ) and we are looking in the x-direction. This will become clear when we
look at tangent planes in Section 6.4.1.
Activity 6.9 Describe what the partial derivative of f (x, y) with respect to y
evaluated at the point (x0 , y) tells us about the gradient of the surface at the point
(x0 , y).
6.3.2
Calculating the partial derivatives of f (x, y) is only slightly more difficult than finding
the derivative of a function of one variable. Recalling that the partial derivative of a
function f (x, y) with respect to x, i.e. fx (x, y), is just the derivative of f (x, y) with
respect to x whilst holding y constant, to calculate fx (x, y) we just treat any occurrence
of y in f (x, y) as if it were a constant and differentiate f (x, y) with respect to x. And, in
a similar way, we can find the partial derivative of a function f (x, y) with respect to y,
i.e. fy (x, y). Lets look at an example.
Example 6.7
Lets do this slowly so that we get the idea. To find fx (x, y), we treat y as if it were
a constant and lets say that this constant is c. So, we have a function of one
variable given by
g(x) = f (x, c) = cx2 + 5c3 x + c2 ,
and differentiating this with respect to x gives
dg
= 2cx + 5c3 .
dx
But, c is the constant were using to represent y and so replacing all the cs with ys
we have
f
= 2xy + 5y 3 ,
x
which is the partial derivative of f (x, y) with respect to x.
Similarly, to find fy (x, y), we treat x as if it were a constant and (again) lets say
that this constant is c. So, we have a function of one variable given by
g(y) = f (c, y) = c2 y + 5cy 3 + y 2 ,
212
Given that f (x, y) = 3x3 + 7xy 1 + 2y 9 , find fx (x, y) and fy (x, y).
Lets do this quickly. To find fx (x, y), we treat y as a constant and differentiate
with respect to x to get
f
= 9x2 + 7y 1 .
x
Similarly, to find fy (x, y), we treat x as a constant and differentiate with respect to y
to get
f
= 7xy 2 + 18y 8 .
y
And, were done!
Activity 6.10
Given that
f (x, y) = 2x + x3 y
x y3
+ ,
y
2
Given that
f (x, y) = x ex+y ,
find fx (x, y) and fy (x, y).
We first note that we can write this function as
2
f (x, y) = (x ex ) ey ,
2
and so, to find fx (x, y), we treat ey as a constant and we differentiate the function
x ex using the product rule to get x ex +1 ex . This gives us
f
2
2
= ey (x ex + ex ) = (x + 1) ex+y .
x
213
f
2
2
= x ex (2y ey ) = 2xy ex+y .
y
Activity 6.11
6.3.3
and
l(t) = 10 3t,
(6.3)
Notice that, since k and l both depend on t, we can only pick certain pairs of values, (k, l). That is,
in this case, the variables k and l are not independent.
214
Sometimes, in this context, we call F (t) the total derivative of F (t) with respect to t
(in order to distinguish it from the partial derivatives of f with respect to x and y).
To see why the chain rule works, consider that if we change t by a small amount, t,
the corresponding change in F (t) is given by
dF
t,
dt
but here, there are two ways in which F (t) = f (x(t), y(t)) can change with t.
F
Firstly, F can change with t because f changes with x and x changes with t, lets
denote this change in F by x F . In this case, we have
f
x,
x
as we are holding y constant to see how F changes with x and this means that
x F
f dx
t,
x dt
as the change in x, x, is related to a change in t by x
x F
x (t)t.
Secondly, F can change with t because f changes with y and y changes with t, lets
denote this change in F by y F . In this case, we have
y F
f
y,
y
as we are holding x constant to see how F changes with y and this means that
y F
f dy
t,
y dt
y (t)t.
f dx
f dy
t +
t,
x dt
y dt
we can now equate our two expressions for F and divide through by t to get the
chain rule which we saw above in (6.3). Lets see how we could have used it to answer
the question we saw in Example 6.10.
Example 6.11 Consider the functions in Example 6.10. Use the chain rule to find
the rate of change of production with time.
Here q(k, l) = kl, k(t) = 3 + 2t and l(t) = 10 3t. In this case, if we again let
Q(t) = q(k(t), l(t)), the chain rule states that
dQ
q dk q dl
=
+
.
dt
k dt
l dt
As such, using this, we can see that
dQ
= (l)(2) + (k)(3) = 2(10 3t) 3(3 + 2t) = 11 12t,
dt
which agrees with our earlier answer.
215
Activity 6.12 Suppose that f (x, y) = x2 y and that x(t) = 2 + 3t and y(t) = t2 + 1.
If F (t) = f (x(t), y(t)), use the chain rule to find the total derivative of F with
respect to t and check your answer by explicitly finding F (t) and differentiating it
with respect to t.
We now consider one of the many useful applications of the chain rule.
The derivative of an implicit function
An equation g(x, y) = c where c is a constant can, in some cases, be rearranged (or
solved) to give y as an explicit function of x. Once we have done this, we can then
differentiate our expression for y with respect to x to find its derivative, y (x).
Example 6.12 Suppose that y is a function of x defined by the equation
x2 y = 7. Find y as an explicit function of x and hence find y (x).
as well as
g g dy
+
.
x y dx
216
dx
= 1,
dx
as long as gy (x, y) = 0, That is, y (x) can easily be found by using the partial
derivatives of g. (But, dont forget the minus sign!)
Example 6.13 In Example 6.12, y was a function of x defined implicitly by the
equation x2 y = 7. Find y (x) using the result above.
As we have the equation x2 y = 7 we can write this as g(x, y) = c with
g(x, y) = x2 y and c = 7. Using the above result we can then see that
g
= 2x
x
which means that
dy
g/x
2x
=
=
= 2x,
dx
g/y
1
as before.
Example 6.14
g
= 1,
y
and
3 2
x y 6x y + 2xy = 1.
Verify that the point (x, y) = (1/2, 2) satisfies this equation and find the value of the
derivative, y (x), at this point.
The point (x, y) = (1/2, 2) satisfies the equation since, putting x = 1/2 and y = 2
into the left-hand side, we get
1
2
(2)3 6
1
2
(2)2 + 2
1
2
(2) = 2 3 + 2 = 1,
which is what we have on the right-hand side of the equation. We then see that the
equation defining y implicitly as a function of x is of the form g(x, y) = 1 where
g(x, y) = x2 y 3 6x3 y 2 + 2xy. So, according to the formula given above, we have
dy
g/x
=
,
dx
g/y
and so, since
g
= 2xy 3 18x2 y 2 + 2y
x
we have
and
g
= 3x2 y 2 12x3 y + 2x,
y
dy
2xy 3 18x2 y 2 + 2y
= 2 2
,
dx
3x y 12x3 y + 2x
as long as 3x2 y 2 12x3 y + 2x = 0. Thus, given the point (1/2, 2), we can substitute
these values into our expression for y (x) to see that the value of the derivative at
this point is 6.
217
Activity 6.13
Verify that the point (x, y) = (1, 1) satisfies this equation and find the value of the
derivative, y (x), at this point.
Extensions of the chain rule
What we seen above can be extended. Suppose, for instance, that g is is a function of
two variables x and y, both of which are themselves functions of two variables k and l.
We can think of this as defining a composite function G(k, l) = g(x(k, l), y(k, l)) and an
extension of the chain rule then assures us that
G
g x g y
=
+
k
x k y k
and
G
g x g y
=
+
.
l
x l
y l
To see why the first of these formulae works, consider that if we change k by a small
amount, k, whilst holding l constant, the corresponding change in G(k, l) is given by
G
G
k,
k
but here, there are two ways in which G(k, l) = g(x(k, l), y(k, l)) can change with k.
Firstly, G can change with k because g changes with x and x changes with k, lets
denote this change in G by x G. In this case, we have
x G
g
x,
x
as we are holding y constant to see how F changes with x and this means that
x G
g x
k,
x k
g
y,
y
as we are holding x constant to see how F changes with y and this means that
y G
g y
k,
y k
218
g x
g y
k +
k,
x k
y k
we can now equate our two expressions for G and divide through by k to get the
chain rule for Gk (k, l) which we saw above.
Activity 6.14 Use a similar argument to the one above to explain why the chain
rule formula for Gl (k, l) works.
And, in a similar manner, if we suppose that g(x, y, z) = c defines z implicitly as a
function of x and y, we can use this form of the chain rule to derive the formulae
z
g/x
=
x
g/z
and
z
g/y
=
,
y
g/z
which will allow us to calculate the partial derivatives of z with respect to x and y.
Indeed, to see why the first of these formulae works, we consider that if we knew the
function, z(x, y), that satisfied the equation g(x, y, z) = c, we could find a new function,
G(x, y), of x and y only which is given by G(x, y) = g(x, y, z(x, y)). Then using the
chain rule, we have
g dx g z
G
=
+
.
x
x dx z x
But, G(x, y) = c where c is a constant and so we also have
G
=0
x
as well as
dx
= 1,
dx
g g z
+
.
x z x
Find the partial derivatives qk (k, l) and ql (k, l). What are the values of these partial
derivatives at the point where k = 1 and l = 1?
[Hint: The identity q 3 + q 2 = (q 1)(q 2 + q + 2) will be useful.]
219
6.3.4
Homogeneous functions are important in economics since they allow us to capture the
idea of returns to scale. In this section we will see what it means for a function to be
homogeneous and consider an important theorem about homogeneous functions. The
former will enable us to give an economic interpretation of homogeneous production
functions in terms of returns to scale and the latter will enable us to consider the
economic significance of the marginal products that can be derived from such
production functions.
Homogeneity and returns to scale
We say that a function, f (x, y), is homogeneous of degree r if
f (x, y) = r f (x, y),
for any R. Lets start by looking at some examples of homogeneous functions.
Example 6.15
one.
x+
y is homogeneous. What
220
That is, a proportional increase in inputs leads to a larger proportional increase in output.
That is, a proportional increase in inputs leads to a smaller proportional increase in output.
221
f
f
+y
= rf (x, y).
x
y
This follows from a simple application of the chain rule since, using the definition of a
function that is homogeneous of degree r, we have
In this course, a question may involve verifying that Eulers theorem holds for some
given homogeneous function. As an example, lets verify that it is true for the two
homogeneous functions we considered in Examples 6.15 and 6.16.
Example 6.19 In Example 6.15, we saw that the function f (x, y) = x1/2 y 1/2 is
homogeneous of degree one. Verify that Eulers theorem holds for this function.
In this case we can see that
f
1
= x1/2 y 1/2
x
2
and
f
1
= x1/2 y 1/2 .
y
2
f
f
+y
=x
x
y
1 1/2 1/2
x
y
+y
2
1 1/2 1/2
x y
2
1
1
= x1/2 y 1/2 + x1/2 y 1/2
2
2
222
and
f
1
= y 1/2 .
y
2
f
f
+y
=x
x
y
1 1/2
+y
x
2
1 1/2
y
2
1
1
1 1/2
1
= x1/2 + y 1/2 =
x + y 1/2 = f (x, y),
2
2
2
2
and since the degree of homogeneity of this function is a half, we have 12 f (x, y) on
the right-hand-side of Eulers theorem. Thus, as these two expressions are the same,
Eulers theorem holds.
We now turn to the economic significance of Eulers theorem. Consider a firm that
invests an amount of capital, k, and labour, l, in its production process and this yields a
production level of q(k, l). Further, assume that this production function is
homogeneous of degree one, i.e. that we have constant returns to scale. Eulers theorem
then asserts that
q
q
+l
= q.
k
k
l
Now, ql gives us the marginal product of labour, i.e. it measures the change in
production if we change the amount of labour. In particular, if we invest one more unit
of labour, say by employing one more worker, ql tells us the resulting change in
production.7 As such, it makes sense to say that this extra worker is responsible for this
change in production and so, if we assume that we reward workers by giving them
goods equal to the quantity they produce, it makes sense to reward this worker with a
quantity of goods given by ql . Thus, if all workers produce the same amount, i.e. ql , and
there are l (i.e. the amount of labour invested) workers, it makes sense that they should
all be rewarded with a quantity of goods equal to ql . As such, the quantity lql represents
the total quantity of goods that should be given as rewards to the workers (i.e. the
labour). A similar argument applies to the quantity kqk , i.e. this should be the total
quantity of goods that should be given as rewards to the providers of the capital.
Consequently, Eulers theorem tells us that these rewards should add up to the total
quantity of goods produced, i.e. all the goods being produced should be distributed
amongst the suppliers of capital and the providers of labour. In summary, this says:
But, strictly, this is only approximate since if q is the change in production and l is the change
in labour, the relationship
q
q
q
or
q
l,
l
l
l
is only an approximation. As such, taking on one more worker (i.e. changing the amount of labour by
one) gives l = 1 and hence the change in production, q, is given [approximately] by q = ql . However,
the argument given in these notes can be made precise if we consider the change in production due to
an arbitrarily small change in the amount of labour instead of, say, the intuitively more obvious change
of one worker.
223
6.3.5
If we have a function f (x, y), we can use partial differentiation to find the new functions
fx (x, y) and fy (x, y). These new functions are called the first-order partial derivatives of
f . However, it is also possible to partially differentiate these new functions with respect
to x and y to get the second-order partial derivatives of f . Obviously, for a function of
two variables, there are four second-order partial derivatives, i.e. those that are
unmixed:
2f
f
f
2f
and
,
=
=
x2
x x
y 2
y y
and those that are mixed:
2f
=
yx
y
f
x
2f
=
xy
x
and
f
.
y
fyy = (fy )y ,
fxy = (fx )y
respectively. In this course, we will find that the order of partial differentiation in the
mixed second-order partial derivatives is unimportant since we will always have
fxy = fyx . In particular, this fact can serve as a useful check when we are working out
second-order partial derivatives.
Example 6.21
were given by
fx (x, y) = 2xy + 5y 3
and
and
224
Example 6.22
were given by
fx (x, y) = 9x2 + 7y 1
and
fxy (x, y) = 7y 2 ,
and
and
Activity 6.17
Activity 6.10.
f (x, y) = x ex+y ,
were given by
fx (x, y) = (x + 1) ex+y
and
So, to find fxx (x, y), we treat ey as a constant and we differentiate the function
(x + 1) ex using the product rule to get (x + 1) ex +1 ex . This gives us
2f
2
2
= ey [(x + 1) ex + ex ] = (x + 2) ex+y .
2
x
225
To find fxy (x, y), we treat (x + 1) ex as a constant and we differentiate the function
2
2
ey using the chain rule to get 2y ey . This gives us
2f
2
2
= (x + 1) ex (2y ey ) = 2(x + 1)y ex+y .
yx
To find the second-order derivatives that arise from fy (x, y), we first note that we
can write it as
f
2
= 2(x ex )(y ey ).
y
2
So, to find fyx (x, y), we treat 2y ey as a constant and we differentiate the function
x ex using the product rule to get x ex +1 ex . This gives us
2f
2
2
= 2y ey (x ex + ex ) = 2(x + 1)y ex+y .
xy
To find fyy (x, y), we treat 2x ex as a constant and we differentiate the function y ey
2
2
using the chain and product rules to get y(2y ey ) + ey . This gives us
2f
2
2
2
= 2x ex (2y 2 ey + ey ) = 2x(2y 2 + 1) ex+y .
2
y
Notice that fxy = fyx as we should expect in this course.
Activity 6.19
Activity 6.11.
6.4
We now look at some of the useful things that partial derivatives tell us about functions
of two variables. Before you start this section, you should note that this material makes
use of some ideas from Chapter 2 of 173 Algebra, namely
the dot product of two vectors (see Section 2.8),
displacement and direction vectors (see Section 2.9),
the equation of a plane (see Section 2.11), and
the equation of a hyperplane (see Section 2.12).
Make sure that you understand these before you proceed.
6.4.1
Tangent planes
Suppose that we have a surface whose equation is given by z = f (x, y). If c = f (a, b),
then the point (a, b, c) is on this surface and, if we look at the sections given by x = a
226
and y = b, which are parallel to the (y, z)-plane and (x, z)-plane respectively, we can
find tangent lines in these planes by using the partial derivatives as these tell us how z
is changing with y and x respectively at this point. In particular, if x = a, the section is
given by z = f (a, y) and the tangent line is given by
z = c + fy (a, b)(y b),
and this lives in the plane x = a which is parallel to the (y, z)-plane whereas if y = b,
the section is given by z = f (x, b) and the tangent line is given by
z = c + fx (a, b)(x a),
and this lives in the plane y = b which is parallel to the (x, z)-plane.
Example 6.24 Show that the point (1, 1, 2) lies on the surface whose equation is
z = x2 + y 2 . What are the equations of the tangent lines to the x = 1 and y = 1
sections at this point?
The point (1, 1, 2) lies on the surface z = x2 + y 2 as 2 = 12 + 12 . Here we have
z = f (x, y) with f (x, y) = x2 + y 2 and so, looking at the:
x = 1 section, we have
fy (x, y) = 2y
fy (1, 1) = 2,
and so the tangent line, which lives in the plane x = 1, has an equation given by
z = 2 + 2(y 1) = 2y,
as we should expect since this section has an equation given by z = 1 + y 2 . This
section and the tangent line are illustrated in Figure 6.10(a).
y = 1 section, we have
fx (x, y) = 2x
fx (1, 1) = 2,
and so the tangent line, which lives in the plane y = 1, has an equation given by
z = 2 + 2(x 1) = 2x,
as we should expect since this section has an equation given by z = 1 + x2 . This
section and the tangent line are illustrated in Figure 6.10(b).
In particular, note that these tangent lines live in the planes that define the
relevant sections.
Indeed, as we can find two tangent lines that tell us about how the surface z = f (x, y)
is changing in the x and y-directions at the point (a, b, c) by considering the y = b and
x = a sections respectively, we can use these two lines to define the tangent plane to the
surface at this point. The question is: How do we find the equation of this tangent
plane?
227
x=1
y=1
z = 2y
z = 2x
(a)
(b)
Figure 6.10: Tangent lines to the (a) x = 1 and (b) y = 1 sections of the surface z = x2 +y 2
Lets assume that both of the partial derivatives, fx (x, y) and fy (x, y), are defined at
the point (a, b, c). We know, from Section 2.11 of 173 Algebra, that the vector equation
of a plane through this point is given by
u
xa
v y b = 0,
w
zc
where the vector (u, v, w) is the normal vector to the plane. Indeed, working out this
dot product, we find that
u(x a) + v(y b) + w(z c) = 0,
is the Cartesian equation of the plane. But, what are u, v and w? Well, if we assume
that we have w = 0, i.e. the plane we are considering is not vertical, then we can write
this as
v
u
z = c (x a) (y b),
w
w
and, to be a tangent plane, we require that the two tangent lines we found above lie in
the plane. In particular, we find that when x = a, we must have
z =c
v
(y b) giving us z = c + fy (a, b)(y b),
w
u
(x a) giving us z = c + fx (a, b)(x a),
w
v
= fy (a, b) and
w
u
= fx (a, b).
w
This means that the Cartesian equation of the tangent plane is given by
z c = fx (a, b)(x a) + fy (a, b)(y b),
and writing this as
fx (a, b)(x a) + fy (a, b)(y b) (z c) = 0,
228
(6.4)
fx (a, b)
xa
fy (a, b) y b = 0.
1
zc
(6.5)
u
fx (a, b)
v = fy (a, b) ,
w
1
is a normal vector to this tangent plane.
Example 6.25 Following on from Example 6.24, find the Cartesian and vector
equations of the tangent plane to the surface z = x2 + y 2 at the point (1, 1, 2). Verify
that the tangent lines to the x = 1 and y = 1 sections at this point (found in
Example 6.24) lie in this tangent plane.
Using what we found in Example 6.24 and (6.4), it should be clear that the
Cartesian equation of the tangent plane to the surface z = x2 + y 2 at the point
(1, 1, 2) is given by
z 2 = 2(x 1) + 2(y 1)
z = 2x + 2y 2,
2
x1
2 y 1 = 0.
1
z2
Of course, if you work out the dot product in the latter, you should get the former!
If we now find the x = 1 section of this tangent plane we get
z = 2(1) + 2y 2 = 2y,
which is the tangent line to the x = 1 section of the surface and so this must lie in
the tangent plane and, similarly, if we find the y = 1 section of this tangent plane we
get
z = 2x + 2(1) 2 = 2x,
which is the tangent line to the y = 1 section of the surface and so this must lie in
the tangent plane too. This is illustrated in Figure 6.11.
We note in passing that, if f is differentiable,8 then the tangent plane to f (x, y) at the
point (a, b) gives us a linear approximation to f (x, y) at nearby points, i.e.
f (x, y)
xa
.
yb
229
Figure 6.11: The tangent plane to the surface z = x2 +y 2 at the point (1, 1, 2) as discussed
in Example 6.25. The lines in this tangent plane, which lie in the x = 1 and y = 1 planes,
are the tangent lines to the x = 1 and y = 1 sections of the surface respectively.
This prompts us to define the derivative of f (x, y) with respect to the vector x = (x, y)
to be the vector
df
= fx (x, y), fy (x, y) ,
dx
so that we can write
df
xa
f (x, y) f (a, b) +
.
dx (a,b) y b
This then gives us something which looks like a Taylor series and we will see more of
this in Section 6.4.5. But, before we do this, lets consider another important use of
what we have just seen.
6.4.2
Gradient vectors
The tangent to the surface z = f (x, y) at the point (a, b, c), where c = f (a, b), has a
Cartesian equation given by
z c = fx (a, b)(x a) + fy (a, b)(y b).
Now, if we look at the intersection of the surface and its tangent plane with the
horizontal plane z = c, we find that the surface gives us the contour c = f (x, y) and the
tangent plane gives us the line
fx (a, b)(x a) + fy (a, b)(y b) = 0.
Now, this line passes through the point (a, b) and, given that this line is in the tangent
plane of the surface at the point (a, b, c), it should be clear that it is the tangent line of
this contour at (a, b). In particular, as we can write the equation of this line as
fx (a, b)
xa
fy (a, b)
yb
= 0,
230
fx (a, b)
,
fy (a, b)
(6.6)
fx (x, y)
fy (x, y)
2x
,
2y
2
.
2
Then, using (6.6), we see that the Cartesian equation of the tangent line to the
z = 2 contour at this point9 is given by
2
x1
2
y1
=0
2(x 1) + 2(y 1) = 0
y = 2 x.
x
2x
0
1
+x
,
2
1
1
1
2
1
2
1
. But, of course,
= 2 + (2) = 0,
which means that f (1, 1) is indeed perpendicular to this tangent line and, in
particular, it will be perpendicular to the contour at this point too. This is
illustrated in Figure 6.12.
In general, given a function f (x, y), we call the vector
f (x, y) =
fx (x, y)
,
fy (x, y)
(6.7)
the gradient of f . Indeed, we have seen that fx (a, b) and fy (a, b) allow us to see how
rapidly f is changing if we move away from the point (a, b) in the x or y-direction
respectively. Now, we will look at how f (a, b) allows us to see how rapidly f is
changing if we move away from the point (a, b) in any direction.
9
Note that (x, y) = (1, 1) gives z = f (1, 1) = 2 and so this point is on the z = 2 contour of this
surface.
231
y =2x y
z=2
f (1, 1)
1
O
Figure 6.12: The z = 2 contour of the surface z = x2 + y 2 and its tangent line at the
point (1, 1) as discussed in Example 6.26. Observe how the tangent line to the contour at
this point is perpendicular to the vector f (1, 1). (The x and y-intercepts of the contour
have been omitted for clarity.)
6.4.3
Directional derivatives
Given the function f (x, y), we want to find its derivative, fu (a, b), in the direction of the
= (u1 , u2 )T .10 Of course, if u
is a unit vector in the x-direction, i.e.
unit vector u
=
u
1
0
0
1
but the question is: What if we are not using either of these two directions?
Consider the point on the surface z = f (x, y) at the point (a, b, c) where c = f (a, b). At
i.e. the curve of intersection of the
this point, we can find the section in the direction u,
Then,
surface and a plane that contains the point (a, b, c) and the vector u.
geometrically, we would want to interpret fu (a, b) as the gradient of the tangent line to
is a unit vector, this means that we have a vector v given by
this section. Now, as u
u1
v = u2 ,
fu (a, b)
which lies in the plane and points in the direction of the tangent line. As such, this
vector is perpendicular to the normal vector to the surface at this point and so we have
u1
fx (a, b)
u2 fy (a, b) = 0.
fu (a, b)
1
That is, working out this dot product, we have
=
That is, we have a direction u and we work with a unit vector in that direction, i.e. we use u
= u21 + u22 = 1.
(u1 , u2 )T where |u|
232
or, rearranging,
fu (a, b) = u1 fx (a, b) + u2 fy (a, b) =
u1
f (a, b)
x
,
u2
fy (a, b)
if we rewrite this in terms of inner products. Thus, we can see that the derivative of f
is given by
at the point (a, b) in the direction of the unit vector u
f (a, b),
fu (a, b) = u
in terms of the gradient of f .
Example 6.27 Given that z = f (x, y) with f (x, y) = x2 + y 2 , find the derivative of
T
f (x, y) in the direction 1, 2 at the point (1, 1). What is the derivative of f in the
direction f (1, 1)?
We saw in Example 6.26 that the gradient of f at the point (1, 1) is given by
f (1, 1) =
2
.
2
u=
1
=
we get the unit vector u
5
1
,
2
as |u|2 = 12 + 22 = 5 and this means that the gradient of f in the direction of this
unit vector is given by
1
f (1, 1) =
fu (1, 1) = u
5
1
2
2
2
6
1
= (2 + 4) = .
5
5
2
2
1
so we get the unit vector v =
8
2
,
2
2
2
2
2
1
8
= (4 + 4) = .
8
8
In particular, observe that the latter is approximately 2.83 (to 2dp) which is larger
than the former which is approximately 2.68 (to 2dp).
Indeed, this leads on to a useful observation about the rate at which f is changing in
different directions. We know, from Section 2.9 of 173 Algebra, that if is the angle
and f (a, b), we have
between the vectors u
233
f (a, b) = |u||f
u
(a, b)| cos = |f (a, b)| cos ,
= 1 since u
is a unit vector. In particular, we can use the fact that
as |u|
1 cos 1 to see that
|f (a, b)| fu (a, b) |f (a, b)|.
That is, if |f (a, b)| = 0, we can deduce that:
The maximum rate of change of f at the point (a, b, c) is |f (a, b)| and this occurs
when = 0, i.e. when the direction is u = f (a, b). This is the direction and rate
at which f increases most rapidly.
The minimum rate of change of f at the point (a, b, c) is |f (a, b)| and this
occurs when = , i.e. when the direction is u = f (a, b). This is the direction
and rate at which f decreases most rapidly.
Indeed, this allows us to see that, at the point (a, b), f is steepest in the direction
f (a, b).11
Example 6.28 Illustrate that the maximum rate of change of f occurs in the
direction f using what we found in Example 6.27.
In Example 6.27, we saw that the rate of change in the direction v = f (1, 1) was
T
greater than the rate of change in the direction u = 1, 2 as
fv (1, 1) > fu (1, 1),
and we can illustrate this using Figure 6.13. In particular, observe that if we want to
move to the z = 4 contour from the point (1, 1) on the z = 2 contour, it is quickest
to go in the direction given by f (1, 1) as, if we were to go in the direction
T
u = 1, 2 , we would have to travel further. Consequently, the rate of change of
z = f (x, y) is maximised when we go in the direction given by f (1, 1) and if we go
T
in another direction, say u = 1, 2 , it will be smaller.
6.4.4
Suppose that we have a surface whose equation is given by z = f (x, y). We could, of
course, write this equation as f (x, y) z = 0 and, in this form, the equation is now
g(x, y, z) = 0 if we take g to be the function of three variables given by
g(x, y, z) = f (x, y) z.
Indeed, more generally, we can see that a surface can be given by an equation of the
form g(x, y, z) = c where g, a function of three variables, is constrained to take the
11
234
y
z=4
z=2
(1, 2)T
f (1, 1)
1
O
Figure 6.13: The z = 2 and z = 4 contours of the surface z = x2 + y 2 and the directions
f (1, 1) and (1, 2)T at the point (1, 1) as discussed in Example 6.27. Observe how the
quickest way to get to z = 4 contour from the point (1, 1) on the z = 2 contour is to go in
the direction f (1, 1). (The x and y-intercepts of the z = 2 contour have been omitted
for clarity.)
constant value, c. Sometimes, in such cases, we will be able to rearrange what we are
given to explicitly find the equation of the surface in the form z = f (x, y). But, what if
we cant? That is, what if we can only implicitly define the function f (x, y) through the
equation g(x, y, z) = c? As we shall see, with minor modifications, we will be able to
discuss certain aspects of such a surface using g even if we cant find f .
Tangent planes
Technically, a function g : R3 R defines a hypersurface in R4 whose equation is given
by u = g(x, y, z). And, although we cant visualise such hypersurfaces because they
live in a four-dimensional space, we can easily extend the theory of this chapter to say
things about them. For instance, if we have the point (a, b, c, d) where d = g(a, b, c), it
should be clear that the Cartesian equation of the tangent hyperplane to the surface at
this point is given by
u d = gx (a, b, c)(x a) + gy (a, b, c)(y b) + gz (a, b, c)(z c),
which is the analogue of what we saw in (6.4).12 Indeed, rewriting this as
gx (a, b, c)(x a) + gy (a, b, c)(y b) + gz (a, b, c)(z c) (u d) = 0,
we can see that the vector equation of this tangent hyperplane is
gx (a, b, c)
xa
gy (a, b, c) y b
gz (a, b, c) z c = 0,
1
ud
which is the analogue of (6.5) and the vector
gx (a, b, c)
gy (a, b, c)
gz (a, b, c) ,
1
12
We could, of course, re-run the argument given in Section 6.4.1 in this new context but we refrain
from doing that here.
235
is therefore one of its normal vectors as we might expect given what we saw before.
Here, however, we are interested in a surface in R3 whose equation, for some constant d,
is given by g(x, y, z) = d and this is the u = d contour of the corresponding
hypersurface in R4 .13 In particular, we want to be able to find the tangent plane to this
surface at a point (a, b, c) where g(a, b, c) = d. So, setting u = d in the Cartesian
equation of the tangent hyperplane above, we get
gx (a, b, c)(x a) + gy (a, b, c)(y b) + gz (a, b, c)(z c) = 0,
(6.8)
and this is the Cartesian equation of the tangent plane we seek. Lets see how this
works in practice.
Example 6.29 Following on from Example 6.25, find the Cartesian equation of the
tangent plane to the surface z = x2 + y 2 at the point (1, 1, 2) by using the function
g(x, y, z) = x2 + y 2 z.
The surface whose equation is z = x2 + y 2 can be represented by the equation
g(x, y, z) = 0 with g(x, y, z) = x2 + y 2 z and, as such, we have
gx (x, y, z) = 2x,
gy (x, y, z) = 2y,
gz (x, y, z) = 1.
and
Thus, using the Cartesian equation for the tangent plane at the point (a, b, c) on the
surface g(x, y, z) = d in (6.8), i.e.
gx (a, b, c)(x a) + gy (a, b, c)(y b) + gz (a, b, c)(z c) = 0,
we verify that the point (1, 1, 2) is on the surface as g(1, 1, 2) = 12 + 12 2 = 0 and
see that
2(1)(x 1) + 2(1)(y 1) + (1)(z 2) = 0
2x + 2y z = 2,
is the Cartesian equation of the tangent plane to the surface at this point in
agreement with what we saw in Example 6.25.
But, of course, our real objective here is to see how to find a tangent plane when the
function of two variables which gives the surface is only implicitly defined through an
equation that involves a function of three variables as in the next example.
Example 6.30 Verify that the point (1, 0, ) is on the surface whose equation is
x3 + zy 3 + sin z = 1 and find the tangent plane to the surface at that point.
The point (1, 0, ) is on the surface as 13 + ()(03 ) + sin = 1 + 0 + 0 = 1 and we
can write the equation of the surface as g(x, y, z) = 1 with
g(x, y, z) = x3 + zy 3 + sin z.
As such, we have
gx (x, y, z) = 3x2 ,
13
gy (x, y, z) = 3zy 2 ,
and
gz (x, y, z) = y 3 + cos z,
236
3x z = 3 ,
as the Cartesian equation of the tangent plane to the surface at this point.
Gradient vectors
If we now write (6.8) in vector form, we get
gx (a, b, c)
xa
gy (a, b, c) y b = 0,
gz (a, b, c)
zc
and so we can see that the vector
(6.9)
gx (a, b, c)
g(a, b, c) = gy (a, b, c) ,
gz (a, b, c)
gx (a, b, c)
2x
g(x, y, z) = gy (a, b, c) = 2y ,
gz (a, b, c)
1
and, evaluating this at the point (1, 1, 2), we get
2
2 .
g(1, 1, 2) =
1
Then, using (6.9), we see that the Cartesian equation of the tangent plane to the
surface g(x, y, z) = 0 at this point14 is given by
2
x1
2 y 1 = 0 = 2(x 1) + 2(y 1) (z 2) = 0 = 2x + 2y z = 2.
1
z2
Now, for x, y R, we have points (x, y, z) on this tangent plane given by
x
x
0
1
0
y =
= 0 + x 0 + y 1 ,
y
z
2 + 2x + 2y
2
2
2
237
and so this plane lies in the directions given by the vectors (1, 0, 2)T and (0, 1, 2)T .
But, of course,
1
2
1
2 0 = 2 + 0 + (2) = 0,
g(1, 1, 2) 0 =
2
1
2
and
0
2
0
g(1, 1, 2) 1 = 2 1 = 0 + 2 + (2) = 0,
2
1
2
which means that g(1, 1, 2) is indeed perpendicular to this tangent plane and, in
particular, it will be perpendicular to the surface at this point too.
gx (x, y, z)
g(x, y, z) = gy (x, y, z) ,
gz (x, y, z)
the gradient of g and, for a function of three variables, this is the analogue of what we
saw in (6.7). Of course, we could then extend what we saw in Section 6.4.3, and use this
to find the directional derivatives of a function of three variables. This, in turn, would
allow us to see how rapidly this function is changing if we move away from a point in a
certain direction and, in particular, it would allow us to find the maximum (or
minimum) rate of change of such a function and the direction in which it occurs.
6.4.5
Taylor series
We saw in Section 3.4 that a function, F (t), of one variable has a second-order Taylor
series given by
F (t) = F (a) + (t a)F (a) +
(t a)2
F (a) + ,
2!
around t = a. Now, we want to derive the corresponding result for a function, f (x, y), of
two variables around the point (a, b) and, from what we saw when we considered
tangent planes in Section 6.4.1, we should anticipate that the first two terms of this
Taylor series will be given by
f (a, b) +
df
dx
(a,b)
xa
,
yb
df
= fx (x, y), fy (x, y) ,
dx
is the derivative of f (x, y) with respect to x = (x, y). So, our main concern here is what
the next term will look like.
14
Note that (x, y, z) = (1, 1, 2) gives g(1, 1, 2) = 12 + 12 2 = 0 and so this point is on this surface.
238
If we want to find the Taylor series for a function, f (x, y), around the point (a, b) we
need to see what is happening at some nearby point (x, y). Lets say that, in terms of a
new variable t, these points are related by the equations
x = a + ht
and
y = b + kt,
for some appropriately small values of the numbers ht and kt since these points are
supposed to be close to one another. Indeed, this means that we can define a new
function, F (t), of the single variable, t, given by
F (t) = f (x(t), y(t)) where x(t) = a + ht and y(t) = b + kt,
where the idea is that F (t) and its derivatives will allow us to use the Maclaurin series
for F (t), i.e.
t2
F (t) = F (0) + tF (0) + F (0) + ,
2!
to deduce the corresponding Taylor series for f (x, y). In particular, we can see
straightaway that
F (0) = f (x(0), y(0)) = f (a, b),
which is the first of our anticipated terms. Now we need to find the derivatives F (t)
and F (t) to see what the other two terms are.
To find F (t), we use the chain rule from Section 6.3.3 to see that
F (t) =
f dx f dy
+
= hfx (x(t), y(t)) + kfy (x(t), y(t)).
x dt
y dt
df
dx
(a,b)
xa
,
yb
fx dx fx dy
fy dx fy dy
+
+k
+
x dt
y dt
x dt
y dt
= h hfxx (x(t), y(t)) + kfxy (x(t), y(t)) + k hfyx (x(t), y(t)) + kfyy (x(t), y(t))
F (t) = h2 fxx (x(t), y(t)) + hkfxy (x(t), y(t)) + khfyx (x(t), y(t)) + k 2 fyy (x(t), y(t))
and, in particular, this means that
F (0) = h2 fxx (a, b) + hkfxy (a, b) + khfyx (a, b) + k 2 fyy (a, b),
239
so we can see that the next term in our Taylor series will be
1
t2
F (0) =
(x a)2 fxx (a, b) + (x a)(y b)fxy (a, b) +
2!
2!
(y b)(x a)fyx (a, b) + (y b)2 fyy (a, b) .
Indeed, if we now define the second derivative of f (x, y) with respect to x = (x, y) to be
the matrix
d2 f
f (x, y) fxy (x, y)
= xx
,
2
fyx (x, y) fyy (x, y)
dx
it is easily verified that we have
t2
1
d2 f
x a, y b
F (0) =
2!
2!
dx 2
xa
,
yb
(a,b)
f (x, y) = f (a, b) +
df
dx
d2 f
1
xa
x a, y b
+
yb
2!
dx 2
(a,b)
(a,b)
xa
+ ,
yb
and these terms will be sufficient for our purposes in this course. We will see how this
can be used in the next chapter, but for now, we will just use it to find an
approximation to a function of two variables around a certain point.
Example 6.32 Find the second-order Taylor series of the function
f (x, y) = ex cos y around the point (1, 0).
The first term of our second-order Taylor series is simply f (0, 1) = e1 cos 0 = e. We
also see that
df
= fx (x, y), fy (x, y) = ex cos y, ex sin y ,
dx
which means that
df
= e1 cos 0, e1 sin 0 = e, 0 ,
dx (1,0)
and so the second term of our second-order Taylor series is
df
dx
x1
y0
(1,0)
x1
y
= e, 0
= e(x 1).
ex cos y ex sin y
,
ex sin y ex cos y
240
=
(1,0)
e1 cos 0 e1 sin 0
e1 sin 0 e1 cos 0
e 0
,
0 e
(1,0)
x1
y0
1
x 1, y
2!
e 0
0 e
x1
y
1
e(x 1)
x 1, y
ey
2!
1
=
e(x 1)2 e y 2 .
2!
e + e(x 1) +
1
e(x 1)2 e y 2 ,
2!
is the second-order Taylor series of f (x, y) = ex cos y around the point (1, 0).
Activity 6.20 Find an approximation to e1.1 cos 0.2 by using the second-order
Taylor series that we found in Example 6.32.
Activity 6.21 Find the second-order Taylor series in the previous example by using
the Taylor series for ex about x = 1 (see Example 3.31) and the Maclaurin series for
cos y (see Section 3.4.1).
Learning outcomes
At the end of this chapter and having completed the relevant reading and activities, you
should be able to:
visualise a surface by using sections and contours;
find partial derivatives;
use the chain rule to find derivatives of various kinds;
show that a function is homogeneous and verify Eulers theorem;
solve problems from economics-based subjects that involve partial derivatives;
find tangent planes and gradient vectors;
find directional derivatives and interpret what you have found;
find Taylor series and use these to approximate functions of two variables.
241
Solutions to activities
Solution to activity 6.1
To find the contours of the surface z = 4x + 2y 2 when we have the given values of z,
we note that:
For z = 10, the curve of intersection is given by 10 = 4x + 2y 2 which gives us
y = 2x 4.
For z = 0, the curve of intersection is given by 0 = 4x + 2y 2 which gives us
y = 2x + 1.
For z = 10, the curve of intersection is given by 10 = 4x + 2y 2 which gives us
y = 2x + 6.
Thus, we see from these equations that all three of the contours are straight lines. The
sketch of these contours in the (x, y)-plane is illustrated in Figure 6.14.
z
=
10
z
=
0
z
=
10
1
O
1
2
3 x
4
Figure 6.14: A sketch of the z = 10, z = 0 and z = 10 contours of the surface z =
4x + 2y 2 in the (x, y)-plane for Activity 6.1.
x2 + y 2 = 25.
This is the equation of a circle, centred on the origin, with a radius of five.
To find the z = c contours in the three cases indicated we just need to find out what the
curve
x2 y 2 = c
=
x2 + y 2 = c,
looks like in the three cases. So, we have:
If c > 0, there are no contours as we have c < 0 and we know that x2 + y 2 0 for
all values of x and y.
242
If c = 0, the contour is the point (0, 0) as this is the only solution to the equation
x2 + y 2 = 0.
In particular, notice that z = 0 is the smallest value of z that arises from a point on this
surface.
Solution to activity 6.3
To find these sections of the surface z = 4x + 2y 2 we need to find the curves of
intersection, which in this case, are given by:
For the (x, z)-section, we have y = 0 and so the curve of intersection is given by
z = 4x 2 and this is a straight line in the (x, z)-plane.
For the (y, z)-section, we have x = 0 and so the curve of intersection is given by
z = 2y 2 and this is a straight line in the (y, z)-plane.
6
z
z = 4x 2
O
1
2
z = 2y 2
x
(a)
(b)
Figure 6.15: A sketch of the (a) (x, z)-section and (b) the (y, z)-section of the surface
243
x
z=
x2
y
z=
(a)
y 2
(b)
Figure 6.16: A sketch of (a) the (x, z)-section and (b) the (y, z)-section of the surface
Observe that only the first of these sections lives in the (y, z)-plane but, as illustrated
in Figure 6.17, we can also sketch the other two in this plane to get a feel for how the
surface is changing when we look at the sections x = c for different values of c.
z
8
6
4
4
= 2
x = 0
x =
x
O
8 y
Activity 6.5.
244
=
x
y=2
y=0
y=
2
O
12 12
2
3
2
O 1
2
10
(a)
(b)
Figure 6.18: A sketch of (a) the y = 2, 0, 2 sections and (b) the x = 2, 0, 2 sections of
245
y=2
4
y=1
1
O
y=0
x
Figure 6.19: The y = 0, y = 1 and y = 2 sections of the surface z = x2 +y 2 for Activity 6.7.
246
O
1
y=0
O
1
y=1
4
x=0
x=1
4
y=2
(a)
x=2
(b)
Figure 6.20: A sketch of (a) the y = 0, 1, 2 sections and (b) the x = 0, 1, 2 sections of the
y3
x y3
+
= 2x + x3 y xy 1 + ,
y
2
2
247
x2 + y 2 = (x2 + y 2 )1/2 ,
we hold y constant and differentiate with respect to x using the chain rule to get
f
1
= (x2 + y 2 )1/2 (2x) =
x
2
x
x2 + y 2
and we hold x constant and differentiate with respect to y using the chain rule to get
f
1
= (x2 + y 2 )1/2 (2y) =
y
2
y
x2 + y 2
These are the sought after partial derivatives fx (x, y) and fy (x, y) respectively.
Solution to activity 6.12
Here f (x, y) = x2 y, x(t) = 2 + 3t and y(t) = t2 + 1. In this case, if we again let
F (t) = f (x(t), y(t)), the chain rule states that
f dx f dy
dF
=
+
.
dt
x dt
y dt
As such, using this, we can see that
dF
= (2xy)(3) + (x2 )(2t) = 2x(3y + xt),
dt
and so, substituting our expressions for x(t) and y(t), we get
dF
= 2(2 + 3t)[3(t2 + 1) + (2 + 3t)t] = 2(2 + 3t)(6t2 + 2t + 3).
dt
To check this, we note that
F (t) = f (x(t), y(t)) = (2 + 3t)2 (t2 + 1),
which, using the product and chain rules, gives us
dF
= [2(2 + 3t)(3)](t2 + 1) + (2 + 3t)2 (2t) = 2(2 + 3t)[3(t2 + 1) + t(2 + 3t)],
dt
and this agrees with our earlier answer.
248
dy
g
=
dx
x
and
c = 6,
g
,
y
to get
dy
2x + 2y
2(x + y)
=
=
,
2
dx
2x + 9y
2x + 9y 2
(1,1)
4
2(1 + 1)
= ,
2+9
11
G
l,
l
but here, there are two ways in which G(k, l) = g(x(k, l), y(k, l)) can change with l.
Firstly, G can change with l because g changes with x and x changes with l, lets
denote this change in G by x G. In this case, we have
x G
g
x,
x
as we are holding y constant to see how F changes with x and this means that
x G
g x
l,
x l
249
Secondly, G can change with l because g changes with y and y changes with l, lets
denote this change in G by y G. In this case, we have
y G
g
y,
y
as we are holding x constant to see how F changes with y and this means that
y G
g y
l,
y l
g x
g y
l +
l,
x l
y l
we can now equate our two expressions for G and divide through by l to get the
chain rule for Gl (k, l) which we wanted.
as well as
dy
= 1,
dy
g g z
+
.
y z y
250
and
c = 3,
g
q
q
g
=
l
l
and
g
,
q
q
k 3 + qk 2
= 2
,
l
3q k + k 2 l
and
1
7
q +q+2= q+
+ > 0,
2
4
for all q R, we see that q = 1 is the only solution to this equation. Thus, the point we
are interested in has coordinates (k, l, q) = (1, 1, 1) and, at this point, we have
2
1+3+2
6
3
q
=
= =
k
3+1
4
2
and
q
1+1
2
1
=
= = ,
l
3+1
4
2
y3
x y3
+
= 2x + x3 y xy 1 + ,
y
2
2
and
f
3
= x3 + xy 2 + y 2 .
y
2
1
,
y2
1
y2
x
+ 3y.
y3
Notice that, in particular, we can never have k = 0 here as this does not satisfy the equation
q k + k 3 l + qk 2 l = 3.
3
251
1
and fy (x, y) = x3/4 y 3/4 ,
4
as the first-order partial derivatives. Then, for the second-order partial derivatives, we
note that partially differentiating fx (x, y) with respect to x and y respectively, we get
fxx (x, y) =
3 5/4 1/4
x
y
16
3 1/4 3/4
x
y
,
16
3 1/4 3/4
x
y
16
3 3/4 7/4
x y
.
16
x2 + y 2 = (x2 + y 2 )1/2 ,
and
f
= y(x2 + y 2 )1/2 .
y
So, partially differentiating fx (x, y) with respect to x using the product and chain rules
we get
1
(x2 + y 2 ) x2
y2
fxx (x, y) = (1)(x2 +y 2 )1/2 +(x) (x2 + y 2 )3/2 (2x) =
=
,
2
(x2 + y 2 )3/2
(x2 + y 2 )3/2
and partially differentiating fx (x, y) with respect to y using the chain rule we get
xy
1
fxy (x, y) = x (x2 + y 2 )3/2 (2y) = 2
.
2
(x + y 2 )3/2
Similarly, partially differentiating fy (x, y) with respect to x using the chain rule we get
1
xy
fyx (x, y) = y (x2 + y 2 )3/2 (2x) = 2
.
2
(x + y 2 )3/2
and partially differentiating fy (x, y) with respect to y using the product and chain rules
we get
1
(x2 + y 2 ) y 2
x2
fyy (x, y) = (1)(x2 + y 2 )1/2 + (y) (x2 + y 2 )3/2 (2y) =
=
.
2
(x2 + y 2 )3/2
(x2 + y 2 )3/2
Notice that fxy = fyx as we should expect in this course.
252
6.4. Exercises
e + e(1.1 1) +
1
e(1.1 1)2 e(0.2)2 = 1.085 e,
2!
and, using the value of e, we find that e1.1 cos 0.2 2.949 to 3dp. Indeed, as the point
(1.1, 0.2) is close to the point (1, 0) we expect this to be a good approximation. Of
course, the exact value of e1.1 cos 0.2 is 2.944 to 3dp and so we can see that our
approximation agrees with this to 1dp.
Solution to activity 6.21
As we saw in Example 3.31, the second-order Taylor series for ex around x = 1 is
ex
e +(x 1) e +
(x 1)2
e,
2!
and as we saw in Section 3.4.1, the second-order Maclaurin series (i.e. the Taylor series
around y = 0) of cos y is
y2
cos y 1 .
2!
This means that, around the point (1, 0), we would have
ex cos y
e +(x 1) e +
(x 1)2
e
2!
y2
2!
and, multiplying out the brackets and discarding terms which are more than
second-order in (x 1) and y since these are small around the point (1, 0), we get
ex cos y
e +(x 1) e +
y2
(x 1)2
ee ,
2!
2!
Exercises
Exercise 6.1
Find the first and second-order partial derivatives of the function
f (x, y) = 2xy + x2a y a ,
where a is a constant.
If this function satisfies the equation
x2
2
2f
2 f
2y
18f (x, y) + 36xy = 0,
x2
y 2
253
Exercise 6.2
For some numbers , and , a function, f , takes the form
x2 + y
f (x, y) = 2
.
x + y
If f is homogeneous of degree four, find the values of , and . Having found these
values, verify that the function satisfies Eulers theorem.
Exercise 6.3
Suppose that R(p, q) = eq+p and that p is a positive function of q defined implicitly by
the equation
q 2 p + p2 q + qp = 3.
Given that r(q) = R(q, p(q)), use the chain rule to find its derivative, r (q), when q = 1.
Exercise 6.4
A function f : R2 R is defined by
f (x, y) = x2 2y 2 ,
and the point P has coordinates (1, 1).
(a) Find the direction and rate at which f increases most rapidly at P .
(b) Find the rate of change of f at P in the direction (1, 1)T .
(c) Verify that the point P is on the curve
x2 2y 2 = 1,
and find the Cartesian equation of the tangent line to this curve at this point.
Exercise 6.5
A function f : R3 R is defined by
f (x, y, z) = ln(xy + z).
(a) Find the gradient of f at the point (a, b, c).
(b) Verify that the point (1, 1, 0) is on the surface
ln(xy + z) = 0,
and find the normal vector and the tangent plane to the surface at this point.
(c) Consider the points, (x, y, z), at which the rate of increase of f in the direction
(x/2.y/2, z)T is equal to two. Show that all of these points lie on the surface with
equation
x2 + y 2 + 4z 2 = 1.
254
Solutions to exercises
Solution to exercise 6.1
Given that f (x, y) = 2xy + x2a y a where a is a constant, its first and second-order partial
derivatives are given by
f
= 2y +2ax2a1 y a
x
2f
= 2a(2a1)x2a2 y a
x2
2f
= a(a 1)x2a y a2
y 2
and
2f
= 2+2a2 x2a1 y a1 ,
yx
and
f
= 2x + ax2a y a1
y
and
2f
= 2 + 2a2 x2a1 y a1 .
xy
Observe, in particular, that fxy (x, y) = fyx (x, y) as we should expect in this course.
If this function satisfies the equation
x2
2
2f
2 f
2y
18f (x, y) + 36xy = 0,
x2
y 2
x2 + y
,
x2 + y
to be homogeneous of degree four for some numbers , and , we require that
f (x, y) =
f (x, y) =
(x)2 + (y)
,
(x)2 + (y)
is equal to 4 f (x, y). But, in order for this to happen, we must find that the
numerator is homogeneous, i.e. we have 2 = so that
(x)2 + (y) = (x) + (y) = (x + y ),
giving us a numerator whose degree of homogeneity is = 2.
255
(x)2 + (y)
(x + y )
2 x + y
=
= 2 2
= 2 f (x, y),
2
2
2
2
(x) + (y)
(x + y )
x +y
f
f
+y
= 4f (x, y).
x
y
and
f
(6y 5 )(x2 + y 2 ) (x6 + y 6 )(2y)
=
,
y
(x2 + y 2 )2
f
f
+y
=x
x
y
+y
4(x6 + y 6 )
x2 + y 2
= 4f (x, y),
=
as required.
Solution to exercise 6.3
Given that, r(q) = R(q, p(q)), the chain rule tells us that
dr
R dq R dp
R R dp
=
+
=
+
,
dq
q dq
p dq
q
p dq
and so, as R(q, p) = eq+p , we have
dr
dp
dp
= eq+p + eq+p
= eq+p 1 +
dq
dq
dq
256
Now we need to calculate p (q) given that p = p(q) is defined through the equation
q 2 p + p2 q + qp = 3.
To do this, we let G(q, p) be the function defined by
G(q, p) = q 2 p + p2 q + qp,
so that the given equation is now G(q, p) = 3. With this, we then have
dp
G
=
dq
q
where
G
= 2qp + p2 + p and
q
G
,
p
G
= q 2 + 2pq + q,
p
which gives us
dp
2qp + p2 + p
= 2
,
dq
q + 2pq + q
and
2qp + p2 + p
dp
= 2
,
dq
q + 2pq + q
and we need to evaluate this at the point where q = 1. In particular, we now need to
find the value of p that corresponds to q = 1 if p = p(q) is the positive function of q
defined implicitly by the equation
q 2 p + p2 q + qp = 3.
That is, if we set q = 1 in this equation we get
p + p2 + p = 3
p2 + 2p 3 = 0
(p + 3)(p 1) = 0,
i.e. the possible values of p are 3 and 1. But, we are told that p is a positive function
of q and so we reject p = 3 and take the point where q = 1 and p = 1 to be the one we
are interested in. Then, at this point, we find that
2+1+1
dp
=
= 1
dq
1+2+1
dr
= e1+1 (1 + [1]) = 0,
dq
2x
,
4y
257
f (1, 1) =
and this is the direction in which f is increasing most rapidly at P . We then find that
1
2
4
2
1
1
1
6
= (2 + 4) = = 3 2,
2
2
For (c), the point P is on the curve as 12 2(1)2 = 1 2 = 1. To find the equation
of the tangent line to the curve at this point, we use (6.6), to see that
f (1, 1)
x1
y+1
=0
2
x1
4
y+1
=0
2(x 1) + 4(y + 1) = 0,
y
fx
y/(xy + z)
1
x
f
x/(xy
+
z)
f (x, y, z) =
=
=
y
xy + z
1
fz
1/(xy + z)
and so the gradient vector is
b
1
a ,
f (a, b, c) =
ab + c
1
For (b), we see that the point (1, 1, 0) is on the surface as ln([1][1] + 0) = ln 1 = 0 and
the normal vector to the surface at this point is
1
1
1
1
f (1, 1, 0) =
= 1 .
(1)(1) + 0
1
1
x1
1
x1
f (1, 1, 0) y 1 = 0 = 1 y 1 = 0 = 1(x1)+1(y1)+1(z0) = 0,
z0
1
z0
258
i.e. x + y + z = 2 is the Cartesian equation of the tangent plane to surface at the point
(1, 1, 0).
For (c), we note that at all points, (x, y, z), we have
y
1
x ,
f (x, y, z) =
xy + z
1
y .
v =
2
2
2
x + y + 4z
2z
The rate of increase of f in the direction of the unit vector v at a point (x, y, z) is then
given by fv (x, y, z), i.e. we have
v f (x, y, z) =
xy + xy + 2z
(xy + z)
x2 + y 2 + 4z 2
2(xy + z)
(xy + z) x2 + y 2 + 4z 2
2
x2 + y 2 + 4z 2
where we have just found the dot product of the two vectors v and f (x, y, z).
Consequently, when fv (x, y, z) = 2, we have points (x, y, z) that satisfy the equation,
2=
2
x2
y2
4z 2
x2 + y 2 + 4z 2 = 1,
as required.
259
260
Chapter 7
Two-variable optimisation
Essential reading
(For full publication details, see Chapter 1.)
Binmore and Davies (2002) Sections 4.6, 4.7, 6.36.8.
Anthony and Biggs (1996) Chapter 13, parts of Chapters 14 and 21.
Further reading
Simon and Blume (1994) parts of Chapter 17, 18 and 19.
Adams and Essex (2010) parts of Sections 13.113.3.
7.1
Introduction
Having seen how to find partial derivatives and gained some insight into what they tell
us about a function of two variables in the last chapter, we now see how they can be
used to optimise such a function. In particular, we will see how the first-order partial
derivatives allow us to find the stationary points of a function and its second-order
partial derivatives allow us to see whether such a point is a maximum or a minimum. We
will also see how to optimise a function of two variables in cases where the variables are
constrained, i.e. they are required to satisfy some extra condition known as a constraint.
7.2
Unconstrained optimisation
We start by considering unconstrained optimisation, i.e. we are looking for the places
where a function of two variables, f (x, y), attains its maximum or minimum values
when x and y are independent and free to take any values in R2 .
261
7. Two-variable optimisation
7.2.1
Stationary points
Suppose we have a surface z = f (x, y) whose tangent plane at the point (a, b, c) where
c = f (a, b) is given by (6.4), i.e.
z c = fx (a, b)(x a) + fy (a, b)(y b).
We define a stationary point of this function to be any point where the tangent plane to
the function is horizontal and so, in this case, the tangent plane would have to be z = c.
But, if this is the case, it means that we must have
fx (a, b)(x a) + fy (a, b)(y b) = 0,
for all x, y R which, in turn, means that we must have
fx (a, b) = 0
and
fy (a, b) = 0.
Thus, we find that the point (x, y) = (a, b) is a stationary point of the function f (x, y) if
both first-order partial derivatives of the function are zero at that point. Consequently,
in order to find the stationary points of a function, f (x, y), we must find all points (x, y)
that satisfy the equations
fx (x, y) = 0
and
fy (x, y) = 0,
simultaneously.
Example 7.1 Find the stationary points of the function
f (x, y) = x4 + 2x2 y + 2y 2 + y.
The first-order partial derivatives of this function are
fx (x, y) = 4x3 + 4xy
and
fy (x, y) = 2x2 + 4y + 1.
At a stationary point, both of the first-order partial derivatives are zero, i.e. we must
have fx (x, y) = 0 and fy (x, y) = 0. Thus, to find the stationary points we have to
solve the simultaneous equations
4x3 + 4xy = 0
and
2x2 + 4y + 1 = 0.
4x(x2 + y) = 0
x = 0 or y = x2 .
262
1
y= ,
4
y = x2 we must have
2x2 + 4(x2 ) + 1 = 0
2x2 = 1
x2 =
1
2
1
x = ,
2
1
1
= ,
y =
2
2
1
4
1
1
,
2 2
1
1
,
2 2
and
and
At a stationary point, both of the first-order partial derivatives are zero, i.e. we must
have fx (x, y) = 0 and fy (x, y) = 0. Thus, to find the stationary points we have to
solve the simultaneous equations
12x2 60y = 0
and
6x + y + 40 = 0,
and
and then notice that the first equation gives us y = x2 /5. Substituting this into the
second equation then allows us to see that
6x+
x2
+40 = 0 = x2 30x+200 = 0 = (x20)(x10) = 0 = x = 10 or x = 20,
5
102
= 20
5
or
y=
202
= 80,
5
respectively. Thus, this function has two stationary points, namely the points
(10, 20) and (20, 80).
Activity 7.1 Find the stationary points of the function
f (x, y) = x2 4x + y 2 + 4y + 8.
263
7. Two-variable optimisation
Activity 7.2
In particular, notice that at a stationary point, i.e. at a point, (a, b), where
fx (a, b) = 0
and
fy (a, b) = 0,
fx (a, b)
fy (a, b)
0
0
= 0.
That is, if we are at a stationary point, we can see that the rate of change of f in any
is zero as
direction given by the unit vector u
f (a, b) = u
0 = 0,
fu (a, b) = u
which means that at a stationary point, the rate of change of f is zero in all directions.
We have now seen how to find the stationary points of a function, f (x, y), but what do
they look like? Generally speaking, we will find that there are three kinds of stationary
point namely local minima, saddle points and local maxima and these are
illustrated in Figure 7.1(a), (b) and (c) respectively. We now consider what criteria we
can use to determine exactly what kind of stationary point we have found.
x
Figure 7.1: Each of these surfaces has the indicated kind of stationary point at (0, 0, 0).
7.2.2
Lets say that we have found that (a, b) is a stationary point of the function, f (x, y).
This means that
fx (a, b) = 0
and
fy (a, b) = 0,
and so, in particular, the derivative of f at this point is given by
df
dx
264
However, we saw in Section 6.4.5, that the second-order Taylor series of the function
f (x, y) around the point (a, b) is given by
f (x, y) = f (a, b) +
df
dx
(a,b)
1
d2 f
xa
x a, y b
+
yb
2!
dx 2
(a,b)
xa
+ ,
yb
1
d2 f
x a, y b
2!
dx 2
(a,b)
xa
+ ,
yb
provided that the point (x, y) is sufficiently close to the point (a, b). Consequently, if we
let K(x, y) be the quantity
x a, y b
d2 f
dx 2
(a,b)
xa
,
yb
we can see that: If, for all (x, y) close to [but not equal to] (a, b), we have:
K(x, y) > 0, then f (x, y) > f (a, b) for such points and so the function always lies
above the horizontal tangent plane at (a, b). This means that the stationary point
is a local minimum as in Figure 7.1(a).
K(x, y) < 0, then f (x, y) < f (a, b) for such points and so the function always lies
below the horizontal tangent plane at (a, b). This means that the stationary point
is a local maximum as in Figure 7.1(c).
However, if we find that there are some points (x, y) close to [but not equal to] (a, b)
that make K(x, y) > 0 and others that make K(x, y) < 0, we see that at some points we
have f (x, y) > f (a, b) and so the function lies above the horizontal tangent plane and at
other points we have f (x, y) < f (a, b) and so the function lies below the horizontal
tangent plane. Indeed, as we saw in Figure 7.1(b), this is exactly what happens when
we have a saddle point.
Now, it turns out that,1 if we use the definition of the second derivative matrix, we have
K(x, y) = x a, y b
xa
,
yb
This is most easily done if we show that the second derivative of f (x, y) at the point (a, b), i.e. the
matrix
d2 f
fxx (a, b) fxy (a, b)
=
,
2
fyx (a, b) fyy (a, b)
dx (a,b)
is positive definite or negative definite as in Binmore and Davies (2002) Section 6.3. But you wont
encounter these concepts until you study 175 Further Linear Algebra and so we merely motivate the
result that follows here.
265
7. Two-variable optimisation
if we assume, as usual, that fxy (a, b) = fyx (a, b). Then, taking out a factor of fxx (a, b)
and completing the square, we get2
K(x, y) = fxx (a, b)
fxy (a, b)
(x a) +
fxx (a, b)
fxy (a, b)
(x a) +
fxx (a, b)
H(a, b)
(y b)2 .
[fxx (a, b)]2
Indeed, if we find that H(a, b) < 0, we can see that there will be some points (x, y) close
to [but not equal to] (a, b) that make K(x, y) > 0 and others that make K(x, y) < 0. In
this case, as we saw above, this means that the stationary point (a, b) is a saddle point.
In summary, we have now motivated the following method for classifying our stationary
points:
If (a, b) is a stationary point of the function, f (x, y), and the Hessian is defined
to be the function
H(x, y) = fxx (x, y)fyy (x, y) [fxy (x, y)]2 ,
then
If H(a, b) > 0 and fxx (a, b) > 0, then this stationary point is a local
minimum.
If H(a, b) > 0 and fxx (a, b) < 0, then this stationary point is a local
maximum.
If H(a, b) < 0, then this stationary point is a saddle point.
In particular, if H(a, b) = 0, we can draw no conclusions about the nature of
the stationary point by using this method.
Lets look at some examples of how this works in practice.
2
Technically, we have assumed that fxx (a, b) = 0 here, but if this was not the case we could present
a slightly different argument to deal with this problem. However, as we are just trying to motivate what
follows instead of providing a rigorous argument for it, we will skip these technicalities here.
266
Example 7.3
Using the first-order partial derivatives we found in Example 7.1, we find that the
second-order partial derivatives are
fxx (x, y) = 12x2 + 4y,
and
fyy (x, y) = 4,
1
4
1
1
,
2 2
and
1
1
,
2 2
Using the first-order partial derivatives we found in Example 7.2, we find that the
second-order partial derivatives are
fxx (x, y) = 24x,
and
267
7. Two-variable optimisation
and
Activity 7.4
Lastly, we have remarked above that in cases where the Hessian is zero at a stationary
point, the method that we have used so far fails. Indeed, in such cases, the stationary
point could be a local minimum, a local maximum or a saddle point and, to determine
which, we would have to think more carefully about what is happening. Lets consider
an example of a function where this kind of problem occurs.
Example 7.5 Find the stationary point of the function f (x, y) = x3 y 3 and show
that we cant determine its nature using the method above. What kind of stationary
point do we have here?
The first-order partial derivatives of this function are
fx (x, y) = 3x2
and
fy (x, y) = 3y 2 .
So, clearly, the only stationary point is at (0, 0). The second-order partial derivatives
of this function are given by
fxx (x, y) = 6x,
and
268
100
100
50
50
200
100
0
-4
-2
0
0
-4
-2
-50
-50
-100
-100
-100
4
-200
2
0
-2
4
-4
-2
-4
(a)
(b)
(c)
Figure 7.2: Some useful pictures for Example 7.5. (a) The y = 0 section, z = f (x, 0) = x3 .
7.2.3
269
7. Two-variable optimisation
(a) where we have a local minimum, the function is convex because it lies above all
of its tangent planes
(b) where we have a saddle point, the function is neither convex nor concave as,
considering the horizontal tangent plane at (0, 0, 0), some of the function lies
above this tangent plane and the rest of it lies below this tangent plane.
(c) where we have a local maximum, the function is concave because it lies below all
of its tangent planes.
We now want to develop a way of determining whether a function is convex or concave
on R2 .
Suppose that we have a function f (x, y) that is convex. As we saw in Section 6.4.1, at
any point (a, b), the tangent plane to this function has a Cartesian equation given by
z = f (a, b) +
df
dx
xa
,
yb
(a,b)
and, as this function is convex, it must be the case that for all (x, y) R2 , the function
lies above this tangent plane, i.e. we must have
f (x, y) f (a, b) +
df
dx
(a,b)
xa
.
yb
However, using the second-order Taylor series for f (x, y) around the point (a, b), this
means that we have
f (a, b)+
df
dx
(a,b)
1
d2 f
xa
x a, y b
+
yb
2!
dx 2
(a,b)
xa
yb
f (a, b)+
df
dx
(a,b)
xa
,
yb
d2 f
dx 2
xa
yb
(a,b)
0,
and this just asserts that K(x, y) 0 using our notation from Section 7.2.2. However,
using what we saw before, this means that we require
H(x, y) 0
and
fxx (x, y) 0,
Again, we have glossed over any complications in our derivation that would occur if fxx (x, y) = 0
for some point, (x, y).
270
Note, in particular, that when testing for convexity or concavity, we can have
H(x, y) = 0 even though we must have H(x, y) = 0 when we are classifying stationary
points using the method of the previous section. But, it should be clear that if a
function, f (x, y), has a stationary point and it is
convex, then that stationary point is a global minimum.
concave, then that stationary point is a global maximum.
That is, we now have a way of determining whether a local minimum (or a local
maximum) is a global minimum (or a global maximum).
Example 7.6 Show that the function f (x, y) = x2 + y 2 has a global minimum at
the point (0, 0, 0).
The first-order partial derivatives of this function are
fx (x, y) = 2x
and
fy (x, y) = 2y.
At a stationary point, we must have fx (x, y) = 0 and fy (x, y) = 0, i.e. we must have
x = 0 and y = 0. Indeed, as z = f (0, 0) = 0, this means that we have a stationary
point at (0, 0, 0).
The second-order partial derivatives of this function are
fxx (x, y) = 2,
and
7
fyy (x, y) = 2,
271
7. Two-variable optimisation
Example 7.7 Determine the regions in the (x, y)-plane where the function,
f (x, y) = x2 y 3 is convex, concave or neither.
The first-order partial derivatives of this function are
fx (x, y) = 2x
and
fy (x, y) = 3y 2 ,
and
Figure 7.3: The surface z = f (x, y) where f (x, y) = x2 y 3 from Example 7.7. Observe
that this function is convex when y 0 but that it is neither convex nor concave when
y > 0.
Lets now look at some applications of this material.
7.2.4
Applications
Optimisation problems are very common in economics and we now introduce two ways
in which they can arise in that subject. The first is their use in cost minimisation and
the second will be another instance of profit maximisation.
272
Cost minimisation
Suppose a firm is using quantities x and y of two commodities and this incurs a cost
given by the cost function, C(x, y). One might reasonably ask: What quantities should
they be using if they want to minimise their costs?
Example 7.8 A data processing company employs both senior and junior
programmers. A particularly large project will cost
C(x, y) = 2000 + 2x3 12xy + y 2 ,
pounds, where x and y represent the number of junior and senior programmers used
respectively. How many employees of each kind should be assigned to the project in
order to minimise its cost? What is this minimum cost?
To minimise the cost, we need to find the stationary points of C(x, y) and determine
which of them gives us a minimum. So, as before, we start by finding the first-order
partial derivatives of C(x, y), i.e.
Cx (x, y) = 6x2 12y
and
At a stationary point, both of these first-order partial derivatives are zero, i.e. we
must have Cx (x, y) = 0 and Cy (x, y) = 0. Thus, to find the stationary points, we
have to solve the simultaneous equations
6x2 12y = 0
12x + 2y = 0.
and
and
6x + y = 0,
and then notice that the second equation gives us y = 6x. Substituting this into the
first equation then allows us to see that
x2 2(6x) = 0
x2 12x = 0
x(x 12) = 0
x = 0 or x = 12,
or
y = 6(12) = 72,
respectively. Thus, the cost function, C(x, y), has two stationary points, namely the
points (0, 0) and (12, 72).
To classify these stationary points, we look at the second-order partial derivatives of
C(x, y), which are
Cxx (x, y) = 12x,
and
Cyy (x, y) = 2,
273
7. Two-variable optimisation
and
We now describe the problem of maximising the profit of a firm which makes two
products, X and Y. Generally, if pX and pY are the selling prices of one unit of X and
one unit of Y respectively, then the total revenue, TR(x, y), obtained from producing
amounts x of product X and y of product Y is
TR(x, y) = xpX + ypY .
Of course, there are a number of ways in which the prices pX and pY may be related to
the quantities x and y. For instance:
If the goods were related, pX and pY could both depend on x and y (e.g. if we were
considering a music company producing an album on both CD and cassette).
If the goods were unrelated, pX and pY could depend only on x and y respectively
(e.g. a pharmaceuticals company producing paracetamol and insulin).
The firm will also have a joint total cost function, TC(x, y), which tells us how much it
costs to produce x units of X and y units of Y. Clearly, given TR(x, y) and TC(x, y), we
can consider the profit function of the firm, (x, y), which is given by
(x, y) = TR(x, y) TC(x, y) = xpX + ypY TC(x, y),
and we can maximise this function of x and y using the techniques described above.
Lets look at an example.
Example 7.9 Suppose that a firm is the sole supplier of X and Y (in other words,
it has a monopoly on these goods) and that the demands for X and Y, in tonnes, are
given by
x = 2 2pX + pY
and
y = 13 + pX 2pY ,
4
Which, thinking about it, is far less than the value of C(x, y) at the other stationary point since
C(0, 0) = 2000.
274
17 2x y
,
3
and so substituting this into pY = x 2 + 2pX , we find that
pX =
pY = x2+2
17 2x y
3x 6 + 34 4x 2y
28 x 2y
= pY =
= pY =
.
3
3
3
and we can now maximise this profit function using the method above.
Activity 7.7 Finish the problem started in Example 7.9. That is, find the values of
x and y that maximise the profit function (x, y) found in the example, the
corresponding prices pX and pY , and the maximum profit.
7.3
Constrained optimisation
We now turn our attention to the problem of constrained optimisation, i.e. the problem
of optimising a function, f (x, y), in the case where the values of x and y we are
considering are constrained by the requirement that they must lie in some region, R, of
R2 . In particular, we will see that the optimal point we seek will
Note that if the price of X was fixed and the price of Y was increased, then the demand for X would
rise and the demand for Y would fall. This is the behaviour one might expect if X and Y were two related
commodities, e.g. if they were two different types of chocolate bar.
275
7. Two-variable optimisation
either be a point inside the region, in which case it will be a stationary point of
f (x, y) that happens to be in the region,
or it will be a point on the boundary of the region, in which case it need not be a
stationary point of f (x, y) even though it optimises this function over points in the
region.
Of course, in the former case, we can find and classify the stationary point in the region
using the method in the previous section and then, checking that this point is more
optimal than any point on the boundary of the region, we will have our answer. Lets
look at a quick example.
Example 7.10 Minimise the function f (x, y) = (x 1)2 + (y 1)2 given that (x, y)
must lie in the region defined by the inequalities x 0, y 0 and x + y 3.
The first-order partial derivatives of this function are
fx (x, y) = 2(x 1)
and
and so, setting these equal to zero, we see that (1, 1) is the only stationary point of
this function. The second-order partial derivatives of this function are
fxx (x, y) = 2,
and
fyy (x, y) = 2,
1
2
3
2
1
+ ,
2
> 0.
Thus, we cant find values of f (x, y) as small as f (1, 1) = 0 on any of the boundaries
of the region and so the minimum value of f (x, y) for points in this region is zero
and this occurs at the point (1, 1).
276
Activity 7.8
7.3.1
Generally speaking, when the optimal point occurs on the boundary of a region, we will
be able to find it by considering the contours of the function we are optimising in
relation to the region we are optimising the function over. Indeed, when doing this, we
will find that we are in one of the two cases below.
The optimal point is at a corner of the boundary
The following example should clarify what we should do in this case.
Example 7.11 Maximise the function f (x, y) = x2 + y 2 given that (x, y) must lie
in the region defined by the inequalities x 0, y 0 and x + 2y 4.
We start by sketching the region which is the shaded triangle in Figure 7.4(a) and
some typical contours of the surface z = f (x, y). Indeed, notice that here, the
contour z = c has equation
x2 + y 2 = c,
and so it will be a circle of radius c centred on the origin. In the figure, we have
sketched the z = 4 and z = 16 contours and, in particular, we notice that as the
contours move away from the origin, the value of z increases as indicated by the
arrow.
Now, to find the maximum value of f (x, y) in this region we need a point which both
lies in the region, and
gives us the largest value of z.
That is, in this case, we want the point (4, 0) which is a corner of the boundary. In
particular, notice that with this point on the z = 16 contour:
we get a higher value of z than we do from any point on a contour with z < 16
(like, say, the z = 2 contour), and
we cant have any point on a contour with z > 16 as none of these contours will
give us a point in the region.
That is, the point (4, 0) which gives us z = 16 must indeed maximise the function
f (x, y) given that (x, y) must lie in the specified region.
6
That is, the point (1, 1) clearly satisfies the inequalities x 0 and y 0 as well as the inequality
x + y 3 since 1 + 1 = 2 < 3.
277
7. Two-variable optimisation
2
O
d
i n i rec
cr ti
ea on
si n o
g f
z
d
i n i rec
cr ti
ea on
si n o
g f
z
z = 16
z=4
(a)
(x , y )
z=2
z=1
x
(b)
Figure 7.4: (a) The region for Example 7.11 is the shaded triangle and the z = 4 and
z = 16 contours are indicated. (b) The region for Example 7.12 is the same shaded triangle
and the z = 1 and z = 2 contours are indicated. Note, in both cases, the direction in
which z increases.
The optimal point is on the boundary but it isnt a corner
This is the case that is going to concern us the most and so, for the moment, we just
look at an example to see what is happening before we come to the recommended
method for solving such problems.
Example 7.12 Maximise the function f (x, y) = xy given that (x, y) must lie in the
region defined by the inequalities x 0, y 0 and x + 2y 4.
We start by sketching the region which is the shaded triangle in Figure 7.4(b) and
some typical contours of the surface z = f (x, y). Indeed, notice that here, the
contour z = c has equation
xy = c,
and so it will be a rectangular hyperbola with the x and y-axes as its asymptotes. In
the figure, we have sketched the z = 1 and z = 2 contours and, in particular, we
notice that as the contours move away from the origin, the value of z increases as
indicated by the arrow.
Now, to find the maximum value of f (x, y) in this region we need a point which both
lies in the region, and
gives us the largest value of z.
That is, in this case, we want the point (x , y ) which is not a corner of the
boundary. In particular, notice that with this point on the z = 2 contour:
we get a higher value of z than we do from any point on a contour with z < 2
(like, say, the z = 1 contour), and
we cant have any point on a contour with z > 2 as none of these contours will
give us a point in the region.
That is, the point (x , y ) which gives us z = 2 must indeed maximise the function
f (x, y) given that (x, y) must lie in the specified region. But, how do we find this
point?
278
One way to find this point is to see that it is a point where, for some constant c, we
have a contour f (x, y) = c which is both
tangential to the line x + 2y = 4, and
touching the line x + 2y = 4.
Indeed, as the gradient of f (x, y) = c is given by
dy
f /x
y
=
= ,
dx
f /y
x
as we saw in Section 6.3.3 and the gradient of the line x + 2y = 4 is given by
y =2
x
2
dy
1
= ,
dx
2
the first condition means that we must have a point which satisfies the equation
y
1
=
x
2
y=
x
,
2
whereas the second condition means that we must have a point which satisfies the
equation x + 2y = 4. Solving these equations simultaneously, we find that this gives
us the point (x , y ) = (2, 1).7
Now, in such cases, we could always proceed in this way but, as we shall see in a
moment, there is a way of turning this idea into a much more general method. And, it is
this new method that we will generally use in such cases.
7.3.2
Suppose that we have been asked to optimise the function, f (x, y), given that (x, y)
must lie in some region and, by looking at the contours as above, we have determined
that the optimal point occurs on the boundary given by some equation g(x, y) = 0. In
particular, we are concerned with the case where the optimal point is not a corner of
the boundary, i.e. we want a point where, for some constant c, the contour f (x, y) = c is
both
tangential to the boundary given by g(x, y) = 0, and
touching the boundary given by g(x, y) = 0.
Now, for tangency, we require that the gradient of the contour f (x, y) = c, i.e.
dy
fx (x, y)
=
,
dx
fy (x, y)
is equal to the gradient of the boundary given by g(x, y) = 0, i.e.
dy
gx (x, y)
=
,
dx
gy (x, y)
7
And, at this point, z = f (2, 1) = 2 as expected from above. But, in general, we would not know the
optimal value of z = f (x, y) beforehand. We have just used it here to help illustrate what is going on.
279
7. Two-variable optimisation
where we have used what we saw in Section 6.3.3 twice. But, if these are equal, we have
gx (x, y)
fx (x, y)
=
fy (x, y)
gy (x, y)
fx (x, y)
fy (x, y)
=
,
gx (x, y)
gy (x, y)
fy (x, y)
fx (x, y)
=
.
gx (x, y)
gy (x, y)
f (x, y) g(x, y) = 0.
y
So, any point which satisfies these two equations is a point where the contour
f (x, y) = c is tangential to the boundary g(x, y) = 0. We also note that the equation
f (x, y) g(x, y) = 0
g(x, y) = 0,
and so, any point which satisfies this equation lies on the boundary. Consequently, we
define the Lagrangean to be the function
L(x, y, ) = f (x, y) g(x, y),
and we call the Lagrange multiplier. In particular, the point we seek will be amongst
the stationary points of the Lagrangean since it must satisfy the equations
L
= 0,
x
L
= 0 and
y
L
= 0,
which we have derived above. In such cases, we call the function we are optimising,
f (x, y), the objective function and we call the equation of the boundary, which must be
written in the form g(x, y) = 0, the constraint. Lets see how we can use this method to
solve the constrained optimisation problem we saw in Example 7.12.
Example 7.13 Solve the constrained optimisation problem in Example 7.12 using
the method of Lagrange multipliers.
We have already seen that the optimal point we seek occurs when the function
f (x, y) = xy is tangential to the boundary given by the line x + 2y = 4. Writing the
equation of the line in the form g(x, y) = x + 2y 4 = 0 we see that the Lagrangean
is
L(x, y, ) = xy (x + 2y 4),
where is the Lagrange multiplier. We now find the stationary points of the
Lagrangean by finding its first-order partial derivatives, i.e.
Lx (x, y, ) = y ,
280
x 2 = 0 and x + 2y 4 = 0.
x
2
y=
x
,
2
and this, as you should expect is our tangency condition from Example 7.12. On the
other hand, the third equation is just
x + 2y = 4,
which, as you should expect, is our constraint. Solving these two equations
simultaneously, we then get the point (2, 1) as the only solution and so this must be
the optimal point we seek in agreement with what we found in Example 7.12.
Obviously, at this point, we find that f (1, 2) = 2 is the maximum value of f subject
to the constraint.
Sometimes we will see questions where we are just asked to use this method to solve a
constrained optimisation problem. In such cases, we will be given the objective function,
f (x, y), and the constraint, g(x, y) = 0, which we should be using. In particular, unless
we are explicitly asked to look at contours, we will just apply the method and assume
that the answer we find is the appropriate kind of optimal point.8 Lets look at an
example of such a problem.
Example 7.14
2x 4y + 120 = 0
and
x + y 34 = 0.
Although, sometimes, the Lagrangean may have several stationary points and, if that happens, it
should be fairly straightforward to see which of these is the one we want.
281
7. Two-variable optimisation
and
= 2x 4y + 120,
2y = 4x 40
y = 2x 20,
whereas the third equation gives us x + y = 34 which is, of course, just our
constraint. So, as this gives y = 34 x, we can use it and the y = 2x 20 that we
have just found to eliminate y and get
34 x = 2x 20
3x = 54
x = 18.
7.3.3
282
optimises f (x, y) subject to the constraint. Of course, since we have used the constraint
to find the point (x , y ), the values of x and y we found will depend on c, i.e. we have
the functions x = x(c) and y = y(c) of c. In particular, this means that the optimal
value of f (x, y) subject to the constraint that we have found also depends on c, lets call
this F (c), i.e. we have
F (c) = f (x , y ) = f (x(c), y(c)).
Now, if we differentiate this with respect to c using the chain rule (see Section 6.3.3), we
have
f dx f dy
dF
=
+
,
dc
x dc
y dc
so that, using our expressions for fx (x, y) and fy (x, y) above, we get
dF
g dx
g dy
=
+
=
dc
x dc
y dc
g dx g dy
+
x dc y dc
However, given the constraint g(x, y) = c, we see that differentiating both sides with
respect to c we get
g dx g dy
+
= 1,
x dc y dc
where we have used the chain rule again on the left-hand-side. Putting these last two
equations together, we find that
dF
= ,
dc
i.e. the Lagrange multiplier is the rate of change of the optimal value of f (x, y) subject
to the constraint g(x, y) = c with respect to c. In particular, if we allowed our constraint
to change from g(x, y) = c to g(x, y) = c + c we would find that the change in the
optimal value of f (x, y) subject to this constraint, i.e. F (c), is given by
F
c
c,
provided that c is suitably small. Lets see how this works in the context of
Example 7.14.
Example 7.15 Using what we found in Example 7.14, find and hence find the
approximate change in the maximum value of f (x, y) subject to the constraint
x + y = 34 if the constraint is changed to x + y = 35.
We have found that the maximum value of f (x, y) subject to the constraint
x + y = 34 is f (18, 16) = 2, 722. As this occurs at the point (18, 16) we can use either
of the first two equations we found in Example 7.14 to find so, using the first, we
have
160 6x 2y = 0 = = 160 6(18) 2(16) = 20.
Consequently, using the theory above, we have a change in the constraint from
x + y = 34 to x + y = 35 which gives c = 1 and so the change in the maximum
value of f (x, y) subject to this constraint is approximately 20.
We now turn to some applications of constrained optimisation in economics.
283
7. Two-variable optimisation
7.3.4
Applications
M/p2
M/p2
ut
ili
n
g
io
ct
cr
ea
sin
re
x1
p1
di
ty
x2
of
x2
x2
p2
in
where x1 , x2 0 as they represent quantities. This gives us a budget set, i.e. the set of
all bundles that the consumer can afford given the prices of the goods and his budget.
Indeed, geometrically, the bundles he can afford are contained in the triangular region
illustrated in Figure 7.5(a).
=
M
M/p1
(a)
x1
M/p1
x1
(b)
Figure 7.5: (a) The budget set for our consumer. (b) Adding three contours, u(x1 , x2 ) =
These contours are called indifference curves as each point on such a contour gives our consumer the
same utility, i.e. he will be indifferent between the bundles represented by points on the same contour.
284
increasing utility is as indicated. Indeed, we observe in this case that the maximum
value of u(x1 , x2 ) subject to the constraint imposed by the budget set occurs at the
point indicated, i.e. a point where we have a contour of u(x1 , x2 ) which is both
tangential to the line p1 x1 + p2 x2 = M , and
touching the line p1 x1 + p2 x2 = M .
As such, we could use the method of Lagrange multipliers to solve this problem, i.e. we
would write the constraint as p1 x1 + p2 x2 M = 0 and use the Lagrangean
L(k, l, ) = u(x1 , x2 ) (p1 x1 + p2 x2 M ),
to find the point (x1 , x2 ) which maximises the consumers utility subject to the
constraint. Indeed, having done this, we can define the function
U (M ) = u(x1 , x2 ),
which tells us the maximum utility of the consumer given his budget, M . In particular,
using the theory in Section 7.3.3, we see that the value of the Lagrange multiplier we
get from solving the equations will satisfy
dU
= ,
dM
i.e. it gives us the consumers marginal utility of [budgetary] money if he is purchasing
in a way that maximises his utility subject to his budget set. Lets look at an example.
Example 7.16 Suppose cats cost 2 each and dogs cost 1 each. If a consumer
has a utility function given by
u(x1 , x2 ) = x21 x22 ,
when he buys x1 cats and x2 dogs, how many cats and dogs should he buy if he
wants to maximise his utility given that he has M to spend? Find, U (M ), the
maximum utility he can attain if he has a budget of M and verify that U (M ) =
where is the Lagrange multiplier.
In this case, the budget set will be the region defined by the inequalities
2x1 + x2 M,
and x1 , x2 0 which looks like the one in Figure 7.5(a) whereas the contours
u(x1 , x2 ) = c where u(x1 , x2 ) = x21 x22 look like the ones sketched in Figure 7.5(b). As
such, we are in the situation described above and so we need to maximise u(x1 , x2 )
subject to the constraint that
2x1 + x2 = M
2x1 + x2 M = 0,
if we want the constraint in the right form. Thus, we have the Lagrangean
L(x1 , x2 , ) = x21 x22 (2x1 + x2 M ),
285
7. Two-variable optimisation
and we seek the points which simultaneously satisfy the equations Lx1 (x1 , x2 , ) = 0,
Lx2 (x1 , x2 , ) = 0 and L (x1 , x2 , ) = 0. The first-order partial derivatives of
L(x1 , x2 , ) are
Lx1 (x1 , x2 , ) = 2x1 x22 2,
Lx2 (x1 , x2 , ) = 2x21 x2 and
L (x1 , x2 , ) = (2x1 + x2 M ) ,
and we set these equal to zero to yield the equations
2x1 x22 2 = 0,
2x21 x2 = 0
2x1 + x2 M = 0.
and
We now solve these by eliminating from the first two equations, i.e. we get
= x1 x22 = 2x21 x2
x1 x2 (x2 2x1 ) = 0
x2 = 2x1 ,
where we reject the solutions where x1 = 0 and x2 = 0 as these give a utility of zero
which, clearly, wont give us the maximum we seek. We then use this new
relationship between x1 and x2 in the third equation, which is just the constraint
2x1 + x2 = M , to get
2x1 + 2x1 = M
4x1 = M
x1 =
M
,
4
and then, using this in the equation x2 = 2x1 , we get x2 = M/2. Thus, these values
of x1 and x2 maximise our consumers utility if he has a budget of M and his
maximum utility is then given by
U (M ) = u
M M
,
4 2
M
4
M
2
M4
,
64
4M 3
M3
=
.
64
16
Of course, we can also find the value of using, say, the equation
U (M ) =
= x1 x22
M
4
M
2
M3
,
16
Activity 7.9 Another consumer has a budget of 4 to buy cats and dogs at the
prices in Example 7.16 and her utility function is u(x1 , x2 ) = 3x1 + x2 when she buys
x1 cats and x2 dogs. Sketch the budget set and some contours u(x1 , x2 ) = c where c
is a constant for this consumer. How many cats and dogs should she buy if she wants
to maximise her utility given her budget?
286
co
g
sin
ea
de
cr
di
re
ct
io
st
of
7
O
(a)
k
(b)
Figure 7.6: (a) The constraint q(k, l) = Q. (b) Adding three contours, C(k, l) = c, where
C(Q)
= C(k , l ),
10
These contours are called isocosts as each point on such a contour costs the firm the same amount
of money.
287
7. Two-variable optimisation
which tells us the minimum cost of producing an amount, Q. In particular, using the
theory in Section 7.3.3, we see that the value of the Lagrange multiplier we get from
solving the equations will satisfy
dC
= ,
dQ
i.e. it gives us the marginal cost of the firm if it is producing in a way that minimises its
costs subject to the constraint that it is producing an amount, Q. Lets look at an
example.
Example 7.17 Suppose capital, k, costs 16 per unit and labour, l, costs 1 per
unit. If a firm can produce an amount given by the production function
q(k, l) = 10k 1/4 l1/4 ,
what values of k and l will minimise the cost of producing Q units? Find, C(Q),
the
minimum cost of producing Q and verify that C (Q) = where is the Lagrange
multiplier.
In this case, the constraint q(k, l) = Q will look like the curve in Figure 7.6(a) for
k, l 0 and so we are in the situation described above. Indeed, here the cost
function is
C(k, l) = 16k + l,
and, writing the constraint in the form q(k, l) Q = 0, we get the Lagrangean
L(k, l, ) = 16k + l (q(k, l) Q).
We seek the points which simultaneously satisfy the equations Lk (k, l, ) = 0,
Ll (k, l, ) = 0 and L (k, l, ) = 0 so we find the first-order partial derivatives of
L(k, l, ), i.e.
10 3 1
k 4l4 ,
4
10 1 3
k 4 l 4 and
4
Lk (k, l, ) = 16
Ll (k, l, ) = 1
L (k, l, ) = 10k 4 l 4 Q ,
and set these equal to zero to yield the equations
3 1
5
16 k 4 l 4 = 0,
2
5 1 3
1 k 4 l 4 = 0
2
10k 4 l 4 Q = 0.
and
We now solve these by eliminating from the first two equations, i.e. we get
1
5 l4
16 3 = 0
2 k4
2
5
= 16
k4
1
l4
5 k4
1 3 =0
2 l4
288
2
5
l4
1
k4
from the second equation. As such, we can equate these expressions for to get
16
2
5
k4
1
l4
2
5
l4
k4
16k = l.
We then use this new relationship between k and l in the third equation, which is
just the constraint 10k 1/4 l1/4 = Q, to get
1
Q = 10k 4 (16k) 4
Q = 20k 2
k2 =
Q
20
k=
Q2
,
400
Q2
400
Q2
.
25
Thus, these values of k and l minimise the cost of producing Q units. The minimum
cost is then given by
Q2 Q2
,
400 25
C(Q)
=C
= 16
Q2
400
Q2
25
2Q2
,
25
4Q
.
C (Q) =
25
Of course, we can also find the value of using, say, the equation
=
2
5
l4
k
1
4
2
5
(Q2 /25) 4
(Q2 /400)
1
4
4Q
,
25
Learning outcomes
At the end of this chapter and having completed the relevant reading and activities, you
should be able to:
find and classify the stationary points of a function of two variables;
solve problems from economics-based subjects that involve unconstrained
optimisation;
optimise a function in the presence of constraints;
solve problems from economics-based subjects that involve constrained
optimisation.
289
7. Two-variable optimisation
Solutions to activities
Solution to activity 7.1
The first-order partial derivatives of the function are
fx (x, y) = 2x 4
and
fy (x, y) = 2y + 4.
At a stationary point, both of the first-order partial derivatives are zero, i.e. we must
have fx (x, y) = 0 and fy (x, y) = 0. Thus, to find the stationary points we have to solve
the simultaneous equations
2x 4 = 0
and
2y + 4 = 0.
But, clearly, the first of these equations gives x = 2 and the second gives y = 2. Thus,
(2, 2) is the only stationary point of f (x, y).
Solution to activity 7.2
The first-order partial derivatives of the function are
fx (x, y) = 9x2 + 18x 72
and
At a stationary point, both of the first-order partial derivatives are zero, i.e. we must
have fx (x, y) = 0 and fy (x, y) = 0. Thus, to find the stationary points we have to solve
the simultaneous equations
9x2 + 18x 72 = 0
and
6y 2 24y 126 = 0.
Now, notice that the first equation contains no ys and the second equation contains no
xs. As such, the first equation tells us everything there is to know about x, i.e.
9x2 + 18x 72 = 0 = x2 + 2x 8 = 0 = (x + 4)(x 2) = 0 = x = 4 or x = 2,
whereas the second equation tells us everything we need to know about y, i.e.
6y 2 24y 126 = 0 = y 2 4y 21 = 0 = (y + 3)(y 7) = 0 = y = 3 or y = 7.
As such, since we can take any of the x values with any of the y values we can see that
this function has four stationary points, namely (4, 3), (4, 7), (2, 3) and (2, 7).
Solution to activity 7.3
Using the first-order partial derivatives we found in Activity 7.1, we find that the
second-order partial derivatives are
fxx (x, y) = 2,
and
fyy (x, y) = 2.
As these are constants, they take these values at the stationary point (and, indeed, at
all other points). Thus, we can see that the Hessian at the stationary point is given by
H(2, 2) = (2)(2) (0)2 = 4 > 0
so this is a local minimum.
290
and
and
and
and
and
So, clearly, the only stationary point is at (1, 1) as this is the only point that makes
fx (x, y) = 0 and fy (x, y) = 0. The second-order partial derivatives of this function are
given by
fxx (x, y) = 12(x 1)2 ,
and
291
7. Two-variable optimisation
Indeed, evaluating this as the stationary point gives H(1, 1) = 0 and so the method we
used above fails.
However, if we consider the surface z = f (x, y), notice that we have z = f (1, 1) = 0 at
the stationary point and for all other x, y R, we have
z = f (x, y) = (x 1)4 + (y 1)4 > 0,
i.e. f (x, y) f (1, 1) for all x, y R. Consequently, it should be clear that this function
has a local minimum at (1, 1) and this minimum value is zero.11
Solution to activity 7.6
Suppose that we have a function f (x, y) that is concave. As we saw in Section 6.4.1, at
any point (a, b), the tangent plane to this function has a Cartesian equation given by
z = f (a, b) +
df
dx
xa
,
yb
(a,b)
and, as this function is concave, it must be the case that for all (x, y) R2 , the function
lies below this tangent plane, i.e. we must have
f (x, y) f (a, b) +
df
dx
(a,b)
xa
.
yb
However, using the second-order Taylor series for f (x, y) around the point (a, b), this
means that we have
f (a, b)+
df
dx
(a,b)
1
d2 f
xa
x a, y b
+
yb
2!
dx 2
(a,b)
xa
yb
f (a, b)+
df
dx
(a,b)
xa
,
yb
xa
yb
(a,b)
0,
and this just asserts that K(x, y) 0 using our notation from Section 7.2.2. However,
using what we saw before, this means that we require
H(x, y) 0
and
fxx (x, y) 0,
1
15 + 17x + 28y 5x2 5y 2 + xy ,
3
Actually, this is not only a local minimum, it is a global minimum as this is truly the smallest value
the function can take for x, y R.
12
Again, we have glossed over any complications in our derivation that would occur if fxx (x, y) = 0
for some point, (x, y).
292
and, to maximise this, we need to find its stationary points and determine which of
them gives us a maximum. So, we start by finding the first-order partial derivatives of
(x, y), i.e.
x (x, y) =
1
17 10x + y
3
and
y (x, y) =
1
28 10y + x .
3
At a stationary point, both of these first-order partial derivatives are zero, i.e. we must
have x (x, y) = 0 and y (x, y) = 0. Thus, to find the stationary points, we have to solve
the simultaneous equations
10x y = 17
x 10y = 28.
and
We start by noticing that the first equation gives us y = 10x 17 and so, substituting
this into the second equation, we get
x 10 10x 17
= 28
99x = 198
x = 2,
and then, using y = 10x 17 again, we get y = 3. Thus, the profit function, (x, y), has
(2, 3) as its only stationary point.
To classify this stationary point, we look at the second-order partial derivatives of
(x, y), which are
xx (x, y) =
10
,
3
xy (x, y) =
1
= yx (x, y)
3
and
yy (x, y) =
10
,
3
10
10
1
3
100 1
= 11.
9
9
Clearly, at (2, 3), we have H(2, 3) > 0 and fxx (2, 3) < 0, which means that the
stationary point we have found is indeed a local maximum. Consequently, to maximise
its profit, the firm should produce 2 tonnes of X and 3 tonnes of Y so that it can sell
them at prices, in pounds, of
pX =
17 2(2) 3
10
=
3
3
3.33
and
pY =
28 2 2(3)
20
=
3
3
6.67,
respectively and, in doing so, the firm will make a maximum profit of
(2, 3) =
1
44
15 + 17(2) + 28(3) 5(2)2 5(3)2 + (2)(3) =
3
3
14.67,
pounds.
Solution to activity 7.8
Of course, this should have been obvious
either by noting that
f (x, y) = (x 1)2 + (y 1)2 0,
for all points (x, y) R2 with a minimum of zero at (1, 1);
293
7. Two-variable optimisation
or by observing that as H(x, y) = 4 > 0 and fxx (x, y) = 2 > 0 for all points
(x, y) R2 , we see that this function is convex and so the stationary point (1, 1) we
found above is a global minimum.
Then, using either of these facts, we see that we have found the minimum of f (x, y) for
all (x, y) R2 and so it must be the minimum in the given region too since it is in that
region.
Solution to activity 7.9
Given the prices in Example 7.16 and the consumers budget of 4, we see that the
budget set is given by
2x1 + x2 4,
where x1 , x2 0 as they are quantities. This is sketched in Figure 7.7(a).
We are now asked to sketch some contours u(x1 , x2 ) = c where c is a constant and
u(x1 , x2 ) = 3x1 + x2 ,
for this consumer. Indeed, looking at the budget set, it makes sense to choose the
contours where c = 4 and c = 6 and these are illustrated in Figure 7.7(b). This allows us
to see the direction of increasing utility, which is indicated in the figure, and allows us
to see that the point (2, 0) is the one where we get the highest utility if we are
constrained to stay within the budget set. Consequently, this consumer should buy two
cats and no dogs if she wants to maximise her utility subject to her budget constraint.
x2
x2
6
4
4
1
c=
2x
+
4
=
2
c=
x2
O
of
on tility
i
t
u
ec
d i r si n g
a
re
in c
x1
4
3
(a)
x1
(b)
Figure 7.7: The sketches for Activity 7.9. (a) The budget set for our consumer. (b) Adding
Exercises
Exercise 7.1
The function
f (x, y) = x2 ln y y ln y,
is defined for y > 0 and all x R. Find its stationary points and classify them.
294
7.3. Exercises
Exercise 7.2
Consider the function
f (x, y) = x+1 y 1 ,
for x, y > 0 and some constants and . For what values of and is this function
convex? Sketch the region(s) in the (, )-plane that correspond to these values of
and .
Exercise 7.3
Suppose that a firm can sell its product in a domestic and a foreign market and that
the inverse demand functions for these two markets are
p1 = 30 4q1
and
p2 = 50 5q2 ,
where p1 and p2 are the prices (in pounds) if they sell quantities q1 and q2 (in tonnes) in
the domestic and foreign markets respectively. Given that the total cost function of the
firm (in pounds) is
TC(q) = 10 + 10q,
where q is the quantity produced (in tonnes) and that the firm has a monopoly in both
markets, find the quantities it should sell in these markets if they want to maximise
their profit. What are the corresponding prices? What is the maximum profit?
Exercise 7.4
Use the method of Lagrange multipliers to optimise the function
f (x, y) = x3/8 y 2/3 ,
subject to the constraint x2 + y 2 = 25 where x, y > 0.
By sketching the constraint and some contours of f , justify your use of the method of
Lagrange multipliers and determine whether the point you have found maximises or
minimises f subject to the constraint.
Exercise 7.5
Given an amount of capital, k, and labour, l, a firm produces a quantity of goods,
q(k, l), where
q(k, l) = ln k + ln l,
for k, l > 0. Suppose that each unit of capital costs 2 and each unit of labour costs 3.
Use the method of Lagrange multipliers to find the values of k and l that maximise the
firms production given that their total budget for capital and labour is M .
Hence show that the maximum production the firm can achieve given a budget of M
is given by
M
Q(M ) = 2 ln ,
2 6
and verify that Q (M ) = where is the Lagrange multiplier.
295
7. Two-variable optimisation
Solutions to exercises
Solution to exercise 7.1
Given that
f (x, y) = x2 ln y y ln y,
for y > 0 and all x R, we see that the first-order partial derivatives of this function are
fx (x, y) = 2x ln y
and
fy (x, y) =
x2
(ln y + 1),
y
where we have used the product rule when finding fy (x, y). At a stationary point, both
of the first-order partial derivatives are zero, i.e. we must have fx (x, y) = 0 and
fy (x, y) = 0. Thus, to find the stationary points we have to solve the simultaneous
equations
x2
2x ln y = 0
and
ln y 1 = 0.
y
If we start by looking at the first equation, this gives us
x ln y = 0
x = 0 or ln y = 0
x = 0 or y = 1.
ln y = 1
y = e1 ,
x2 = 1
x = 1,
fxy (x, y) =
2x
x2 1
= fyx (x, y) and fyy (x, y) = 2 ,
y
y
y
x2 1
y2 y
2x
y
2(x2 + y) ln y + 4x2
.
y2
296
2 e1 ln(e1 )
= 2 e > 0 and fxx (0, e1 ) = 2 ln(e1 ) = 2 < 0,
2
e
4
H(1, 1) = < 0,
1
as ln 1 = 0 and so this is a saddle point.
4
H(1, 1) = < 0,
1
as ln 1 = 0 and so this is a saddle point.
Thus, the stationary points (0, e1 ), (1, 1) and (1, 1) are a local maximum and two
saddle points respectively.
Solution to exercise 7.2
We have, for x, y > 0, the function f (x, y) = x+1 y 1 whose first-order partial
derivatives are
fx (x, y) = ( + 1)x y 1
fy (x, y) = ( 1)x+1 y 2 ,
and
and
( + 1) 0,
( )
as x, y > 0, and
( + 1)( 1)( 2) ( + 1)2 ( 1)2 x2 y 2(2) 0,
which gives
(+1)( 1) [( 2) ( + 1)( 1)] 0
(+1)( 1)(1) 0,
()
297
7. Two-variable optimisation
=
1
2
1
and
p2 = 50 5q2 ,
TC(q1 , q2 ) = 10 + 10(q1 + q2 ),
Note that the situation described here, where a producer charges different prices in different markets,
is sometimes known as price discrimination.
298
and
q2 (q1 , q2 ) = 40 10q2 ,
and so, as a stationary point occurs when q1 (q1 , q2 ) = 0 and q2 (q1 , q2 ) = 0, we need to
solve the simultaneous equations
20 8q1 = 0
and
40 10q2 = 0.
But, of course, the first equation gives q1 = 5/2 and the second equation gives q2 = 4
which means that (5/2, 4) is the only stationary point of (q1 , q2 ).
To check that this is a maximum, we look at the second-order partial derivatives of
(q1 , q2 ), which are
q1 q1 (q1 , q2 ) = 8,
q1 q2 (q1 , q2 ) = 0 = q2 q1 (q1 , q2 )
and
q2 q2 (q1 , q2 ) = 10,
5
2
+ 40(4) 4
5
2
5(4)2 10 = 95,
pounds.
Solution to exercise 7.4
Writing the constraint in the form x2 + y 2 25 = 0, we get the Lagrangean
L(x, y, ) = x3/8 y 2/3 (x2 + y 2 25),
and we seek the points which simultaneously satisfy the equations Lx (x, y, ) = 0,
Ly (x, y, ) = 0 and L (x, y, ) = 0. So we find the first-order partial derivatives of
L(x, y, ), i.e.
3
Lx (x, y, ) = x5/8 y 2/3 2x,
8
2
Ly (x, y, ) = x3/8 y 1/3 2y and
3
L (x, y, ) = (x2 + y 2 25),
299
7. Two-variable optimisation
3 5/8 2/3
x
y 2x = 0,
8
x2 + y 2 25 = 0.
and
We now solve these by eliminating from the first two equations, i.e. we get
3 5/8 2/3
x
y 2x = 0
8
3
16
y 2/3
x13/8
1
3
x3/8
y 4/3
from the second equation. As such, we can equate these expressions for to get
3
16
y 2/3
x13/8
1
3
x3/8
y 4/3
y2 =
16 2
x.
9
We then use this new relationship between x and y in the third equation, which is just
the constraint x2 + y 2 = 25, to get
x2 +
16 2
x = 25
9
25 2
x = 25
9
x2 = 9
x = 3,
16 2
(3 ) = 16
9
y = 4,
Strictly, the constraint is 2k + 3l M where k, l > 0, but we can see that if we chose a point where
2k + 3l < M , we could not maximise the quantity produced since, spending more on capital and labour
to get a point where 2k + 3l = M , we would get a larger quantity. This should make sense if you consider
the discussion of budget constraints in Section 7.3.4.
300
5
4
i n d i re
cr
e a c ti o
si n n
g of
f(
x,
y)
(a)
(b)
Figure 7.9: The sketches for Exercise 7.4. (a) The constraint x2 + y 2 = 25 for x, y > 0. (b)
Adding three contours, f (x, y) = c, where the direction in which f (x, y) is increasing is
as indicated. Clearly, we are interested in the point (3, 4) which is indicated in the figure.
that the firm can produce subject to the constraint 2k + 3l = M where k, l > 0 we use
the Lagrangean
L(k, l, ) = ln(k) + ln(l) (2k + 3l M ).
We seek the points which simultaneously satisfy the equations Lk (k, l, ) = 0,
Ll (k, l, ) = 0 and L (k, l, ) = 0. The first-order derivatives of L(k, l, ) are
Lk (k, l, ) =
1
2,
k
Ll (k, l, ) =
1
3
l
L (k, l, ) = (2k + 3l M ),
and
1
3 = 0
l
2k + 3l M = 0.
and
We now solve these by eliminating from the first two equations, i.e. we get
=
1
1
=
2k
3l
3l = 2k
3
k = l.
2
We then use this new relationship between k and l in the third equation, which is just
the constraint 2k + 3l = M , to get
2
3
l + 3l = M
2
6l = M
l=
M
,
6
3 M
M
=
.
2
6
4
Thus the values of k and l that maximise q(k, l) subject to the constraint are k = M/4
and l = M/6.
In this case, the maximum production achievable, given a budget of M , is
Q(M ) = q
M M
,
4 6
= ln
M
4
+ ln
M
6
= ln
M2
24
= 2 ln
2 6
301
7. Two-variable optimisation
as required. Further, we can find the value of using, say, the equation
=
1
2k
1
2
4
M
2 6
2
,
M
can be written as
Q(M ) = 2 ln M 2 ln 2 6
Q (M ) =
2
,
M
302
Chapter 8
Differential equations
Essential reading
(For full publication details, see Chapter 1.)
Binmore and Davies (2002) Sections 12.112.4 and 12.712.8.
Anthony and Biggs (1996) Chapters 27 and 28.
Further reading
Simon and Blume (1994) Sections 24.124.3 and Section 25.3.
Adams and Essex (2010) Sections 3.7 and 7.9, parts of Sections 17.117.2,
17.417.6.
Aims and objectives
8.1
If the differential equation involves a function with more than one independent variable, then it
would contain at least one partial derivative of the function and we would have a partial differential
equation.
303
8. Differential equations
(a)
dy
dx
(b)
dy
=x
dx
(c)
d3 y
=x
dx3
=x
d2 y
.
dx2
d2 y
dx2
d2 y
dx2
.
2
Given an ODE, we usually want to solve it. That is, we want to find the unknown
function in a form which does not involve any derivatives, and when we have found the
function in this form we call it a solution to the ODE. In general, we will find that any
given ODE has many solutions and so we get a general solution, i.e. we find the
unknown function up to some arbitrary constants that are not determined by the ODE
itself. Lets look at a very simple example of an ODE (i.e. one that can be solved by
direct integration) to see how things work.
8
Example 8.1
dy
= 2x + 1.
dx
This is a first-order ODE of degree one and it is very easy to solve because we can
just integrate both sides to see that
dy
dx =
dx
(2x + 1) dx
y = x2 + x + c,
304
Example 8.2 Find the solution to the ODE in Example 8.1 that also satisfies the
condition y(0) = 1.
We know that all solutions to the ODE in the previous example have the form
y(x) = x2 + x + c.
If, in addition, we want a solution that satisfies the condition y(0) = 1, we can set
x = 0 in both sides of this expression and use the condition to get
y(0) = 02 + 0 + c
1 = c.
That is, if we want to satisfy the condition y(0) = 1 as well, we must take c = 1 in
the general solution. Consequently,
y(x) = x2 + x + 1,
is the particular solution to the ODE given that y(0) = 1.
Of course, it should be clear from this example that, when we apply different conditions
to the general solution, we can get different values of c and hence different particular
solutions.
Activity 8.2 Find the particular solutions to the ODE in Example 8.1 that also
satisfy the conditions (a) y(0) = 0, (b) y(0) = 1 and (c) y(2) = 7.
Indeed, we solved simple ODEs that looked like this when we considered marginal
functions in Section 5.4.1. Further, as the following example shows, we can also solve
simple higher-order ODEs by direct integration.
Example 8.3
d2 y
= 6x + 2.
dx2
This is a second-order ODE of degree one and, once again, we can begin to solve it
by integrating both sides to see that
d2 y
dx =
dx2
(6x + 2) dx
dy
= 3x2 + 2x + c,
dx
but this does not give us a solution as we still have a derivative in our expression.
However, if we integrate both sides again, we get
dy
dx =
dx
(3x2 + 2x + c) dx
y = x3 + x2 + cx + d,
305
8. Differential equations
Of course, if we find that there are several arbitrary constants in the general solution of
an ODE, such as c and d in the general solution to the second-order ODE in
Example 8.3, we will need more conditions in order to determine these constants and
hence find a particular solution.
Example 8.4 Find the solution to the ODE in Example 8.3 that also satisfies the
conditions y(0) = 1 and y (0) = 2.
We know that all solutions to the ODE in the previous example have the form
y(x) = x3 + x2 + cx + d.
If, in addition, we want a solution that satisfies the condition y(0) = 1, we can set
x = 0 in both sides of this expression and use the condition to get
y(0) = 03 + 02 + c(0) + d
1 = d.
2 = c.
y(x) = x3 + x2 + 2x + 1,
is the particular solution to the ODE given that y(0) = 1 and y (0) = 2.
More generally, we wont be able to solve ODEs by direct integration and so the
procedure for solving an ODE will usually involve identifying its type and applying the
relevant method. In what follows, we shall see how the form of an ODE allows us to
choose the method that will enable us to solve it in cases where direct integration cant
be used.
8.2
First-order ODEs
In this section we will consider some methods that will allow us to solve certain
first-order ODEs of degree one. That is, certain ODEs that have the form
dy
= f (x, y),
dx
where f (x, y) is some given function of the independent variable, x, and the dependent
variable, y.
306
8.2.1
dy
,
dx
is called a separable ODE. This is because, in such cases, we have been able to
separate the variables so that all occurrences of x occur on the left-hand-side and all
occurrences of y occur on the right-hand-side. ODEs of this type can be solved by
integrating both sides to get
M (x) dx =
N (y)
dy
dx
dx
M (x) dx =
N (y) dy,
using the integration by substitution formula from Section 5.2.3. If we now determine
these integrals, we will find the general solution to the ODE.
Example 8.5
dy
= 2x(y 1).
dx
1 dy
,
y 1 dx
with M (x) = 2x and N (y) = (y 1)1 . Using the method described above, we write
this as
2x dx =
dy
y1
2 +c
= ec ex .
Now, both sides of this expression are non-negative because of the modulus on the
left-hand-side and the exponentials on the right-hand-side. This means that, if we
want to remove the modulus, we must allow the possibility that the right-hand-side
can give us a negative quantity, i.e. we have
y 1 = ec ex
y = 1 ec ex .
Then, as the independent variable is x and the dependent variable is y, the unknown
function here is y(x), so this gives us the general solution
2
y(x) = 1 + A ex ,
where A R is an arbitrary constant.2
Of course, having found the general solution to the ODE in this example, we can also
find particular solutions if we are given some conditions.
2
Here we have replaced ec with a new constant A R which can take any value.
307
8. Differential equations
Activity 8.3 Find the particular solutions to the ODE in Example 8.5 given the
conditions (a) y(0) = 2 and (b) y(0) = 0.
What value of y(1) will give the same particular solution as the one you found in
(a)?
8.2.2
P (x) dx
where, here, P (x) dx is just any antiderivative of P (x). Once we have this, we
multiply both sides of the ODE by the integrating factor to get
(x)
dy
+ (x)P (x)y = (x)Q(x).
dx
(8.1)
P (x) dx
P (x) dx
if we use the chain rule3 and so, using the product rule, we have
d
(x)y(x)
dx
= (x)
dy d
dy
+
y(x) = (x)
+ (x)P (x)y(x),
dx dx
dx
= (x)Q(x)
(x)y(x) =
(x)Q(x) dx,
and if we determine the integral on the right-hand-side, we can then find the general
solution to the ODE.
3
P (x) dx = P (x).
To see why, note that if c is an arbitrary constant and F (x) is an antiderivative of P (x), i.e. F (x) = P (x),
we have
P (x) dx = F (x) + c
as expected.
308
d
dx
P (x) dx =
d
F (x) + c
dx
= F (x) = P (x),
Example 8.6
dy
2y = 6.
dx
P (x) dx =
(x)Q(x) dx = x2 y(x) =
Verify that the answer we found in that example is correct by solving this separable
ODE using the method in Section 8.2.1.
Lets now consider another example where the ODE is linear, but not separable.
Example 8.7
dy
= y + ex .
dx
1 dx = x + c,
309
8. Differential equations
(x)Q(x) dx
ex y(x) =
dx
ex y(x) = x + c,
8.2.3
dy
= 0,
dx
dy
= 0.
dx
and, clearly, they are both homogeneous of degree 2. As such, we introduce a new
function, v(x), such that
y(x) = xv(x)
dy
dv
= v(x) + x ,
dx
dx
310
dv
dx
= 0.
Cancelling common factors and simplifying this then becomes the separable ODE
dv
1
= ,
dx
x
which we solve using the method in Section 8.2.1, i.e.
dx
x
dv =
v(x) = ln |x| + c,
Verify that the answer we found in that example is correct by solving this linear
first-order ODE using the method in Section 8.2.2.
Lets now consider another example where the ODE is homogeneous, but not linear.
Example 8.9
dy
= 0.
dx
and, clearly, they are both homogeneous of degree 4. As such, we introduce a new
function, v(x), such that
y(x) = xv(x)
dy
dv
= v(x) + x ,
dx
dx
dv
dx
= 0.
Cancelling common factors and simplifying this then becomes the separable ODE
dv
1 + v4
=
,
dx
4xv 3
which we solve using the method in Section 8.2.1, i.e.
4v 3
dv =
1 + v4
dx
x
ln |1 + v 4 | = ln |x| + c,
311
8. Differential equations
where c is an arbitrary constant. So, taking exponentials of both sides, this gives us
|1 + v 4 | = eln |x|+c = ec eln |x| = ec |x|,
so that removing the modulus signs and replacing the arbitrary constant ec > 0 with
A R, we get
v 4 + 1 = Ax = v = (Ax 1)1/4 ,
for some arbitrary constant, A. Consequently, using y(x) = xv(x), we find that
y(x) = x(Ax 1)1/4 ,
is the general solution to our homogeneous ODE.
Activity 8.7
Homogeneous ODEs are not the only examples of ODEs that can be solved using the
methods above after some judicious substitution. In this course, if a novel substitution
is needed to make a given ODE solvable, it will usually be given. See, for example,
Exercise 8.2.
8.3
Second-order ODEs
In this section we will consider some methods that will allow us to solve certain
second-order ODEs where all occurrences of y and its derivatives are of degree one. In
particular, we will be concerned with such ODEs that have the form
a
dy
d2 y
+
b
+ cy = f (x),
dx2
dx
where a, b and c are constants and f (x) is some given function of the independent
variable, x. ODEs of this form are often said to have constant coefficients referring to
the constants multiplying y and its derivatives on the left-hand-side. The method for
solving such second-order ODEs is as follows.
8.3.1
If the function, f (x), on the right-hand-side of our second-order ODE with constant
coefficients is zero, i.e. if our ODE has the form
a
dy
d2 y
+b
+ cy = 0,
2
dx
dx
we say that it is homogeneous.4 To solve such an ODE, lets suppose that any solution
must have the form
y(x) = A ekx ,
(8.2)
4
Note that this is a different use of the word homogeneous to the one in Sections 6.3.4 and 8.2.3.
That is, this is an homogeneous equation whereas in Section 6.3.4 we had homogeneous functions and
in Section 8.2.3 we had an ODE which was made up from two such functions in a certain way.
312
and
d2 y
= Ak 2 ekx ,
dx2
and
y(x) = Bx ex ,
If you are interested, this case involves complex numbers which are discussed in Chapter 13 of
Binmore and Davies (2002). If you read this, you will then be able to understand the discussion of this
type of solution in Section 14.5 of Binmore and Davies (2002). However, as we are not dealing with such
things here, you are advised to wait until you tackle complex numbers properly in 175 Further Linear
Algebra.
313
8. Differential equations
Example 8.10
(k 2)(k + 1) = 0,
and so we have two real solutions given by k = 2 and k = 1. As such, the theory
above dictates that
y(x) = A e2x +B ex ,
where A and B are arbitrary constants, is the general solution to this homogeneous
second-order ODE.
Example 8.11
(k + 2)2 = 0,
and so we have one real solution given by k = 2. As such, the theory above
dictates that
y(x) = (A + Bx) e2x ,
where A and B are arbitrary constants, is the general solution to this homogeneous
second-order ODE.
Example 8.12
k 2 2k+2 = 0 = (k1)2 +1 = 0 = k1 = 1 = k = 1 1.
and so we get no real solutions for k. As such, the theory above dictates that we take
= 1 and d = 1, so that
y(x) = ex A cos(x) + B sin(x) ,
where A and B are arbitrary constants, is the general solution to this homogeneous
second-order ODE.
8.3.2
If the function, f (x), on the right-hand-side of our second-order ODE with constant
coefficients is non-zero, i.e. it has the form
a
314
dy
d2 y
+
b
+ cy = f (x),
dx2
dx
with f (x) = 0, then we say that it is non-homogeneous. To solve such an ODE, we use
the following method.
We solve the corresponding homogeneous ODE, to find the function, yc (x), which is
often called the complementary function. That is, we solve
a
d2 yc
dyc
+ cyc = 0,
+b
2
dx
dx
315
8. Differential equations
yc (x) = A e2x +B ex ,
where A and B are arbitrary constants. Our first task is to find the particular
integral, yp (x), for each choice of f (x). Once we have this, we can then find the
general solution, y(x), of the relevant non-homogeneous second-order ODE by
simply taking y(x) = yc (x) + yp (x).
For (i), we have f (x) = 8 and so we take yp (x) = where is a constant that has to
be determined. To find , we note that yp (x) and yp (x) are both zero which means
that substituting them into the non-homogeneous second-order ODE, we get
0 0 2 = 8
= 4.
Thus, yp (x) = 4 is the sought after particular integral and the general solution to
our non-homogeneous second-order ODE is
y(x) = A e2x +B ex 4,
using y(x) = yc (x) + yp (x).
For (ii), we have f (x) = 6x and so we take yp (x) = + x where and are
constants that have to be determined. To find and , we note that yp (x) = and
yp (x) = 0 which means that substituting them into the non-homogeneous
second-order ODE yields
0 2( + x) = 6x
316
2x (2 + ) = 6x.
Now these two expressions must be the same and so, looking at the coefficient of x
on both sides, we see that must be 3. Similarly, looking at the constant term on
both sides we see that 2 must be zero, so as = 3, this means that must
be 3/2. Thus, yp (x) = 32 3x is the sought after particular integral and the general
solution to our non-homogeneous second-order ODE is
3
y(x) = A e2x +B ex + 3x,
2
using y(x) = yc (x) + yp (x).
For (iii), we have f (x) = 20 e3x and so we take yp (x) = e3x where is a constant
that has to be determined. To find , we note that yp (x) = 3 e3x and yp (t) = 9 e3x
which means that substituting them into the non-homogeneous second-order ODE
yields
9 e3x 3 e3x 2( e3x ) = 20 e3x
4 e3x = 20 e3x
= 5.
Thus, yp (x) = 5 e3x is the sought after particular integral and the general solution to
our non-homogeneous second-order ODE is
y(x) = A e2x +B ex +5 e3x ,
using y(x) = yc (x) + yp (x).
A complication
Although we wont spend much time on such things, observe that if the function, f (x),
in our non-homogeneous second-order ODE prompts us to try a particular integral,
yp (x), that is part of the complementary function i.e. we can find values of the
arbitrary constants in yc (x) that make yc (x) = yp (x) we have to be more subtle when
we choose our particular integral. However, this subtlety usually involves doing nothing
more than multiplying what wed normally choose to be our particular integral by x.
Lets return to our previous example to see how this works.
Example 8.14 Following on from Example 8.13, find the general solution to the
non-homogeneous second-order ODE
y y 2y = f (x),
when f (x) = 18 e2x .
We know that the complementary function, yc (x), for this non-homogeneous
second-order ODE is given by
yc (x) = A e2x +B ex ,
where A and B are arbitrary constants. Our task is to find the particular integral,
yp (x), in the case where f (x) = 18 e2x so that we can deduce the relevant general
solution.
317
8. Differential equations
Note: Here we would normally try yp (x) = e2x but this is part of the
complementary function since, taking A = and B = 0, we have yp (x) = yc (x)!
Our first reaction in this case would be to take yp (x) = e2x where is a constant
that has to be determined. To find , we note that yp (x) = 2 e2x and yp (x) = 4 e2x
which means that substituting them into the non-homogeneous second-order ODE,
we get
4 e2x 2 e2x 2( e2x ) = 18 e2x .
But now, the left-hand-side turns out to be zero,6 meaning that this equation for
has no solutions! That is, we cant determine if we use this general form for yp (x)!
Thus, the particular integral in this case cant have the general form yp (x) = e2x as
we cant find an that will make it work.
So, following the advice above, we try the next best thing which is our original
choice multiplied by x. That is, we try yp (x) = x e2x where is a constant that has
to be determined. To find , we note that writing yp (x) as (x)(e2x ) we can use the
product rule to get
yp (x) = ()(e2x ) + (x)(2 e2x ) = ( + 2x)(e2x ),
and
yp (x) = (2)(e2x ) + ( + 2x)(2 e2x ) = (4 + 4x)(e2x ).
So, substituting these into the non-homogeneous second-order ODE, we get
(4 + 4x)(e2x ) ( + 2x)(e2x ) 2(x)(e2x ) = 18 e2x
3 e2x = 18 e2x ,
which means that can now be determined and is actually equal to 6. Thus,
yp (x) = 6x e2x is the sought after particular integral and so the general solution to
our non-homogeneous second-order ODE is
y(x) = A e2x +B ex +6x e2x ,
using y(x) = yc (x) + yp (x).
Another example of this complication arises in Question 3(b) of the sample examination
paper in Appendix A.
8.4
We now turn our attention to systems of first-order ODEs. For instance, we may be
asked to find the functions y1 (x) and y2 (x) that simultaneously satisfy the ODEs
dy1
= f1 (y1 , y2 , x) and
dx
6
dy2
= f2 (y1 , y2 , x),
dx
Actually, this shouldnt be a surprise since, taking A = and B = 0 in our complementary function,
we still have a solution to the homogeneous second-order ODE and so putting this into the left-hand-side
must yield zero!
318
where we are given the functions f1 and f2 . Generally, y1 and y2 will appear on the
right-hand-sides of both these first-order ODEs and, in such cases, we say that they are
coupled as we cant solve one of them without using information contained in the other.
The procedure that we shall use to solve these involves rewriting the system of
first-order ODEs as a second-order ODE which can then be solved using the method
outlined in the previous section.
8.4.1
A simple system of coupled first-order ODEs will only involve linear combinations of
y1 (x) and y2 (x) on the right-hand-side, i.e. it will have the form
dy1
= ay1 (x) + by2 (x) and
dx
dy2
= cy1 (x) + dy2 (x),
dx
for some constants a, b, c and d. The procedure for solving this involves differentiating
the first equation (say) with respect to x so that we get
dy1
dy2
d2 y1
=a
+b
,
2
dx
dx
dx
and then, using the second equation, we find that
d2 y1
dy1
=
a
+ b (cy1 (x) + dy2 (x)) ,
dx2
dx
which means that we have
d2 y1
dy1
a
bcy1 (x) bdy2 (x) = 0.
2
dx
dx
dy1
ay1 (x),
dx
and
dy2
= 3y1 + 3y2 ,
dx
319
8. Differential equations
1
4
dy1
2y1 ,
dx
(8.3)
d2 y1
dy1
2
2
dx
dx
Consequently, if we substitute these two expressions into the second ODE, we get
1
4
d2 y 1
dy1
2
2
dx
dx
= 3y1 +
3
4
dy1
2y1 ,
dx
5
dx2
dx
which is our sought after second-order ODE in y1 (x). As it is an homogeneous
second-order ODE with constant coefficients, this can be solved using the method in
Section 8.3.1. The auxiliary equation is
k 2 5k 6 = 0
(k + 1)(k 6) = 0,
which has two real solutions given by k = 1 and k = 6 which means that the
general solution for y1 (x) is
y1 (x) = A ex +B e6x ,
for arbitrary constants A and B. To find the general solution for y2 (x), we note that
using (8.3) and the fact that
we get
y2 (x) =
1
4
3A ex +4B e6x ,
in terms of the same arbitrary constants A and B as before. Thus, the general
solution to this system of first-order ODEs is
3
y1 (x) = A ex +B e6x and y2 (x) = A ex +B e6x ,
4
for arbitrary constants A and B.
However, we are also given the conditions y1 (0) = 5 and y2 (0) = 2 which imply that
3
5 = A + B and 2 = A + B.
4
Solving these two equations simultaneously, say by subtracting one from the other,
we see that 7 = 7A/4 which gives A = 4 and then, the first equation gives B = 1.
Consequently, we find that
y1 (x) = 4 ex + e6x
is the particular solution of this system of first-order ODEs given the conditions
y1 (0) = 5 and y2 (0) = 2.
320
It is worth noting that systems of equations of the form encountered here can also be
solved using diagonalisation in much the same way as systems of difference equations
are solved in Section 11.2 of 173 Algebra.
8.4.2
Systems of first-order ODEs become more complicated when they involve more
complicated functions on the right-hand-side. The method for solving them remains the
same, but a little more care must be taken as the following example illustrates.
Example 8.16 Find the functions y1 (x) and y2 (x) that satisfy the system of
first-order ODEs given by
dy1
dy2
= 4y1 + 2y2 and
= 2y1 + 4x2 + 4,
dx
dx
with the conditions y1 (0) = 1 and y2 (0) = 7/2.
We will solve this by rewriting this system as a second-order ODE in y1 (x). To do
this we note that, rearranging the first ODE gives us
y2 =
1
2
dy1
+ 4y1 ,
dx
(8.4)
d2 y1
dy1
+4
2
dx
dx
dy1
d2 y 1
+4
2
dx
dx
= 2y1 + 4x + 4,
(k + 2)2 = 0,
321
8. Differential equations
and
y1 (x) = 2,
= 4,
= 5.
1
[(B 2A 2Bx) e2x +4x 4] + 4[(A + Bx) e2x +2x2 4x + 5] .
2
322
Once we have these general solutions, we can use the initial conditions y1 (0) = 1 and
y2 (0) = 7/2 to get the equations
1=A+5
7
2
and
= 12 B + A + 8,
8.5
Applications of ODEs
Differential equations are used widely in economics-based subjects and, in Section 5.4.1,
we saw a very simple application when we considered marginal functions. Here, we will
consider a few more examples that are a bit more sophisticated.
8.5.1
p dq
,
q dp
where q = q D (p) is the demand function. If we know the elasticity of demand, we can
use this and our knowledge of ODEs to determine the demand function.
Example 8.17 Suppose that the elasticity of demand is a constant, i.e. (p) = r for
all p and r is a positive constant. Find the demand function if q D (1) = 2.
Using the definition of the elasticity of demand, this gives us
p dq
=r
q dp
1 dq
r
= ,
q dp
p
and so this is a separable first-order ODE. Solving this using the method in
Section 8.2.1, we write this as
1
dq =
q
r
p
ln |q| = r ln |p| + c,
r +c
= ec pr ,
323
8. Differential equations
where we can remove the modulus signs since, economically, q and p are both
positive. Then, using the fact that q D (1) = 2, we see that ec = 2 and so
q = q D (p) =
2
,
pr
8.5.2
Suppose that the price of some commodity varies continuously with time and that its
initial price is not equal to its equilibrium price. We might expect that, as time
progresses, the price of the commodity will tend to its equilibrium price but to be sure,
we need to have a model of how the price of the commodity is varying with time. One
such model involves looking at how the rate of change of the price of the commodity is
related to the excess of demand over supply.
Suppose that the price of the commodity as a function of time is p(t) and that the
market for this commodity is governed by the demand function, q D (p), and the supply
function, q S (p). This means that, at any time, t, as the price is p(t), the quantity being
demanded is given by q D (p(t)) and the quantity being supplied is given by q S (p(t)). As
such, we can define the excess of demand over supply to be the function of p(t) given by
(p(t)) = q D (p(t)) q S (p(t)),
i.e. the difference between these two quantities. Clearly, this means that if p(t) is such
that:
(p(t)) > 0, demand outstrips supply and so the price should rise, i.e. p (t) > 0.
(p(t)) = 0, demand equals supply and we should have equilibrium, i.e. p (t) = 0.
(p(t)) < 0, supply outstrips demand and so the price should fall, i.e. p (t) < 0.
This suggests that the rate of change of the market price with time, i.e. p (t), should be
given by some function f of the excess of demand over supply, (p(t)), i.e. we have a
model where
dp
= f ((p(t)))
dt
with
Then, by solving this first-order ODE, we can find out how the market price varies with
time and hence assess the stability of the market by considering what it does as t .
To see how this works, lets consider an example.
324
Example 8.18
and
q S (p) = 3p 1,
respectively. If the rate of change of the market price is given by three times the
excess of demand over supply, find the ODE that describes how p(t) changes with
time.
We start by calculating the excess of demand over supply which is given by
(p(t)) = q D (p(t)) q S (p(t)) = [5 2p(t)] [3p(t) 1] = 6 5p(t).
We then know that the rate of change of demand over supply is given by three times
the excess, i.e.
6
dp
= 3(p(t)) = 3[6 5p(t)] = 15 p(t)
dt
5
This is a separable first-order ODE and we can easily solve it using the method in
Section 8.2.1.
Activity 8.9 Solve the separable first-order ODE found in Example 8.18 and use it
to determine how the market price changes over time if the initial price is p(0). How
does the market price behave in the long-term?
8.5.3
In Section 6.1.5 of 173 Algebra, you saw how to find the balance, B(t), of a bank
account that utilises continuously compounded interest at an annual equivalent rate of
100r%. Another way of thinking about this is to say that, at any time, t, the rate of
increase of the balance, B (t), is given by rB(t). This means that we have
dB
= rB(t),
dt
and this is a simple separable first-order ODE that can be solved, using the method in
Section 8.2.1, to get
B(t) = P ert ,
where B(0) = P is the initial balance. As such, we can see that this way of thinking
about continuous compounding gives us an alternative way of deriving the formula you
saw in Section 6.1.5 of 173 Algebra.
Activity 8.10 Verify that solving this separable first-order ODE will give the
solution above.
However, we can actually use ODEs to find the balance of a bank account which uses
continuously compounded interest in the presence of more complicated investment
schemes. For instance, if we take the bank account above and suppose that money is
325
8. Differential equations
added to the account at a rate given by f (t),7 we see that the balance, B(t), is now
given by
dB
dB
= rB(t) + f (t) =
rB(t) = f (t),
dt
dt
which is a linear first-order ODE. And, of course, we could also have the situation where
money is deducted from the account at a rate given by f (t),8 and then we see that the
balance, B(t), would be given by
dB
= rB(t) f (t)
dt
dB
rB(t) = f (t),
dt
The rate of change of BY (t) is then given by the sum of rY BY (t) which is the
continuously compounded interest accrued on the balance in account Y and rX PX
which, as we have just seen, is the continuously compounded interest accrued in
account X. That is, for t 0, we have
dBY
= rY BY (t) + rX PX
dt
dBY
rY BY (t) = rX PX ,
dt
which is a linear first-order ODE and we can easily solve this, subject to the
condition that BY (0) = PY , using the method in Section 8.2.2.
Activity 8.11 Solve the linear first-order ODE found in Example 8.19 and use it to
determine the balance in account Y at any time t 0.
7
8
326
8.5.4
Market trends
In some markets, the equilibrium price will change with time and so it is useful for
consumers to try and anticipate trends. That is, the consumer will keep an eye on the
current equilibrium price, but they will also look at the rate at which the price is rising
or falling and whether this rate of change is speeding up or slowing down. We can
represent these three considerations mathematically by using p(t), p (t) and p (t)
respectively and, by considering how these affect the quantity being supplied or
demanded, we can model how the price itself is varying with time by using an ODE.
Lets look at an example.
Example 8.20
d2 p
dp
2 2,
dt
dt
dp d2 p
2.
dt
dt
Find the ODE that determines the equilibrium price at any time t 0.
Here we have linear supply and demand functions which have been modified to take
a trend into account. To find the equilibrium price at any time t 0, we need to
determine the function, p(t), that makes the amount supplied equal to the amount
demanded, i.e.
3 + 4p(t)
dp
d2 p
dp d2 p
2 = 9 2p(t) + 6 2 2 .
dt
dt
dt
dt
But, rearranging this, we get the non-homogeneous second-order ODE with constant
coefficients given by
d2 p
dp
7 + 6p(t) = 12,
2
dt
dt
which we can solve using the method in Section 8.3.2.
Activity 8.12 Solve the second-order ODE found in Example 8.20 and use it to
determine how the equilibrium price changes if p(0) = 7 and p (0) = 15. How does
this equilibrium price behave in the long-term?
Learning outcomes
At the end of this chapter and having completed the relevant reading and activities, you
should be able to:
identify and solve separable, linear and homogeneous first-order ODEs and other
first-order ODEs that can be solved by a given substitution;
327
8. Differential equations
Solutions to activities
Solution to activity 8.1
Looking at the given ODEs, we see that:
(a) is second-order of first degree,
(b) is second-order of second degree, and
(c) is third-order of first degree.
Here we find the highest order derivative to determine the order and then the algebraic
degree (or power) of this derivative determines the degree.
Solution to activity 8.2
We have the general solution
y(x) = x2 + x + c,
328
which means that we must take c = 1 in the general solution to see that
y(x) = x2 + x + 1,
is the particular solution to the ODE given that y(2) = 7. Observe that this is the
same particular solution as the one we found with y(0) = 1 in Example 8.2 but that
it arises from a condition that specifies information about y(x) at a different value
of x.
Solution to activity 8.3
We have the general solution
2
y(x) = 1 + A ex ,
and we want to find the particular solutions corresponding to:
y(0) = 2. So, setting x = 0 in both sides of this expression and using the condition,
we get
y(0) = 1 + A e0 = 2 = 1 + A,
which means that we must take A = 1 in the general solution to see that
2
y(x) = 1 + ex ,
is the particular solution to the ODE given that y(0) = 2.
y(0) = 0. So, setting x = 0 in both sides of this expression and using the condition,
we get
y(0) = 1 + A e0 = 0 = 1 + A,
which means that we must take A = 1 in the general solution to see that
2
y(x) = 1 ex ,
is the particular solution to the ODE given that y(0) = 0.
If we want a value of y(1) that will give us the same particular solution as the one found
in (a), i.e.
2
y(x) = 1 + ex ,
we put x = 1 into both sides of this expression to get
y(1) = 1 + e1 = 1 + e .
That is, the condition y(1) = 1 + e gives us the same particular solution as the one we
found in (a).
Solution to activity 8.4
Here we have to solve the separable first-order ODE
2
1 dy
=
,
x
y + 3 dx
329
8. Differential equations
with M (x) = 2/x and N (y) = (y + 3)1 . Using the method in Section 8.2.1, we write
this as
2
dx =
x
dy
y+3
y = 3 ec x2 ,
1
dx = ln |x| + c,
x
and so we see that ln x is an antiderivative of 1/x. This means that the integrating
factor is
1
(x) = e ln x = eln x = x1 ,
and so we have
(x)y(x) =
(x)Q(x) dx
x1 y(x) =
x1 dx
330
x1 y(x) = ln |x| + c,
dy
+ P (x)y = Q(x),
dx
dy
= 0 in the form
dx
5
x3
dy
y = 3,
dx 4x
4y
and this is not linear due to the presence of the 1/y 3 on the right-hand-side.
Solution to activity 8.8
In Example 8.17, we found that q D (p) = 2/pr where r is a positive constant. As such,
we can see that q D (p) as p 0+ and q D (p) 0 as p .
Solution to activity 8.9
Using the method in Section 8.2.1, we write the ODE as
dp
=
p 65
ln p
6
= 15t + c,
5
6
= e15t+c = ec e15t .
5
Now, we remove the modulus bars and compensate for this loss by replacing ec (which
must be positive) with the constant A (which can be negative), to get
p(t) =
6
+ A e15t ,
5
which is the general solution. Then, given that the initial price is p(0), we see that
p(0) =
6
+ A e0
5
6
A = p(0) ,
5
6
6
+ p(0)
5
5
e15t ,
which tells us how the market price changes over time if the initial price is p(0). In
particular, if we have a p(0) such that:
p(0) > 6/5, since e15t 0 as t , p(t) will decrease to 6/5.
p(0) = 6/5, we find that p(t) = 6/5 for all t 0.
p(0) < 6/5, since e15t 0 as t , p(t) will increase to 6/5.
Indeed, as you should be able to verify, 6/5 is the equilibrium price for this market and
so, in this case, regardless of the choice of p(0), the market is either in equilibrium or
tends to equilibrium in the long-term.
331
8. Differential equations
dB
= rB(t) we write it as
dt
r dt,
B = ert+c = ec ert ,
where we can remove the modulus sign since, economically, B is positive. Then, using
the fact that B(0) = P , we see that ec = P and so
B(t) = P ert ,
as we would expect.
Solution to activity 8.11
We have to solve the linear first-order ODE
dBY
rY BY (t) = rX PX ,
dt
subject to the condition that BY (0) = PY . The integrating factor is given by
e
(rY ) dt
= erY t ,
erY t BY =
rX PX erY t dt
erY t BY = PX
rX rY t
e
+c,
rY
rX
+ c erY t .
rY
rX
+c
rY
c = PY + P X
rX
,
rY
rX
rX
+ PY + PX
rY
rY
rX
rY
erY t 1 ,
332
7
dt2
dt
and so the auxiliary equation is
k 2 7k + 6 = 0
(k 1)(k 6) = 0,
= 2.
A + B = 5,
and since
p (t) = A et +6B e6t ,
the other initial condition, p (0) = 15, gives us
15 = A + 6B.
Solving these equations, say by subtracting one from the other, we get 5B = 10 which
gives us B = 2 and so, from the first equation, A = 3. Consequently, the particular
solution we seek is
p(t) = 3 et +2 e6t +2,
and this describes how the equilibrium price changes with time. Indeed, in the
long-term, as both 3 et and 2 e6t tend to infinity as t , we see that p(t) too.
333
8. Differential equations
Exercises
Exercise 8.1
Find the general solution of the ODE
dy
xy
+
1 + x2 .
=
dx 1 + x2
What is the particular solution if y(0) = 1?
Exercise 8.2
Use the substitution w(t) = y (t) to show that the ODE
d2 y 3 dy
= 3.
dt2
t dt
can be written as a linear ODE in terms of w(t). Solve this linear ODE for w(t) and
hence find the general solution of the original ODE.
Exercise 8.3
Find the particular solution of the ODE
y (t) 5y (t) + 6y(t) = 10 sin t,
given that y(0) = 0 and y (0) = 1.
Exercise 8.4
The functions f (t) and g(t) are related by the first-order ODEs
f (t) = 3f (t) g(t)
and
2p2
,
p2 + 1
Solutions to exercises
Solution to exercise 8.1
We solve this linear first-order ODE using the method in Section 8.2.2. Here
P (x) = x/(1 + x2 ) and we start by seeing that the integral
P (x) dx =
334
x
dx = 12 ln |1 + x2 | + c,
1 + x2
1+x2
1
2
ln(1 + x2 ) is an
1 + x2 .
1 + x2 , we have
(x)y(x) =
(x)Q(x) dx
= y(x) 1 + x2 =
(1 + x2 ) dx
x3
+ c,
= y(x) 1 + x2 = x +
3
where c is an arbitrary constant. As such, we find that
y(x) =
x
x3
c
+
+
,
1 + x2 3 1 + x2
1 + x2
3x + x3 + 3
,
3 1 + x2
= 3 becomes
dt2
t dt
dw 3
w(t) = 3,
dt
t
3
dt = 3 ln |t| + c,
t
and so 3 ln t is an antiderivative of 3/t which means that the the integrating factor,
(t), is given by
3
(t) = e3 ln t = eln(t ) = t3 .
Then, as Q(t) = 3, we have
(t)w(t) =
(t)Q(t) dt
t3 w(t) =
3t3 dt
3
t3 w(t) = t2 + c,
2
3t
+ ct3 ,
2
335
8. Differential equations
3t
+ ct3
2
w(t) dt =
3
c
dt = t2 + t4 + d,
4
4
where d is another arbitrary constant. This is the general solution of the original ODE.
Solution to exercise 8.3
The given ODE is a non-homogeneous second-order ODE with constant coefficients and
we solve it using the method of Section 8.3.2. In particular:
The corresponding homogeneous second-order ODE is
y (t) 5y (t) + 6y(t) = 0,
and so the auxiliary equation is
k 2 5k + 6 = 0
(k 2)(k 3) = 0,
The right-hand-side of the given ODE is 10 sin t and this suggests that we try a
particular integral of the form
yp (t) = sin t + cos t.
We differentiate this twice to get
yp (t) = cos t sin t
and
+ = 2,
= ,
and so, solving these two equations simultaneously, we find that = 1 and = 1.
Consequently, we see that
yp (t) = sin t + cos t,
is the particular integral.
336
The general solution is then given by the sum of its complementary function and
its particular integral, i.e. we have
y(t) = A e2t +B e3t + sin t + cos t,
where A and B are arbitrary constants.
We can now use the initial condition y(0) = 0 to see that
0=A+B+0+1
A + B = 1,
and, as
y (t) = 2A e2t +3B e3t + cos t sin t,
the initial condition y (0) = 1 gives us
1 = 2A + 3B + 1 0
2A + 3B = 0.
df
dt
(8.6)
df
d2 f
df
2 = 3 3f
dt
dt
dt
f,
(k 2)(k 4) = 0,
which has two real solutions given by k = 2 and k = 4 which means that the general
solution for f (t) is
f (t) = A e2t +B e4t ,
337
8. Differential equations
for arbitrary constants A and B. To find the general solution for g(t), we note that
using (8.6) and the fact that
f (t) = 2A e2t +4B e4t ,
we get
g(t) = 3[A e2t +B e4t ] [2A e2t +4B e4t ] = A e2t B e4t ,
in terms of the same arbitrary constants A and B as before. Thus, the general solution
to this system of first-order ODEs is
f (t) = A e2t +B e4t
and 0 = A B.
Solving these two equations simultaneously then gives us A = 1 and B = 1 which means
that
f (t) = e2t + e4t and g(t) = e2t e4t ,
are the sought after functions.
Solution to exercise 8.5
Using the definition of elasticity with q = q D (p) and the given expression we have
p dq
2p2
= 2
,
q dp
p +1
2p
1 dq
= 2
,
q dp
p +1
which is a separable ODE. So, using the method of Section 8.2.1, we write this as
dq
=
q
2p
dp and determine the integrals to get
+1
p2
ln |q| = ln |p2 + 1| + c,
q=e
c ln(p2 +1)1
=e e
= e (p + 1)
ec
= 2
,
p +1
where we can remove the modulus signs since, economically, q is positive and p2 + 1 is
always positive too. Then, using the fact that q = 4 when p = 1, we see that ec = 8 and
so
8
q = q D (p) = 2
,
p +1
is the sought after demand function.
9
Here we have implicitly used the substitution u = 1 + p2 to determine the integral on the
right-hand-side.
338
Appendix A
Sample examination paper
Important note: This sample examination paper reflects the intended examination
and assessment arrangements for this course in the academic year 20112012. The
format and structure of the examination may have changed since the publication of this
subject guide. You can find the most recent examination papers on the VLE where all
changes to the format of the examination are posted.
Calculus
Time allowed: THREE hours.
Candidates should answer all FIVE questions. All questions carry equal marks (20
marks each).
Calculators may not be used for this paper.
1. (a) (i) Find
t cos t dt.
339
f (x, y) = x2 2x y 3 + y 2 + y.
Find the regions, if any, in the (x, y)-plane where f is convex, concave or
neither.
Does f have a global minimum or a global maximum? Justify your answer.
(b) Find the general solution of the differential equation
y (t) 2y (t) + y(t) = et .
1
2 vw Q 2 ,
where each unit of capital costs v and each unit of labour costs w. By sketching the
constraint and some appropriate contours, you should justify your use of the
method of Lagrange multipliers and explain why your answer is a minimum.
The product manufactured by the firm sells at a fixed price, p, and the raw
materials required to produce each unit cost an amount, c, where c < p. If the firm
acts in a way which minimises its capital and labour costs, use the result just
obtained to determine the production level, Q, that will maximise its profit.
5. (a) Find the fifth-order Maclaurin series for esin x .
cos x
dx.
(b) Determine the integral
(1 sin x)(2 + sin x)
(c) Find and classify the stationary points of the function
x
3 x.
f (x) =
12
340
Appendix B
Solutions to the sample examination
paper
Question 1.
(a) For (i), we use integration by parts to see that, differentiating the t and integrating
the cos t, we get
t cos t dt = t sin t
x3
+ xy 2
cos(y/x)
dy
= 0,
dx
and N (x, y) = x2 y.
(x)3
+ (x)(y)2 = 3 M (x, y),
cos(y/x)
and
N (x, y) = (x)2 (y) = 3 N (x, y),
i.e. both M (x, y) and N (x, y) are homogeneous of degree 3. Consequently, this
differential equation is homogeneous of degree 3.
For (iii), as the differential equation in (ii) is homogeneous, we make the substitution
y(x) = xv(x) so that, using the product rule, we have
dy
dv
= v(x) + x ,
dx
dx
and the differential equation becomes
x3
dv
+ x3 v 2 x3 v v + x
cos v
dx
which, when simplified, yields
v cos v
= 0,
dv
1
= ,
dx
x
341
which is a separable differential equation. Rewriting this in the usual way then gives
dx
x
v cos v dv =
where c is an arbitrary constant and we have used (i) to find the integral on the
right-hand-side. So, using y(x) = xv(x) again, we see that
y
y
y
sin
+ cos
= ln |x| + c,
x
x
x
is the general solution in terms of y/x. (Obviously, this expression cannot be usefully
simplified any further.)
(b) As the plane, P , contains the point (3, 4, 1) and has normal (4, 8, 4)T , we have
4
x3
8 y 4 = 0 = 4(x3)+8(y4)4(z+1) = 0 = x2y+z = 6,
4
z+1
as its Cartesian equation.
and in order for this to be in the same direction as the normal to P , there must be some
that makes
2
2x
4
x
2
f = 4 = 2y = 8 = y = 4 .
2
2z
4
z
2
Of course, we also need the point, (x, y, z), to lie on P and so we also have
x 2y + z = 6
1
= ,
2
i.e. this is the value of that we need. Thus, the point on S that we seek is (1, 2, 1)
and, using the equation for S, we get
c = (1)2 + (2)2 + (1)2 = 6,
as the required value of c.
The new surface can be written as g(x, y, z) = with
g(x, y, z) = x2 + y 2 + z 2 ,
342
for constants and . At any point (x, y, z) on the surface, its normal vector is given by
gx
2x
8
g = gy = 2y
=
g = 6 ,
gz
2z
10
at the point (4, 3, 5). We also see that the normal to S at the point (4, 3, 5) is
8
f = 6 ,
10
and, in order for these two surfaces to intersect orthogonally at this point, we must have
8
8
6
g f = 0 =
6 = 0 = 64 + 36 + 100 = 0 = = 1,
10
10
as the value of that we seek. Then, as the point (4, 3, 5) must lie on the new surface,
we also have
x2 + y 2 z 2 =
42 + 32 (52 ) =
= 0,
p
,
26 p
1 dq
1
=
,
q dp
p 26
1
dp
p 26
ln |q| = ln |p 26| + c
q = A(p 26),
for some arbitrary constant, A. Then, using the fact that the equilibrium price is 14 and
the equilibrium quantity is 6, we can see that A must satisfy the equation
6 = A(14 26)
A=
6
1
= .
12
2
343
Putting this all together, we then see that we have q = q D (p) where
p
q D (p) = 13 ,
2
PS = p q
pS (q) dq,
0
where p and q are the equilibrium price and quantity, and pS (q) is the inverse supply
function. So, using the information given in the question, we have
6
36 = (14)(6)
(aq + b) dq
q2
48 = a + bq
2
48 = 18a + 6b,
or, indeed, 8 = 3a + b as our first equation for a and b. Another equation that needs to
be satisfied is
14 = 6a + b,
as the equilibrium quantity must give the equilibrium price when we use the inverse
supply function. We can easily solve these equations for the constants a and b by
subtracting one from the other to get a = 2 and then, using the first equation again, we
get b = 2. Consequently, we have
pS (q) = 2q + 2 so that q S (p) =
p
1,
2
Of course, an alternative method here would be to observe that the supply function is a straight line
and so the producer surplus is the area of a triangular region whose height is p b and whose width is
q . This means that, if we find the area of this triangle, we have
36 = 21 (14 b)(6)
14 b = 12
b = 2.
Then, again using the fact that equilibrium quantity must give the equilibrium price when we use the
inverse supply function, we use b = 2 to see that
14 = 6a + b
a=
14 2
= 2,
6
344
so that q S (p) =
p
1,
2
This means that, in the presence of the excise tax of T , the new equilibrium price is
given by
qTS (p) = qTD (p)
p
1
(p T ) 1 = 13
2
2
p = 14 +
T
,
2
and, using qTD (p) say, we see that the new equilibrium quantity is
q = 13
1
2
T
2
14 +
=6
T
.
4
We can now find the tax revenue, R(T ), which is the tax per unit, T , multiplied by q,
the amount being sold in the presence of the tax, i.e. we have
R(T ) = T q = T
T
4
= 6T
T2
.
4
To see where this is maximised, we start by noting that R(T ) has a stationary point
when R (T ) = 0, i.e. when
6
T
=0
2
T = 12,
and since R (T ) = 1/2 < 0 this turning point is indeed a maximum. Thus, the tax
revenue is maximised when T = 12.
Question 3.
(a) The first-order partial derivatives of f (x, y) are
fx (x, y) = 2x 2
and
fy (x, y) = 3y 2 + 2y + 1.
At a stationary point, both of these first-order partial derivatives are zero, i.e. we must
have fx (x, y) = 0 and fy (x, y) = 0. Thus, to find the stationary points we have to solve
the simultaneous equations
2x 2 = 0
and
3y 2 + 2y + 1 = 0.
But, the first equation gives us x = 1 and the second equation gives us
3y 2 2y 1 = 0
(3y + 1)(y 1) = 0
y=
1
or 1.
3
Consequently, the points (1, 1/3) and (1, 1) are the stationary points of this function.
The second-order partial derivatives of this function are
fxx (x, y) = 2,
and
fyy (x, y) = 6y + 2,
345
and
(k 1)2 = 0,
346
The right-hand-side of the given ODE is et and our first reaction in this case would
be to take yp (t) = et where is a constant that has to be determined. But, this
wont work as, taking A = 0 and B = , we see that this is part of the
complementary function. As such, we multiply by t and try yp (t) = t et which
wont work either because, taking A = and B = 0, we see that this is part of the
complementary function as well. Consequently, we multiply by t again and try
yp (t) = t2 et which, thankfully, will work because it is not part of the
complementary function. So, differentiating this using the product rule, we have
yp (t) = (2t) et +(t2 ) et = (2t + t2 ) et ,
and
yp (t) = (2 + 2t) et +(2t + t2 ) et = (2 + 4t + t2 ) et ,
which means that, substituting these into our ODE, we get
(2 + 4t + t2 ) et 2(2t + t2 ) et +t2 et = et
Consequently, we see that
yp (t) =
2 et = et
1
= .
2
t2 t
e,
2
t2 t
e,
2
and
0 = (A + B) e1 +
e1
,
2
respectively. The first of these gives B = 1 and then the second gives
0=A+B+
1
2
0=A+1+
1
2
3
A= .
2
3
t2
t2 3t + 2 t
t + 1 et + et =
e,
2
2
2
347
Question 4.
w k l1 = 0
k l Q = 0.
and
We now solve these by eliminating from the first two equations, i.e. we get
v k 1 l = 0,
v
k 1 l
vk
,
k l
wl
,
k l
w
k l1
from the second equation. As such, we can equate these expressions for to get
vk
wl
=
k l
k l
l=
v
k.
w
We then use this new relationship between k and l in the third equation, which is just
the constraint k l = Q, to get
Q = k
v
k
w
Q=
v
w
k 2
k 2 =
w
v
k=
w 1
Q 2 ,
v
v
w
w 1
Q 2
v
v 1
Q 2 .
w
Thus, these values of k and l minimise the cost of producing Q units. The minimum
cost is then given by
C(Q)
=C
as required.
348
w 1
Q 2 ,
v
v 1
Q 2
w
=v
w 1
Q 2 + w
v
1
v 1
Q 2 = 2 vw Q 2 ,
w
de
cr
di
re
ct
io
ea
sin n of
g
co
st
To justify this, we note that the constraint k l = Q looks a bit like a rectangular
hyperbola and, for k, l > 0, this is illustrated in Figure B.1(a). The objective function,
C(k, l) = vk + wl has contours C(k, l) = c, where c is a constant, that are straight lines
as illustrated in Figure B.1(b). The direction in which C(k, l) is decreasing is indicated
in this figure along with the point we found above using the Lagrange multiplier
method i.e. a point where we have a contour of C(k, l) which is both tangential to
the constraint and touching the constraint. Having seen this, it should be clear that this
point will minimise C subject to the constraint.
(a)
(b)
Figure B.1: (a) The constraint q(k, l) = Q. (b) Adding three contours, C(k, l) = c, where
C(Q) = cQ + C(Q)
+ FC = cQ + 2 vw Q 2 + FC,
which is the cost of the raw materials plus the costs of capital and labour plus any fixed
costs the firm may have. As such, the profit function for the firm is
1
(Q) = R(Q) C(Q) = pQ cQ 2 vw Q 2 FC,
and we want to find the value of Q that maximises this. As such, we find that
1 1 1
vw 12
2
(Q) = p c 2 vw
Q
=pc
Q 2 ,
2
as the fixed costs, FC, are a constant and, setting this equal to zero, we find that
(Q) = 0
12
2
pc
=
vw
Q=
pc
vw
2
12
is the only stationary point. Indeed, notice that this value of Q is positive as p > c and
> 0. Furthermore, we have
2
vw
2
(Q) =
Q 12 1 ,
1 2
349
and as this is negative at the stationary point (since 0 < < 1/2 implies that > 0
and 1 2 > 0) we see that our stationary point is a local maximum. Thus,
pc
vw
Q=
2
12
y2 y3 y4
+
+
+ ,
2!
3!
4!
and
sin x = x
x3 x5
+
,
3!
5!
x3 x5
=1+ x
+
3!
5!
1
+
2!
+
x3
+
x
3!
1
+
3!
x3
+
x
3!
1
1
(x )4 + (x )5 + ,
4!
5!
if we keep the relevant terms of the sin x series when we put them into the series for ey .
Then, multiplying out the brackets and, again, keeping the relevant terms we get
esin x = 1 + x
x3 x5
+
3!
5!
1
x3
x2 + 2(x)
+
2!
3!
1
x3
+
x3 + 3(x)(x)
+
3!
3!
x4 x5
+
+
+ ,
4!
5!
+
x3
x5
+
6
120
1
x4
x2
+
2
3
1
x5
+
x3
+
6
2
x4
x5
+
+
+
24 120
+
350
x2
x4 x5
+ 0x3
+ ,
2
8
15
and so we have
cos x
dx =
(1 sin x)(2 + sin x)
1
dg.
(1 g)(2 + g)
1/3
1/3
+
1g 2+g
1
3
1
2 + sin x
ln
+ c,
3
1 sin x
dg
ln |1 g| + ln |2 + g| + c
as the answer.
(c) To find the stationary points of the function f (x) we write it as
x
x1/3 ,
f (x) =
12
and so we have
1
1
x2/3 .
12 3
The stationary points occur when f (x) = 0 and so we need to solve the equation
f (x) =
1
1
2/3 = 0
12 3x
x2/3 4
= 0,
x2/3
x2 = 64
x = 8.
1
3
2
3
x5/3 =
2
,
9x5/3
Thus, the stationary points when x = 8 and x = 8 are a local minimum and a local
maximum respectively.
351
352