You are on page 1of 121

Chapter 1

A Short Introduction to
Mathematical Logic and
Proof
In this course, we are going to take a rigorous approach to the main concepts
in single variable Dierential Calculus. This means that rather than simply
asserting mathematical statements as facts, we will attempt whenever possible,
to provide proofs of the validity of these statements. To do so, we will begin with
a set of notions and statements that we will take as being given. For example,
we will assume the basic notions of set theory and the algebraic and arithmetic
properties of the natural numbers, the integers, the rational numbers and the
real numbers. We will introduce as axioms some of the perhaps less well-known
properties of these objects such as the Principle of Mathematical Induction for
the natural numbers and the Least Upper Bound Property for the real numbers.
It is important to note that while there is some value in rigour for its own
sake and that it is even possible for proofs to be fun, our motivation in this
course for including rigour is the hope that we will gain a deeper understanding
of the fundamental concepts of Calculus as well as an appreciation for their
limitations. In this respect, we will begin with a very brief, and admittedly
incomplete, introduction to the formalities of mathematical logic and to the
rules of inference that we will use in constructing our proofs.
1.1 Basic Notions of Mathematical Logic and
Truth Tables
A statement is a (mathematical) sentence that can be determined to be either
true or false. For example, all dierentible functions are continuous, and all
prime numbers are odd are two examples of mathematical statements. The
rst statement will be later shown to be true and, since 2 is a natural number
1
that is both prime and even, the latter statement is false. Sometimes we are not
able to determine whether or not a statement is true or false but we can see that
it must be one or the other. For example, the Twin Prime Conjecture says that
there are innitely primes p such that p + 2 is also prime. Despite a great deal
of eort that has been exerted to try and prove this statement, we still do not
know that it is true. (A conjecture is a statement for which there is evidence or
strong speculation that it is true but no known proof). However, it should be
obvious that this statement is either true or it is false. A mathematical sentence
such as x > 0 is not a statement since it can be either true or false depending
on the value assigned to the variable x.
Throughout the rest of this chapter we will use italicized lower case letters
to denote statements.
Given a statement p, we can also talk about the negation of p which we
denote by p and which we call not p. The negation of a statement is exactly
what one would expect from the name. For example, if the statement p is the
sky is blue, then the negation p is simply the statement that the sky is not
blue. When a statement p is true, its negation p is false and vice versa.
We illustrate this fact by the use of a truth table.
p p
T F
F T
Notice that it is never the case that both p and p are simultaneously
both true nor is it ever the case that both p and p are simultaneously false.
This is known as the Law of the Excluded Middle.
In this course, we will be asked to prove statements that rely on hypotheses.
For example, If f(x) is dierentiable, then f(x) is continuous. If we let the
statement p be f(x) is dierentiable, and the statement q be f(x) is
continuous, then we are asserting that the truth of p implies the truth of q.
That is, p implies q. We will denote this by
p q.
In p implies q , the statement p is called the anticedent and q is called the
consequence.
We can construct a truth table for p q. To do so, we ask ourselves how
such a statement could be false. We would conclude that the only way for this
to happen would be if p is true but q is false. This leads to the following truth
table.
p q pq
T T T
T F F
F T T
F F T
2
A close look at the truth table above yields a rather strange consequence,
namely that something false will imply anything. For example if p is the
statement all animals are dogs and q is the statement that the sky is blue,
then the fact that p is false means that we can conclude that p q or that
All animals are dogs implies that the sky is blue.
The words if ... then are referred to as a logical connective as they join
two statements together to form a compound statement. Two other common
connectives are and and or. (Curiously, not is also a connective though it is
only applied to a single statement.) Given two statements p and q , we can
form two new statements p and q, and p or q, which we denote respectively
by p q and p q.
It should be quite clear that for p and q to be true, it must be the case
that both statements are true. This means that the truth table looks like:
p q pq
T T T
T F F
F T F
F F F
The situation for the word or is a little more ambiguous. In fact, in common
speech or has two possible interpretations. We could say that p or q is satised
if at least one of p and q is true. We could also say that p or q is satised if
one of p and q is true, but not both. The rst case is referred to as the inclusive
or. This is the interpretation of the connective or that we use in mathematical
logic. As we indicated above, it is denoted by p q and its truth table is:
p q pq
T T T
T F T
F T T
F F F
The second case is called the exclusive or. It is logically equivalent to
(p q) (p q).
We can use truth tables to see how this works
p q pq p q (p q) (p q) (p q)
T T T T F F
T F T F T T
F T T F T T
F F F F T F
We have suggested that the exclusive or is logically equivalent to the com-
pound statement (p q) (p q). Generally, we will say that statements
3
p and q are equivalent if the truth of one implies the truth of the other. In
other words, we say that p holds if and only if q holds. In terms of our con-
nectives, we can interpret equivalence as the statement (p q) (q p). The
truth table is
p q p q q p (p q) (q p)
T T T T T
T F F T F
F T T F F
F F T T T
The truth table conrms that equivalence happens when p and q have the
same truth values. We will denote equivalence by p q.
Example 1.1 [Contrapositive]. In this example, we will use truth tables to
show that the statements p q and q p are logically equivalent. The
statement q p is called the contrapositive of the statement p q. We will
see later that the equivalence of an implication with its contrapositive leads to
a method of proof called proof by contradiction.
We can consider the following truth table:
p q p q p q q p (p q) (q p)
T T F F T T T
T F F T F F T
F T T F T T T
F F T T T T T
A quick look at the table shows that p q and and q p have the
same truth values. This is what we wanted for the statements to be equivalent.
Alternatively, we see that (p q) (q p) is always true regardless what
truth values p and q are assigned. A compound statement is called a tautology
if it is always true regardless of the truth values assigned to the basic statements.
For example, given a function f(x) dened on the real numbers, if p is the
statement that f(x) is dierentiable and q is the statement that f(x) is
continuous, then p q is the statement that dierentiability implies continu-
ity. The contrapositive, q p represents the statement that if f(x) is not
continuous, it cannot be dierentiable. We will see later in this course that the
rst statement is true, and hence the second statement must also be true.
1.2 Variables and Quantiers
We saw before that there are mathematical sentences which could be either
true or false depending upon additional parameters. For example the statement
x > 0 may be true or it may be false as the value of x is allowed to vary.
For this reason, we call x a variable. The truth value of x > 0 will be
4
determined once we assign a value to x. This reminds us of a function and as
such we will use the functional notation
p(x) : x > 0
to represent the sentence x > 0. The potential values for the variable x will
either be specied or they will be determined by the context of the sentence. For
example, in this course, the sentence x > 0 makes sense whenever x is assigned
a value that is a real number. In this case, p(4) is true but p(3) is false.
In this course, we will often want to show either that a sentence p(x) is true
for all possible values of x, that it is true for some values of x, or that it is false
for every value of x. In the rst case, we would say that for every x, p(x) is
true. The phrase for every is called the universal quantier. It is denoted by
the symbol and we can write the above sentence symbolically as follows:
x : p(x).
Here it is important to note that the scope of the variable x must be known
for this to make sense. If the scope is known, then this sentence itself becomes
a statement as it is either true or false.
Other English phrases that express the universal quantier are for each...
and for all....
We say that
there exists an x such that p(x) is true
, if we can nd at least one value of x
0
such that when substituted into the
sentence, p(x
0
) becomes true. The phrase there exists is called the existential
quantier. Symbolically, it is denoted by . We also use the symbol to
represent the phrase such that. Therefore, we can express the sentence there
exists an x such that p(x) is true symbolically by
x p(x).
We will often require the use of more than one quantier in a sentence.
In this case, the order of the quantiers is very important. For example the
sentences for every x there exists a z such that x z and the sentence
there exists an x such that for every z, x z are very dierent. The rst
statement is clearly true for the set of all real numbers since once we have
determined a value of x we may simply choose z to be x as well. The second
statement is false for the real numbers since its truth would imply that the real
numbers have a least element which is clearly not true.
It is important to know how to negate sentences with quantiers. When, for
example, would it not be true that for every x the sentence p(x) is true. This
happens precisely when there exists at least one such x
0
such that not p(x
0
)
is true. Symbolically this means that
(x : p(x)) x p(x).
5
The value x
0
that negates x : p(x) is called a counterexample for the state-
ment x : p(x). We emphasize that to prove the that the statement x : p(x)
is true, we need some procedure or argument to exhaust all choices of x, where
as to show that x : p(x) is false, we need only nd one counterexample.
Similarly, the statement there exists an x such that p(x) is true is itself
false precisely when for every x, the statement p(x) is false. That is,
(x p(x)) x : p(x).
1.3 Rules of Inference and the Foundations of
Proof
In some ways, constructing proofs is like a complex game. Just as in games,
in constructing proofs there are certain basic rules that can be applied. These
rules are called the Rules of Inference.
Our rst rule of inference is called Modus Ponens. Simply stated, this rule
tells us that if we assume that p is true and we know that p implies q , then
we can conclude that q is also true. Symbolically, this is expressed as
p (p q) q
We can validate Modus Ponens by looking at the truth table below
p q pq
T T T
T F F
F T T
F F T
We see that the only situation in which both p and p q are true occurs
when q is also true.
For example, we know that Fido is a dog and that if Fido is a dog, then Fido
is an animal. From this we can conclude that Fido is an animal.
A variant on Modus Ponens is the rule Modus Tollens. This rule tells us
that if we know that p q is true and we also know that q is false, then we
could conclude that p must also be false. This is represented symbolically by
[(p q) q] p
Again, this can be veried from the truth table for p q. An example of
Modus Tollens would be: If Fido is a dog, then Fido is an animal. However,
Fido is not an animal and hence we conclude that Fido is not a dog.
The next rule of inference that we will introduce is Hypothetical Syllogism.
This is like an associativity law for implication. It says that if p implies q and
if q implies s , then we can conclude that p implies s . Symbolically, this is
(p q) (q s) (p s)
6
We can see in the truth table below that when ever both p q and q s
are true then so is p s.
p q s pq qs ps
T T T T T T
T T F T F F
T F T F T T
T F F F T F
F T T T T T
F T F T F T
F F T T T T
F F F T T T
An example of Hypothetical Syllogism is: If Fido is a dog, then Fido is an
animal and if Fido is an animal, Fido must eat, therefore, if Fido is a dog, Fido
must eat.
The next rule is Disjunctive Syllogism. Simply stated this rule says that if
we know that p or q are true, and we can show that p is false, then we can
conclude that q is true. This becomes
[(p q) p] q.
An example of Disjunctive Syllogism is: We are told that Fido is either a
cat or a dog. We know Fido is not a cat. Therefore, Fido is a dog.
Additional Rules are:
Constructive Dilemma: This rule states that if p q and r s, and
if we know that either p or r is true, then we can conclude that either q or
s must be true. That is
[[(p q) (r s)] (p r)] (q s).
Destructive Dilemma: This rule states that if p q and r s, and if
we know that either q or s is false, then we can conclude that either p or r
must also be false. That is
[[(p q) (r s)] (q s)] (p r).
The last three rules are very straight forward:
Simplication: This rule says that if we know that both p and q are true,
we can conclude that p is true. That is
(p q) q
Addition: This rule states that if p is true, then either p or q must be
true for any q.
p (p q)
7
Conjunction: This states that if we can establish the truth of p and the
truth of q separately, then we have established the truth of the statement
p and q .
We have seen that if we know that all dogs are animals, and we know that
Fido is a dog, then Modus Ponens allows us to conclude that Fido is an animal.
In this case, we are able to apply our general knowledge about dogs to conclude
something about a specic dog Fido. This is an example of what is known
as deductive reasoning. Most of the proofs in mathematics employ deductive
reasoning. Generally, we will start with a hypothesis or something we know to
be true, and then apply rules such as those above to reach our desired conclusion.
There is another type of reasoning called inductive reasoning. In inductive
reasoning, we begin with some specic observations and then try to draw a
more general conclusion. For example, if we knew that the rst few terms of an
innite sequence were 2, 4, 6, 8, 10, . . ., then we might guess from the pattern
of these ve terms that this was the sequence of all even natural numbers. From
this we could speculate that the next term in the sequence would be 12. Unlike,
most instances of deductive reasoning that we will see in this course, inductive
reasoning most often does not result in a proof. Indeed, it is possible that if
we were to be told a few more terms in the sequence above we might nd that
we have 2, 4, 6, 8, 10, 0, 2, 4, 6, 8, 10, where the general formula for the n-th
term is a
n
= 2n mod 12. We see that our inductive conclusion was wrong.
This does not make inductive reasoning useless. In fact, inductive reasoning is
the foundation for much of science, particularly experimental science. Even in
mathematics, inductive reasoning often leads us to an understanding of what is
actually going on. It helps us to formulate conjectures, mathematical statements
that we believe to be true, and for which we might later nd proofs. It is also a
key element in problem solving. Moreover, in the next chapter we will see how
to employ an important formal technique of proofs that is based on inductive
reasoning called Proof by Induction.
In closing out this chapter, we will breiy remark on a somewhat indirect
technique of proof called proof by contradiction. In this technique we will use the
fact that if we know a statement q to be true and that if we can show that p
q, then we could conclude that p must also be true. Formally, this follows
from our rules of inference and from our understanding of the contrapositive.
To see why this is the case, we observe that knowing that p q to be true
also tells us that the contrapositiove statement q p is true. However, since
we have as a hypothesis that q is true, Modus Ponens tells us that p must also
be true.
Example 1.2. Prove that there are innitely many prime numbers.
We will not provide all of the details of the proof of this statement at this
time. In particular, to prove this statement, we will need to know that every
natural number greater than or equal to 2 has a prime factor, something that we
will leave as an exercise following the next chapter. We then begin by assuming
8
that there are not innitely many prime numbers or equivalently that the list
p
1
, p
2
, , p
n
of prime numbers is nite. Next we let
p = p
1
p
2
, p
n
+ 1
Since p is larger than 1, it must have a prime factor. However, it is easy to
see that none of the listed primes p
1
, p
2
, , p
n
could be a factor of p. This
contradits the assumption that the list p
1
, p
2
, , p
n
includes all primes and
shows that our original assumption that there are not innitely many prime
numbers must be false. Hence we conclude that there are innitely many prime
numbers.
Finally, we note that the material of this chapter has been included to en-
courage the reader to think about what it means to formulate a proof of a
mathematical statement. We will use the ideas introduced here in this course
but we will in general will not make full use the formal symbolism.
9
Chapter 2
Principle of Mathematical
Induction and Properties of
Numbers
We will introduce some basic material that will be used throughout the rest of
the course.
2.0 Notation
We will use the following notation:
N will denote the set of natural numbers 1, 2, 3, . . ..
Z will denote the set of integers . . . , 2, 1, 0, 1, 2, . . ..
Q will denote the set of rational numbers
_
a
b
: a Z, b N
_
.
R will denote the set of real numbers.
Intervals. We will use the notation (a, b) to denote the set x : a < x < b.
This is called an open interval. We will use [a, b] to denote the set x : a
x b. This is called a closed interval. Additionally, we will use the nota-
tion (, b), (a, ), (, b], [a, ) to mean the open intervals x : x < b,
x : x > a, and the closed intervals x : x b, x : x a, respec-
tively. Finally, we will use [a, b) and (a, b] to denote the half-open intervals
x : a x < b and x : a < x b, respectively.
Formally, we make the following denition.
Definition 2.0.1. A set S R is an interval if for every x, y S, if x z y
then we must have z S.
10
It is easy to see that the singleton set a is an interval for any a R. It
is somewhat less obvious that the empty set, denoted by , is also an interval.
To see why this is so we rst ask what would it mean if the empty set was not
an interval. In this case, we would have to be able to nd a pair x, y and
an element z R such that x z y but z , . This is clearly impossible
because no such x, y exist in . As such, we have shown that the statement,
is not an interval is false, and as such we have proved that is an interval.
Definition 2.0.2. An interval I is said to be degenerate if I = c for some
c R or if I = . Otherwise, we say that it is nondegenerate.
We will use the notation A B and A B interchangeably to mean that A
is a subset of B with the possibility that A = B though when we explicitly wish
to emphasize that A = B is a possibility, we will generally use A B. When
we wish to express that A is a proper subset of B, then we can either specify
further that A ,= B, or we can use the notation A B.
We will let
B A = x B [ x , A.
In the special case when B = R, we call the set R A the complement of A in
R and denote this set by A
c
.
2.1 Mathematical Induction
Mathematics is built on axioms. Axioms are mathematical statements that we
accept as being true without need for proof. The following axiom introduces
one of the fundamental properties of the set N of natural numbers. It will lead
to an important method of proof called proof by induction,.
Axiom 2.1.1 [Principle of Mathmatical Induction]. If a set S N is
such that the following two conditions hold,
1. 1 S.
2. For each k N, if k S, then k + 1 S.
then S = N.
We can give an informal argument that illustrates why the Principle of Math-
ematical Induction is a reasonable axiom. Suppose S N satises the two
conditions given in the axiom. Then we know that 1 S. Because 1 S,
and because S satises the second condition, 2 must also be in S. Because
1 + 1 = 2 S, and because S satises the second condition, 2 + 1 = 3 must
also be in S. Because 3 S, 4 must be in S. Because 4 S, 5 must be in S,
and so on. If we are given any natural number n, we can use this argument to
show that n S; therefore, every natural number n can be shown to be in S
11
by simply repeating this process enough times and so S = N. It is important to
note however, that this is not a proof of the validity of the Principle of Math-
ematical Induction. Indeed, since we have stated the principle as an axiom, no
proof is needed.
Mathematical Induction As mentioned before, this axiom leads to a method
of proof called proof by induction. We begin by asserting a statement P(n)
for each natural number n. The goal is then to show to that P(n) is true for
each n N . For example, we could let P(n) be the statement that
n

j=1
j = 1 + 2 + +n =
n(n + 1)
2
.
If we let S = n N : P(n) is true, then S N. Moreover, to prove P(n)
for each n, it suces to show that S = N. To do this we rst show that 1 S
(i.e., that P(1) is true). We must then show that for any k N, k S implies
that k + 1 S , that is that the truth of P(k) forces P(k + 1) to also be true,
for any natural number k. This would show that S satises the two hypotheses
of the Principle of Mathematical Induction and as such, we can conclude that
S = N and hence that P(n) is true for all n N.
In practice, we divide inductive proofs into three distinct steps as outlined
above.
The rst step is to clearly identify the statements P(n) that we are trying
to prove.
The next step is usually (but not always) the easiest part of the argument:
Show that P(1) is true. This is called the initial case. This step is very impor-
tant. There are many instances where induction has led to false proofs because
this initial case was not properly established.
In the third step, we are allowed to assume that we know that the truth
of P(k) for some k. This assumption is refered to as the induction hypothesis.
We must then use the truth of P(k) as a tool to show that given the induction
hypothesis, it must also be the case that P(k +1) is also true, that is that P(k)
implies P(k + 1), for any k N. In most inductive proofs we will see in this
course, this step is the most involved.
Once we have successfully concluded each of the steps above, we may appeal
to the Principle of Mathematical Induction to conclude that P(n) is in fact true
for each n N.
We will illustrate proof by induction with the following example:
Example 2.1. Prove by induction that
n

j=1
j = 1 + 2 + +n =
n(n + 1)
2
.
Step 1:
12
Let P(n) be the statement that
n

j=1
j =
n(n + 1)
2
.
Step 2:
We must show that P(1) is true. However, P(1) is the statement that
1

j=1
j =
1(1 + 1)
2
,
which is obviously true because

1
j=1
j = 1 =
2
2
=
1(1+1)
2
.
Step 3:
Next, we will verify that the truth of P(k) implies the truth of P(k +1), for
any k N. Assume that P(k) is true. This means that for some xed k
k

j=1
j =
k(k + 1)
2
.
With this assumption in hand, we need to show that P(k + 1) is true, or
that
k+1

j=1
j =
(k + 1)((k + 1) + 1)
2
.
We know that
k+1

j=1
j =
_
_
k

j=1
j
_
_
+ (k + 1).
Separating out the last term k + 1 alllows us to make use of the induction
hypothesis. In fact, by the assumed truth of P(k), the following is true:
k+1

j=1
j =
_
_
k

j=1
j
_
_
+ (k + 1)
=
k(k + 1)
2
+ (k + 1)
=
_
k
2
+
2
2
_
(k + 1)
=
(k + 2)(k + 1)
2
=
(k + 1)((k + 1) + 1)
2
.
13
We have succeeded in showing that
k+1

j=1
j =
(k+1)((k+1)+1)
2
, which is precisely
the assertion P(k + 1).
Finally, by the Principle of Mathematical Induction, we have that P(n) is
true for all n N.
Strong Induction and the Well Ordering Principle. There are a number
of equivalent formulations of the Principle of Mathematical Induction. Below
we present the rst of these which we called the Principal of Strong Mathemat-
ical Induction. It is so named because it seems to be formally stronger than
the Principle of Mathematical Induction in the sense that you can assume the
presence of 1, 2, 3, . . . , k in S to show that k + 1 is in S. We could prove the
Principal of Strong Mathematical Induction as a consequence of the Principle
of Mathematical Induction. In fact, it is an interesting exercise to show that
the two statements are in fact logically equivalent in the sense that they each
imply the other.
After presenting the Principle of Stong Induction, we will show that the Prin-
ciple of Strong Induction can be used to establish another important property
of the natural numbers known as the Well Ordering Principle.
Theorem 2.1.2 [Principle of Strong Induction]. If a set S N is such
that the following two conditions hold, then S = N.
1. 1 S.
2. For each k N, if 1, 2, . . ., k S, then k + 1 S.
Proof. This proof is left as a homework exercise. (Hint: Use the Principle of
Mathematical Induction.)
In a similar manner as for the Principle of Mathematical Induction, the
Principle of Strong Induction leads to a method of proof called proof by strong
induction.
Example 2.2. Let f : N R be a function dened recursively by f(1) = 3,
f(2) =
3
2
, and f(n) =
f(n1)+f(n2)
2
for all n 3. We will prove by strong
induction that f(n) = 2 +
_
1
2
_
n1
.
Let P(n) be the statement that
f(n) = 2 +
_
1
2
_
n1
.
Then for n = 1, 2 +
_
1
2
_
n1
= 2 +
_
1
2
_
0
= 3 = f(1). This shows that P(1) is
true. For n = 2, 2 +
_
1
2
_
n1
= 2 +
_
1
2
_
1
=
3
2
= f(2). This shows that P(2)
is true.
14
Now assume that P(n) is true for all n satisfying 1 n k, for some k N.
If k = 1, the P(k + 1) = P(2) is true. If k > 1, then k + 1 3, and hence we
have that
f(k + 1) =
f((k + 1) 1) +f((k + 1) 2)
2
=
f(k) +f(k 1)
2
=
2 +
_
1
2
_
k1
+ 2 +
_
1
2
_
k2
2
=
4 +
_
1
2
_
k2
_
1
2
+ 1

2
=
4 +
_
1
2
_
k2
_
1
2

2
= 2 +
_
1
2
_ _
1
2
_ _
1
2
_
k2
= 2 +
_
1
2
_ _
1
2
_ _
1
2
_
k2
= 2 +
_
1
2
_
k
= 2 +
_
1
2
_
(k+1)1
.
This shows that the statement P(k + 1) is true. By the Principle of Strong
Induction, P(n) is true for all n N.
Well-Ordering Principle. One way in which the natural numbers dier from
the real numbers is that N has a least element namely 1. In fact, since N is only
innite in one direction, we would speculate that every nonempty subset of
the natural numbers has a least element. In fact, we can use Strong Induction
to show that this is true. This property is known as the Well-Ordering Principle
and is stated below.
Theorem 2.1.3 [Well-Ordering Principle]. Every nonempty subset S of
N has a least element. That is, there exists a number l S such that l n for
all n S.
Proof. This proof is left as a homework exercise. (Hint: Suppose this theorem
is false; then there exists a subset S of N such that S has no least element. Let
T = N S. Use the principle of strong induction on T to show that T = N.)
Remark 2.1.4. Recall that a natural number n is prime if n 2 and if n has
no factors other than 1 and itself. One of the many standard applications of
the Well Ordering Principle is the Fundamental Theorem of Arithmetic which
we state below. The proof will be left as an exercise.
Theorem 2.1.5 [The Fundamental Theorem of Arithmetic]. Let n N
with n 2. Then there exist primes p
1
< p
2
< < p
k
and natural numbers
m
1
, m
2
, , m
k
such that
n = p
m1
1
p
m2
2
p
m
k
k
.
15
Moreover, if there exist primes q
1
< q
2
< < q
l
and natural numbers
r
1
, r
2
, , r
k
such that
n = q
r1
1
q
r2
2
q
r
l
l
,
then k = l, p
i
= q
i
for each 1 i k and m
i
= r
i
for each 1 i k.
Proof. This proof that there is a decomposition into primes is left as a homework
exercise. We will not address the uniqueness at this time.
Equivalence of these three principles. It turns out that these three prin-
ciples are logically equivalenti.e., they imply each other.
Theorem 2.1.6. The following are equivalent:
i) Principle of Mathematical Induction;
ii) Principle of Strong Induction;
iii) Well-ordering Principle.
Proof. The proof that i) ii) and ii) iii) are left as a homework exercise.
iii) i) Assume that N satises the Well Ordering Principle. Assume also
that S is a subset of N satisfying
1. 1 S.
2. For each k N, if k S, then k + 1 S.
We let T = N S. To prove the theorem we must show that T = . Assume
to the contrary that T is nonempty. Then by the Well Ordering Principle, T
has a least element which we denote by k
0
. Since we have assumed already
that 1 S, we know immediately that 1 ,= k
0
. It follows that k
0
1 N.
However, since k
0
1 < k
0
and k
0
is the least element of T it must be that
k
0
1 S. But if k
0
1 S, then by property 2 above we must also have
that k
0
= (k
0
1) + 1 S. This is a contradiction since we know that k
0
T.
Therefore, since the assumption that T ,= leads to a contradiction it must be
the case that T = . Finally, if T = and T = N S, then it must be the case
that S = N.
The proof above uses the technique known as proof by contradiction that
we mentioned briey in the previous Chapter. As previously indicate, we rst
assume that the conclusion of the theorem is false and then proceed to derive a
contradiction, thereby showing that conclusion cannot be false.
16
2.2 Properties of the Real Numbers
In this section, we will explore the fundamental properties of R.
We will start with the most important dening axiom of the real numbers.
Before stating the axiom, we we will rst require the following denition:
Definition 2.2.1. Given a set S R, a real number is called an upper bound
for S if x for all x S. A real number is called a lower bound for S if
x for all x S. S is said to be bounded above if S has an upper bound. S
is said to be bounded below if S has a lower bound. S is said to be bounded if
S has an upper bound and a lower bound.
Example 2.3. Let S
1
= [0, 1); then = 1 is an upper bound for S
1
and = 0
is a lower bound for S
1
. Incidentally, 17 is also an upper bound for S
1
and 24
is a lower bound for S
1
. Let S
2
= 1, 2; then 2 is an upper bound for S
2
and 1
is a lower bound for S
2
. Let S
3
= Q

(, 1) (the set of all rational numbers


that are less than 1); then S
3
is bounded above but not below.
Problem. Let S = N; then S is clearly bounded below. Is S bounded above?
Our intuition tells us that the answer to the problem above is that N is not
bounded above. However, this turns out to be a rather deep statement about
the nature of the real line. We will soon see why it is true.
Notice in the previous example that when we identied an upper bound
for the set [0, 1) our rst thought was to choose 1. In fact, the set [0, 1) has
innitely many upper bounds; any number larger than or equal to 1 will do.
However, amongst all such upper bounds, 1 is distinguished in that it is the
smallest. Similarly 0 is distinguished amongst all lower bounds of [0, 1) in that
it is the largest lower bound. This important observation leads us to consider
the following denitions.
Definition 2.2.2. We say that R is the least upper bound (greatest lower
bound) for a set S R if
1. is an upper bound (lower bound, respectively) for S, and
2. if is an upper bound (lower bound, respectively) for S, then
( , respectively).
The least upper bound of S is often called the supremum of S, and the greatest
lower bound the inmum of S. If a set S has a least upper bound (greatest
lower bound), then we denote it by sup S or lub S (inf S or glb S, respectively).
Example 2.4. Let S = [0, 1), then lub S = 1 and glb S = 0.
17
Example 2.5. Let S = x Q : x
2
< 2, the set of rational numbers whose
squares are less than 2. If we consider S as a subset of R, then lub S =

2
and glb S =

2. However, what happens if we only want to consider


rational numbers as our universal set. In fact, if we consider S only
as a subset of Q, then we can show that both the supremum and
inmum of S do not exist in Q.
Least upper bound property. We have just seen that a set may have a
least upper bound or a greatest lower bound when viewed as a subset of R but
fail to have such when viewed entirely within Q. In essence, the reason why this
can happen is that if we look at a number line that represents the set of rational
numbers Q, then this line has innitely many innitely small holes that cannot
be lled without adding additional elements. It turns out that at the edge of the
set S of rationals whose squares are less than 2 are two of these holes where the
real numbers

2 would like to be. If we approach the set S from the left from
within the rational, then no matter how close we are to S we will always be able
to move a little bit nearer this missing edge while staying within Q. For this
reason, we have an example of a bounded set where the upper bounds do not
have a minimum in Q and similarly the lower bounds do not have a maximum
in Q.
When we move to the set R we can ask if it also has holes. Alternatively, we
want to know if every set in R that is bounded above has as its upper edge
another real number. The following axiom is actually a conrmation that R has
no holes.
Axiom 2.2.3 [Least Upper Bound Property]. A nonempty subset S R
that is bounded above always has a least upper bound.
Observe that in our statement of the Least Upper Bound Property we specif-
ically excluded the emptyset . This leads to the following problem:
Problem. Is bounded above or below?
We begin by asking whether there is an upper bound for . This might seem
like a bit of an absurd question after all how can some number x be larger than
or equal to every element in a set that itself contains no element. However, the
key is to ask, when would x not be an upper bound. With this in mind, we have
the following proposition:
Proposition 2.2.4. Let R Then R is both an upper bound and a lower
bound for . In particular, is bounded.
Proof. If were not an upper bound for , then by denition there would exist
some element x satisfying x > , a contradiction (nothing is in ). This
18
shows that must be an upper bound for . Similarly, must be a lower bound
for , as claimed.
The previous proposition yields a very unusual fact about the empty set.
For this set and this set alone it is possible to have an upper bound and a
lower bound such that < . In fact, = 1 and = 5 are such a pair.
This proof again uses the general principle that a statement about elements in
a set will be automatically true for , because if it were not true, then there
would exist elements in to contradict the statement, which is impossible. The
truth of a statement about derived in this way is called a vacuous truth and
the statement is said to be true vacuously.
The above proposition also shows that we must take care to apply the least
upper bound property only to nonempty sets, because clearly, fails to have a
least upper bound even though it is bounded above!
The dual to the Least Upper Bound Property is the Greatest Lower Bound
Property, stated below. Note that, as we show below, the Greatest Lower Bound
Property can be derived as a theorem, using the Least Upper Bound Property
as the key tool in the proof. However, had we assumed the Greatest Lower
Bound Property as an axiom, we would be able to derive the Least Upper
Bound Property as a theorem. That is the the two properties are equivalent.
Theorem 2.2.5 [Greatest Lower Bound Property]. A nonempty subset
S R that is bounded below always has a greatest lower bound.
Proof. Let be a lower bound for S. Therefore, if x S, we have that x.
It follows that x and hence that is an upper bound for the set
T = S = s : s S. Since S is nonempty, so is T. By the Least Upper
Bound Property, T has a least upper bound . Now if x S, then x , so
that x, for every x T. This proves that is a lower bound for S. We
also know that since is an upper bound for T, we have and hence
that . However, since was an arbitrary lower bound of S, is in
fact the greatest lower bound for S.
Archimedean property. We will now answer the question posed at the end
of 2.3. The fact that N is not bounded above may seem trivial and obvious, but
one of the goals of the course is learn to proceed in mathematics rigorously and
carefully, verifying every new statement with denitions, axioms, and previously
proven results. This is important because there are many glaringly examples
of statements that appear that they must be true that are either extremely
dicult to prove or worse yet, for which there are even simple counterexamples
showing them to be false.
Theorem 2.2.6 [Archimedean Property]. The set N R has no upper
bound in R. That is, N is not bounded above.
19
Proof. Suppose on the contrary that N is bounded above by some M R. Then
since 1 N, N ,= . By the least upper bound property, N has a least upper
bound R. Then since is the least upper bound,
1
2
is not an upper
bound for N. This shows that there exists an element n N with

1
2
< n .
Since n N, it follows that n + 1 N, making

1
2
+ 1 < n + 1
< +
1
2
< n + 1,
which is a contradiction since no element of N can be greater than , an upper
bound for N. Consequently, the statement that N is bounded above cannot be
true and the theorem follows.
The previous proof uses an observation that will be used repeatedly through-
out this course, namely that if is the least upper bound of a set S and if > 0
is any positive number no matter how small, then < . Therefore, is
not an upper bound for S. It follows that there must be some s S that makes
fail to be an upper bound; that is, this s satises s > . But since
s S, it must be less than or equal to the upper bound . Combining these
two inequalities yields
< s .
The following easy but important corollary is also known as the Archimedean
Property.
Corollary 2.2.7 [Archimedean Property]. For every > 0, then there
exists an n N satisfying 0 <
1
n
< .
Proof. For any > 0, we can nd an n N with n >
1

> 0 by theorem
2.2.6for if such an n cannot be found, then
1

would be an upper bound for


N, contradicting the theorem. Taking reciprocals yields 0 <
1
n
< .
It might have been tempting to try and avoid using the Least Upper Bound
Property to prove the Archimedian Property by instead arguing as follows:
Assume that R is an upper bound for N. Then we certainly know that
> 0 and we also know that has a decimal expansion
= m.a
1
a
2
a
3
a
4

where m N and 0 a
i
9. (The number m, which is the greatest integer
less than , is called the oor of and it is denoted by |.) From here we, we
get that m < m+1. However m+1 N contradicting the fact that was
an upper bound for N.
20
It might well appear that we have succeeded in avoiding the use of the Least
Upper Bound Property in the above proof until we ask ourselves: how do
we know that every real number has a decimal expansion? It turns out
that this familiar fact can actually be shown to be equivalent to the Least
Upper Bound Property. Indeed, in an exercise later in the course, we will show
one direction of this equivalence by showing that under the assumption that the
real numbers satisfy the Least Upper Bound Property, then every real number
does indeed have a decimal expansion.
An application of the above corollary is the following corollary, which states
something about the density of rational numbers and irrational numbers as
subsets of R.
Definition 2.2.8. We say that a set S R is dense in R if for every R
and every > 0, there exists an s S such that s ( , +).
Corollary 2.2.9 [Density of Rationals and Irrationals]. For every a,
b R with a < b, there exists r Q and s , Q satisfying a < r < b and
a < s < b.
Proof. This proof is left as a homework exercise. In this proof, you may assume
that

2 is irrational.
2.3 Absolute value.
We will now introduce the simple but important concept of absolute value.
Definition 2.3.1. The absolute value of a real number x is the quantity
[x[ :=
_
x if x 0,
x if x < 0.
The essential properties of the absolute value are summarized below.
Remark 2.3.2. The absolute value satises
1. [Positive Deniteness:] [x[ 0, with [x[ = 0 if and only if x = 0,
2. [Positive Homogeneity:] [xy[ = [x[[y[, and
3. [Triangle Inequality:] [x +y[ [x[ +[y[.
Proof. The rst two properties can be veried directly from the denition of
absolute value; the proof is left as an exercise. We will prove the Trinagle
Inequality below.
21
Remark 2.3.3 [Geometric Interpretation of Absolute Value]. Note
that [x[ =

x
2
; this suggests that the absolute value is a one-dimensional
analogue of
_
x
2
+y
2
, which measures the length of a vector (x, y) in the
plane. In particular, [ x [ can be thought of as the distance from x to 0.
Observe, that [ x [=[ x0 [ and that [xy[ = [(1)(y x)[ = [ 1[[y x[ =
[y x[ by item (2) in the remark above. As such, we can interpret the quantity
[x y[ = [y x[ as the distance from x to y. This geometric interpretation
will be useful in solving some simple but useful inequalities involving absolute
values.
Example 2.6. We will nd all x R such that [x 3[ < 2. This problem
should be interpreted as nding those numbers x such that the distance from x
to 3 is less than 2. We can easily see that the solution set is (1, 5). This can be
veried rigorously by directly applying the denition of the absolute value.
Note. In general, we have that
1. [x a[ if and only if x (a , a +),
2. 0 < [x a[ < if and only if x (a , a +) a,
3. [x a[ if and only if x [a , a +].
Problem. Let x, y R. Find all z such that [x y[ [x z[ +[z y[.
Solution. Since [x y[ = [y x[, the problem is symmetric in x and y and so
we can assume without loss of generality that x y. We have three cases to
consider.
1. If z < x, then [x y[ < [z y[ [x z[ +[z y[.
2. If z > y, then [x y[ < [x z[ [x z[ +[z y[.
3. If x z y, then [x y[ = [x z[ +[z y[.
We have actually veried the following version of the triangle inequality.
Theorem 2.3.4 [Triangle Inequality]. For all x, y, z R, [x y[
[x z[ +[z y[.
22
This version of the triangle inequality will be used extensively throughout
the course. It should be interpreted as follows: The distance traveled going from
x directly to y is no greater than the distance traveled going from x to a third
point z and then from z back to y. If this third point z is between x and y,
then the two routes have the same length.
Note that if we replace y by y in the triangle inequality, and then let z = 0,
we get:
Corollary 2.3.5 [Triangle Inequality II]. For all x, y R, [x + y[
[x[ +[y[.
Here is our nal version of the triangle inequality; it is used less frequently
than the other two.
Corollary 2.3.6 [Triangle Inequality III]. For all x, y R, [[x[ [y[[
[x y[.
Proof. First observe that
[ x [=[ (x y) +y [[ x y [ + [ y [
This shows that
[ x [ [ y [[ x y [
Interchanging x and y establishes that
[ y [ [ y [[ y x [=[ x y [
Finally we conclude that
[[x[ [y[[ [x y[.
Remark 2.3.7. Suppose x R. If [x[ < for all > 0, then x = 0.
Proof. Suppose on the contrary that x ,= 0. Then let
0
=
|x|
2
> 0. We have
that

0
> [x[ = 2
0
> 0,
which is a contradiction since 2
0
>
0
. The result follows.
The above remark simply says that if the size of a number is less than
every positive number, then it must be 0.
23
Chapter 3
Sequences and Their Limits
3.0 Denition of Sequence
Informally, a sequence is an innite list of numbers where order is important.
For example, 0, 1, 0, 1, 0, 1, . . . is a sequence. Formally, we make the following
denition.
Definition 3.0.1. A sequence is a function f : N R. For n N, we call f(n)
the nth term of the sequence; for the sake of intuitive notation, we will usually
denote the nth term of the sequence by x
n
, y
n
, a
n
, b
n
, or some other letter with
n as subscript, instead of f(n). We will denote the whole sequence by x
n

nN
or simply by x
n
, where x could be any letter.
Note that a sequence does not need to have a pattern at all; however, most
of the sequence we will be working with will be dened by formulae such as
1
n
.
When we want to write down a sequence, we can specify how to compute its nth
term, as in
_
1
n
_
, or we can list the terms until the pattern becomes apparent,
as in
_
1,
1
2
,
1
3
,
1
4
, . . .
_
.
We will frequently encounter sequences that are dened by recursion. An
example of such a sequence is:
a
1
= 1, a
n+1
=

3 + 2a
n
.
You will notice that in the above example we can use what we know of the rst
term to calculate the second term. We can then determine the third term, and
so on. For recursively dened sequences calculating a particular term requires
us to already know the values of previous terms.
Diversion. Is there a dierence between induction and recursion?
Given a sequence a
1
, a
2
, a
3
, . . ., a tail of the sequence is a sequence of the
form a
k
, a
k+1
, a
k+2
, . . .; that is, a collection of terms in the original sequence
24
from some xed point onwardsthus the word tail. A tail is obtained by deleting
the rst few terms o the sequence.
3.1 Limits of Sequences
Dening the limit. Consider the sequence
_
1,
1
2
,
1
3
, . . . ,
1
n
, . . .
_
or
_
1
n
_
nN
.
What can we say about the a
n
s as n gets very large? As n gets larger, these
germs are getting closer and closer to the value 0. We say that 0 is the limit
of the sequence a
n
as n , or a
n
converges to 0. We give below an
informal, heuristic denition of convergence.
Informal definition. Given a sequence a
n
, we say that L is the limit of
a
n
as n goes to innity if as n gets very large, the a
n
s get closer and closer
to L.
There is a problem with this denition. Note that as n gets larger, a
n
=
1
n
gets closer and closer to 17; however, we do not want 17 to be a limit! What
we are missing is a way of quantifying the fact that as n gets large all of the
terms in the tail of the sequence are very close to 0. The following is the formal
denition of limit.
Definition 3.1.1 [Limit]. We say that L is the limit of a sequence a
n
(no-
tation: lim
n
a
n
= L) if for every > 0, these exists an N N
(which in general depends on ) such that
n N implies [a
n
L[ < .
That is, L < a
n
< L +.
In this case, we may also say that a
n
converges to L as n goes to innity
(notation: a
n
L as n ) or simply a
n
converges to L.
We can modify our more informal denition as follows:
Informal definition. L is the limit of a
n
if for any positive tolerance > 0,
there exists N N such that a
n
approximates L with an error that is less than
, provided that n N. That is, the entire tail of the sequence starting from
N onwards is captured in the interval (L , L +).
Definition 3.1.2. Given a sequence a
n
, if no real number is the limit of
a
n
, we say that a
n
diverges.
Example 3.1. We will formally show that
_
1
n
_
converges to 0. Let > 0. By
the Archimedean property, we can nd an N N such that 0 <
1
N
< . If
25
n N, then 0 <
1
n

1
N
< . This shows that

1
n
0

< and hence 0 is the


limit of
_
1
n
_
.
Example 3.2. We will formally show that the sequence
a
n
:= 1, 1, 1, 1, . . . = (1)
n+1

nN
diverges. That is, no real number is the limit of a
n
.
Suppose to the contrary that a
n
has a limit L. Then by denition of limit,
for = 1 there exists an N N so that n N implies [a
n
L[ < = 1. Let
n
1
N be odd with n
1
N. Then
[a
n1
L[ = [(1)
n1+1
L[ = [1 L[ < 1.
This shows that
L (0, 2).
Similarly, if we let n
2
N be even with n
2
N. Then
[a
n2
L[ = [(1)
n2+1
L[ = [ 1 L[ < 1.
This in turn shows that
L (2, 0).
Hence L satises both L (0, 2) and L (2, 0), a contradiction. We conclude
that a
n
has no limit and hence diverges.
Alternatively, we could have proceeded as follows. Observe that for all n N,
[a
n+1
a
n
[ = 2. Suppose to the contrary that a
n
has a limit L. By denition
of limit, for = 1 there exists an N N so that n N implies [a
n
L[ < 1.
By the triangle inequality,
2 = [a
N+1
a
N
[ [a
N+1
L[ +[L a
N
[ < 1 + 1 = 2,
a contradiction. We again conclude that a
n
has no limit.
Remark 3.1.3. Suppose a
n
is a sequence and L R. Then
lim
n
[a
n
L[ = 0 if and only if lim
n
a
n
= L.
Proof. Let > 0. Since lim
n
[a
n
L[ = 0, there exists an N N such that
for all n N, [a
n
L[ = [[a
n
L[ 0[ < . This shows that lim
n
a
n
= L.
The argument for the converse is almost identical and will be omitted.
The following theorem answers the question: How many limits can a se-
quence have?
Theorem 3.1.4. Suppose that lim
n
a
n
= L and that lim
n
a
n
= M. Then
L = M.
26
Proof. Let > 0. We can nd N
1
, N
2
N so that
n N
1
implies [a
n
L[ <

2
and
n N
2
implies [a
n
M[ <

2
.
Let N = maxN
1
, N
2
. Then
[L M[ [L a
N
[ +[a
N
M[ <

2
+

2
= .
This shows that [L M[ < for any arbitrary > 0. Remark 2.3.7 shows that
L M = 0, and hence L = M, as required.
Diversion. Dene the sequence a
n
recursively as follows. Let a
1
= 1, and
a
n+1
= cos(a
n
) for all n 1. Does a
n
converge?
Convergence and boundedness. The sequence 1, 2, 3, . . . does not con-
verge. How do we show this and why does this occur? Recall that a subset S
of R is bounded if it is bounded above by some and below by some . That
is, x for all x S.
Observation. A subset S of R is bounded if and only if there exists an M > 0
such that S [M, M].
Proof. If S [M, M], then M is an upper bound for S and M is a lower
bound for S. If S has an upper bound and a lower bound , then let M =
max[[, [[. We have that x [[ M and x [[ M for all
x S, whence S [M, M].
Question: Are convergent sequences bounded?
Theorem 3.1.5. Every convergent sequence is bounded.
Proof. Assume that L = lim
n
a
n
and let = 1. We can nd an N
0
N such
that if n N
0
, then [a
n
L[ < 1. Hence [a
n
[ = [a
n
L+L[ [a
n
L[ +[L[
1 + [L[ for all n N
0
. Let M = max[a
1
[, [a
2
[, . . . , [a
N01
[, 1 + [L[. Then we
have that [a
n
[ M for all n N and so a
n
as a set of values is a subset of
[M, M], showing that it is bounded.
Question: Are all bounded sequences convergent?
Example 3.3. The sequence 1, 1, 1, 1, . . . is bounded but not convergent.
27
3.2 Monotone Convergence Theorem
In this section, we will prove an important result about sequences that progress
in only one direction.
Definition 3.2.1. We say that a sequence a
n
is
increasing if a
n
< a
n+1
, for all n N.
decreasing if a
n
> a
n+1
, for all n N.
non-decreasing if a
n
a
n+1
, for all n N.
non-increasing if a
n
a
n+1
, for all n N.
monotonic if a
n
is either non-increasing or non-decreasing.
Example 3.4.
_
n
n+1
_
=
_
1
1
n+1
_
is increasing. cos(n) is not monotonic.
Theorem 3.2.2 [Monotone Convergence Theorem]. If a sequence a
n

is monotonic and bounded, then it converges.


Proof. First assume that a
n
is non-decreasing. Then since it is bounded
above, the set of values a
n
has a least upper bound l = supa
n
. Let > 0.
Since l < l, there exists N N such that L < a
N
l. If n N, then
l < a
N
a
n
l since a
n
is non-decreasing. We conclude that if n N,
[a
n
l[ < ; i.e., l = lim
n
a
n
. The case where a
n
is non-increasing is
handled similarly; the details are left as an exercise.
Diversion. The monotone convergence theorem is equivalent to the least upper
bound property.
Corollary 3.2.3. A monotonic sequence converges if and only if it is bounded.
Proof. Bounded monotonic sequences converge by the monotone convergence
theorem. To prove the converse, note that theorem 3.1.5 implies that non-
increasing and non-decreasing sequences that are convergent are bounded be-
low and above, respectively. Non-increasing and non-decreasing sequences are
bounded above and below, respectively, by their rst term. The result fol-
lows.
28
3.3 Series
Given a sequence a
n
, we can form the formal sum a
1
+a
2
+a
3
+ :=

n=1
a
n
.
This is called a series.
Question: What does this formal sum represent? Does it have a value?
Example 3.5. Suppose a
n
= (1)
n+1
. Then our formal sum looks like
a
1
+a
2
+a
3
+ = 1 + (1) + 1 + (1) + 1 + (1) + .
We could parenthesize our formal sum in this way:
[1 + (1)] + [1 + (1)] + [1 + (1)] + = 0 + 0 + 0 + = 0.
On the other hand, we could parenthesize our formal sum in this way:
1 + [(1) + 1] + [(1) + 1] + [(1) + 1] + = 1 + 0 + 0 + 0 + = 1.
Our result is ambiguous; the value of the series changes if we change the way
we parenthesize the terms. We need an alternate method for nding the sum
of series.
Definition 3.3.1. Given a sequence a
n
= a
1
, a
2
, a
3
, . . ., we dene the kth
partial sum, S
k
, as
S
k
= a
1
+a
2
+ +a
k
=
k

n=1
a
n
.
We say that the series

n=1
a
n
converges if the sequence of partial sums
S
k
converges. In this case, we write

n=1
a
n
= lim
k
S
k
. Otherwise, we
say that the series diverges.
Example 3.6. Let r R. Consider the series

n=0
r
n
= 1 +r +r
2
+r
3
+ .
For which r does this series converge? To answer this question, we must look
at its sequence partial sums; hence, let S
k
=

k
n=0
r
n
= 1 +r +r
2
+ +r
k
.
If r = 1, then
S
k
= 1 + 1 + 1 + + 1 = k + 1.
Since S
k
= k + 1 diverges, the series

n=0
1
n
diverges.
If r = 1, then
S
k
= 1 + (1) + 1 + + (1)
k
=
_
1 if k 0 (mod 2),
0 if k 1 (mod 2).
Since S
k
= (1)
k
+ 1 diverges,

n=0
(1)
n
diverges.
29
Finally, in the interesting case where [r[ , = 1, we have that
S
k
=
k

n=0
r
n
= 1 +r +r
2
+ +r
k
,
whence
(1 r)S
k
= (1 r)(1 +r +r
2
+ +r
k
) = 1 r
k+1
and so
S
k
=
1 r
k+1
1 r
.
We can show that lim
k
r
k+1
= 0 if [r[ < 1. This means that
lim
k
S
k
= lim
k
1 r
k+1
1 r
=
1 lim
k
r
k+1
1 r
=
1 0
1 r
=
1
1 r
,
where the third equality we shall formally prove later (in proposition 3.5.1). If
[r[ > 1, then S
k
=
1r
k+1
1r
is unbounded and hence divergent by 3.1.5.
We conclude that

n=0
r
n
converges if and only if [r[ < 1, and in this case,

n=0
r
n
=
1
1r
.
A series of the form

n=0
r
n
is called a geometric series. We have actually
shown the following theorem.
Theorem 3.3.2 [Geometric Series Test]. A geometric series

n=0
r
n
con-
verges if and only if [r[ < 1.
Example 3.7. Consider the geometric series

n=0
_
1
2
_
n
= 1 +
1
2
+
1
4
+ . By
the geometric series test, it converges and has sum
1
1
1
2
= 2.
Question: Does the series

n=0
1
n!
converge?
Example 3.8. The series

n=0
1
n!
converges. To see this, let S
k
=

k
n=0
1
n!
.
Then we have that
S
0
= 1,
S
1
= 1 + 1,
S
2
< 1 + 1 +
1
2
,
S
3
< 1 + 1 +
1
2
+
1
2
2
,
S
4
< 1 + 1 +
1
2
+
1
2
2
+
1
2
3
,
.
.
.
S
k
1 +
k1

n=0
1
2
n
< 1 + 2 = 3.
30
Since S
k
is increasing and bounded above by 3, the monotone convergence
theorem shows that

n=0
1
n!
converges and that

n=0
1
n!
sup
kN
S
k
< 3.
Diversion.

n=0
1
n!
:= e.
The above example suggests a kind of comparison process that will de-
termine whether a sequence will converge based on whether another sequence
converges.
Theorem 3.3.3. Assume that a
1
a
2
a
3
and 0 b
1
b
2
b
3
;
that is, assume that a
n
and b
n
are non-decreasing sequences. Assume also
that a
n
b
n
for each n N. Then a
n
converges if b
n
converges.
Proof. Note that if b
n
converges, to L, say, then the proof of the monotone
convergence theorem shows that b
n
L = sup
nN
b
n
. We now have that a
n

b
n
L for all n N. Since a
n
is non-decreasing and bounded above by L, it
converges by the monotone convergence theorem.
Remark 3.3.4. If a
n
and b
n
are non-decreasing and a
n
b
n
for all n N,
then b
n
diverges if a
n
diverges.
Proof. This remark is just the contrapositive of theorem 3.3.3 and follows im-
mediately.
Example 3.9. Consider

n=1
1
n
. This series is known as the harmonic series.
Let S
k
=

k
n=1
1
n
. It is clear that S
k
is an increasing sequence. We have that
S
1
= 1,
S
2
= 1 +
1
2
,
S
4
= 1 +
1
2
+
1
3
+
1
4
> 1 +
1
2
+
1
2
,
S
8
= 1 +
1
2
+
1
3
+ +
1
8
> 1 +
1
2
+
1
2
+
1
2
,
.
.
.
S
2
j 1 +
j
2
.
Since
_
1 +
j
2
_
is not bounded, S
2
j is not bounded. Hence S
k
is not
bounded. The series diverges by theorem 3.1.5.
Example 3.10. Consider

n=1
1
n
2
. Suppose we can show that

n=2
1
n
2
n
converges (to T, say). Let S
k
=

k
n=1
1
n
2
and T
k
=

k
n=2
1
n
2
n
. Note that
31
0 <
1
n
2
<
1
n
2
n
, and so S
k
1 < T
k
for all k 2. Now T
k
is clearly
increasing, so it is bounded above by its limit T. This shows that S
k
1 <
T
k
T S
k
< T
k
+ 1 T + 1 for all k 2, and S
k
is increasing; therefore,
S
k
is bounded above by T and hence convergent by the monotone convergence
theorem. This shows that

n=1
1
n
2
converges.
Now, by elementary algebra, we have that
1
n
2
n
=
1
n1

1
n
. Then
T
k
=
k

n=2
1
n
2
n
=
_
1
1
2
_
+
_
1
2

1
3
_
+
_
1
3

1
4
_
+ +
_
1
k 1

1
k
_
= 1
1
k
.
Hence lim
k
T
k
= lim
k
_
1
1
k
_
, which we can easily show to be 1. This
series therefore converges (and it converges to T = 1).
We conclude that

n=1
1
n
2
converges (but we do not yet know what value
it converges to). We can approximate this serie by comparing it to the upper
bound T +1:

n=1
1
n
2
1 +1 = 2. Since this series is increasing with the rst
term being 1, we know that 1

n=1
1
n
2
2.
Throughout the last example, we used some arithmetic rules on limits as
though limits could be carried into and out of an addition or a multiplication.
We will later formally verify that these operations are valid in proposition 3.5.1.
Diversion.

n=1
1
n
2
=

2
6
.
3.4 Subsequences
Definition 3.4.1. Let a
n
be a sequence. Let n
1
, n
2
, n
3
, . . . be a sequence
of natural numbers such that n
1
< n
2
< n
3
< . A subsequence is a sequence
of the form b
k
:= a
n
k
; i.e.,
b
1
, b
2
, b
3
, . . . = a
n1
, a
n2
, a
n3
, . . ..
Example 3.11. If a
n
= (1)
n+1
and our indices n
k
= 2k, then a
n
k
=
a
2k
= 1, 1, 1, . . . is a subsequence of a
n
(it is in fact the sequence
composed of the even terms of a
n
).
Question: Suppose that a
n
converges to L and assume that a
n
k
if a sub-
sequence of a
n
. Does lim
k
a
n
k
= L?
Theorem 3.4.2. Suppose that a
n
converges to L. Let a
n
k
be a subsequence
of a
n
. Then a
n
k
converges to L.
Proof. Let > 0. Since lim
n
a
n
= L, we can nd an N
0
N such that
if n N
0
, then [a
n
L[ < . Since n
k
is a strictly increasing sequence of
natural numbers, it is not bounded above; therefore, we can choose a K
0
N
32
so that n
K0
N
0
. If k K
0
, then n
k
n
K0
N, and hence [a
n
k
L[ < .
This shows that lim
k
a
n
k
= L, as required.
Definition 3.4.3. We say that a sequence diverges to (notation: lim
n
a
n
= ) if for every M > 0, there exists some N
0
such that if n N
0
, then a
n
M.
Similarly, we say that a sequence diverges to (notation: lim
n
a
n
= )
if for every M < 0, there exists some N
0
such that if n N
0
, then a
n
M.
Note that if a
n
diverges to , then it does not converge; however, we still
use the intuitive notation lim
n
a
n
= . Similarly, if a
n
diverges to ,
it diverges.
Example 3.12. Let a
n
= n. Let n
k
be any sequence of natural numbers such
that n
1
< n
2
< n
3
< . Then we know that the subsequence a
n
k
= n
k

is unbounded above and hence diverges (to ); therefore, the sequence a


n
=
1, 2, 3, . . . does not have any convergent subsequences.
Example 3.13. Let
a
n
=
_
1 if n 0 (mod 2),
n+1
2
if n 1 (mod 2).
Then a
n
is the sequence 1, 1, 2, 1, 3, 1, 4, 1, 5, 1, . . .. Since a
2k
= 1 is a
constant sequence, it converges to 1; however, lim
k
a
2k1
= lim
k
k = .
That is, the odd terms of a
n
diverges because it is unbounded above.
Problem. What condition on a sequence would force it to have a convergent
subsequence? Is boundedness sucient?
Definition 3.4.4. Given a sequence a
n
, we call the index k a peak point of
a
n
if a
k
> a
n
for all n k.
Lemma 3.4.5 [Peak Point Lemma]. Every sequence has a monotone subse-
quence.
Proof. Given a
n
, we must consider two cases: (1) a
n
has innitely many
peak points, or (2) a
n
has only nitely many peak points. We shall prove the
lemma for each case.
1. Suppose a
n
has innitely many peak points. In this case, we can nd a
strictly increasing sequence of peak points n
k
= n
1
, n
2
, n
3
, . . .. (Each
term in this sequence is an index.) Now, for each k N, since n
k
is a peak
point and n
k+1
> n
k
, we have that a
n
k+1
< a
n
k
by the denition of peak
point. Hence a
n
k
is a decreasing subsequence of a
n
and we are done.
33
2. Suppose a
n
has nitely many peak points. Since the set of all peak
points of a
n
has nitely many elements, it is bounded; therefore, we
can choose an index n
1
N such that n
1
> k for all peak points k of
a
n
. Note that n
1
is not a peak point, and so we can choose an n
2
with
n
2
> n
1
such that a
n1
a
n2
. Since n
2
is not a peak point (because
n
2
> n
1
> k for all peak points k), we can nd an n
3
with n
3
> n
2
such
that a
n1
a
n2
a
n3
. Proceeding in this fashion, so that we have chosen
n
1
, n
2
, . . . , n
k
with n
1
< n
2
< < n
k
such that a
n1
a
n2
a
n
k
,
we can nd n
k+1
> n
k
such that a
n
k
a
n
k+1
because a
k
not a peak point.
This shows that we can inductively generate a strictly increasing sequence
n
k
with a
n
k
non-decreasing. This completes the proof.
With the above lemma in hand, we now come to the main theorem.
Theorem 3.4.6 [Bolzano-Weierstrass Theorem]. Every bounded sequence
has a convergent subsequence.
Proof. Let a
n
be any bounded sequence. By the peak point lemma, a
n
has
a monotonic subsequence a
n
k
. Since a
n
k
is also bounded, it converges by
the monotone convergence theorem.
Definition 3.4.7. We say that L is a limit point of a sequence a
n
if there
exists a subsequence a
n
k
of a
n
such that lim
k
a
n
k
= L. We denote the
set of all limit points by lima
n
.
Example 3.14. Let a
n
=
1
n
. Then lima
n
= lim
_
1,
1
2
,
1
3
,
1
4
, . . .
_
= 0.
Note that if lim
n
a
n
= L, then lima
n
= L. That is, since all subse-
quences of a convergent sequence converge to the same limit L, the limit points
of a
n
is the singleton set L.
Example 3.15. Let a
n
= (1)
n+1
. Then lima
n
= lim1, 1, 1, 1, . . . =
1, 1.
Example 3.16. Let a
n
= n. Then lima
n
= lim1, 2, 3, . . . = .
Diversion. Does there exist a sequence a
n
such that lima
n
= R?
34
3.5 Arithmetic of Limits of Sequences
Several results from the previous sections depended on the ability for limits to
interact well with arithmetic operations such as addition and multiplication. In
this section, we will show that this is indeed the case.
Proposition 3.5.1. Assume that lim
n
a
n
= L and lim
n
b
n
= M.
1. If c R, then lim
n
ca
n
= c lim
n
a
n
= cL.
2. lim
n
(a
n
+b
n
) = lim
n
a
n
+ lim
n
b
n
= L +M.
3. lim
n
(a
n
b
n
) = (lim
n
a
n
) (lim
n
b
n
) = LM.
4. If b
n
,= 0 for all n N and M ,= 0, then lim
n
an
bn
=
limnan
limnbn
=
L
M
.
Proof. We will prove each item separately.
1. First suppose c = 0. Then ca
n
= 0 for all n N and so lim
n
ca
n
=
lim
n
0 = 0 = cL. Now suppose c ,= 0. Let > 0. Since

|c|
> 0, we can
nd an N N such that if n N, then [a
n
L[ <

|c|
. Hence if n N,
then
[ca
n
cL[ = [c[[a
n
L[ < [c[

[c[
= .
This shows that lim
n
ca
n
= cL, as required.
2. Let > 0. We can nd N
1
N so that if n N
1
, then [a
n
L[ <

2
. We can
nd N
2
N so that if n N
2
, then [b
n
M[ <

2
. Let N = maxN
1
, N
2
.
If n N, then we have that
[(a
n
+b
n
) (L +M)[ = [(a
n
L) + (b
n
M)[
[a
n
L[ +[b
n
M[
<

2
+

2
= .
This shows that lim
n
(a
n
+b
n
) = L +M, as required.
3. By theorem 3.1.5, a
n
is a bounded sequence, and so there exists a C > 0
so that [a
n
[ C for all n N. Let > 0. We can choose an N
1
N so
that if n N
1
, [b
n
M[ <

2C
. We can also choose an N
2
N so that if
n N
2
, [a
n
L[ <

2|M|
. Let N = maxN
1
, N
2
. If n N, we have that
[a
n
b
n
LM[ = [a
n
b
n
a
n
M +a
n
M LM[
[a
n
b
n
a
n
M[ +[a
n
M LM[
= [a
n
[[b
n
M[ +[M[[a
n
L[
< C

2C
+[M[

2[M[
=

2
+

2
= .
35
This shows that lim
n
a
n
b
n
= LM, as required.
4. We will rst consider the case where a
n
= 1, 1, 1, 1, . . .. In this case,
L = 1 and we must show that lim
n
1
bn
=
1
M
. Consider

1
bn

1
M

bnM
bnM

. With
0
=
|M|
2
, we can nd an N
1
such that if n N
1
, then
[b
n
M[ <
0
=
|M|
2
. Hence if n N
1
, we have by a triangle inequality
that
[[b
n
[ [M[[ [b
n
M[ <
[M[
2

[M[
2
< [b
n
[ [M[ <
[M[
2

[M[
2
< [b
n
[ <
3[M[
2
.
We focus on the leftmost inequality [b
n
[ >
|M|
2
. If n N, then

1
b
n
M

=
1
[b
n
[[M[
<
1
|M|
2
[M[
=
2
[M[
2
.
Let > 0. We can choose an N
2
N such that if n N, then [b
n
M[ <
|M|
2
2
. Let N = maxN
1
, N
2
. If n N, we have that

1
b

1
M

b
n
M
b
n
M

= [b
n
M[

1
b
n
M

<
[M[
2
2

2
[M[
2
= .
This shows that lim
n
1
bn
=
1
M
.
Now consider a general sequence a
n
. Then by item (3) of this proof,
lim
n
a
n
b
n
= lim
n
a
n

1
b
n
= lim
n
a
n
lim
n
1
b
n
= L
1
M
=
L
M
,
completing the proof.
On the surface, the four statements of the proposition above states what
the limits of those sequences created by combining two convergent sequences
via arithmetic operations should be, but more importantly, they state that the
limits exist. The fact that the limit of a combination is the combination of the
limits is only valid in general when the components are convergent. If either a
n

36
or b
n
is divergent, we generally cannot say anything about the convergence
of the sequence created by combining a
n
and b
n
, let alone what its limit should
be.
In the proposition above, only the quotient rule had problems when M = 0,
including the special case where L = 0 as well. Intuitively, in the case where
L = M = 0, we can see that if the numerator approaches 0 at a faster rate than
the denominator, then the quotient limit should be 0; if the denominator tends
to 0 at a faster rate than the numerator, then the quotient sequence blows up
and becomes unbounded and divergent. Can we show this formally? Indeed, if
the denominator tends to 0 and the quotient limit exists, can we conclude that
the numerator approaches 0 at all?
Theorem 3.5.2. Assume that a
n
and b
n
are sequences with b
n
,= 0 for all
n N. Suppose lim
n
an
bn
exists and equals L. Also suppose that lim
n
b
n
exists and equals 0. Then lim
n
a
n
exists and equals 0.
Proof. Since lim
n
an
bn
= L, we have that
lim
n
a
n
= lim
n
b
n

a
n
b
n
=
_
lim
n
b
n
_

_
lim
n
a
n
b
n
_
= 0 L = 0,
where the third-to-last equality is due to proposition 3.5.1.
In general, if lim
n
a
n
= lim
n
b
n
= 0 and b
n
,= 0 for all n N, the
quotient sequence
_
an
bn
_
may or may not exist.
Example 3.17. Let a
n
= b
n
=
1
n
. Then
an
bn
= 1 for all n N and so
lim
n
an
bn
= lim
n
1 exists and equals 1.
Example 3.18. Let a
n
=
1
n
and b
n
=
1
n
2
. This time, the b
n
s goes to 0 with a
quadratic rate, which is faster than the numerators linear rate. We expect the
quotient sequence to be unbounded and therefore divergent. Formally,
an
bn
=
1
n
1
n
2
= n, and so
_
an
bn
_
= n is unbounded and therefore divergent by theorem
3.1.5.
Example 3.19. We will evaluate lim
n
3n
2
+2n1
4n
2
+2
. Note that
lim
n
3n
2
+ 2n 1
4n
2
+ 2
= lim
n
n
2
n
2

3 +
2
n

1
n
2
4 +
2
n
2
=
lim
n
3 + lim
n
2
n
+ lim
n
1
n
2
lim
n
4 + lim
n
2
n
2
=
3 + 2(0) 0
4 + 2(0)
=
3
4
.
The rst equality in the second line is due to proposition 3.5.1 (3).
37
Example 3.20. Consider the sequence
_
3n
2
+5
n
3/2
+2
_
. We know that
3n
2
+ 5
n
3/2
+ 2
=
n
3/2
n
3/2

3n
1/2
+
5
n
3/2
1 +
2
n
3/2

3n
1/2
1 + 2
=

n,
which is unbounded. This shows that lim
n
3n
2
+5
n
3/2
+2
= .
Example 3.21. Let
a
n
=
b
0
+b
1
n +b
2
n
2
+b
3
n
3
+ +b
j
n
j
c
0
+c
1
n +c
2
n
2
+ +c
k
n
k
.
Consider the sequence a
n
. We can easily show that lim
n
a
n
= 0 if j < k,
lim
n
a
n
=
bj
c
k
if j = k, and that a
n
diverges if j > k. We leave the details
as an exercise.
Example 3.22. Let a
n
=
cos(n)
n
. We want to nd the limit of the sequence
a
n
. It can be shown that the sequence cos(n) does not converge, and so we
cannot use proposition 3.5.1 to conclude that
lim
n
a
n
=
_
lim
n
cos(n)
_

_
lim
n
1
n
_
=
_
lim
n
cos(n)
_
0 = 0.
Intuitively, however, we can see that the limit is indeed 0, because the cosine
function is never larger than 1 in absolute value, and so dividing by n will make
it converge to 0 at a rate no slower than
1
n
would.
Since [ cos(n)[ 1 for all n N, [a
n
[ =

cos(n)
n


1
n
for all n. We now have
1
n

cos(n)
n

1
n
.
That is,
cos(n)
n
is squeezed between two sequences that converge to the same
limit: 0. Must
_
cos(n)
n
_
converge? Does it have a choice? If
_
cos(n)
n
_
converges,
is its limit also 0?
3.6 Squeeze Theorem
Theorem 3.6.1 [Squeeze Theorem]. Assume that a
n
b
n
c
n
, and
lim
n
a
n
= L = lim
n
c
n
.
Then b
n
converges and lim
n
b
n
= L.
38
Proof. Let > 0. We can nd an N
1
N such that if n N
1
, then [a
n
L[ < .
We can also nd an N
2
N such that if n N
2
, then [c
n
L[ < . Let
N = maxN
1
, N
2
. We now have, for all n N, L < a
n
< L + and
L < c
n
< L +. Hence if n N,
L < a
n
< b
n
< c
n
< L +,
implying that [b
n
L[ < . This shows that b
n
converges and lim
n
b
n
= L,
as required.
Example 3.23. We can now solve the problems that we encountered in example
3.22. We will show that
_
cos(n)
n
_
converges and nd its limit. Since [ cos(n)[ 1
for all n N,

cos(n)
n


1
n
for all n. We have
1
n

cos(n)
n

1
n
,
for all n N. Since lim
n
1
n
= 0 = lim
n
1
n
, the squeeze theorem shows
that
_
cos(n)
n
_
converges and has limit 0.
3.7 Cauchy Sequences
In the denition of limit, the denition of convergence is tied with it. A se-
quences converges if it has a limit; but we would like to know whether there is
an intrinsic test for convergence of sequences. Our current technique for show-
ing that a sequence converges is this: We rst guess what its limit is, and then
proceed with an - argument. Can we tell that a sequence converges just by
looking at its terms, without guessing its limit?
Definition 3.7.1. We say that a sequence a
n
is Cauchy if for every > 0,
there exists some N N such that if m, n N, then [a
n
a
m
[ < .
This denition says that the terms of a Cauchy sequence eventually become
close to each other. More precisely, given any number , we can nd a tail of
the sequence such that any pair of terms in this tail is less than units apart.
Theorem 3.7.2. Any convergent sequence a
n
is Cauchy.
Proof. Assume that lim
n
a
n
= L. Let > 0. We can nd an N N such
that if n N, then [a
n
L[ <

2
. If m, n N, then we have that
[a
n
a
m
[ = [a
n
L +L a
m
[ [a
n
L[ +[L a
m
[ <

2
+

2
= .
This shows that a
n
is Cauchy.
39
Problem. Is every Cauchy sequence convergent?
Lemma 3.7.3. If a
n
is Cauchy, then a
n
is bounded.
Proof. Since a
n
is Cauchy, we can nd an N N such that if m, n N,
then [a
n
a
m
[ < 1. Hence if n N, we have that [a
n
[ = [a
n
a
N
+ a
N
[
[a
n
a
N
[ +[a
N
[ < 1 +[a
N
[. Let M = max[a
1
[, [a
2
[, [a
3
[, . . . , [a
N1
[, 1 +[a
N
[;
then [a
n
[ M for all n N, as required.
Lemma 3.7.4. If a
n
is Cauchy, and if a
n
k
is a subsequence of a
n
with
lim
k
a
n
k
= L, then a
n
converges and lim
n
a
n
= L.
Proof. Let > 0. There exists an N N such that if m, n N, then
[a
n
a
m
[ <

2
.
Since lim
k
a
n
k
= L, we can nd a K N so that n
K
N and
[a
n
K
L[ <

2
.
But now we have that for all n N,
[a
n
L[ = [a
n
a
n
K
+a
n
K
L[ [a
n
a
n
K
[ +[a
n
K
L[ <

2
+

2
= ,
showing that a
n
converges and lim
n
a
n
= L.
Theorem 3.7.5 [Completeness Theorem]. If a sequence a
n
is Cauchy,
then a
n
converges.
Proof. Suppose a
n
is Cauchy. Then a
n
is bounded by lemma 3.7.3. By
the Bolzano-Weierstrass theorem, a
n
has a convergent subsequence a
n
k
.
Lemma 3.7.4 shows that a
n
converges, as required.
Example 3.24. Consider the harmonic series

n=1
1
n
. Let S
k
= 1 +
1
2
+
1
3
+
+
1
k
. The dierence between successive partial sums S
k+1
S
k
is
S
k+1
S
k
=
_
1 +
1
2
+
1
3
+ +
1
k
+
1
k + 1
_

_
1 +
1
2
+
1
3
+ +
1
k
_
= =
1
k + 1
0,
as k . Is S
k
Cauchy? No, since S
k
diverges. The denition of Cauchy
requires us to check that every possible pair of terms in the tail of the sequence
is close together, not just successive terms.
40
Chapter 4
Limits of Functions and
Continuity
4.0 Some Denitions
Below is our working denition of function.
Informal definition. A function is a rule that assigns each element in a
given set X a unique value in a set Y . The set X is called the domain of the
function f (notation: X = dom(f)) and Y is called the codomain of f. We can
express all of this at once using the compact notation f : X Y . If x X is
assigned to y Y , we can write y = f(x), where f represents the rule. The
range of f is the set y Y : f(x) = y for some x X. We often denote the
range by f(X) or ran(f).
The domain is simply the set from which the function takes its input
values. The codomain is a set that contains potential outputs of the function.
The range contains all and only the output values of f. We we specify a
function, we must declare its domain; however, if a function has an obvious
natural domain, then we can simply dene the function without specifying
the domain. In this case, the domain will be the natural domain on which
the function (rule) makes sense. Similarly, if the codomain is not specied, we
will simply assume that it is the range.
Example 4.1. Let f(x) =

x. Using the above convention, the domain of f
is dom(f) = x R : x 0 and the range of f is ran(f) = y R : y 0.
(We will later show this last statement rigorously using the intermediate value
theorem.) Since the codomain is not specied, we will assume that it is equal
to the range.
41
Example 4.2. Let f : R R be dened by f(x) = x
2
. This time, we have
explicitly and fully dened a function f whose domain is R and whose codomain
is R. The range of f is y R : y 0. (We will later show this last statement
rigorously using the intermediate value theorem.)
Here is the formal denition of a function.
Definition 4.0.1. Given nonempty sets X and Y , a function from X into Y
is a subset f of X Y with the additional property that if (x, y
1
) f and
(x, y
2
) f, then y
1
= y
2
, for all x X, y
1
, y
2
Y . Here, X is called the
domain of f (notation: dom(f) = X) and Y is called the codomain of f. We
can display all of this information together using the notation f : X Y . If
(x, y) f, we will write y = f(x). The set y Y : (x, y) f for some x X
is called the range of f, denoted as f(X) or ran(f).
Definition 4.0.2. The graph of a function f : X Y is the set of ordered
pairs (x, f(x)) : x X. Under the formal denition of a function, the graph
of f is just the set f itself. We say that two functions f and g are equal if they
have the same graph; that is, f = g as sets.
Note. Given two functions f and g, f = g if and only if dom(f) = dom(g) and
f(x) = g(x) for all x X.
Example 4.3. Let f(x) = x+1 and g(x) =
x
2
1
x1
. Are these the same functions?
Since dom(f) = R and dom(g) = x R : x ,= 1, dom(f) ,= dom(g). This
means that f ,= g.
4.1 Limits
4.1.1 Denition
Observe that while g(x) =
x
2
1
x1
is not dened at x = 1, for x near 1, we have
g(x) = x + 1. Suppose that we allow x to approach (but not reach) the value
1. Then x + 1 approaches 2. We want to say that 2 is the limit of g(x) as x
approaches 1. Here is our informal, heuristic denition of limit for functions.
Informal definition. We say that L is the limit of f(x) as x approaches a if
the values of f(x) get closer and closer to L as x gets closer and closer to a.
From this denition, we have the sense that a limit gives us information
about the behaviour of our function f near x = a, but this denition, just like
our heuristic denition of limit for sequences, is not precise enough to capture
the notion of a limit. Here is our formal denition of limit.
42
Definition 4.1.1 [Limit of Functions]. Suppose that we have a function
f : S R where S R. Assume that a S and that there exists an open
interval I containing a such that I a S. We say that L is the limit of f(x)
as x approaches a if for every > 0, there exists a > 0 such that
0 < [x a[ < and x S implies [f(x) L[ < .
In this case, we write lim
xa
f(x) = L. If no such L exists, we say that the
limit of f(x) as x approaches a does not exist, and we write lim
xa
f(x) does
not exist.
Note. For lim
xa
to exist, f(x) must be dened on some open interval (a
, a +) except possibly at x = a.
Problem. What can we say about lim
x0

x? This denition of limit excludes


the possibility that lim
x0

x exists since

x is not dened for x < 0.
Note. In considering lim
xa
f(x), it does not matter whether or not f(a) is
dened. Moreover, the actual value of f(a) does not aect the limit.
Example 4.4. We will try to prove that lim
x2
(3x+1) = 7. Note that 3x+1 is
dened on all of R. Let > 0. Let <

3
. Then for all x satisfying 0 < [x2[ < ,
we have that [(3x + 1) 7[ = [3x 6[ = 3[x 2[ < 3 = 3

3
= . This shows
that lim
x2
(3x + 1) = 7, as claimed.
Example 4.5. We will try to prove that lim
x3
x
2
= 9. Note that x
2
is dened
on all of R. Given an > 0, we let = min
_
1,

7
_
. We have that
2


7
since 0 < 1. If 0 < [x 2[ < , then we have that
[x
2
9[ = [x + 3[[x 3[ = [x 3 + 6[[x 3[ ([x 3[ +[6[)[x 3[
= [x 3[[x 3[ + 6[x 3[ <
2
+ 6 + 6
= 7 7

7
= .
This shows that lim
x3
x
2
= 9.
Problem. Find lim
x1
(x
5
+ 2x
3
+ 6x + 1).
Should we play the - game? Example 4.5 is discouraging enough; if we use
the denition of limit directly, this question will be very ugly to solve.
43
Example 4.6. Let f : R 0 R be dened by f(x) =
|x|
x
. We have that
f(x) =
_
x
x
= 1 if x > 0,
x
x
= 1 if x < 0.
Does lim
x0
f(x) exist? We will prove that this limit does not exist. To show
that a limit does not exist, we need to nd one > 0 for which no > 0
will satisfy our denition of limit. We will in fact use
0
= 1. We proceed by
contradiction. Suppose on the contrary that lim
x0
f(x) exists and equals L.
Then for =
0
= 1, we can nd a > 0 such that if 0 < [x 0[ = [x[ < , then
[f(x) L[ < = 1. Let x
1
=

2
. Then 0 < [x[ =

2
< , and so
[f(x
1
) L[ =

[x
1
[
x
1
L

= [1 L[ < 1.
Let x
2
=

2
. Then 0 < [x[ =

2
< , and so
[f(x
2
) L[ =

[x
2
[
x
2
L

= [ 1 L[ < 1.
Now we have that
2 = [2[ = [(1 L) (1 L)[ [1 L[ +[ 1 L[ < 1 + 1 = 2,
a contradiction. This shows that lim
x0
f(x) does not exist.
As with sequences, we can show that limits are unique.
Theorem 4.1.2. Assume that lim
xa
f(x) = L and lim
xa
f(x) = M. Then
L = M.
Proof. This proof is left as an exercise.
4.1.2 Sequential Characterization of Limits
There is a close connection between sequential limits and limits of functions.
Proposition 4.1.3. Suppose that f : S R, where S R. Assume that
lim
xa
f(x) = L. Let x
n
be a sequence such that x
n
S, x
n
,= a for all
n N, and x
n
a as n . Then f(x
n
) L as n .
Proof. Let > 0. We can nd > 0 such that if 0 < [x a[ < , then
[f(x) L[ < . Since x
n
a as n , we can nd an N N such that
if n N, then [a
n
a[ < . Since a
n
,= a for all n N, we have that
0 < [x
n
a[ < for all n N, whence [f(x
n
) L[ < for all n N. This
shows that f(x
n
) is a sequence that converges to L, as required.
44
Problem. Can we characterize limits of functions in terms of limits of se-
quences?
Remark 4.1.4. Let f : S R, where S R. Suppose that there exists a
sequence x
n
with lim
n
x
n
= a, x
n
,= a for all n N, and f(x) is dened
for x = x
n
for all n N. If lim
n
f(x
n
) does not exist, then lim
xa
f(x) does
not exist.
Proof. If lim
xa
f(x) exists, f(x
n
) would be convergent by proposition 4.1.3,
a contradiction.
Remark 4.1.5. Let f : S R, where S R. Suppose that there exists two
sequences x
n
and y
n
such that lim
n
x
n
= a; lim
n
y
n
= a; x
n
,= a
and y
n
,= a for all n N; f(x) is dened for x = x
n
and x = y
n
for all n N.
If lim
n
x
n
= L, lim
n
y
n
= M, and L ,= M, then lim
xa
f(x) does not
exist.
Proof. Suppose lim
xa
f(x) exists and equals T. Then by proposition 4.1.3,
L = T = M, a contradiction.
Example 4.7. Let f : R 0 R be given by f(x) =
|x|
x
. We can prove that
lim
x0
f(x) does not exist in another way. For each n N, we let x
n
=
1
n
and y
n
=
1
n
. Then f(x
n
) = 1 and f(y
n
) = 1 for all n N, and there-
fore lim
n
f(x
n
) = 1 and lim
n
f(y
n
) = 1. Since lim
n
x
n
= 0 =
lim
n
y
n
, remark 4.1.5 shows that lim
x0
f(x) does not exist.
Example 4.8. Let f : R R be given by
f(x) =
_
1 if x Q,
1 if x R Q.
What can we say about lim
xa
f(x) for any a R? We note that for any
n N, there exists r
n
Q and s
n
R Q such that r
n
, s
n
,= a and r
n
,
s
n

_
a
1
n
, a +
1
n
_
. This means that lim
n
r
n
= a = lim
n
s
n
. Now
f(r
n
) = 1 for all n N, so that lim
n
f(r
n
) = 1; however, f(s
n
) = 1 for all
n N, so that lim
n
f(s
n
) = 1. Remark 4.1.5 shows that lim
xa
f(x) does
not exist, for any a R.
Example 4.9. Let f : R 0 R be dened by f(x) = sin
_
1
x
_
. We note
that sin
_

2
+ 2k
_
= 1 and sin
_

2
+ 2k
_
= 1. We let x
k
=
1

2
+2k
and
y
k
=
1

2
+2k
, for each k N. We have that x
k
, y
k
,= 0 for all k N and
45
lim
k
x
k
= 0 = lim
k
y
k
; however,
lim
k
f(x
k
) = lim
k
sin
_
1
x
k
_
= sin
_

2
+ 2k
_
= 1
and
lim
k
f(y
k
) = lim
k
sin
_
1
y
k
_
= sin
_

2
+ 2k
_
= 1.
Remark 4.1.5 shows that lim
x0
f(x) does not exist.
Theorem 4.1.6 [Sequential Characterization of Limits]. Suppose that
f : S R, where S R. Also suppose that a R is such that there exists an
open interval I containing a with I a S. Then the following are equivalent.
1. lim
xa
f(x) exists and is equal to L.
2. Every sequence x
n
S such that x
n
,= a for all n N and lim
n
x
n
=
a has the property that f(x
n
) is a sequence that converges to L.
Proof. Proposition 4.1.3 shows that hypothesis (1) implies conclusion (2). To
verify the conversethat is, to show that hypothesis (2) implies conclusion (1),
we will proceed by contradiction. Assume that the hypothesis (2) holds but (1)
fails. That is, assume that (2) holds and there exists an
0
> 0 such that for
every > 0, there exists an x

S that satises
0 < [x

a[ < and [f(x


d
elta) L[
0
.
In particular, for
n
=
1
n
, there exists an x
n
S that satises
0 < [x
n
a[ <
n
=
1
n
and [f(x
n
) L[
0
,
for every n N. Since 0 < [x
n
a[ <
1
n
for every n N, the squeeze theorem
shows that lim
n
[x
n
a[ = 0, whence lim
n
x
n
= a (by remark 3.1.3).
Since [x
n
a[ > 0, x
n
,= a for all n N. Finally, note that [f(x
n
) L[
0
for all n N, and so either f(x
n
) diverges, or it converges to a number other
than L. (Simply set =
0
, and so no N N can be found to satisfy the
implication n N [f(x
n
) L[ <
0
.) Now we are done, for we have arrived
at a contradiction: x
n
here is a sequence in S that converges to a with no
terms equal to a, and yet f(x
n
) does not converge to L, directly contradicting
hypothesis (2).
4.1.3 Arithmetic of Limits of Functions
With the sequential characterization of limits in hand, we can transform results
about limits of sequences into results about limits of functions. In particular,
we have the following arithmetic rules for limits of functions.
46
Proposition 4.1.7. Suppose f, g : S R where S R. Suppose an open in-
terval I containing a R satises I a S. Also assume that lim
xa
f(x) =
L and lim
xa
g(x) = M. Then we have the following.
1. lim
xa
(cf)(x) := lim
xa
c f(x) = c lim
xa
f(x) = cL, for all c R.
2. lim
xa
(f +g)(x) := lim
xa
(f(x)+g(x)) = (lim
xa
f(x))+(lim
xa
g(x))
= L +M.
3. lim
xa
(fg)(x) := lim
xa
(f(x)g(x)) = (lim
xa
f(x)) (lim
xa
g(x)) =
LM.
4. If g(x) ,= 0 for all x S and M ,= 0, then lim
xa
_
f
g
_
(x) := lim
xa
f(x)
g(x)
=
limxa f(x)
limxa g(x)
=
L
M
.
In particular, the above limits of arithmetic combinations all exist.
Proof. We will only verify the second item and leave the rest as an exercise. Sup-
pose x
n
S is any sequence such that x
n
,= a for all n N and lim
n
x
n
=
a, then we have f(x
n
) = L and g(x
n
) = M by the sequential characterization
of limits. By proposition 3.5.1, the sequence (f +g)(x) := f(x) +g(x) con-
verges to L +M. By the sequential characterization of limits again, (f +g)(x)
is a function such that lim
xa
(f + g)(x) = L + M, as required. (Hint: Follow
the outline of the proof just given. Transform these items into statements about
sequences using the sequential characterization of limits and apply proposition
3.5.1.)
Theorem 4.1.8. Suppose f, g : S R where S R. Suppose an open interval
I containing a R satises I a S. Assume that g(x) ,= 0 for all x S.
If lim
xa
g(x) = 0 and lim
xa
f
g
(x) := lim
xa
f(x)
g(x)
exists, then lim
xa
f(x)
exists and equals 0.
Proof. This proof is left as an exercise. (Hint: Transform this theorem into a
statement about sequences using the sequential characterization of limits and
apply theorem 3.5.2.)
Example 4.10. lim
xa
c = c lim
xa
1 = c 1 = c, for any c R.
Example 4.11. Let P(x) be any polynomial. That is, P(x) = c
0
+c
1
x+c
2
x
2
+
+c
n
x
n
. Then by proposition 4.1.7, we have that
lim
xa
P(x) = lim
xa
(c
0
+c
1
x +c
2
x
2
+ +c
n
x
n
)
= lim
xa
c
0
+ lim
xa
c
1
x + lim
xa
c
2
x + + lim
xa
c
n
x
n
= c
0
+c
1
a +c
2
a
2
+ +c
n
a
n
+ .
47
That is, limits of polynomials can be evaluated by direct substitution:
lim
xa
P(x) = P(a).
Example 4.12. A rational function is a function of the form f(x) :=
P(x)
Q(x)
,
where P(x) and Q(x) are polynomials. How do we nd
lim
xa
f(x) = lim
xa
P(x)
Q(x)
?
The following (informal) algorithm can be used to compute this limit.
1. If Q(a) ,= 0, then
lim
xa
f(x) = lim
xa
P(x)
Q(x)
=
lim
xa
P(x)
lim
xa
Q(x)
=
P(a)
Q(a)
.
The last equality is due to the previous example and the second to last
equality is due to proposition 4.1.7.
2. If Q(a) = 0, then the limit does not exist if P(a) ,= 0 (by 4.1.8).
3. If Q(a) = 0 = P(a), then both P(x) and Q(x) can be factored into
Q(x) = (x a)Q
1
(x) and P(x) = (x a)P
1
(x). This means that f(x) =
(xa)P1(x)
(xa)Q1(x)
=
P1(x)
Q1(x)
:= f
1
(x) for any x where Q
1
(x) ,= 0 and x ,= a; i.e.,
f(x) and f
1
(x) have the same limiting behaviour at x = a. The limit of
f
1
(x) as x approaches a is the same as the limit of f(x) as x approaches
aeither they both do not exist, or they are equal. Go back to step (1)
with the function f
1
(x) =
P1(x)
Q1(x)
and repeat this process until the limit is
found.
This process must eventually terminate because each polynomial has a nite de-
gree and each cycle through this algorithm reduces the degree of the numerator
and the denominator by one.
4.1.4 Squeeze Theorem for Functions
Example 4.13. Consider f(x) = xsin
_
1
x
_
. Does lim
x0
f(x) exist? Observe
that

sin
_
1
x
_

1 for all x R 0, and so


xsin
_
1
x
_

[x[ for all x R 0.


This means that [x[ xsin
_
1
x
_
[x[, for all x R 0. Let > 0. Set
= ; if 0 < [x[ < = , then < [x[ xsin
_
1
x
_
[x[ < , which implies

xsin
_
1
x
_

0[ < . This shows that lim


x0
xsin
_
1
x
_
= 0.
The above example suggests that there is a squeeze theorem for functions.
In fact, there is.
48
Theorem 4.1.9 [Squeeze Theorem for Functions]. Suppose that f, g,
h : S R with S R. Suppose a R is such that there is a > 0 with
(a , a +) a S. If
g(x) f(x) h(x)
for all x (a , a +) a and if
lim
xa
g(x) = L = lim
xa
h(x),
then lim
xa
f(x) exists and
lim
xa
f(x) = L.
Proof. This proof is left as an exercise. (Hint: Use the sequential characteriza-
tion of limits to transform this theorem into a statement about sequences and
apply the squeeze theorem for sequences.)
Example 4.14. Consider f(x) = xsin
_
1
x
_
again. We have already shown that
[x[ xsin
_
1
x
_
[x[
for all x R 0. Since lim
x0
[x[ = 0 = lim
x0
[x[, the squeeze theorem
for functions shows that lim
x0
f(x) = lim
x0
xsin
_
1
x
_
exists and equals 0.
Problem. What is lim
x0
sin(x)
x
?
4.1.5 One-Sided Limits
Note that lim
x0
does not exist; but if we allow x to approach 0 with the
assumption that x > 0, then f(x) approaches 1. Similarly, if we only allow
x < 0, then as x approaches 0, we have that f(x) approaches 1. Here is the
formal denition.
Definition 4.1.10 [One-Sided Limits]. Suppose that f : S R with S R.
1. Suppose that some a R and c > 0 satises (a, a+c) S. We say that L
is the limit as x approaches a from above (or from the right) if for every
> 0, there exists a > 0 such that if 0 < x a < and x S, then
[f(x) L[ < . In this case, we will write lim
xa
+ f(x) = L; otherwise,
we will write lim
xa
+ f(x) does not exist.
2. Suppose that some a R and c > 0 satises (ac, a) S. We say that L
is the limit as x approaches a from below (or from the right) if for every
> 0, there exists a > 0 such that if 0 < a x < and x S, then
[f(x) L[ < . In this case, we will write lim
xa
f(x) = L; otherwise,
we will write lim
xa
f(x) does not exist.
49
Example 4.15. We can easily show that lim
x0
+

x = 0, and that lim


x0

x
does not exist.
Theorem 4.1.11. Suppose f : S R with S R. Suppose an open interval I
containing a R satises I a S. Then lim
xa
f(x) exists and equals L if
and only if both lim
xa
+ f(x) and lim
xa
f(x) exist and lim
xa
+ f(x) = L =
lim
xa
f(x).
Proof. Assume that lim
xa
f(x) = L. Let > 0. Then there exists a > 0
such that if 0 < [x a[ < , then [f(x) L[ < . In particular, if 0 < x a < ,
then [f(x) L[ < ; hence, lim
xa
+ f(x) = L. Similarly, if x < a x < , then
[f(x) L[ < ; hence lim
xa
f(x) = L.
Conversely, assume that lim
xa
+ f(x) = L = lim
xa
f(x). Given > 0,
we can nd
1
and
2
such that
1. if 0 < x a <
1
, then [f(x) L[ < , and
2. if 0 < a x <
2
, then [f(x) L[ < .
Let = min
1
,
2
. Assume that 0 < [x a[ < . If x > a, then 0 < [x a[ =
x a < d
1
implies that [f(x) L[ < . If x < a, then 0 < [x a[ = a x <
d
2
implies that [f(x) L[ < . This shows that 0 < [x a[ < implies
[f(x) L[ < , whence lim
xa
f(x) = L. This completes the proof.
As expected, there is a sequential characterization of one-sided limits.
Theorem 4.1.12 [Sequential Characterization of One-Sided Limits].
Suppose that f : S R with S R.
1. Suppose that a R and c > 0 satises (a, a+c) S. Then lim
xa
+ f(x) =
L if and only if whenever x
n
S is a sequence such that lim
n
x
n
=
a, x
n
> a for all n N, we have lim
n
f(x
n
) = L.
2. Suppose that a R and c > 0 satises (ac, a) S. Then lim
xa
f(x) =
L if and only if whenever x
n
S is a sequence such that lim
n
x
n
=
a, x
n
< a for all n N, we have lim
n
f(x
n
) = L.
Proof. This proof is left as an exercise.
Once again, we can use this sequential characterization to transform results
about sequences into one-sided limits.
Proposition 4.1.13. All of the arithmetic rules for limits of functions hold for
each one-sided limit.
Proof. This proof is left as an exercise.
50
Theorem 4.1.14 [One-Sided Squeeze Theorem]. Suppose that f, g, h :
S R with S R.
1. Suppose that a R and c > 0 satises (a, a + c) S. Also suppose that
lim
xa
+ g(x) = L = lim
xa
+ h(x) and that g(x) f(x) h(x) for all
x (a, a +c). Then lim
xa
+ f(x) exists and equals L.
2. Suppose that a R and c > 0 satises (a c, a) S. Also suppose that
lim
xa
g(x) = L = lim
xa
h(x) and that g(x) f(x) h(x) for all
x (a c, a). Then lim
xa
f(x) exists and equals L.
Proof. This theorem is left as an exercise.
Definition 4.1.15. Suppose S R is such that x S if and only if x S and
is nonempty. A function f : S R (S R) is said to be even if f(x) = f(x)
for all x S.
The graph of an even function is symmetric about the y-axis.
Example 4.16. f(x) = x
2
, g(x) = cos(x), and h(x)
sin(x)
x
are all even functions.
Note. If f(x) is even, then lim
x0
+ f(x) and lim
x0
f(x) will either both not
exist or both will exist and be equal.
Definition 4.1.16. Suppose S R is such that x S if and only if x S and
is nonempty. A function f : S R (S R) is said to be odd if f(x) = f(x)
for all x S.
The graph of an odd function is rotationally symmetric about the origin.
Example 4.17. f(x) = x
3
, g(x) = sin(x), and h(x) =
cos(x)
x
are all odd func-
tions.
Note. If f(x) is odd, then lim
x0
f(x) = L if and only if lim
x0
+ f(x) =
lim
x0
f(x) = L and L = 0.
4.1.6 Fundamental Trigonometric Limit
In this section, we will prove the fundamental trigonometric limit, or simly the
fundamental trig limit. Before we do so, note that
lim
0
cos() = lim
0
+
cos() = 1
51
and
lim
0
sin() = lim
0
+
sin() = 0.
Here, should be interpreted as the angle measured counterclockwise from the
positive x-axis. With each , identify a point P = (x

, y

) on the unit circle as


in gure 4.1. We should interpret sin() and cos() as functions that identify
each with the quantities y

and x

, respectively.
x
y

1
(
x

, y
)
= (cos( ), sin( ))
Figure 4.1: Interpretation of sin() and cos().
Theorem 4.1.17 [Fundamental Trigonometric Limit]. lim
0
sin()

ex-
ists and equals 1.
Proof. Let f() =
sin()

. Then f() =
sin()

=
sin()

=
sin()

= f().
This shows that f() is even. Thus, lim
0
f() exists and equals 1 if and
only if lim
0
+ f() exists and equals 1; therefore, we need only show that
lim
0
+ f() = lim
0
+
sin()

exists and equals 1. So suppose > 0 and



2
.
From gure 4.2, we have that the area of region R
1
is
A
1
=
1
2
cos() sin(),
the area of region R
2
is
A
2
=
_

2
_
((1)
2
) =

2
,
and the area of region R
3
is
A
3
=
1
2
(1)(tan()) =
sin()
2 cos()
.
52
x
y

1
(cos( ), sin( ))
R
1
x
y

1
R
2
x
y

1
R
3
Figure 4.2: Inequality of three dierent areas.
We also have, from gure 4.2, that A
1
A
2
A
3
. That is, if 0 <

2
, we
have that
(4.1)
cos() sin()
2


2

sin()
2 cos()
.
Multiplying (4.1) by
2
sin()
(which is positive) yields
(4.2) cos()

sin()

1
cos()
for 0 <

2
. Note that all three quantities in 4.2 are positive if 0 <

2
.
Finally, taking reciprocals, we have that
(4.3)
1
cos()

sin()

cos()
if 0 <

2
. But
lim
0
+
1
cos()
=
1
lim
0
+ cos()
=
1
1
= 1 = lim
0
+
cos(),
and hence by the one-sided squeeze theorem, lim
0
+
sin()

exists and equals 1.


This completes the proof.
Example 4.18. We will evaluate lim
0
tan()

. Note that
lim
0
tan()

= lim
0
sin()


1
cos()
=
_
lim
0
sin()

_
lim
0
1
cos()
_
= 1 1 = 1.
53
Note. If is small, then sin() tan().
Example 4.19. We will nd lim
0
sin(3)
tan()
. Note that
lim
0
sin(3)
tan()
= lim
0
_
sin(3)
3
__

tan()
_

=
3

_
lim
0
sin(3)
3
__
lim
0

tan()
_
,
since both limits on the right exist and equal 1. To see this, note that
lim
0
sin(3)
3
= lim
x0
sin(x)
x
= 1
since x := 3 0 as 0 and that
lim
0

tan()
=
1
lim
y0
tan(y)
y
=
1
1
= 1
since y := 0 as 0. Hence lim
0
sin(3)
tan()
=
3

1 1 =
3

.
Example 4.20. We will nd lim
h0
cos(h)1
h
. By multiplying with a conjugate
cos(h) + 1, we have that
_
cos(h) 1
h
_

_
cos(h) + 1
cos(h) + 1
_
=
cos
2
(h) 1
h(cos(h) + 1)
=
sin
2
(h)
h(cos(h) + 1)
.
Hence,
lim
h0
cos(h) 1
h
=
_
lim
h0
sin(h)
h
_

_
lim
h0
sin(h)
cos(h) + 1
_
= (1)
_
0
1 + 1
_
= 0,
since both limits exist.
4.2 Continuity
In this section, we will explore one of the most important properties of functions:
continuity. Firstly, here is the formal denition of continuity.
4.2.1 Some Denitions
Definition 4.2.1 [Continuity]. Suppose f : S R, where S R. Suppose
also that an open interval I S contains some a S. We say that f(x) is
continuous at x = a if
1. lim
xa
f(x) exists, and
2. lim
xa
f(x) = f(a).
54
Otherwise, we say that f(x) is discontinuous at x = a.
Here is an alternate denition of continuity. We can easily show that these
two denitions are equivalent.
Definition 4.2.2 [Continuity]. Suppose f : S R, where S R. Suppose
also that an open interval I S contains some a S. We say that f(x)
is continuous at x = a if for every > 0, there exists a > 0 such that if
[x a[ < and x S, then [f(x) f(a)[ < . Otherwise, we say that f(x) is
discontinuous at x = a.
Here is a denition that is useful when talking about points of discontinuity.
Definition 4.2.3. Suppose f : S R, where S R. Also suppose that some
a R and c > 0 satises (a, a + c) S. We say that f(x) approaches as x
approaches a from above (from the right) if for every M > 0, there exists a > 0
such that if 0 < x a < and x S, then f(x) > M. In this case, we will still
write lim
xa
+ f(x) = , even though the limit does not exist. In a similar way,
we can dene lim
xa
f(x) = , lim
xa
+ f(x) = , lim
xa
f(x) = ,
lim
xa
f(x) = , and lim
xa
f(x) = .
4.2.2 Types of Discontinuities
Suppose that f : S R, where S R, and that I S is an open interval
containing some a S. Before we look at points of continuity, we will look at
how the function f can fail to be continuous at the point a. One way is for f
to have no limit as x approaches a. Another way is for f to have a limit as x
approaches a, but the value f(a) is not equal to that limit. Below is a summary
of these various types of discontinuities.
1. Removable discontinuity. Suppose that lim
xa
f(x) exists but
lim
xa
f(x) ,= f(a)
or f(x) is not dened at x = a; then f(x) is discontinuous at x = a. For
example, f(x) =
x
2
1
x1
is discontinuous at x = 1 because f(x) is not dened
at x = 1 even though lim
x1
f(x) exists and equals lim
x1
x + 1 = 2. In
the general case, we could simply dene
g(x) =
_
f(x) if x ,= a,
lim
xa
f(x) if x = a.
The function g removes the discontinuity of f at x = a by redening
the value of f at x = a to be lim
xa
f(x); also, g(x) = f(x) everywhere
else. In our particular example, we will dene
g(x) =
_
x
2
1
x1
if x ,= 1,
2 if x = 1.
55
Then g(x) is continuous at x = a for all a R.
2. Essential discontinuity. We say that f(x) has an essential disconti-
nuity at x = a if lim
xa
f(x) does not exist.
Jump discontinuity. If lim
xa
+ f(x) = L and lim
xa
f(x) = M
both exist but not equal, then we say that f(x) has a (nite) jump
discontinuity at x = a. We call L M the jump of f(x) at x = a.
For example, if
f(x) =
[x[
x
=
_
1 if x > 0,
1 if x < 0,
then lim
x0
+ f(x) = 1 and lim
x0
f(x) = 1, and so f(x) has a
jump discontinuity at x = a with jump 1 1 = 2.
Vertical asymptote. We say that x = a is a vertical asymptote
for f(x) if lim
xa
+ f(x) = or lim
xa
f(x) = (or both).
For example, if f(x) =
1
x
, then x = 0 is a vertical asymptote for f(x)
since lim
x0
+ f(x) = (and lim
x0
f(x) = ).
Oscillatory discontinuity. If f(x) is discontinuous at x = a for
any other reason, then x = a is called a point of oscillatory discon-
tinuity for f(x). For example, if f(x) = sin
_
1
x
_
, then x = 0 is not a
removable or jump discontinuity and it is not a vertical asymptote;
therefore, it is a point of oscillatory discontinuity for f(x).
Problem. Given a function f(x), what can we say about the possible set of
points of discontinuity?
Example 4.21. Let
f(x) =
_
1 if x Q,
1 if x R Q.
Then f(x) is never continuous (since lim
xa
f(x) does not exist for any a R).
4.2.3 Sequential Characterization of Continuity
Not surprisingly, there is a sequential characterization of continuity.
Theorem 4.2.4 [Sequential Characterization of Continuity]. Suppose
f : S R with S R. Also suppose that an open interval I S contains some
point a S. Then the following are equivalent.
1. f(x) is continuous at x = a.
56
2. Whenever x
n
S is a sequence that converges to a, we have that
f(x
n
) is a sequence converging to f(a).
Proof. We shall rst show that hypothesis (1) implies conclusion (2). So assume
that f(x) is continuous at x = a. Let x
n
S be any sequence that converges
to a. Let > 0. Then there exists a > 0 such that if [x a[ < and x S,
then [f(x) f(a)[ < . But since x
n
a as n , there exists an N N
such that if n N, then [x
n
a[ < . Hence for all n N, x
n
S and
[x a[ < , whence [f(x
n
) f(a)[ < . This shows that f(x
n
) is a sequence
that converges to f(a).
We want to show that the converse holds as well. That is, we want to show
that hypothesis (2) implies conclusion (1); we will proceed by contradiction.
Suppose (2) holds but condition (1) fails. That is, there exists an
0
> 0
such that for every > 0, there exists x

S with [x
d
elta a[ < and
[f(x

) f(a)[
0
. In particular, for
n
=
1
n
, there exists x
n
S with
[x
n
a[ <
n
=
1
n
and [f(x
n
)f(a)[
0
, for each n N. Since 0 [x
n
a[ <
1
n
and lim
n
1
n
, the squeeze theorem shows that lim
n
[x
n
a[ = 0, whence
lim
n
x
n
= a. But on the other hand, f(x
n
) is a sequence that does not
converge to f(a), since [f(x
n
) f(a)[
0
for all n N (simply set =
0
and
no N could be found to satisfy convergence). Now we are done, for we have
arrived at a contradiction: x
n
S is a sequence that converges to a and
yet f(x
n
) is a sequence that does not converge to f(a), directly contradicting
hypothesis (2). This shows that (2) implies (1) and the proof is complete.
Example 4.22. Let
f(x) =
_

_
0 if x R Q,
1
n
if x =
m
n
, m Z, n N, gcd(m, n) = 1, m ,= 0,
1 if x = 0.
For any R, there exists a sequence x
n
R Q with lim
n
x
n
=
due to the density of irrationals. Since f(x
n
) = 0 for every n N, the
sequential characterization of continuity shows that f(x) is continuous at x =
if f(x
n
) = 1 converges to 0 = f(). This shows that f(x) must vanish at
all points of continuity, and hence f(x) is discontinuous at x = r for all r Q.
Remark 4.2.5. The function dened in example 4.22 is continuous at x =
for every R Q.
Proof. This proof is left as a homework exercise. (Hint: We have already shown
that f(x) is discontinuous on the rationals. To show that f(x) is continuous on
the irrationals, show that the set E

=
_
n N :

n
K

<
_
is a nite set for
any R Q, > 0, and K N. This shows that any sequence that converges
to must eventually have larger and larger denominators, whence f applied to
those rational numbers becomes close to 0.)
57
Diversion. Given E R, does there exist a function f : R R such that
D(f) = E, where D(f) := x R : f(x) is discontinuous at x? In particular,
can D(f) = R Q?
Hint: If we let
E
n
=
_
x R : for every > 0, there exists y

, z

(x , x +)
such that [f(y

) f(x

)[
1
n
_
,
Then D(f) =

n=1
E
n
.
Once again, as a consequence of the sequential characterization, we can
transform results about sequences into results about continuity of functions.
Theorem 4.2.6. Suppose that f, g : S R, where S R. Also suppose that
an open interval I S contains some a S. If f, g are both continuous at
x = a, then we have the following.
1. (cf)(x) := c f(x) is continuous at x = a for all c R.
2. (f +g)(x) := f(x) +g(x) is continuous at x = a.
3. (fg)(x) := f(x)g(x) is continuous at x = a.
4. If g(x) ,= 0 for all x S (in particular, for x = a), then
_
f
g
_
(x) :=
f(x)
g(x)
is continuous at x = a.
Proof. This proof is left as an exercise.
4.2.4 Composition of Functions
Definition 4.2.7. Assume that ran(f) dom(g); that is, suppose f, g are
functions such that f : S R and g : T R, where S R and ran(f) :=
f(S) T R. We can now dene a new function h : S R by h(x) :=
(g f)(x) := g(f(x)). This new function h = g f is called the composition of
g with f. Note that we apply f rst, then the result is substituted into g.
The requirement that ran(f) := f(S) T simply says that g must be dened
on all the possible outputs of f, so that g(f(x)) is always dened for any x S.
Example 4.23. Let f(x) = x
2
and g(x) = sin(x). Then dom(f) = R, ran(f) =
y R : y 0, and dom(g) = R. Since ran(f) dom(g), we can dene
(g f)(x) := g(f(x)) = sin(f(x)) = sin(x
2
).
Note. It does not follow that (f g)(x) = (g f)(x).
58
Example 4.24. Let f(x) = [x[ 1, g(x) =

x. Then dom(f) = R, ran(f) =
y R : y 1, and dom(g) = x R : x 0. Since ran(f) , dom(g),
we cannot dene the composition (g f)(x). (It would look like g(f(x)) =
_
[x[ 1, which is never dened.) But since ran(g) dom(f), we can dene
(f g)(x) := f(g(x)) = [

x[ 1 =

x 1, dened on dom(g) = x
R : x 0.
Example 4.25. Let f(x) =

x, g(x) = x
2
. Then dom(f) = x R : x 0,
ran(f) y R : y 0, dom(g) = R, and ran(g) y R : y 0. Since
ran(g) dom(f), we can dene (g f)(x) :=

x
2
= [x[, with dom(g f) = R.
Since ran(f) dom(g), we can also dene (f g)(x) := (

x)
2
= x on its domain
dom(f g)(x) = x R : x 0.
Theorem 4.2.8. Suppose that f : S R and g : T R, where S R and
f(S) T R. Let a S (and so f(a) T). Suppose that open intervals I S
and J T contain a and f(a), respectively. Assume that f(x) is continuous at
x = a and g(y) is continuous at y = f(a). Then h(x) := (g f)(x) is continuous
at x = a.
Proof. Assume that x
n
S is any sequence such that x
n
a as n
. Then since f(x) is continuous at x = a, the sequential characterization of
continuity shows that f(x
n
) is a sequence such that f(x
n
) f(a) as n .
Moreover, f(x
n
) f(S) T. Now since g(y) is continuous at y = f(a), the
sequential characterization of continuity shows that g(f(x
n
)) is a sequence
such that g(f(x
n
)) g(f(a)) as n . The sequential characterization of
continuity applied once more shows that h(x) := (g f)(x) is continuous at
x = a.
Corollary 4.2.9. Suppose that f : S R and g : T R, where S R and
f(S) T R. Also suppose that I S is an open interval containing some
a S and that J T is an open interval containing f(a). If f(x) is continuous
at x = a and g(y) is continuous at y = f(a), then
lim
xa
(g f)(x) := lim
xa
g(f(x)) = lim
yf(a)
g(y).
Proof. By theorem 4.2.8, (g f)(x) is continuous at x = a, and so
lim
xa
(g f)(x) := lim
xa
g(f(x)) = g(f(a)).
Because g(y) is continuous at y = f(a), we also have that
lim
yf(a)
g(y) = g(f(a)).
The result follows.
59
Remark 4.2.10. For certain special functions, we have the following.
1. A polynomial P(x) = a
0
+a
1
x +a
2
x
2
+ +a
n
x
n
is continuous at x = a
for each a R. (lim
xa
P(x) = P(a) by example 4.11.)
2. A rational function f(x) =
P(x)
Q(x)
where P(x) and Q(x) are polynomials is
continuous at x = a for every a R where Q(a) ,= 0.
3. sin(x) and cos(x) are continuous for each point in R. It is known that
[ sin(x) sin(y)[ [x y[ and [ cos(x) cos(y)[ [x y[.
4. f(x) = e
x
is continuous at x = a for all a R.
5. f(x) = ln(x) is continuous at x = a for all a > 0.
Proof. This proof is omitted.
Example 4.26. Let f : R R be dened by f(x) =
e
sin(x)
cos
2
(x)+1
. It is continuous
by 4.2.8 along with remark 4.2.10 (1), (2), (3), and (4).
4.2.5 Continuity on an Interval
Question: Is f(x) =

x continuous on its domain?
Formally, f(x) =
_
(x) cannot be continuous at x = 0 in the sense of our
denition of continuity at a point; however, it behaves continuously at x = 0.
Note that lim
x0

x is not dened since there is no open interval around 0 on


which the square root function is dened, and so it cannot have a limit at x = 0.
It does, however, have the one-sided limit lim
x0
+ f(x) = lim
x0
+

x = 0 =
f(0). Therefore, f(0) reects the behaviour of f(x) near x = 0 but with x 0;
this is the essence of continuity. One could say that f(x) is continuous from
above at x = 0. With this example as a motivation, we are now ready to
expand our denition of continuity to intervals.
Definition 4.2.11. Suppose that f : S R with S R. Suppose (a, b) S,
a < b. We say that f(x) is continuous on (a, b) if f(x) is continuous (in the
usual sense) at x = c for every c (a, b). Now suppose [a, b] S, a < b. We say
that f(x) is continuous on [a, b] if
1. f(x) is continuous on (a, b),
2. lim
xa
+ f(x) exists and equals f(a), and
3. lim
xb
f(x) exists and equals f(b).
We can similarly dene continuity on intervals of the form [a, b) and (a, b].
The following sequential characterization is true regardless what kind of
interval I is.
60
Theorem 4.2.12 [Sequential Characterization of Continuity on In-
tervals]. Suppose f : S R with S R. Suppose also that I S is a
nondegenerate interval. Then the following are equivalent.
1. f(x) is continuous on I.
2. If x
n
S is any sequence that converges to x
0
with x
0
I, then f(x
n
)
is a sequence that converges to f(x
0
).
Proof. This proof is left as an exercise.
An obvious application is the following.
Theorem 4.2.13. Suppose that f, g : S R, where S R, and that a non-
degenerate interval I S. If f, g are both continuous on I, then we have the
following.
1. (cf)(x) := c f(x) is continuous on I, for all c R.
2. (f +g)(x) := f(x) +g(x) is continuous on I.
3. (fg)(x) := f(x)g(x) is continuous on I.
4. If g(x) ,= 0 for all x S (in particular, for x = a), then
_
f
g
_
(x) :=
f(x)
g(x)
is continuous on I.
Continuity at a point is dierent from continuity on an interval (global conti-
nuity) are slightly dierent concepts. On a global scale, the concept of continuity
should not be restricted to such a level that functions, even those that behave
as continuously as it possibly could, are classied as discontinuous at points
where they should have been labeled continuous due to a simple, unimportant
technical reason. Our denition of this global continuity allows well-behaved
functions, such as

x, to be continuous even at endpoints of their domain.
4.3 Intermediate Value Theorem
4.3.1 The Theorem
Problem. Suppose we want to solve f(x) = 0. (For example, let f(x) =
x
5
+x +.) The rst step might be to show that a solution exists!
Theorem 4.3.1 [Intermediate Value Theorem]. Suppose that f : S R
with S R. Suppose also that [a, b] S with a < b. If f(x) is continuous on
[a, b] and f(a) < 0 < f(b), then there exists c [a, b] such that f(c) = 0.
61
Proof. Let E = x [a, b] : f(x) 0. Then E is nonempty since a S; E
is bounded above by denition. Therefore, E has a least upper bound c by the
least upper bound principle. Since c = sup E, there exists a sequence x
n
in
E with x
n
c as n . Since c [a, b] and f(x) is continuous on [a, b], the
sequential characterization of continuity on intervals shows that f(x
n
) is a
sequence converging to f(c). But since f(x
n
) 0 for all n N, we have that
f(c) = lim
n
f(x
n
) 0.
In particular, c ,= b because f(b) > 0. Now consider the nonempty interval
(c, b]. For each n N, let y
n
= min
_
c +
1
n
, b
_
. Then y
n
(c, b] and so y
n
, E,
for all n N. Since y
n
c 1n for n N, the squeeze theorem (along with
remark 3.1.3) shows that y
n
c as n . But since f(x) is continuous on
[a, b], f(y
n
) is a sequence that converges to f(c). Finally, since for each n N,
f(y
n
) > 0, we have that
f(c) = lim
n
f(y
n
) 0.
This shows that f(c) = 0 and the proof is complete.
This is an extremely important results about real-valued functions of real
numbers. In fact, it is known that the least upper bound property, the mono-
tone convergence theorem, the Bolzano-Weierstrass theorem, the completeness
theorem, and the intermediate value theorem are all logically equivalent to each
other. They all describe essentially the same fundamental property of the real
numbers: completeness.
Example 4.27. Let f(x) = x
5
+ x + . Then f(x) is continuous on [2, 0]
since it is a polynomial. Since f(2) < 0 and f(0) > 0, the intermediate value
theorem shows that there exists a c [2, 0] (in fact, in (2, 0)) with f(c) = 0.
Example 4.28. We will show that there exists an x
0
R with cos(x
0
) = x
0
.
Let h(x) = x cos(x). We have that h(0) = 0 cos(0) = 1 and h
_

2
_
=

2
cos
_

2
_
=

2
. Since h(0) < 0 and h
_

2
_
> 0, and h(x) is continuous
on
_
0,

2

by remark 4.2.10 and theorem 4.2.6, the intermediate value theorem


shows that there exists an x
0

_
0,

2

such that h(x


0
) = x
0
cos(x
0
) = 0, or
cos(x
0
) = x
0
, as required.
We can slightly generalize the intermediate value theorem into the following
corollary, which we will also refer to as the intermediate value theorem.
Corollary 4.3.2 [Intermediate Value Theorem]. Suppose that f : S R
with S R. Suppose also that [a, b] S with a < b. If f(x) is continuous on
62
[a, b] and lies between f(a) and f(b)that is,
f(a) < < f(b) or f(a) > > f(b),
then there exists c [a, b] such that f(c) = .
Proof. Suppose rst that f(a) < < f(b). Let h(x) = f(x) . Then h(a) =
f(a) < 0 and h(b) = f(b) > 0. The intermediate value theorem shows
that there is a c with h(c) = f(c) = 0, or f(c) = , as required. The proof
for the case where f(a) > > f(b) is similar and is omitted.
Geometrically, the intermediate value theorem shows that f([a, b]) is an in-
terval (with no gaps). The range of a continuous function on an interval is still
an intervalbut is f([a, b]) a closed interval? We will answer this question in
the next section.
Diversion. Let a
n
be a sequence recursively dened as follows: a
1
= 1,
a
n+1
= cos(a
n
). Then it can be shown that a
n
converges to some real number
L. But since a
n+1
= cos(a
n
), cos(a
n
) L also as n . But cos(x) is
a continuous function on all of R, so cos(a
n
) converges to cos(L) by the
sequential characterization of continuity, and hence cos(L) = L (by theorem
3.1.4).
Problem. Suppose that a monk leaves a monastery at 7:00 a.m. and walks
along a path to a building at the top of a mountain. The monk arrives at 7:00
p.m. The next day, the monk leaves at 7:00 a.m. and takes the same path back
to the monastery. If the monk arrives at the monastary again at 7:00 p.m., show
that there exists at least one time t between 7:00 a.m. and 7:00 p.m. so that at
time t, the monk was at the same location both days.
Solution. Let s(t) be the monks distance from the monastery t hours after
7:00 a.m. on day one, and let r(t) be the monks distance from the monastery
t hours after 7:00 a.m. on day two. Also suppose that the path from the
monastery to the building at the top of the mountain has length d > 0. We
know that s(0) = 0, s(12) = d, r(0) = d, and r(12) = 0. Let H(t) = s(t) r(t).
Then H(0) = s(0) r(0) = d < 0 and H(12) = s(12) r(12) = d > 0. Now
s(t) and r(t) are both continuous on [0, 12] (a reasonable physical assumption,
since the monk cannot teleport), and so H(t) is also continuous on [0, 12]. The
intermediate value theorem shows that there exists a time c [0, 12] at which
H(c) = s(c)r(c) = 0, or s(c) = r(c). Moreover, since H(0) ,= 0 and H(12) ,= 0,
c (0, 12), as required.
4.3.2 Solving Equations Numerically
Example 4.29. We will try to nd an x
0

_
0,

2
_
such that cos(x
0
) = x
0
,
guaranteed to exist by the intermediate value theorem. So consider h(x) =
63
x cos(x). Let a
0
= 0 and b
0
=

2
. Let d
0
=
a0+b0
2
=

4
. Now
h(d
0
) = h
_

4
_
=

4

2
2
> 0.
Since h(a
0
) < 0 and h(d
0
) > 0, the intermediate value theorem shows that we
have a solution x
0

_

8
,

4
_
. We can repeat this process by setting a
1
=

8
,
b
1
=

4
, and d
1
=
a1+b1
2
=
3
16
. This leads to the binary search algorithm for
nding zeroes.
Here is the binary search algorithm that we can use to approximate zeros of
continuous functions on an interval to any desired accuracy. So suppose that
f : S R with S R, that f(x) is continuous on [a, b] S (a < b), and that
f(a)f(b) < 0. Set a
0
= a, b
0
= b. Let > 0 be the maximum error or tolerance.
In the kth step of the algorithm, starting with k = 0, we do the following.
1. Calculate d
k
=
a
k
+b
k
2
.
2. If f(a
k
)f(b
k
) < 0, set b
k+1
= d
k
and a
k+1
= a
k
.
3. Else, set a
k+1
= d
k
and b
k+1
= b
k
.
4. Repeat so long as the error b
k+1
a
k+1
.
Note. After n applications, the maximum potential error in using d
n
=
an+bn
2
to approximate the solution for f(x) = 0 is
b0a0
2
n
.
4.4 Extreme Value Theorem
Problem. Given a function f(x) dened on a nondegenerate interval I, do
there exist points c
1
, c
2
I such that f(c
1
) f(x) f(c
2
) for all x I?
Definition 4.4.1. Suppose that f : S R, where S R, and that I S is
a nondegenerate interval. We say that c is a global maximum for f(x) on I if
c I and f(x) f(c), for all x I. We say that c is a global minimum for f(x)
on I if c I and f(x) f(c), for all x I. We say that c is a global extremum
for f(x) on I if it is either a global maximum or a global minimum for f(x) on
I.
Example 4.30. Let f(x) =
1
x
. Then f(x) has no global maximum or global
minimum on (0, 1).
Example 4.31. Let f(x) =
1
x
. Then f(x) has no global maximum on (0, 1],
but f(x) has a global minimum on (0, 1] at x = 1.
64
Example 4.32. Let f(x) =
1
x
. Then f(x) has a global maximum on
_
1
2
, 1

at
x =
1
2
and a global minimum on
_
1
2
, 1

at x = 1.
We now come to the main theorem.
Theorem 4.4.2 [Extreme Value Theorem]. Suppose f : S R, where
S R. Suppose [a, b] S and f(x) is continuous on [a, b]. There exist two
numbers c
1
and c
2
[a, b] such that f(c
1
) f(x) f(c
2
) for all x [a, b].
This theorem states that a function f(x), continuous on a closed interval
[a, b], attains both its global maximum and global minimum on [a, b].
Proof. We will rst show that f(x) is bounded on [a, b]. That is, there exists
an M > 0 such that [f(x)[ M for all x [a, b]. Suppose to the contrary that
f(x) is not bounded on [a, b]. That is, for each n N, we can nd x
n
[a, b]
such that [f(x
n
)[ > n. This denes a (bounded) sequence x
n
[a, b]. By the
Bolzano-Weierstrass theorem, x
n
has a subsequence x
n
k
that converges to
some x
0
[a, b]. Since f(x) is continuous on [a, b], theorem 4.2.12 shows that
f(x
n
k
) is a sequence that converges (to f(x
0
)). But this is impossible, since
[f(x
n
k
)[ > n
k
k for all k N, making f(x
n
k
) an unbounded and hence a
divergent sequence. We conclude that f(x) is bounded on [a, b].
Let L = sup f([a, b]), l = inf f([a, b]) (both exist and are nite since we have
shown in the previous paragraph that f([a, b]) is a bounded set). For any n N,
we can nd y
n
, z
n
[a, b] with
(4.4) L
1
n
< f(y
n
) L
and
(4.5) l f(z
n
) < l +
1
n
.
This denes two (bounded) sequences x
n
and y
n
[a, b]. By the Bolzano-
Weierstrass theorem, y
n
has a subsequence y
nj
that converges to some c
1

[a, b]. Since f(x) is continuous on [a, b], theorem 4.2.12 shows that f(y
nj
) is a
sequence that converges to f(c
1
). But we also have that f(y
nj
) converges to
L by inequality (4.4), and so f(c
1
) = L (by theorem 3.1.4), the global maximum
of f(x) on [a, b]. Similarly, z
n
has a subsequence z
n
k
that converges to some
c
2
[a, b], whence by continuity, f(z
n
k
) is a sequence that converges to f(c
2
).
But by inequality (4.5), f(z
n
k
) converges to l, and so f(c
2
) = l, the global
minimum of f(x) on [a, b]. This completes the proof.
So in particular, if f(x) is continuous on [a, b], f([a, b]) is a closed interval.
65
4.5 Uniform Continuity
Assume that f : S R, where S R, is continuous on a nondegenerate interval
I S. Then for every a I and > 0, we can nd a > 0 such that
(4.6) if [x a[ < and x I, then [f(x) f(a)[ < .
Note. Generally, given an > 0, the choice of depends on both and the
focal point x = a.
If this can be chosen independent of the focal point a, then for each > 0,
there is a > 0 that will make 4.6 hold for all a I. That is, there is a > 0
that will make the function change by only a small amount regardless where x
is.
Definition 4.5.1 [Uniform Continuity]. Suppose that f : S R, where
S R, and that I S is a nondegenerate interval. We say that f(x) is
uniformly continuous on I if for every > 0, there exists a > 0 such that if
[x y[ < and x, y I, then [f(x) f(y)[ < .
Example 4.33. Let f(x) = 3x + 7, I = R. Given > 0, if [x y[ <

3
, then
[f(x) f(y)[ = [(3x + 7) (3y + 7)[ = [3x 3y[ = 3[x y[ < 3

3
= . This
shows that f(x) is uniformly continuous on R.
Remark 4.5.2. If f : R R is dened by f(x) = mx+b, then f(x) is uniformly
continuous on R.
Proof. This proof is left as an exercise. Simply reproduce the proof in example
4.33.
The following theorem should not be a surprise.
Theorem 4.5.3 [Sequential Characterization of Uniform Continu-
ity]. Suppose that f : S R, where S R, and that I S is a nondegenerate
interval. Then the following are equivalent.
1. f(x) is uniformly continuous on I.
2. Whenever x
n
, y
n
are sequences in I that satisfy lim
n
[x
n
y
n
[ = 0,
we have that f(x
n
) f(y
n
) is a sequence that satises lim
n
[f(x
n
)
f(y
n
)[ = 0.
Proof. We will rst show that hypothesis (1) implies conclusion (2). Suppose (1)
holds, and suppose x
n
, y
n
I are sequences such that lim
n
[x
n
y
n
[ =
0. Let > 0. There exists > 0 such that if [x y[ < and x, y I, then
66
[f(x) f(y)[ < . Since lim
n
[x
n
y
n
[ = 0, we can nd an N N such that
if n N, then [x
n
y
n
[ < . Hence for all n N, [f(x
n
) f(y
n
)[ < , showing
that f(x
n
) f(y
n
) is a sequence that converges to 0 as n , as required.
Next, we will show that hypothesis (2) implies conclusion (1); we will pro-
ceed by contradiction. Suppose to the contrary that hypothesis (2) holds but
condition (1) fails. Then there exists an
0
> 0 such that for every > 0, we can
nd x

, y

I with [x

[ < and yet [f(x

) f(y

)[
0
. In particular, for

n
=
1
n
, we can nd x
n
, y
n
I with [x
n
y
n
[ <
n
=
1
n
and [f(x
n
)f(y
n
)[
0
,
for all n N. We now have a pair of sequences x
n
and y
n
I that satises
lim
n
[x
n
y
n
[ = 0 by the squeeze theorem (and remark 3.1.3). Now we are
done, for we have arrived at a contradiction: lim
n
[x
n
y
n
[ = 0 and yet
f(x
n
) f(y
n
) is a sequence that does not converge to 0 (simply let =
0
and no N N can be found to satisfy convergence to 0), directly contradicting
hypothesis (2).
The next example shows that uniform continuity does not come for free all
the time.
Example 4.34. Let f : R R be dened by f(x) = x
2
; let I = R. Let
x
n
= n +
1
n
and y
n
= n for each n N. Then we have that [x
n
y
n
[ =

n +
1
n
n

=
1
n
0 as n . But f(x
n
) =
_
n +
1
n
_
2
= n
2
+ 2 +
1
n
2
and
f(y
n
) = n
2
; therefore, [f(x
n
) f(y
n
)[ =

n
2
+ 2 +
1
n
2
n
2

= 2 +
1
n
2
> 2 for
all n N, and hence f(x
n
) f(y
n
) does not converge to 0. This shows that
f(x) is not uniformly continuous on I = R by the sequential characterization of
uniform continuity.
It may seem surprising that a simple function like f(x) = x
2
is not uniformly
continuous on some interval; but the next theorem shows that every continuous
function on a closed interval is uniformly continuous.
Lemma 4.5.4. Suppose x
n
R is a sequence such that its limit does not exist
or lim
n
x
n
,= 0. Then there exists an
0
> 0 and an increasing sequence of
indices n
k
N such that [x
n
k
[
0
for all k N.
Proof. Since 0 is not the limit of x
n
, by denition of limit, there exists an

0
> 0 such that for every N N there exists an a
N
N such that [x
a
N
0[ =
[x
a
N
[
0
. This gives us a sequence a
N

NN
with the property that x
a
N

0
and a
N
N for all N N. Now dene the sequence n
k
recursively: Let
n
1
= a
1
and n
k+1
= a
n
k
+1
for each k N. Then for every k N, n
k
satises
both [x
n
k
[
0
and n
k+1
= a
n
k
+1
n
k
+ 1 > n
k
. This gives us the required
increasing sequence.
Lemma 4.5.5. Suppose x
n
, y
n
R are sequences such that lim
n
x
n
= c
and lim
n
[x
n
y
n
[ = 0, then lim
n
y
n
= c.
67
Proof. Let > 0. We can choose an N
1
N so that [x
n
c[ <

2
. We
can choose an N
2
N so that [[x
n
y
n
[ 0[ = [x
n
y
n
[ <

2
. Hence if
n N := maxN
1
, N
2
, we have that
[y
n
c[ [y
n
x
n
[ +[x
n
c[ <

2
+

2
= ,
which shows that lim
n
y
n
= c.
Theorem 4.5.6. Suppose that f : S R, where S R, and that [a, b] S
(a < b). If f(x) is continuous on [a, b], then f(x) is uniformly continuous on
[a, b].
Proof. Assume that f(x) is not uniformly continuous on [a, b]. Then we can nd
sequences x
n
, y
n
[a, b] such that lim
n
[x
n
y
n
[ = 0 but f(x
n
)f(y
n
)
is a sequence that does not converge or converges to a limit other than 0. Lemma
4.5.4 shows that there exists an
0
> 0 and a increasing sequence of indices
n
k
N such that
(4.7) [f(x
n
k
) f(y
n
k
)[
0
for all k N.
Note that x
n
k
is a bounded sequence, and so by the Bolzano-Weierstrass
theorem, x
n
k
has a convergent subsequence
_
x
n
k
j
_
that converges to some
c [a, b]. Since lim
j

x
n
k
j
y
n
k
j

= 0, lemma 4.5.5 shows that lim


n
y
n
=
c also. Since f(x) is continuous on [a, b], the sequential characterization of
continuity on intervals shows that
_
f
_
x
n
k
j
__
and
_
f
_
y
n
k
j
__
are sequences
that converge to f(c). This means that there exists a J N such that

f
_
x
n
k
J
_
f(c)

<

0
2
and

f
_
y
n
k
J
_
f(c)

<

0
2
.
Hence

f
_
x
n
k
J
_
f
_
y
n
k
J
_

f
_
x
n
k
J
_
f(c)

f(c) f
_
y
n
k
J
_

<

0
2
+

0
2
=
0
,
directly contradicting (4.7) (with k = k
J
). This shows that f(x) is uniformly
continuous on [a, b].
68
Chapter 5
Dierentiation
In this chapter, we will study important concepts such as Newton quotients,
secant lines, tangent lines, and derivatives. These concepts will be necessary
for the next chapter, where we will prove our third value theoremthe mean
value theorem; it is the most important result in dierential calculus and we
will see its use in various applications.
5.0 Some Denitions and Basic Results
Definition 5.0.1. Suppose that f : S R where S R, and that some point
a S is such that a I for some open interval I S. For some x
0
I, we call
the quantity
f(x0)f(a)
x0a
a Newton quotient for f(x) centered at x = a.
Geometrically, the Newton quotient
f(x0)f(a)
x0a
represents the slope of the
secant line through (a, f(a)) and (x
0
, f(x
0
)).
On the other hand, for some x I, if we let y = f(x), y = f(x) f(a)
(the change in y), and x = x a (the change in x), then the Newton quotient
f(x)f(a)
xa
=
y
x
can be viewed as the average rate of change in f(x) over the
interval from a to x. For example, if f(x) is the distance travelled since time
x = 0 and x is the current time, then
y
x
=
f(x)f(a)
xa
is the average velocity.
In this way, a natural question arises: What do we mean by the instantaneous
velocity or the instantaneous rate of change? We shall mean the limit of the
average velocity or the average rate of change as x 0. With this as our
motivation, we formally dene the derivative below.
Definition 5.0.2 [Derivative]. Suppose that f : S R. Suppose a S is
such that an open interval I S contains a. We say that f(x) is dierentiable
at x = a if
lim
xa
f(x) f(a)
x a
69
exists. If this limit exists, then we will denote this limit by f

(a) and call it the


derivative of f(x) at x = a. That is, f

(a) := lim
xa
f(x)f(a)
xa
. Otherwise, we
say that f(x) is not dierentiable at x = a (notation: f

(a) does not exist).


Note that if h := x a = x, then h := x a 0 if and only if x a. We
have the following alternate form of the derivative.
Remark 5.0.3. Suppose that f : S R. Suppose a S is such that an open
interval I S contains a. If f(x) is dierentiable at x = a, then
f

(a) := lim
xa
f(x) f(a)
x a
= lim
h0
f(a +h) f(a)
h
.
Proof. This proof is left as an exercise.
Note. In order for a function to be dierentiable at a point x = a, there must
be an open interval I containing a on which f(x) is dened.
Example 5.1. Let f : R R be dened by
f(x) = c.
Let a R. Then
f

(a) := lim
h0
f(a +h) f(a)
h
= lim
h0
c c
h
= lim
h0
0
h
= lim
h0
0 = 0.
(In particular, f(x) is dierentiable at every a R.)
Example 5.2. Let f : R R be dened by
f(x) = mx +b.
Let a R. Then
f

(a) := lim
h0
f(a +h) f(a)
h
= lim
h0
[m(a +h) +b] [ma +b]
h
= lim
h0
ma +mh +b ma b
h
= lim
h0
mh
h
= m.
(In particular, f(x) is dierentiable at every a R.)
Example 5.3. Let f : R R be dened by
f(x) = x
2
.
70
Let a R. Then
f

(a) := lim
h0
f(a +h) f(a)
h
= lim
h0
(a +h)
2
a
2
h
= lim
h0
a
2
+ 2ah +h
2
a
2
h
= lim
h0
h(2a +h)
h
= lim
h0
(2a +h) = 2a.
(In particular, f(x) is dierentiable at every a R.)
Example 5.4. Let f : R R be dened by
f(x) = [x[.
Let a = 0. Then
lim
h0
f(0 +h) f(0)
h
= lim
h0
[h[ 0
h
= lim
h0
[h[
h
.
We know that lim
h0
+
|h|
h
= 1 and lim
h0

|h|
h
= 1. Hence lim
h0
|h|
h
does
not exist (by theorem 4.1.11). That is, f(x) is not dierentiable at x = 0.
This process of calculating f

(a) by directly using the denition is sometimes


referred to as calculating the derivative from rst principles.
Note. Note that f(x) = [x[ is continuous at x = 0 but not dierentiable at
x = 0. Can a function be dierentiable at x = a but not continuous at x = a?
Remark 5.0.4. Suppose that f : S R, where S R and that an a R is
such that an open interval I S contains a. If lim
xa
(f(x) f(a)) = 0, then
lim
xa
f(x) = f(a).
Proof. Let > 0. Then we can nd a > 0 such that if 0 < [x a[ < and
x S, then [(f(x) f(a)) 0[ = [f(x) f(a)[ < . This directly shows that
lim
xa
f(x) = f(a).
Theorem 5.0.5. Suppose that f : S R, where S R, and that an a R is
such that an open interval I S contains a. If f(x) is dierentiable at x = a,
then f(x) is continuous at x = a.
Proof. Suppose that f

(a) := lim
xa
f(x)f(a)
xa
exists. Since lim
xa
(x a) = 0,
we have lim
xa
(f(x) f(a)) exists and equals 0 by theorem 4.1.8. This shows
that lim
xa
f(x) exists and equals f(a) by remark 5.0.4. This shows that f(x)
is continuous at x = a.
71
It may seem that a continuous function must be dierentiable most of the
time, because there seems to be no way that one can construct a continuous
function thats not smooth at too many places. After all, no matter how many
corners we create, there are many, many other points where the function is
smooth (dierentiable). Unfortunately, this intuition is misleading, for the op-
posite is true.
Diversion. There are continuous functions dened on R that are not dieren-
tiable at any point.
We now present two other interpretations of the derivative.
Instantaneous rate of change. We can view the derivative as the limit of
the average rate of change as x 0, thus: lim
x0
y
x
= lim
xa
f(x)f(a)
xa
:=
f

(a).
Tangent line. The geometric signicance of the derivative is seen by viewing
f(a+h)f(a)
h
as the slope of the secant line through the points (a, f(a)) and
(a + h, f(a + h)). If we let h
n
be a sequence of numbers that approaches 0,
then each h
n
produces a secant line l
h
through (a, f(a)), (a +h
n
, f(a +h
n
)). If
f

(a) exists, then the slopes of these lines converge to f

(a). Moreover, the lines


l
n
converge to a unique line l through the point (a, f(a)) with slope f

(a);
this is the tangent line of f(x) at x = a.
Definition 5.0.6. Suppose that f : S R, where S R, and that an a R
is such that an open interval I S contains a. We will call the line passing
through (a, f(a)) with slope f

(a) the tangent line to the graph of y = f(x) at


x = a or simply the tangent line of f(x) at x = a. The equation of the tangent
line is
L
a
(x) := f(a) +f

(a)(x a).
This equation is also called the linear approximation to f(x) centered at x = a.
A simple application of the linear approximation to f(x) centered at x = a
is as follows. If f(x) is dierentiable, then f

(a) :=
f(x)f(a)
xa
. If x = a, then
L
a
(a) = f(a) +f

(a)(aa) = f(a). If x a, then f

(a)(xa) f(x) f(a)


L
a
(x) := f(a) +f

(a)(xa) f(x). For x near a, L


a
(x) is approximately f(x).
Here is a summary of what we know about the linear approximation L
a
(x).
1. If x a, then f(x) L
a
(x).
2. L
a
(a) = f(a).
3. L

a
(a) = f

(a).
72
4. As seen in the denition of the tangent line, the graph of L
a
(x) is the
set of ordered pairs S := (x, y) : y = f(a) + f

(a)(x a), which is the


tangent line to f(x).
Items (2) and (3) tell us that L
a
(x) encodes both the value of f(a) and
that of f

(a). Moreover, if h(x) = b + m(x a) is such that h(a) = f(a) and


h

(a) = f

(a), then h = L
a
as functions; that is, L
a
(x) is the unique rst-degree
polynomial (a line) that satises L
a
(a) = f(a) and L

a
(a) = f

(a).
It is now natural for us to consider the derivative function.
Definition 5.0.7. Suppose that f : S R, where S R. Let T be the set
of all points t S such that an open interval I S contains t and f(x) is
dierentiable at x = t. Then we can dene a function h : T R by
h(x
0
) = f

(x
0
)
for each x
0
T. The function h(x) is called the derivative (function) of f(x)
and is denoted by f

(x) (instead of h(x)).


Example 5.5. Let f : R R be dened by f(x) = x
2
. Then by example
5.3, f(x) is dierentiable at x = a for all a R with f

(a) = 2a. Hence the


derivative function of f(x) is f

(x) = 2x dened on all of R.


5.1 Arithmetic Properties for Derivatives
Theorem 5.1.1 [Linearity]. Suppose that f, g : S R where S R, and
that some a S is such that an open interval I R contains a. Suppose also
that f(x) and g(x) are both dierentiable at x = a. Then we have the following.
1. [Homogeneous Rule] For every c R, (cf)(x) := c f(x) is dieren-
tiable at x = a and (cf)

(a) = c f

(a).
2. [Sum Rule] (f + g)(x) := f(x) + g(x) is dierentiable at x = a and
(f +g)

(a) = f

(a) +g

(a).
Proof. We will use proposition 4.1.7 throughout this proof.
1. We compute
(cf)

(x) := lim
xa
c f(x) c f(a)
x a
= lim
xa
c[f(x) f(a)]
x a
= c
_
lim
xa
f(x) f(a)
x a
_
:= c f

(a).
73
2. We compute
(f +g)

(x) := lim
xa
(f +g)(x) (f +g)(a)
x a
= lim
xa
f(x) +g(x) f(a) g(a)
x a
=
_
lim
xa
f(x) f(a)
x a
_
+
_
lim
xa
g(x) g(a)
x a
_
:= f

(a) +g

(a).
Theorem 5.1.2 [Product Rule]. Suppose that f, g : S R where S R,
and that some a S is such that an open interval I R contains a. Suppose
also that f(x) and g(x) are both dierentiable at x = a. Then h(x) := (fg)(x) :=
f(x)g(x) is dierentiable at x = a and h

(a) := (fg)

(a) = f(a)g

(a)+f

(a)g(a).
Proof. First note that
(fg)(x) (fg)(a)
x a
:=
f(x)g(x) f(a)g(a)
x a
=
f(x)g(x) f(x)g(a) +f(x)g(a) f(a)g(a)
x a
=
f(x)g(x) f(x)g(a)
x a
+
f(x)g(a) f(a)g(a)
x a
= f(x)
g(x) g(a)
x a
+g(a)
f(x) f(a)
x a
.
Hence
lim
xa
h(x) h(a)
x a
:= lim
xa
(fg)(x) (fg)(a)
x a
= lim
xa
f(x)
g(x) g(a)
x a
+g(a)
f(x) f(a)
x a
=
_
lim
xa
f(x)
g(x) g(a)
x a
_
+
_
lim
xa
g(a)
f(x) f(a)
x a
_
(5.1)
=
_
lim
xa
f(x)
_
_
lim
xa
g(x) g(a)
x a
_
(5.2)
+ g(a)
_
lim
xa
f(x) f(a)
x a
_
= f(a)
_
lim
xa
g(x) g(a)
x a
_
(5.3)
+ g(a)
_
lim
xa
f(x) f(a)
x a
_
:= f(a)g

(a) +g(a)f

(a),
74
where equalities (5.1) and (5.2) follows from proposition 4.1.7 and equality (5.3)
holds since f(x) is continuous at x = a by theorem 5.0.5. This completes the
proof.
Lemma 5.1.3. Suppose that f : S R, where S R, and that some a S
is such that an open interval I R contains a. Suppose also that f(x) is
dierentiable at x = a and f(x) ,= 0 for all x S. Dene h : S R by
h(x) :=
1
f(x)
; then h(x) is dierentiable at x = a with h

(a) =
f

(a)
[f(a)]
2
.
Proof. We compute
lim
xa
h(x) h(a)
x a
:= lim
xa
1
f(x)

1
f(a)
x a
= lim
xa
f(a)f(x)
f(a)f(x)
x a
= lim
xa
_
f(x) f(a)
x a

1
f(a)f(x)
_
=
_
lim
xa
1
f(a)f(x)
_

_
lim
xa
f(x) f(a)
x a
_
(5.4)
=
_
1
f(a)

1
lim
xa
f(x)
_

_
lim
xa
f(x) f(a)
x a
_
(5.5)
=
_
1
f(a)

1
f(a)
_

_
lim
xa
f(x) f(a)
x a
_
(5.6)
:=
1
[f(a)]
2
f

(a)
=
f

(a)
[f(a)]
2
,
where equalities (5.4) and (5.5) hold due to proposition 4.1.7 and equality (5.3)
holds since f(x) is continuous at x = a by theorem 5.0.5. This completes the
proof.
Theorem 5.1.4 [Quotient Rule]. Suppose that f, g : S R where S R,
and that some a S is such that an open interval I R contains a. Suppose
also that f(x) and g(x) are both dierentiable at x = a and that g(x) ,= 0
for all x S. Then h(x) :=
_
f
g
_
(x) :=
f(x)
g(x)
is dierentiable at x = a with
h

(a) :=
_
f
g
_

(a) =
f

(a)g(a)f(a)g

(a)
[g(a)]
2
.
Proof. Note that h(x) :=
f(x)
g(x)
= f(x)
1
g(x)
. By the product rule and lemma
5.1.3, we have that
h

(a) = f(a)
_
g

(a)
[g(a)]
2
_
+
f

(a)
g(a)
=
f

(a)g(a) f(a)g

(a)
[g(a)]
2
,
75
as claimed.
5.2 Examples of Derivatives
We now introduce some alternate notation for the derivative. We used f

(x) =
lim
h0
f(x+h)f(x)
h
(by remark 5.0.3) to denote the derivative function of f(x)
and f

(a) := lim
xa
f(x)f(a)
xa
= lim
h0
f(a+h)f(a)
h
to denote the derivative
of f(x) at a point x = a. We can denote f

(x) as
d
dx
f(x) or if y = f(x), by
dy
dx
; we can denote f

(a) as
d
dx
f(x)

x=a
or if y = f(x), by
dy
dx

x=a
. This set of
fraction notation is invented by Gottfried Leibniz (16461716).
Here are the derivatives of important functions.
1. Let f : R R be dened by f(x) = x
n
, where n N. We claim that
f

(x) = nx
n1
. (For this purpose only, we dene 0
0
= 1.) To see this,
we proceed by induction. Let P(n) be the statement that if f(x) = x
n
,
then f

(x) = nx
n1
. We know that if n = 1, so that f(x) = x
1
= x,
then f

(x) = 1 for all x R. But 1 = 1x


0
= nx
n1
for all x R;
therefore, P(1) is true. Now suppose P(k) is true for some k N. Consider
f(x) = x
k+1
= x x
k
. By the product rule,
d
dx
(x x
k
) =
_
d
dx
x
_
x
k
+x
_
d
dx
x
k
_
= (1) x
k
+x (kx
k1
) = x
k
+kx
k
= (k + 1)x
k
.
This shows that P(k +1) is true, whence by the principle of mathematical
induction, P(n) is true for all n N. This example and our arithmetic
rules for derivatives show that if f(x) =
P(x)
Q(x)
is rational (i.e., P(x) and
Q(x) are polynomials), then f(x) is dierentiable at x = a for any a R
where Q(a) ,= 0. Therefore, rational functions are dierentiable on their
domain and polynomials are dierentiable everywhere.
2. Let f : R R be dened by f(x) = sin(x). We will compute
lim
h0
sin(x +h) sin(x)
h
for arbitrary x R. Recall by the fundamental trigonometric limit that
lim
h0
sin(h)
h
= 1
and
lim
h0
cos(h) 1
h
= 0.
76
Now note that
sin(x +h) sin(x)
h
=
sin(x) cos(h) + sin(h) cos(x) sin(x)
h
= sin(x)
cos(h) 1
h
+ cos(x)
sin(h)
h
.
Therefore, we have
lim
h0
sin(x +h) sin(x)
h
= lim
h0
sin(x)
cos(h) 1
h
+ cos(x)
sin(h)
h
=
_
lim
h0
sin(x)
cos(h) 1
h
_
(5.7)
+
_
lim
h0
cos(x)
sin(h)
h
_
= sin(x)
_
cos(h) 1
h
_
+ cos(x)
_
sin(h)
h
_
(5.8)
= sin(x) 0 + cos(x) 1
= cos(x),
where equalities (5.7) and (5.8) are due to proposition 4.1.7. Thus, sin(x)
is dierentiable at x = a for all a R and
d
dx
sin(x) = cos(x), for every
x R.
3. We will now try to dene the exponential function. How dene y = a
x
?
What do we mean by

? We will start from the natural numbers and


build upwards. Let a R, a > 0.
If n N, then dene
a
n
= a a a a
. .
n times
.
Since f(x) = x
m
, m N is a strictly increasing continuous function
on [0, ), by the intermediate value theorem, there exists a unique
value x
a
R, x
a
> 0 such that x
m
a
= a. Dene a
1
m
:=
m

a to be this
value x
a
.
If
n
m
Q, m, n N, gcd(m, n) = 1 then dene
a
n
m
=
_
a
1
m
_
n
.
Dene a
0
= 1.
For any R Q, > 0, dene
a

= sup a
q
: 0 < q < , q Q .
77
For any x R, x < 0, dene
a
x
=
1
a
x
.
The above denition will give us the exponential function f : R R,
where f(x) = a
x
. Some basic properties of f(x) are
a
x+y
= a
x
a
y
, for all x R,
a
xy
= (a
x
)
y
, for all x R,
a
x
=
1
a
x
, for all x R, and
f(x) is continuous on R.
It can be shown that f(x) is dierentiable at x = 0. So now we turn to
the task of nding the derivative function f

(x). We compute
d
dx
a
x
= lim
h0
a
x+h
a
x
h
= lim
h0
a
x
(a
h
1)
h
= a
x
_
lim
h0
a
h
1
h
_
= a
x

d
dx
a
x

x=0
.
We can intuitively see that as a varies from 0 to , the slopes of the
tangent lines vary from to . We expect there to exist a (unique)
base e such that
d
dx
e
x

x=0
:= lim
h0
e
h
1
h
= 1. For this base,
d
dx
e
x
=
e
x
lim
h0
e
h
1
h
= e
x
, for all x R.
Though the above discussion is very informal, it can be shown that such
a base e does indeed exist and that exponential functions do satisfy the
properties described above. Throughout this course, we may assume that
exponential functions satisfy all the properties described above and that
d
dx
e
x
= e
x
, for all x R.
5.3 Chain Rule
Suppose that a function f(x) is dierentiable at x = a and that g(y) is dieren-
tiable at y = f(a). Let L
f
a
(x) = f(a) +f

(a)(xa) be the linear approximation


to f(x) centered at x = a. Let L
g
f(a)
(x) = g(f(a))+g

(f(a))(yf(a)) be the lin-


ear approximation of g(y) centered at y = f(a). Let h(x) = (gf)(x) := g(f(x)).
What happens to the derivative when we compose two functions?
Observation. If x a, then y := f(x) L
f
a
(x). We know that if y f(a),
then z := g(y) L
g
f(a)
(y). This means that if x a, then h(x) := (g f)(x)
L
g
f(a)
(L
f
a
(x)) := (L
g
f(a)
L
f
a
)(x). That is, the composition of g with f can be
78
approximated by the composition of the two linear approximations. What is
L
g
f(a)
L
f
a
? We compute
(L
g
f(a)
L
f
a
)(x) := (L
g
f(a)
[f(a) +f

(a)(x a)])(x)
= g(f(a)) +g

(f(a))([f(a) +f

(a)(x a)] f(a))


= g(f(a)) +g

(f(a))f

(a)(x a)
= h(a) +g

(f(a))f

(a)(x a).
We would like this to be the linear approximation of h(a) centered at x = a.
We would therefore want the coecient g

(f(a))f

(a) to equal h

(a).
Theorem 5.3.1 [Chain Rule]. Assume that f : S R, where S R, and
that g : T R, where f(S) T R. Suppose there are open intervals
I S, J T such that I contains some a S and J contains f(a) T.
If f(x) is dierentiable at x = a and g(y) is dierentiable at y = f(a), then
h(x) := (g f)(x) is dierentiable at x = a with h

(a) = g

(f(a))f

(a).
Proof. Let : T R be dened by
(y) =
_
g(y)g(f(a))
yf(a)
if y ,= f(a),
g

(f(a)) if y = f(a).
First note that f(a) J T, and so
lim
yf(a)
(y) = lim
yf(a)
g(y) g(f(a))
y f(a)
:= g

(f(a)).
That is, (y) is continuous at y = f(a). Now we note that for all y T,
g(y) g(f(a)) = (y)[y f(a)],
even when y = f(a). Hence
g(f(x)) g(f(a)) = (f(x))[f(x) f(a)]
for all x S, since f(S) T. We now compute
lim
xa
g(f(x)) g(f(a))
x a
= lim
xa
(f(x))[f(x) f(a)]
x a
= lim
xa
(f(x))
_
f(x) f(a)
x a
_
=
_
lim
xa
(f(x))
_

_
lim
xa
f(x) f(a)
x a
_
(5.9)
= (f(a))
_
lim
xa
f(x) f(a)
x a
_
(5.10)
:= g

(f(a))f

(a),
79
where (5.9) is due to proposition 4.1.7 and (5.10) is due to three facts: f(x) is
continuous at x = a by theorem 5.0.5, (y) is continuous at y = f(a), and the
composition (f(x)) is continuous at x = a by theorem 4.2.8. This shows that
h(x) := (g f)(x) is dierentiable at x = a with h

(a) = g

(f(a))f

(a).
Example 5.6. Let f : R R be dened by
f(x) = cos(x).
We will nd
d
dx
f(x). Note that cos(x) = sin
_
x +

2
_
, and so we can use the
chain rule. Let g(y) = sin(y) and h(x) = x +

2
. Then f(x) := cos(x) =
sin
_
x +

2
_
= g(h(x)). Hence the chain rule shows that
f

(x) = g

(h(x))h

(x) = cos(h(x)) 1 = cos


_
x +

2
_
,
which reduces to
cos(x) cos
_

2
_
sin(x) sin
_

2
_
= sin(x),
for all x R.
Example 5.7. Let A =
_
2n +

2
: n Z
_
be the set of integer multiples of
2 oset by

2
. Let f : R A R be dened by f(x) = tan(x) :=
sin(x)
cos(x)
. We
will nd f

(x). By the quotient rule, we have that


f

(x) =
(cos(x))(cos(x)) (sin(x))(sin(x))
cos
2
(x)
=
cos
2
(x) + sin
2
(x)
cos
2
(x)
=
1
cos
2
(x)
= sec
2
(x),
for all x R A.
Example 5.8. Let A be the set dened in the previous example. Let f : RA
R be dened by f(x) = sec(x) :=
1
cos(x)
. we will nd f

(x). By the quotient


rule, we have that
f

(x) =
(sin(x))
cos
2
(x)
=
sin(x)
cos(x)

1
cos(x)
:= tan(x) sec(x),
for all x R A.
80
Example 5.9. Similarly, we can nd the derivative functions of f(x) := cot(x)
and g(x) := csc(x):
f

(x) = csc
2
(x)
and
g

(x) = cot(x) csc(x).


We can rewrite the chain rule using the Leibniz notation. Suppose y = f(x)
and z = g(y). Then
dy
dx
= f

(x) and
dz
dy
= g

(y). But we have that z = g(f(x)),


and so the chain rule can be written as
dz
dx
= g

(f(x))f

(x) =
dz
dy

dy
dx
.
The dys appear to cancel as they would in a fraction.
5.4 Local Extrema
Suppose that f : S R, where S R, and that I = [a, b] S with a < b.
Suppose also that f(x) is continuous on I. Recall that a point c I is called a
global maximum for f(x) if f(c) f(x) for all x I, and that c I is called
a global minimum for f(x) if f(c) f(x) for all x I. How do we nd this
c? Either that c is an endpoint (c = a or c = b) or c is not an endpoint. In
the latter case, we have that c (a, b) and f(x) f(c) or f(x) f(c) for all
x (a, b).
Now suppose that f(x) is dierentiable at this c (a, b). What can we say
about f

(c)? We will later show that f

(c) = 0. To nd this c, then, we must


check (1) the endpoints a and b, (2) points for which f

(x) = 0, and (3) values


of x for which f

(x) does not exist.


Definition 5.4.1. Suppose that f : S R, where S R. We say that c S is
a local maximum for f(x) if there exists an open interval (a, b) S containing c
such that f(c) f(x) for all x (a, b). We say that c S is a local minimum for
f(x) if there exists an open interval (a, b) S containing c such that f(c) f(x)
for all x (a, b). We say that c S is an local extremum for f(x) if it is either
a local maximum or a local minimum for f(x).
Example 5.10. Let f : R R be dened by f(x) = x(x 1)(x + 1). Then it
appears that some c (1, 0) is a local maximum and some d (0, 1) is a local
minimum for f(x).
Problem. How do we nd local maxima and minima?
The following theorem provides a necessary (but not sucient) condition for
c S to be a local maximum or minimum for f(x).
81
Theorem 5.4.2. Suppose that f : S R, where S R. Suppose that c S is
a local extremum for f(x). If f(x) is dierentiable at x = c, then f

(c) = 0.
Proof. Suppose rst that c S is a local maximum for f(x). Notice once and
for all that because f(x) is dierentiable at x = c, we have that
(5.11) lim
hc

f(x) f(c)
x c
= lim
xc
f(x) f(c)
x c
= lim
hc
+
f(x) f(c)
x c
by theorem 4.1.11. There exists an (a, b) S containing c so that f(x) f(c)
for all x (a, b). Let x be such that x (a, c). Then f(x) f(c) and so
f(x) f(c) 0. Since x < c,
f(x)f(c)
xc
0. This shows that
lim
hc

f(x) f(c)
x c
0.
Now suppose x (c, b). Again, f(x) f(c) and so f(x)f(c) 0. Since x > c,
f(x)f(c)
xc
0. This shows that
lim
hc
+
f(x) f(c)
x c
0.
Hence 0 lim
xc
f(x)f(c)
xc
0 by (5.11), whence f

(c) := lim
xc
f(x)f(c)
xc
= 0
and we are done. A similar argument, mutatis mutandis, shows that f

(c) = 0
if c is a local minimum for f(x).
Corollary 5.4.3. Suppose that f : S R, where S R. Suppose that c S
is a local extremum for f(x). Then either
1. f(x) is dierentiable at x = c and f

(c) = 0, or
2. f(x) is not dierentiable at x = c.
Proof. Trivial.
Definition 5.4.4. Suppose that f : S R, where S R, and that c S is
contained in some open interval I S; c is called a critical point for f(x) if
either
1. f(x) is dierentiable at x = c with f

(c) = 0, or
2. f(x) is not dierentiable at x = c.
Corollary 5.4.3 says, in short, that local extrema are critical points. The
obvious application is the following corollary.
Corollary 5.4.5. Suppose that f : S R where S R. If f(x) is continuous
on [a, b] S and some c [a, b] is such that f(c) f(x) for all x [a, b] or
f(c) f(x) for all x [a, b], then
82
1. c = a or c = bi.e., c is an endpoint;
2. c (a, b) and
f(x) is dierentiable at x = c with f

(c) = 0; or
f(x) is not dierentiable at x = c.
In the case of (2), we shall call such c an interior critical point for f(x) on [a, b].
Proof. If condition (1) does not holdi.e., if c is not an endpoint, then c is a
local extremum and corollary 5.4.3 shows that condition (2) holds.
Example 5.11. Let f(x) = xe
x
, dened on R. We will attempt to nd all
critical points, local extrema, global extrema on R, and global extrema on [2, 3]
for f(x).
First note that the product rule shows that f(x) is dierentiable on all of R,
with f

(x) = xe
x
+ e
x
= (x + 1)e
x
. Now f

(x) = 0 if and only if x = 1 since


e
x
> 0 for all x R. This shows that 0 is the only critical point for f(x). Since
f(x) is continuous on [2, 3], the extreme value theorem shows that f(x) has
global extrema on [2, 3]. The extrema will be at x = 2, x = 3 or x = 1 by
corollary 5.4.5. Checking, f(2) = 2e
2
, f(3) = 3e
3
, and f(1) = e
1
; the
global maximum on [2, 3] occurs at x = 3 and the global minimum on [2, 3]
occurs at x = 1. Since f(x) is unbounded above, it does not have a global
maximum on R. We do not yet know whether it is bounded below. At this
point, we do not know how to test whether a critical point is a local extremum.
To solve many of these problems, we will need the result presented in the
next chapter.
5.5 Inverse Functions
5.5.0 Some Denitions
Suppose f : S R, where S R. Assume that f(x) is one-to-one, or injective,
on a nondegenerate interval I S. That is, if f(x) = f(y) for some x, y I,
then x = y. Let T = f(I). We can now dene a new function g : T I by
g(y) = x if and only if f(x) = y. The graph of f(x) is essentially the same as
the graph of g(y), save for a reection. That is, if (x
0
, y
0
) is in the graph of
f(x), x
0
I, then (y
0
, x
0
) will be in the graph of g(y). Moreover, this function
g(y) satises
(g f)(x) := g(f(x)) = x
for all x I and
(f g)(y) := f(g(y)) = y
for all y T. We call such a function g(y) the inverse of f(x) on I. Also, f(x)
is said to be invertible on I if some g(y) is an inverse of f(x) on I; i.e., when
83
f(x) is one-to-one. Geometrically, the graph of the inverse function g(y) (of
f(x) on I) is obtained by reecting the graph of f(x) through the line y = x.
Algebraically, the graph of the inverse function is obtained by exchanging the
variable x with y in the function f(x). We make the denition below.
Definition 5.5.1. Suppose that f : S R, where S R, and that I S is
a nondegenerate interval. Let T = f(I). If f(x) is one-to-one on I, then f(x)
is said to be invertible and the function g : T I dened by g(y) = x if and
only if f(x) = y is called the inverse (function) of f(x) on I. Instead of g(y),
we will sometimes denote this function by f
1
(y).
Example 5.12. Let f : R R be dened by f(x) = e
x
. Then f(x) is increasing
on all of R, and hence it is invertible on R. We can dene g(y) = x if and only
if e
x
= y. If we interchange the variables, we get g(x) = y if and only if e
y
= x.
From here, we obtain the graph of our inverse function as described above.
Definition 5.5.2 [Natural Logarithm]. We say that y = ln(x) if and only if
x = e
y
for each x > 0. This denes the natural logarithm function ln : R
+
R,
where R
+
:= x R : x > 0.
As a result, we have that
(5.12) e
ln(x)
= x
for all x R
+
, and
(5.13) ln (e
x
) = x
for all x R. Suppose that ln(x) is dierentiable. Then equation (5.12) shows
that
d
dx
e
ln(x)
=
d
dx
x.
By the chain rule, we have that
d
dx
e
ln(x)
= e
ln(x)

d
dx
ln(x) = 1,
whence we conclude
d
dx
ln(x) =
1
e
ln(x)
=
1
x
,
for all x R
+
, by (5.12).
84
5.5.1 Inverse Function Theorem
Definition 5.5.3. Suppose that f : S R, where S R, and that I S is
a nondegenerate interval. We say that f(x) is (strictly) increasing on I if all
x
0
, y
0
I with x
0
< y
0
satisfy f(x
0
) < f(y
0
). We say that f(x) is (strictly)
decreasing on I if all x
0
, y
0
I with x
0
< y
0
satisfy f(x
0
) > f(y
0
).
Problem. Suppose that on some nondegenerate interval I, f(x) is invertible
and that f(x) is continuous on I. Must f(x) be strictly monotonic (strictly
increasing or strictly decreasing) on I?
Theorem 5.5.4. Suppose that f : S R, where S R, and that I S is a
nondegenerate interval. If f(x) is continuous and one-to-one on I, then f(x) is
either strictly increasing or strictly decreasing on I.
Proof. This proof is left as an exercise. (Hint: Show that if f(x) is neither
increasing nor decreasing, then there exists points a, b, c I with a < b < c
such that either (1) f(a) < f(b) and f(b) > f(c), or (2) f(a) > f(b) and
f(b) < f(c). Use the intermediate value theorem to show that f(x) is not
one-to-one on I.)
For strictly monotonic functions, the following is a useful characterization of
continuity on an interval [a, b].
Theorem 5.5.5. Suppose that f : S R, where S R, and that [a, b] S
is a nondegenerate interval. Also suppose that f(x) is strictly increasing or
strictly decreasing on [a, b]. Then f(x) is continuous on [a, b] if and only if
f([a, b]) = [f(a), f(b)].
Proof. Suppose that f(x) is continuous on [a, b]. Let x [a, b]. We know
that f(a) f(x) f(b) since f(x) is increasing on [a, b], and so f([a, b])
[f(a), f(b)]. So suppose [f(a), f(b)]; i.e., f(a) f(b). If = f(a)
or = f(b), then x = a or x = b (respectively) would satisfy f(x) = . So
suppose f(a) < < f(b). The intermediate value theorem shows that there
exists some c [a, b] such that f(c) = . Hence [f(a), f(b)] f([a, b]), whence
f([a, b]) = [f(a), f(b)].
To show the converse, we will proceed by contraposition. That is, we assume
the negation of the statement that
f(x) is continuous on [a, b]
and proceed to show that the statement
f([a, b]) = [f(a), f(b)]
is false. So suppose f(x) is not continuous on [a, b]. That is, suppose
85
1. f(x) is discontinuous at x = c for some c (a, b),
2. lim
xa
+ f(x) ,= f(a) or it does not exist, or
3. lim
xb
f(x) ,= f(b) or it does not exist.
We will prove the rst case only and will leave the other two cases as an exercise.
Observe that f(x) < f(c) for all x [a, c), since f(x) is increasing on [a, c]. Let
L = supf(x) : x [a, c) := sup f([a, c)).
Note that L f(c) since every x [a, c) satises f(x) f(c). Let > 0. Then
there must exist some x
0
[a, c) such that L < f(x
0
) L, for otherwise L
would be the supremum of f([a, c)), a contradiction. Let = c x
0
> 0. For
any x that satises 0 < c x < that is, for any x that satises x
0
< x < c,
we have
L < f(x
0
) < f(x) L
since f(x) is increasing on [a, c]. This shows that [f(x) L[ < . We conclude,
therefore, that
lim
xc

f(x) = L f(c).
Via a similar argument,
lim
xc
+
f(x) = M f(c),
where M := inff(x) : x (c, b] := inf f((c, b]). Combining, we have
lim
xc

f(x) = L f(c) M = lim


xc
+
f(x),
and so L cannot equal M. To see this, suppose that L = M. Then f(c) =
lim
xc
f(x) by theorem 4.1.11 and so f(x) is continuous at x = c, contrary to
assumption; therefore, we must have f(x
0
) L < f(c) < M f(x
1
) for all
x
0
[a, c) and x
1
(c, b]. Since f(c) is the only point in [L, M] f([a, b]), it
follows that f([a, b]) is not an interval and we are done.
Now it is easy to show that the inverse function of a continuous, one-to-one
function is continuous.
Theorem 5.5.6. Suppose that f : S R, where S R, and that [a, b] S is a
nondegenerate interval on which f(x) is one-to-one. Let f
1
: f([a, b]) [a, b]
be the inverse function of f(x). If f(x) is continuous on [a, b], then f
1
(y) is
continuous on f([a, b]).
Proof. By theorem 5.5.4, f(x) is either strictly increasing or strictly decreas-
ing on [a, b]. Suppose rst that f(x) is strictly increasing on [a, b]. Let y
1
,
y
2
f([a, b]); by denition there are x
1
, x
2
[a, b] with f(x
1
) = y
1
and
86
f(x
2
) = y
2
. If f(x
1
) = y
1
< y
2
= f(x
2
), then x
1
< x
2
, for f(x) is in-
creasing on [a, b]. This shows that f
1
(y
1
) := x
1
< x
2
:= f
1
(y
2
); i.e.,
f
1
(y) is strictly increasing on f([a, b]). For convenience, let x = f(a) and
y = f(b). We have f([a, b]) = [f(a), f(b)] = [x, y] by theorem 5.5.5. Since
f
1
([x, y]) = f
1
(f([a, b])) = [a, b] = [f
1
(x), f
1
(y)], theorem 5.5.5 shows
that f
1
is continuous on f([a, b]). The case where f(x) is strictly decreasing
is handled similarly.
We now come to the main theorem of this section.
Theorem 5.5.7 [Inverse Function Theorem]. Suppose that f : S R,
where S R, and that [a, b] S is a nondegenerate interval on which f(x) is
one-to-one. Let g : f([a, b]) [a, b] be the inverse function of f(x). If f(x) is
continuous on [a, b] and dierentiable on (a, b) with f

(x) ,= 0 for all x (a, b),


then g(y) is dierentiable on (f(a), f(b)). Moreover, in this case, if x
0
(a, b)
and y
0
= f(x
0
), then
g

(y
0
) =
1
f

(x
0
)
.
Strategy: We would like to say
lim
yy0
g(y) g(y
0
)
y y
0
= lim
yy0
g(y) x
0
y f(x
0
)
= lim
xx0
x x
0
f(x) f(x
0
)
= lim
xx0
1
f(x)f(x0)
xx0
=
1
f

(x
0
)
.
The doubious step is the second equality: It is intuitively clear that as y ap-
proaches y
0
, then x must approach x
0
. Why is this true? It is because both
the quotient in line two and the function g(y) is continuous at x = x
0
and at
y = y
0
= f(x
0
), respectively.
Proof. Let x
0
(a, b). Let : S R be dened by
(x) =
_
xx0
f(x)f(x0)
if x ,= x
0
,
1
f

(x0)
if x = x
0
.
Note that (x) is well-dened since f(x) is one-to-one on [a, b]i.e., f(x) =
f(x
0
) if and only if x = x
0
. Observe that by the quotient rule,
lim
xx0
(x) := lim
xx0
x x
0
f(x) f(x
0
)
= lim
xx0
1
f(x)f(x0)
xx0
=
1
f

(x
0
)
,
87
showing that (x) is continuous at x = x
0
. By theorem 5.5.6, we have that
g(y) is continuous on f([a, b]) = [f(a), f(b)] (theorem 5.5.5); in particular, it is
continuous at y = f(x
0
) := y
0
(f(a), f(b)). One more item of note:
(g(y)) =
_
g(y)x0
f(g(y))f(x0)
=
g(y)g(y0)
yy0
if y ,= y
0
,
1
f

(x0)
if y = y
0
.
The two cases are divided correctly because no other value of y can make g(y)
equal to x
0
: g(y) is one-to-one. Putting these together, corollary 4.2.9 shows
that
lim
yy0
g(y) g(y
0
)
y y
0
= lim
yy0
(g(y))
= lim
xg(y0)
(x)
= lim
xx0
(x)
=
1
f

(x
0
)
,
as required.
Remark 5.5.8. Let f : R
+
R be dened by f(x) = ln(x). Then f

(x) =
1
x
for all x (1, ).
Proof. This proof is left as an exercise.
Interestingly, the inverse function theorem expressed in Leibnizs notation is
again very memorable:
dy
dx
=
1
dx
dy
,
where y = f(x) and so x = f
1
(y). It was one of Leibnizs goals to create an
intuitive and memorable system of notation for calculus.
5.5.2 Inverse Trigonometric Functions
Arc sine. Notice that on the interval
_

2
,

2

, sin(x) is increasing and hence


one-to-one; therefore, the restricted sin function sin(x)[
[

2
,

2
]
has an inverse.
The domain of the inverse is sin
__

2
,

2
_
= [1, 1]. Dene arcsin : [1, 1]
_

2
,

2

by arcsin(y) = x if and only if sin(x) = y, where x


_

2
,

2

. This
arcsin function is the inverse of sin(x) over the interval
_

2
,

2

. We know by
denition that arcsin(1) =

2
, arcsin(0) = 0, and arcsin(1) =

2
.
Problem. Calculate
d
dx
arcsin(x).
88
Solution. By the inverse function theorem,
d
dx
arcsin(x)

x=x0
=
1
d
dy
sin(y)

y=y0
=
1
cos(y
0
)
for all x
0
(1, 1), y
0
:= arcsin(x
0
). Now we try to express cos(y
0
) in terms of
x
0
. Since y
0
:= arcsin(x
0
)
_

2
,

2

, cos(y
0
) will be nonnegative. This means
that
cos(y
0
) =
_
1 (sin(y
0
))
2
=
_
1 (sin(arcsin(x
0
)))
2
.
But sin(arcsin(x
0
)) = x
0
, and so cos(y
0
) =
_
1 x
2
0
. Thus we arrive at
d
dx
arcsin(x) =
1

1 x
2
for all x
0
(1, 1).
Arc cosine. Notice that on the interval [0, ], cos(x) is decreasing and hence
one-to-one; therefore, the restricted cos function cos(x)[
[0,]
has an inverse. The
domain of the inverse is cos([0, ]) = [1, 1]. Dene arccos : [1, 1] [0, ] by
arccos(y) = x if and only if cos(x) = y, where x [0, ]. This arccos function
is the inverse of cos(x) over the interval [0, ]. We know by denition that
arccos(1) = , arcsin(0) =

2
, and arccos(1) = 0.
Problem. Calculate
d
dx
arccos(x).
Solution. We will present a dierent strategy of nding the derivative of in-
verse functions. We know that cos(arccos(x
0
)) = x
0
for all x
0
[1, 1]; there-
fore,
d
dx
cos(arccos(x)) = 1. The inverse function theorem shows that arccos(x)
is dierentiable on (1, 1), whence the chain rule shows that
1 =
d
dx
cos(arccos(x)) = sin(arccos(x))
d
dx
arccos(x).
We will now try to nd an alternate expression for sin(arccos(x)). Note that
arccos(x) [0, ], on which sin(arccos(x)) is nonnegative. Now we have
sin(arccos(x)) =
_
1 (cos(arccos(x)))
2
=
_
1 x
2
,
for all x (1, 1). This shows that
d
dx
arccos(x) =
1
sin(arccos(x))
=
1

1 x
2
for all x (1, 1) and we are done.
89
Observation. Let f : [1, 1] be dened by f(x) = arcsin(x) +arccos(x). Then
f(x) is continuous on [1, 1] and dierentiable on (1, 1) with f

(x) =
1

1x
2

1

1x
2
= 0 on (1, 1). We will see later (by corollary 6.0.3) that this implies
f(x) = C for some constant C R for all x (1, 1). What is this constant?
Substituting x = 0, we nd that C = f(0) = arcsin(0) +arccos(0) = 0 +

2
=

2
.
This happens to be the value of f(x) at x = 1 and x = 1 as well. Hence
f(x) =

2
for all x [1, 1].
Arc tangent. Notice that on the interval
_

2
,

2
_
, tan(x) is increasing and
hence one-to-one; therefore, the restricted tan function tan(x)[
(

2
,

2
)
has an
inverse. The domain of the inverse is tan
__

2
,

2
__
= R. Dene arctan : R
_

2
,

2
_
by arctan(y) = x if and only if tan(x) = y, where x
_

2
,

2
_
. This
arctan function is the inverse of tan(x) over the interval
_

2
,

2
_
. We know by
denition that arctan(1) =

4
, arctan(0) = 0, and arctan(1) =

4
.
Problem. Calculate
d
dx
arctan(x).
Solution. For any c > 0, c <

2
, tan(x) is continuous on [c, c] and dieren-
tiable on (c, c). Suppose x
0
(M, M) and y
0
= arctan(x
0
) (c, c). The
inverse function theorem shows that arctan(x) is dierentiable on (M, M) with
d
dx
arctan(x)

x=x0
=
1
d
dy
tan(y)

y=y0
=
1
(sec(y
0
))
2
=
1
1 + (tan(y
0
))
2
=
1
1 + (tan(arctan(x
0
)))
2
=
1
1 +x
2
0
.
Since c is arbitrary and tan(x) is increasing and not bounded on
_

2
,

2
_
, we
conclude that
d
dx
arctan(x) =
1
1 +x
2
for any x R.
5.5.3 More Derivatives
Problem. Let S = x R : x 1 = [1, ). Let F : S R; let F(x)
represent the area bounded by the graph of y = f(t) :=
1
t
, the lines x1, x = t,
and y = 0. Is F(x) dierentiable on (1, )? If it is, nd F

(x).
In the successor courseMATH 148we shall formally dene what we mean
by the area bounded by those curves listed above. For now, we show below that
F

(x) = f(x) =
1
x
, and that F(x) = ln(x).
90
Solution. Let x > 1. By remark 5.0.3 and 4.1.11, it suces to show that both
lim
h0
+
F(x+h)F(x)
h
and lim
h0

F(x+h)F(x)
h
exist and are equal.
First let h > 0. Since f(t) =
1
t
is decreasing on [x, x +h], we have that
h f(x +h) F(x +h) F(x) h f(x)

h
x +h
F(x +h) F(x)
h
x
by comparing areas. One should interpret hf(x+h) as the area of the rectangle
with base width h and height f(x + h) (and similarly for h f(x)), and F(x +
h) F(x) as the area under the curve f(t) =
1
t
over [x, x + h]. We now have
that, for all h > 0,
1
x +h

F(x +h) F(x)
h

1
x
.
Since lim
h0
+
1
x+h
=
1
x
= lim
h0
+
1
x
, the one-sided squeeze theorem shows
that lim
h0
+
F(x+h)F(x)
h
exists and equals
1
x
. A similar calculation shows that
lim
h0

F(x+h)F(x)
h
exists and equals
1
x
. We conclude that F(x) is dieren-
tiable on (1, ) and F

(x) =
1
x
.
Let H(x) = F(x) ln(x). Then H(x) is continuous on [1, M] and dier-
entiable on (1, M), for any M > 1. By remark 5.5.8, we have that H

(x) =
F

(x)
1
x
=
1
x

1
x
= 0, whence by corollary 6.0.3, there exists a constant C R
such that H(x) = C on [1, M]. But C = H(1) := F(1) ln(1) = 0 0 = 0, and
so H(x) = 0 = F(x) ln(x) for all x [1, M]. Since M > 0 is arbitrary, we
have that F(x) = ln(x) for all x 1, as desired.
Problem. Let S = x R : x 0 = [0, ) and R. Let f : S R be
dened by f(x) = x

. Is f(x) dierentiable on (0, )? If it is, nd f

(x). (If
= 0, then we dene, for convenience only, 0
0
= 1.)
Solution. Observe that f(x) = x

=
_
e
ln(x)
_

= e
ln(x)
. Then the chain rule
and remark 5.5.8 together show that
d
dx
f(x) =
d
dx
e
ln(x)
=
_
e
ln(x)
_
d
dx
ln(x) = x

_

1
x
_
= x
1
,
and we are done.
The fact that
d
dx
x

= x
1
is a generalization to the rule
d
dx
x
n
= nx
n1
,
where n N.
Problem. Let R
+
= x R : x > 0 = (0, ). Dene f : R
+
R by
f(x) = x
x
. Is f(x) dierentiable on (0, )? If it is, nd f

(x).
91
Solution. Note that for all x > 0, f(x) = x
x
=
_
e
ln(x)
_
x
= e
x ln(x)
. The chain
rule shows that
d
dx
f(x) =
d
dx
e
x ln(x)
=
_
e
x ln(x)
_
d
dx
xln(x),
whence by the product rule and remark 5.5.8, we have that
_
e
x ln(x)
_
d
dx
xln(x) =
_
e
x ln(x)
_
(ln(x) + 1) = x
x
(ln(x) + 1),
and we are done.
92
Chapter 6
The Mean Value Theorem
6.0 The Theorem
Problem. If a car completes a 110 km trip in one hour along a road with speed
limit of 100
km
h
, show that the car exceeded the speed limit at some point in
that hour of travel. (The car is allowed to travel backwards.)
This problem may sound trivial; however, is it really the case that knowledge
of the average velocity translates to knowledge of the instantaneous velocity?
To solve the problem, we let s(t), 0 t 1 represent the signed distance
travelled (in kilometers) t hours after the start of the trip (t = 0), and we let
v(t) = s

(t) be the instantaneous velocity at time t. The speed of the vehicle is


given by [v[(t) := [v(t)[ = [s

(t)[. The average velocity of the one-hour trip is


given by
s(1) s(0)
1 0
= s(1) = 110
_
km
h
_
.
We must nd some t
0
(0, 1) so that v(t
0
) = s

(t
0
) = 110
km
h
.
Theorem 6.0.1 [Rolles Theorem]. Suppose that f : S R, where S R,
and that [a, b] S with a < b. Also suppose that f(x) is continuous on [a, b]
and dierentiable on (a, b). Assume that f(a) = 0 = f(b); then there exists
c (a, b) such that f

(c) = 0.
Proof. By the extreme value theorem, f(x) attains both its global maximum and
minimum on [a, b]. If they both occur at the endpoints, then f(a) = 0 = f(b)
satises 0 f(x) 0 for all x [a, b]i.e., f(x) = 0 is a constant function on
all of [a, b]. Hence any c (a, b) will satisfy f

(c) = 0. If a global extremum for


f(x) on [a, b] occurs at c (a, b), then c is a local extremum for f(x) and hence
f

(c) = 0 by theorem 5.4.2, as required.


We now come to one of the most important theorems in calculus.
93
Theorem 6.0.2 [Mean Value Theorem]. Suppose that f : S R, where
S R, and that [a, b] S with a < b. Also suppose that f(x) is continuous on
[a, b] and dierentiable on (a, b). Then there exists a c (a, b) such that
f

(c) =
f(b) f(a)
b a
.
Proof. Let h : S R be dened by h(x) = f(x)
_
f(a) +
f(b)f(a)
ba
(x a)
_
.
Then h(x) is continuous on [a, b] by theorem 4.2.13 and continuous on (a, b) by
theorem 5.1.1. Note that
h(a) := f(a)
_
f(a) +
f(b) f(a)
b a
(a a)
_
= f(a) f(a) = 0
and
h(b) := f(b)
_
f(a) +
f(b) f(a)
b a
(b a)
_
= f(b) [f(a) +f(b) f(a)] = 0.
By Rolles theorem, there exists some c (a, b) with h

(c) = 0. But
0 = h

(c) = f

(c)
f(b) f(a)
b a
by theorem 4.2.13. Hence f

(c) =
f(b)f(a)
ba
, as required.
Now we are able to show something that we assumed to be obvious.
Corollary 6.0.3. Suppose that f : S R, where S R, and that [a, b] is a
nondegenerate interval such that [a, b] S. Also suppose that f(x) is continuous
on [a, b] and dierentiable on (a, b). If f

(c) = 0 for all x (a, b), then there


exists M R such that f(x) = M for all x [a, b].
Proof. Let x
0
(a, b]. Then f(x) is continuous on [a, x
0
] and dierentiable
on (a, x
0
). By the mean value theorem, there exists some c (a, x
0
) with
0 = f

(c) =
f(x0)f(a)
x0a
. Hence f(x
0
) = f(a) for all x
0
[a, b]. Letting M = f(a)
proves the corollary.
This corollary says that if f(x) is a function that is continuous on [a, b] and
has zero derivative on (a, b), then f(x) = M is a constant function.
Here is another obvious result. It is important when we study antideriva-
tives in the successor course (MATH 148).
Corollary 6.0.4. Suppose that f, g : S R with S R. Also suppose that
[a, b] S is a nondegenerate interval, and that f(x) and g(x) are continuous on
[a, b] and dierentiable on (a, b). If f

(x) = g

(x) for all x (a, b), then there


exists an M R such that f(x) = g(x) +M for all x [a, b].
94
Proof. Dene h : S R by h(x) = f(x) g(x). Then h(x) is continuous on
[a, b] (by theorem 4.2.6) and dierentiable on (a, b) (by theorem 5.1.1). We have
that h

(x) = f

(x) g

(x) = 0 for all x (a, b) by theorem 5.1.1. By corollary


6.0.3, there exists M R such that h(x) := f(x) g(x) = M for all x [a, b],
and hence f(x) = g(x) for all x [a, b], as claimed.
Definition 6.0.5. Suppose that f : S R, where S R. Suppose that a
nondegenerate interval I is such that I J S, where J is an open interval.
We say that F(x) is an antiderivative of f(x) on I if F

(x) = f(x) for all x I.


(The requirement that I is contained in some open interval J S is made so
that the derivative exists on all of I.)
Corollary 6.0.4 shows that if F(x), G(x) are antiderivatives on I, then
G(x) = F(x) +C for some C R. Conversely, every such function F(x) +C is
an antiderivative of f(x) on I. We will use the notation
_
f(x) dx
to denote the collection of all antiderivatives of f(x) (on an implicit interval I;
usually the entire dom(f)).
Example 6.1. Instead of writing F(x) +C : C R, we will write F(x) +C.
1.
_
sin(x) dx = cos(x) +C.
2.
_
x
n
dx =
x
n+1
n+1
+C for n N.
3.
_
e
x
dx = e
x
+C.
4.
_
e
x
2
dx =?
It turns out that it is not possible to express an antiderivative of e
x
2
by (nite)
arithmetic combinations and compositions of elementary functions.
Problem. Which functions have antiderivatives? How can we nd an an-
tiderivative?
6.1 Applications
6.1.1 Increasing Function Theorem
Recall the following denition.
95
Definition 6.1.1. Suppose that f : S R, where S R, and that I S is
a nondegenerate interval. We say that f(x) is (strictly) increasing on I if all
x
0
, y
0
I with x
0
< y
0
satisfy f(x
0
) < f(y
0
). We say that f(x) is (strictly)
decreasing on I if all x
0
, y
0
I with x
0
< y
0
satisfy f(x
0
) > f(y
0
).
We can make similar denitions for non-increasing and non-decreasing func-
tions.
Theorem 6.1.2 [Increasing Function Theorem]. Suppose that f : S R,
where S R, and that I is any nondegenerate interval such that I J S,
where J is an open interval. If f(x) is dierentiable on I with f

(x) > 0 for all


x I, then f(x) is increasing on I.
Proof. Let x
0
, y
0
I with x
0
< y
0
. Then f(x) is continuous on [x
0
, y
0
] by
theorem 5.0.5 and dierentiable on (x
0
, y
0
). The mean value theorem shows
that there exists c (x
0
, y
0
) such that f

(c) =
f(y0)f(x0)
y0x0
. But f

(c) > 0 and


y
0
x
0
> 0, and hence f(y
0
) f(x
0
) = f

(c)(y
0
x
0
) > 0, whence f(y
0
) >
f(x
0
).
The technical requirement that I is an interval contained in some open in-
terval J S is again made so that f

(x) dened on I.
Corollary 6.1.3 [Decreasing Function Theorem]. Suppose that f : S
R, where S R, and that I is any nondegenerate interval such that I J S,
where J is an open interval. If f(x) is dierentiable on I with f

(x) < 0 for all


x I, then f(x) is decreasing on I.
Proof. This proof is left as an exercise. (Hint: Note that f(x) is decreasing on
I if and only if f(x) is increasing on I.)
Question: If f(x) is increasing and dierentiable on I = (a, b), must f

(x) > 0
for all x I?
It is easy to see that f

(x) 0. In fact, the following example shows that


the strict inequality is not always the case.
Example 6.2. Let f : R R be dened by f(x) = x
3
; it is increasing and
dierentiable on R with f

(x) = 3x
2
, and so f

(0) = 0.
The increasing function theorem easily lends itself to this generalization.
Corollary 6.1.4 [Increasing Function Theorem]. Suppose that f : S
R, where S R, and that I is any nondegenerate interval such that I J S,
where J is an open interval. Suppose also that f(x) is dierentiable on I. If
f

(x) 0 on all x I and f

(x) = 0 for only nitely many x I, then f(x) is


increasing on I.
96
Proof. This proof is left as an exercise.
The decreasing function theorem has the same generalization. Also, versions
of the theorem for non-increasing and non-decreasing functions exist. Later,
both the non-decreasing and the increasing version will be referred to as the
increasing function theorem, and similarly for the decreasing function theorem.
Example 6.3. Let f : R R be dened by f(x) = xe
x
. We will nd all
intervals on which f(x) is increasing and on which f(x) is decreasing. Note
that f(x) is dierentiable on all of R with f

(x) = (x + 1)e
x
; but e
x
> 0 for all
x R, and so f

(x) < 0 when x (, 1), f

(x) > 0 when x (1, ), and


f

(x) = 0 only when x = 1. The increasing and decreasing function theorems


show that f(x) is decreasing on (, 1] and increasing on [1, ).
Problem. Suppose that f : S R, where S R, and that I S is an open
interval containing some a S. If f

(x) is dierentiable on I and f

(a) > 0,
must there be an > 0 such that f(x) is increasing on (c , c +)?
The answer to this problem turns out to be no. One way for the described
situation to occur is for f

(x) to be continuous at x = a, but not all dierentiable


functions have continuous derivatives at certain points. Consider, for example,
the function f : R R dened by
f(x) =
_
x
2
sin
_
1
x
2
_
+x if x ,= 0,
0 if x = 0.
For an exercise, show that f(x) is dierentiable on R with f

(0) > 0 while there


exists no open interval I = (, ) such that f(x) is increasing on I.
6.1.2 Functions with Bounded Derivatives
Problem. Suppose that f : S R, where S R, and that f(x) is continuous
on [a, b] S and dierentiable on (a, b), a < b. If m f

(x) M for all


x (a, b), what can we say about f(a) and f(b)? In particular, what inequality
can we establish?
Theorem 6.1.5. Suppose that f : S R, where S R, and that f(x) is
continuous on [a, b] S and dierentiable on (a, b), a < b. If m f

(x) M
for all x (a, b), then
f(a) +m(x a) f(x) f(a) +M(x a)
for all x [a, b].
97
Proof. Let x
0
(a, b]. Then f(x) is continuous on [a, x
0
] and dierentiable
on (a, x
0
). The mean value theorem shows that
f(x0)f(a)
x0a
= f

(c) for some


c (a, x
0
), and hence
m
f(x
0
) f(a)
x
0
a
= f

(c) M.
This shows that
f(a) +m(x
0
a) f(x
0
) f(a) +M(x
0
a),
for all x
0
(a, b]. When x
0
= a, the above inequality is trivially true. We
conclude that
f(a) +m(x a) f(x) f(a) +M(x a),
for all x [a, b], as required.
We now return to the speed limit problem posed at the beginning of this
chapter.
Problem. If a car completes a 110 km trip in one hour along a road with speed
limit of 100
km
h
, show that the car exceeded the speed limit at some point in
that hour of travel. (The car is allowed to travel backwards.)
Solution. Let s(t) be the signed distance (in kilometers) the car has travelled
t hours after the start of the trip (at which time t = 0), for t [0, 1]. It is a
reasonable physical assumption that s(t) is continuous on [0, 1] and dierentiable
on (0, 1). Suppose to the contrary that the car never exceeded the speed limit
of 100
km
h
; that is, 100 s

(t) 100. Then by theorem 6.1.5,


s(1) s(0) + 100(1 0)
s(1) 100.
This shows that the car cannot complete the 110 km trip. We conclude that the
car must have exceeded the speed limit at some point during the trip.
Theorem 6.1.6. Suppose that f : I R, where I R is any nondegenerate
interval. If f(x) is dierentiable on I with [f

(x)[ M for all x I, then f(x)


is uniformly continuous on I.
Proof. Let x, y I with x < y. Then f(x) is continuous on [x, y] and dieren-
tiable on (x, y). The mean value theorem shows that there exists a c (x, y)
with
f(y) f(x)
y x
= f

(c)
98
This shows that

f(y) f(x)
y x

= [f

(c)[ M
[f(y) f(x)[ M[y x[, (6.1)
for all x < y, x, y [a, b]. But (6.1) is trivially true when x = y and is true
when x > y as well, and so (6.1) is true for all x, y [a, b].
Let > 0. Let =

M
. Then whenever [x y[ < , x, y I, we have, by
(6.1), that
[f(x) f(y)[ M[x y[ < M := M

M
= .
This shows that f(x) is uniformly continuous on [a, b].
Problem. Assume that f : S R, where S R, and that I is any nonde-
generate interval such that I J S, where J is an open interval. If f(x) is
uniformly continuous on I and dierentiable on I, must f

(x) be bounded on
I? That is, must there exist an M > 0 such that [f

(x)[ M for all x I?


The answer to this problem turns out to be no, too. For example, consider
the function f : R R dened by
f(x) =
_
x
2
sin
_
1
x
2
_
if x ,= 0,
0 if x = 0.
For an excercise, show that f(x) is continuous on [1, 1] (and hence uniformly
continuous on [1, 1] by theorem 4.5.6) and dierentiable on [1, 1], while the
derivative function f

(x) is not bounded on [1, 1].


6.1.3 Comparing Functions Using Their Derivatives
Theorem 6.1.7. Suppose that f, g : S R with S R, that I is a nonde-
generate interval such that I S, and that a S is the left endpoint of I.
Assume that f(x) and g(x) are continuous on I and dierentiable on I a
with f(a) g(a). If f

(x) < g

(x) for all x I a, then f(x) < g(x) on


I a.
Proof. Let h : S R be dened by h(x) = g(x) f(x). Then h

(x) = g

(x)
f

(x) > 0 on I a. The increasing function theorem shows that h(x) is


increasing on I a. Since h(x) is continuous on I (by theorem 4.2.13), h(a)
must be stictly less than h(x) for all x I a. To see this, suppose h(a) h(b)
for some b I, b > a. The mean value theorem shows that there exists a
c (a, b) with h

(c) =
h(b)h(a)
ba
0, a contradiction.
Now we are done. Since h(a) = g(a) f(a) 0, h(x) := g(x) f(x) >
h(a) 0 for all x > a, x I. The result follows.
99
6.1.4 Classifying Critical Points
First derivatives. Recall that a critical point is a point at which either
f

(x) = 0 or f

(x) does not exist. We showed using corollary 5.4.3 that ev-
ery local extremum for a function f(x) is a critical point for f(x); however, the
converse is not true.
Example 6.4. Let f(x) = x
3
. Then f

(x) = 3x
2
and so f

(0) = 0; but x = 0 is
neither a local maximum nor local minimum for f(x).
Theorem 6.1.8 [First Derivative Test]. Suppose that f : S R, where
S R, that there exists an open interval (a, b) containing some c S, and that
[a, b] S. Assume also that f(x) is continuous on [a, b] and dierentiable on
(a, b) except possibly at x = c. Finally, assume that x = c is a critical point for
f(x).
1. If f

(x) 0 on (a, c) and f

(x) 0 on (c, b), then c is a local minimum


for f(x).
2. If f

(x) 0 on (a, c) and f

(x) 0 on (c, b), then c is a local maximum


for f(x).
Proof. Let x
1
(a, c). Then f(x) is continuous on [x
1
, c] and dierentiable
on (x
1
, c). By the mean value theorem, there exists a d
1
(x
1
, c) such that
f(x1)f(c)
x1c
= f

(d
1
); therefore, f(x
1
) = f(c) + f

(d
1
)(x
1
c). Since f

(d
1
) 0
and x
1
c < 0, we have that f(x
1
) f(c) + 0 = f(c). Let x
2
(c, b).
Similarly, by the mean value theorem, there exists a d
2
(c, x
2
) such that
f(x2)f(c)
x2c
= f

(d
2
); therefore, f(x
2
) = f(c) + f

(d
2
)(x
2
c). Since f

(d
2
) 0
and x
2
c > 0, we have that f(x
2
) f(c) +0 = f(c). Hence f(x) f(c) holds
for every x (a, b), showing that c is a local minimum for f(x). The proof of
(2) is similar and is left as an exercise.
Example 6.5. Consider f(x) = xe
x
again. We know that f

(x) = (x + 1)e
x
.
We know that f

(x) < 0 if x < 1 and f

(x) > 0 if x > 1. At x = 1,


f

(x) = 0 and so x = 1 is a critical point for f(x). By the rst derivative test,
x = 1 is a local minimum for f(x).
Second derivatives. If the derivative function f

(x) of some f(x) is continu-


ous at a point x = c and dened on an open interval containing c, then it could
potentially be dierentiable again at the point x = c. We call this derivative
the second derivative.
Definition 6.1.9. Suppose that f : S R, where S R, and that f(x) is
dierentiable on an open interval I S containing some point c. If f

(x) is
100
dierentiable at x = c, then we say that f(x) is twice dierentiable at x = c and
call the quantity
f

(c) :=
d
dx
f

(x)

x=c
the second derivative of f(x) at x = c. In general, we can dene a second
derivative (function) of f(x), f

: T R, by f

(x
0
) =
d
dx
f

(x)

x=x0
, where T
is the set of all points x
0
on which f

(x) is dierentiable. Leibnizs notation for


the second derivative is
d
dx
f

(x) :=
d
dx
_
d
dx
f(x)
_
:=
d
2
dx
2
f(x).
Definition 6.1.10. Suppose that f : S R, where S R, and that I S is
a nondegenerate interval. We say that f(x) is concave upwards on I if for any
a, b I with a < b, we have that
f(a) +
f(b) f(a)
b a
(x a) f(x),
for all x [a, b]. Similarly, we say that f(x) is concave downwards on I if for
any a, b I with a < b, we have that
f(a) +
f(b) f(a)
b a
(x a) f(x),
for all x [a, b].
Concavity has a geometric interpretation.
Informal definition. We say that f(x) is concave upwards on an interval I
if for any a, b I with a < b, the line segment joining (a, f(a)) and (b, f(b))
lies above the graph of f(x) on the interval [a, b]. Similarly, we say that f(x)
is concave downwards on an interval I if for any a, b I with a < b, this line
segment lies below the graph of f(x) on the interval [a, b].
Concavity captures the notion of direction in which the curvature is accel-
erating. And intuitively, the second derivative is related to the acceleration,
just as the rst derivative is related to the velocity; we expect there to be a
relationship between second derivatives and concavity.
Theorem 6.1.11. Suppose that f : S R, where S R, and that I is an
interval such that I J S, where J is an open interval. If f(x) is twice
dierentiable at every x I, then we have the following.
1. If f

(x) 0 for all x I, then f(x) is concave upwards on I.


2. If f

(x) 0 for all x I, then f(x) is concave downwards on I.


101
Proof. We will prove the rst item only. The proof for the second item is very
similar and is left as an exercise.
Let a, b I be such that a < b. Let
h(x) = f(a) +
f(b) f(a)
b a
(x a),
and let H(x) = h(x) f(x). Note that H(a) := h(a) f(a) = f(a) f(a) = 0
and H(b) := h(b) f(b) = f(a) + f(b) f(a) f(b) = 0. We know that H(x)
is continuous on [a, b], and so it achieves its global maximum and minimum on
[a, b] by the extreme value theorem. Observe that H(x) is dierentiable on [a, b]
with H

(x) = h

(x) f

(x) =
f(b)f(a)
ba
f

(x). Now f(x) is continuous on


[a, b] (by theorem 5.0.5) and dierentiable on (a, b), and so by the mean value
theorem there exists a c (a, b) such that f

(c) =
f(b)f(a)
ba
. For this c, we have
that
H

(c) =
f(b) f(a)
b a
f

(c) = 0.
Note that H(x) is twice dierentiable at all x [a, b], with H

(x) = h

(x)
f

(x) = 0 f

(x) 0 for all x [a, b]. Now the decreasing function theorem
(non-increasing version) shows that H

(x) is non-increasing on [a, b] and since


H

(c) = 0, we know that H

(x) 0 on [a, c] and H

(x) 0 on [c, b]. The


increasing and decreasing function theorems (their non-strict versions) show
that H(x) is non-decreasing on [a, c] and non-increasing on [c, b], and hence the
global minima on [a, b] for H(x) occur at the endpoints x = a and x = b, where
H(a) = 0 and H(b) = 0. This shows that H(x) := h(x) f(x) 0 for all
x [a, b], and so
h(x) := f(a) +
f(b) f(a)
b a
(x a) f(x)
for all x [a, b], as required.
Definition 6.1.12. Suppose that f : S R, where S R, and that an open
interval (a, b) S contains some c S. We say that c is a point of inection
for f(x) if f(x) is continuous at x = c and either
1. f(x) is concave upwards on (a, c] and downwards on [c, b), or
2. f(x) is concave downwards on (a, c] and upwards on [c, b).
Points of inection are to the second derivative what local extrema are to
the rst derivative. If a function changes from increasing to decreasing, or from
decreasing to increasing, at a point x = c, then c is a local extremum. If a func-
tion changes from concave upwards to concave downwards, or from downwards
to upwards, at a point x = c, then c is a point of inection. Correspondingly,
there is an analogous theorem to theorem 5.4.2.
102
Theorem 6.1.13. Suppose f : S R, where S R, and that f(x) is dif-
ferentiable on some open interval (a, b) S and twice dierentiable at a point
c (a, b). If c is a point of inection for f(x), then f

(c) = 0.
Proof. This proof is omitted.
The converse of this theorem is false.
Example 6.6. Let f(x) = x
4
. Then f

(x) = 12x
2
. Clearly, f

(0) = 0, but
x = 0 is not a point of inectionit is in fact a global and local minimum for
f(x) on R. The whole f(x) is concave upwards on R.
6.2 LHopitals Rule
Sometimes, we run into limits where both the numerator and the denominator
tend to zero. In such cases, we have already seen in a previous section that we
do not know whether the quotient limit exists. This section explores a technique
of evaluating such limits; this technique works provided that the numerator and
the denominator are dierentiable at the point where the limit is evaluated. We
will rst begin with an important result.
6.2.1 Cauchys Mean Value Theorem
Theorem 6.2.1 [Cauchys Mean Value Theorem]. Suppose that f, g :
S R, where S R, and that [a, b] S is a nondegenerate interval. Suppose
also that f(x) and g(x) are continuous on [a, b] and dierentiable on (a, b) with
g

(x) ,= 0 for all x (a, b). Then g(a) ,= g(b) and there exists c (a, b) such
that
(6.2)
f(b) f(a)
g(b) g(a)
=
f

(c)
g

(c)
.
Proof. We will rst show that g(a) ,= g(b). Suppose to the contrary that g(a) =
g(b). Then by the mean value theorem, there exists a point d (a, b) such that
g

(d) =
g(b)g(a)
ba
= 0, a contradiction (g

(x) ,= 0 for all x (a, b)).


We will now verify equation 6.2. Dene h : S R by
h(x) =
f(b) f(a)
g(b) g(a)
[g(x) g(a)] (f(x) f(a)).
We know by theorem 4.2.13 that g(x) is continuous on [a, b]. By the product
and sum rules, we know that h(x) is dierentiable on (a, b). Also, we have that
h(a) :=
f(b) f(a)
g(b) g(a)
[g(a) g(a)] (f(a) f(a)) = 0 0 = 0,
103
and that
h(b) :=
f(b) f(a)
g(b) g(a)
[g(b)g(a)] (f(b)f(a)) = f(b)f(a)(f(b)f(a)) = 0.
Rolles theorem shows that there exists c (a, b) with h

(c) = 0. But
h

(x) =
_
f(b) f(a)
g(b) g(a)
_
g

(x) f

(x);
therefore,
0 = h

(c) =
_
f(b) f(a)
g(b) g(a)
_
g

(c) f

(c),
implying equation 6.2 (g

(c) ,= 0).
Note. If we let g(x) = x, then we obtain the mean value theorem as a special
case from Cauchys mean value theorem. Furthermore, Cauchys mean value
theorem has a geometric interpretation. A curve or path in R
2
is a function
F : I R
2
, where I R is an interval, of the form F(t) = (f(t), g(t)).
Cauchys mean value theorem then says this: The chord joining two points
P = (f(a), g(a)) and Q = (f(b), g(b)) on the path of F has the same slope as
the slope of the tangent line at some point R = (f(c), g(c)) between P and Q
on the path of F.
6.2.2 Indeterminate Forms
Definition 6.2.2. Suppose that f, g : S R, where S R, and that I S
is an open interval containing some a S. Suppose that g(x) ,= 0 for all x I
except possibly at x = a.
1. The limit lim
xa
f(x)
g(x)
is called an indeterminate form of type
0
0
if
lim
xa
f(x) = 0 = lim
xa
g(x).
2. The limit lim
xa
f(x)
g(x)
is called an indeterminate form of type

if
lim
xa
f(x) = = lim
xa
g(x).
Note that limits that are indeterminate forms may or may not exist. In many
cases, however, we are able to use lHopitals rule to evaluate indeterminate
forms. The
0
0
version of the rule is stated below.
Theorem 6.2.3 [LH opitals Rule, Version
0
0
]. Suppose that f, g : S R,
where S R, and that (a, b) S, where a and b are extended real numbers with
a < b. That is, a, b R, . Also suppose that f and g are dierentiable
on (a, b) and that both g(x) and g

(x) ,= 0 for all x (a, b).


104
1. Assume that lim
xa
+ f(x) = 0 = lim
xa
+ g(x). Then
i. if lim
xa
+
f

(x)
g

(x)
= L R, then lim
xa
+
f(x)
g(x)
= L.
ii. if lim
xa
+
f

(x)
g

(x)
= , then lim
xa
+
f(x)
g(x)
= .
2. Assume that lim
xb
f(x) = 0 = lim
xb
g(x). Then
i. if lim
xb

(x)
g

(x)
= L R, then lim
xb

f(x)
g(x)
= L.
ii. if lim
xb

(x)
g

(x)
= , then lim
xb

f(x)
g(x)
= .
Proof. We will only show (1)-(i), because the proofs for the other three state-
ments are very similar. They are left as exercises.
Assume that lim
xa
+
f

(x)
g

(x)
= L R. Let > 0. Then we can nd a b

R,
where b

< b, such that if a < x < b

, then

(x)
g

(x)
L

<

2
.
Let be any real number such that a < < b

. For each R that satises


a < < < b

, we have that f(x) and g(x) are both continuous on [, ] and


dierentiable on (, ). We also have that g

(x) ,= 0 for all x (, ). By


Cauchys mean value theorem, there exists a c

(, ) such that
f() f()
g() g()
=
f

(c

)
g

(c

)
.
Hence for all that satises a < < < b

, we have that
(6.3)

f() f()
g() g()
L

(c

)
g

(c

)
L

<

2
.
By proposition 4.1.13, lim
a
+
f()f()
g()g()
=
f()
g()
(since lim
a
+ f() = 0 =
lim
a
+ g()). Hence we can choose one
0
R with a <
0
< such that
(6.4)

f() f(
0
)
g() g(
0
)

f()
g()

<

2
.
Combining inequalities (6.3) and (6.4), we have that

f()
g()
L

f()
g()

f() f(
0
)
g() g(
0
)

f() f(
0
)
g() g(
0
)
L

<

2
+

2
= . (6.5)
But inequality (6.5) is true for every R that satises a < < b

, and so by
denition of one-sided limit,
lim
xa
+
f(x)
g(x)
= L,
as desired.
105
Here is the second half of lHopitals rule.
Theorem 6.2.4 [LH opitals Rule, Version

]. Suppose that f, g : S R,
where S R, and that (a, b) S, where a and b are extended real numbers with
a < b. That is, a, b R, . Also suppose that f and g are dierentiable
on (a, b) and that both g(x) and g

(x) ,= 0 for all x (a, b).


1. Assume that lim
xa
+ f(x) = = lim
xa
+ g(x). Then
i. if lim
xa
+
f

(x)
g

(x)
= L R, then lim
xa
+
f(x)
g(x)
= L.
ii. if lim
xa
+
f

(x)
g

(x)
= , then lim
xa
+
f(x)
g(x)
= .
2. Assume that lim
xb
f(x) = = lim
xb
g(x). Then
i. if lim
xb

(x)
g

(x)
= L R, then lim
xb

f(x)
g(x)
= L.
ii. if lim
xb

(x)
g

(x)
= , then lim
xb

f(x)
g(x)
= .
Proof. This proof is omitted.
Example 6.7. We will nd lim
x0
+
e
x
cos(x)
x
. Let f(x) = e
x
cos(x) and
g(x) = x. Then lim
x0
+ f(x) = e
0
cos(0) = 1 1 = 0 and lim
x0
+ g(x) = 0
by continuity. Also, g(x) = x ,= 0 and g

(x) = 1 ,= 0 for all x (0, ). We


can now apply lHopitals rule to f(x) and g(x). Now f

(x) = e
x
+ sin(x) and
g

(x) = 1; therefore,
lim
x0
+
f

(x)
g

(x)
= lim
x0
+
e
x
+ sin(x)
1
= lim
x0
+
e
x
+ sin(x) = 1 + 0 = 1
by continuity of e
x
and sin(x). LHopitals rule shows that lim
x0
+
e
x
cos(x)
x
=
1.
This is not surprising, because for x near 0, e
x
1 +x and cos(x) 1, and
so
e
x
cos(x)
x

1+x1
x
= 1.
Example 6.8. We will nd lim
x0
e
x
2
1
x
2
. As with the last example, we expect
the limit to be 1, since e
x
2
1 is approximately x
2
1 near zero. So let
f(x) = e
x
2
1, g(x) = x
2
. Then f

(x) = 2xe
x
2
and g

(x) = 2x. Note that


g(x), g

(x) ,= 0 for all x R 0. Since lim


x0
f(x) = 0 and lim
x0
g(x) = 0,
lHopitals rule shows that
lim
x0
f(x)
g(x)
= lim
x0
f

(x)
g

(x)
= lim
x0
2xe
x
2
2x
= lim
x0
e
x
2
= 1.
106
Example 6.9. We will nd lim
x
e
x
x
. Note that lim
x
e
x
= = lim
x
x,
and so the limit in question is of the indeterminate form

. Letting f(x) = e
x
and g(x) = x, we have that f

(x) = e
x
and g

(x) = 1 for all x > 0. Also, g(x)


and g

(x) is never 0 on (0, ). LHopitals rule shows that


lim
x
e
x
x
= lim
x
e
x
1
= .
The following is an example where lHopitals rule may need to be applied
more than once: a rst time to
f(x)
g(x)
to produce
f

(x)
g

(x)
, a second time to
f

(x)
g

(x)
to produce
f

(x)
g

(x)
, and so on, as long as the limits of these fractions keep being
indeterminate forms.
Example 6.10. We will nd lim
x
e
x
x
3
. Let f(x) = e
x
and g(x) = x
3
, then
lim
x
f(x) = = lim
x
g(x); this limit is of the indeterminate form

.
Dierentiating, we obtain f

(x) = e
x
and g

(x) = 3x
2
. The new limit
lim
x
f

(x)
g

(x)
= lim
x
e
x
3x
2
is still of the indeterminate form

, since lim
x
e
x
= = lim
x
3x
2
.
Dierentiating again, we obtain
lim
x
f

(x)
g

(x)
= lim
x
e
x
6x
,
which is still indeterminate. One last time:
lim
x
f

(x)
g

(x)
= lim
x
e
x
6
is now not indeterminate. Note that we must stop applying lHopitals rule at
this stage because the limit is no longer indeterminate. Since g(x), g

(x), g

(x),
and g

(x) never vanish on (0, ), lHopitals rule, applied three times, shows
that
lim
x
f(x)
g(x)
= lim
x
f

(x)
g

(x)
= lim
x
e
x
6
= .
LHopitals rule can be used to nd limits of the indeterminate form 0 .
The idea is simple; if lim
xa
f(x) = 0 and lim
xa
g(x) = , then lim
xa
1
g(x)
=
0 and so lim
xa
f(x)g(x) = lim
xa
f(x)
1
g(x)
becomes an indeterminate form
0
0
;
similarly, one could invert f(x), if its approach towards 0 is monotonic, to
obtain lim
xa
g(x)
1
f(x)
, a limit of the indeterminate form

.
107
Example 6.11. We will nd lim
x0
+ xln(x). First note that xln(x) =
ln(x)
1
x
for all x > 0, with lim
x0
+
1
x
= and lim
x0
+ ln(x) = . The limit
lim
x0
+
ln(x)
1
x
is of the indeterminate form

. Note that
1
x
and its derivative
1
x
2
is never zero on (0, ). The derivative of ln(x) is
1
x
. LHopitals rule
shows that
lim
x0
+
xln(x) = lim
x0
+
ln(x)
1
x
= lim
x0
+

1
x
1
x
2
= lim
x0
+
x = 0.
From the example above, we can now evaluate some limits of the indeter-
minate form 0
0
.
Example 6.12. We will nd lim
x0
+ x
x
. Note that x
x
=
_
e
ln(x)
_
x
= e
x ln(x)
for
all x (0, ). Extend the function f(x) := xln(x) to [0, ) by f(0) = 0. The
previous example shows that f(x) is continuous on [0, ); therefore, a one-sided
version of theorem 4.2.8 shows that e
x ln(x)
is continuous on [0, ). Hence
lim
x0
+
x
x
= lim
x0
+
e
x ln(x)
= e
0
= 1.
Example 6.13. We will nd lim
x0
+ x
sin(x)
. Note that x
sin(x)
=
_
e
ln(x)
_
sin(x)
=
e
sin(x) ln(x)
= e
sin(x)
x
x ln(x)
. Since lim
x0
+
sin(x)
x
= 1 by the fundamental trig
limit and lim
x0
+ xln(x) = 0, we can extend f(x) :=
sin(x)
x
xln(x) to [0, )
by f(0) = 1 0 = 0, making it continuous on [0, ). Hence e
sin(x)
x
x ln(x)
is
continuous on [0, ) (by a one-sided version of theorem 4.2.8). We now have
lim
x0
+
x
sin(x)
= lim
x0
+
e
sin(x)
x
x ln(x)
= e
10
= 1.
6.2.3 Fundamental Logarithmic Limit
In this section, we will show that ln(x) grows very, very slowly as x . More
precisely, we have the following result.
Remark 6.2.5 [Fundamental Logarithmic Limit]. lim
x
ln(x)
x
= 0.
Proof. Note that this is an indeterminate form of type

, since lim
x
ln(x) =
= lim
x
x. Also, x ,= 0 and
d
dx
x = 1 ,= 0 for all x > 0. Let f(x) = ln(x)
and g(x) = x; then f

(x) =
1
x
and g

(x) = 1 on (0, ). Note that


lim
x
f

(x)
g

(x)
= lim
x
1
x
1
= lim
x
1
x
= 0,
108
whence lHopitals rule shows that
lim
x
f(x)
g(x)
:= lim
x
ln(x)
x
= 0,
as claimed.
We will give an alternate proof of the fundamental logarithmic limit (or
simply fundamental log limit).
Proof. Note that for any y 1,
d
dy
ln(y) =
1
y
< 1 =
d
dy
y. By theorem 6.1.7,
ln(y) < y for all y (1, ). Hence if x 1, then
0
ln
_
x
1
2
_
x
1
2
1;
therefore, if x 1, then
0
ln
_
x
1
2
_
x
=
ln
_
x
1
2
_
x
1
2

1
x
1
2

1
x
1
2
.
Since lim
x
0 = 0 = lim
x
1
x
1
2
, the squeeze theorem for functions can be
extended to limits at to show that lim
x
ln

x
1
2

x
= 0. But
ln(x)
x
=
2 ln

x
1
2

x
;
theorem 4.1.7 can be extended to limits at to show that
lim
x
ln(x)
x
= 2 lim
x
ln
_
x
1
2
_
x
= 2 0 = 0,
as required.
Example 6.14. We will nd lim
x
x
10
10
x
1
5
. Note that
x
10
10
x
1
5
= 10
10
_
ln(x)
x
1
5
_
= 10
10
5
_
_
ln
_
x
1
5
_
x
1
5
_
_
,
and so
lim
x
x
10
10
x
1
5
= lim
x
10
10
5
_
_
ln
_
x
1
5
_
x
1
5
_
_
= 10
10
5 lim
x
ln
_
x
1
5
_
x
1
5
= 10
10
5 0 = 0.
This example illustrates that powers of x, regardless how large, become multi-
plicative contants when the logarithm is taken, and thus ln(x
p
) (p > 0), even
for large p, still grows very slowly.
109
Observation. If p, q > 0, then lim
x
ln(x
p
)
x
q
= 0.
Problem. Find lim
x0

e
x
3
1x
3

sin(x
2
)
cos(x
4
)1
.
This is one limit that we do not wish to evaluate using lHopitals rule. We
are not stuck, either, as the next section presents techniques that can be used
to evaluate limits without the aid of lHopitals rule.
110
Chapter 7
Taylor Polynomials and
Taylors Theorem
We conclude the course with special objects called Taylor polynomials. Some
functions can be approximated very well using these Taylor polynomials, and
so limits involving quotients of these functionssuch as the one that appears
at the end of the last sectionare very easy to evaluate.
7.1 Taylor Polynomials
Question: We know that if f(x) is dierentiable at x = a, then f(x) L
a
(x) :=
f(a)+f

(a)(xa) for x near a. Given an x, how large is the error [f(x)L


a
(x)[?
What factors aect the error?
We know that there are at least two factors aecting the error: the magnitude
[x a[ (i.e., the distance from a to x) and the sharpness of the curvature of
the graph near x = a (i.e., the size of [f

(x)[).
L
a
(x) has these two important properties:
1. f(a) = L
a
(a), and
2. f

(a) = L

a
(a).
It is easy to see that those two properties dene the linear approxima-
tion; that is, L
a
(x) = f(a) + f

(a)(x a) is the only polynomial of degree


one or less with those two properties. It does seem strange, however, that we
want to approximate generic curves with straight lines when we could possibly
do better with parabolas or higher degree polynomials, which, being curves,
should approximate other curves better. What properties do we want from
such polynomials? We want them to satisfy a set of properties generalized
from the two properties that L
a
(x) satisfy. Such a polynomial P
n,a
(x) :=
111
c
0
+c
1
(x a) +c
2
(x a)
2
+ +c
n
(x a)
n
should satisfy these properties:
P
n,a
(a) = f(a), (7.1)
P

n,a
(a) = f

(a),
P

n,a
(a) = f

(a),
P

n,a
(a) = f

(a),
.
.
.
P
(n)
n,a
(a) = f
(n)
(a).
Does such a polynomial exist?
For n = 1, the answer is yes: We have seen that L
a
(x) is the desired poly-
nomial. For n = 2, we require P
2,a
(x) := c
0
+ c
1
(x a) + c
2
(x a)
2
to be
such that P
2,a
(a) := c
0
= f(a), P

2,a
(a) = c
1
+ 2c
2
(a a) = c
1
= f

(a), and
P

2,a
(a) = 2c
2
= f

(a) c
2
=
f

(a)
2
. This means that
P
2,a
(x) := f(a) +f

(a)(x a) +
f

(a)
2
(x a)
2
is the only candidate polynomial of degree two or less to satisfy equations listed
in (7.1). On the other hand, one can easily verify that P
2,a
(x) dened above
does indeed satisfy (7.1); therefore, P
2,a
(x) is the unique polynomial of degree
two or less with the properties in (7.1).
In general,
P
n,a
(x) :=
n

k=0
f
(k)
(a)
k!
(x a)
k
is the unique polynomial of degree at most n such that the equalities in (7.1)
all hold. We make the following denition.
Definition 7.1.1. Suppose that f : S R where S R, and that I S is an
open interval containing some point a S. Suppose also that f(x) is n-times
dierentiable at x = a. Then the nth degree Taylor polynomial of f(x) centered
at x = a is the polynomial P
n,a
: R R dened by
P
n,a
(x) =
n

k=0
f
(k)
(a)
k!
(x a)
k
= f(a) +f

(a)(x a) +
f

(a)
2!
(x a)
2
+
f

(a)
3!
(x a)
3
+ +
f
(n)
(a)
n!
(x a)
n
.
Note that P
1,a
(x) = L
a
(x) is the linear approximation for f(x) centered at
x = a.
112
Example 7.1. Let f : R R be dened by f(x) = e
x
. Then f(x) is n-times
dierentiable at x = 0 for any n N. Thus the nth degree Taylor polynomials
of f(x) centered at x = a for n = 0, 1, . . . are
P
0,0
(x) = f(0) = 1,
P
1,0
(x) = f(0) +f

(0)(x 0) = 1 +x,
P
2,0
(x) = f(0) +f

(0)(x 0) +
f

(0)
2
(x 0)
2
= 1 +x +
x
2
2
,
.
.
.
P
n,0
(x) = f(0) +f

(0)(x 0) +
f

(0)
2
(x 0)
2
+
f
(n)
(0)
n!
(x 0)
n
= 1 +x +
x
2
2!
+
x
3
3!
+ +
x
n
n!
.
7.2 Taylors Theorem
Problem. If we use Taylor polynomials (centered at x = a) of a function f(x)
to approximate f(x) near x = a, how large is the error?
Definition 7.2.1. Suppose that f : S R, where S R, and that I S is an
open interval containing some point a S. Suppose also that f(x) is n-times
dierentiable at x = a. Let P
n,a
(x) be the nth degree Taylor polynomial of
f(x) centered at x = a. Then the nth degree Taylor remainder of f(x) centered
at x = a is the function R
n,a
: R R dened by
R
n,a
(x) = f(x) P
n,a
(x).
The quantity [R
n,a
(x)[ = [f(x) P
n,a
(x)[ is the error in using P
n,a
(x) to
approximate f(x). It turns out that there is something to say about the Taylor
remainder.
Theorem 7.2.2 [Taylors Theorem]. Suppose that f : S R, where S R,
and that I S is an open interval containing some a S. Suppose also that
f(x) is (n+1)-times dierentiable on I. Let R
n,a
(x) be the nth Taylor remainder
of f(x) centered at x = a. For each x I, there exists some c := c
x
I (strictly)
between x and a such that
R
n,a
(x) := f(x) P
n,a
(x)
=
f
(n+1)
(c)
(n + 1)!
(x a)
n+1
.
113
Proof. Let x I be such that x ,= a. Then there exists an M such that
R
n,a
(x) = f(x) P
n,a
(x) = M(x a)
n+1
Let
F(t) = f(t) +f

(t)(x t) +
f

(t)
2!
(x t)
2
+ +
f
(n)
n!
(x t)
n
+M(x t)
n+1
Notice that F(x) = f(x) = F(a). By the MVT, there exists some c between
x and a such that F

(c) = 0.
We have that
d
dt
(
f
(k)
(t)
k!
(x t)
k
) =
f
(k)
(t)
(k 1)!
(x t)
(k1)
+
f
(k+1)
(t)
k!
(x t)
k
.
It follows that
F

(t) =
f
(n+1)
(t)
n!
(x t)
n
M(n + 1)(x t)
n
.
This means that
0 = F

(c) =
f
(n+1)
(c)
n!
(x c)
n
M(n + 1)(x c)
n
.
Solving for M yields that
M =
f
(n+1)
(c)
(n + 1)!
exactly as desired.
Example 7.2. Let f(x) = e
x
, dened on R. Then f(x) is n-times dierentiable
on any open interval containing 0, for all n N. Recall that P
n,0
= 1 + x +
x
2
2!
+ +
x
n
n!
. Then
[R
n,0
(x)[ =

e
x

_
1 +x +
x
2
2!
+ +
x
n
n!
_

f
(n+1)
(c
x
)
(n + 1)!
x
n+1

,
for some c
x
strictly between x and 0.
Let n = 1 and x = 0.01. Now e
0.01
P
1,0
(0.01) = L
0
(0.01) = 1 + (0.01) =
1.01. But how large is the error? Taylors theorem shows that
[e
0.01
P
1,0
(0.01)[ =

(c)
2!
(0.01)
2

114
for some c satisfying 0 < c < 0.01. We know that 0 f

(c) = e
c
e < 3, and
therefore
[e
0.01
P
1,0
(0.01)[

3
2
(10
4
)

=
3
2
(10
4
).
How can we improve our accuracy of estimation? We increase n. Doing so
will make the denominator n! rather large, thereby shrinking the error. Using
P
4,0
(x) instead, we nd that
e
0.01
1 + (0.01) +
(0.01)
2
2!
+
(0.01)
3
3!
+
(0.01)
4
4!
with an error at most

f
(5)
(c)
5!
(0.01)
5

e
c
12
(10
11
)

<

3
12
(10
11
)

=
1
4
(10
11
).
Observation. For any x R, one can check that lim
n
|x|
n
n!
= 0. Note that
for any x R,
[e
x
P
n,0
(x)[ =

f
(n+1)
(c
x
)
(n + 1)!
x
n+1

for some c
x
between x and 0. As such, 0 e
cx
e
|x|
for any x R. Now let
x
0
be any real number. we have that
0 [e
x0
P
n,0
(x
0
)[ =

f
(n+1)
(c
x0
)
(n + 1)!
x
0
n+1

e
c
(n + 1)!
x
0
n+1

e
|x0|

[x
0
[
n+1
(n + 1)!
,
where c := c
x0
is between x
0
and 0. Since lim
n
0 = 0 = lim
n
e
|x0|

|x0|
n+1
(n+1)!
,
hence the squeeze theorem shows that
lim
n
[e
x0
P
n,0
(x
0
)[ = 0,
whence by remark 3.1.3,
lim
n
P
n,0
(x
0
) = lim
n
n

k=0
x
k
0
k!
:=

n=0
x
k
0
k!
= e
x0
.
This is true for any x
0
R, and so for all x R,
e
x
=

n=0
x
k
k!
.
115
Note. We have shown that the information encoded from the behaviour of the
function e
x
at a single point x = 0 describes the entire function e
x
globally.
This is a very special circumstance.
Example 7.3. Let f : R R be dened by
f(x) =
_
e

1
x
2
if x ,= 0,
0 if x = 0.
In this case, we can verify that f(x) is dierentiable on any open interval con-
taining x = 0 innitely many times and that f(0) = 0, f

(0) = 0, f

(0) = 0,
. . .; i.e., f
(n)
(0) = 0 for all n N 0. This shows that its nth degree Taylor
polynomial centered at x = 0 is always the 0 polynomial. Now, f(x) ,= 0 for
every x ,= 0, and so our Taylor polynomials have absolutely failed to give us
any information about f(x) whatsoever outside the point x = 0. This is the
worst case scenario, where the Taylor polynomials of f(x) centered at x = 0,
constructed from local, instantaneous information, give us no information about
f(x) globally.
Example 7.4. Let f : R R be dened by f(x) = sin(x). We have that
P
0,0
(x) = sin(0) = 0 and P
1,0
(x) = f(0) + f

(0)(x 0) = x. Since sin(x) is


dierentiable on any open interval containing 0 an innite number of times, we
have by Taylors theorem that for all x R,
[ sin(x) P
1,0
(x)[ = [ sin(x) x[ =

(c
x
)
2!
x
2

sin(c
x
)
2
x
2


x
2
2
,
for some c
x
between x and 0. Consider the quantity u
4
for any u R. Then for
any u R, we have that
[ sin(u
4
) u
4
[
(u
4
)
2
2
=
u
8
2
.
For values of u close to 0, we can see that this error term is very small.
Can we do any better than
u
8
2
?
Observation. Note that if f(x) = sin(x), then f

(x) = cos(x), f

(x) =
sin(x), f

(x) = cos(x), and f


(4)
(x) = sin(x). In general, we can show
by induction that
f
(k)
(x) =
_

_
sin(x) if k 0 (mod 4),
cos(x) if k 1 (mod 4),
sin(x) if k 2 (mod 4),
cos(x) if k 3 (mod 4).
116
for all k N. We now have, therefore, P
0,0
(x) = 0, P
1,0
(x) = x, P
2,0
(x) = x,
P
3,0
(x) = x
x
3
3!
, P
4,0
(x) = x
x
3
3!
, and P
5,0
(x) = x
x
3
3!
+
x
5
5!
, and so on.
Returning to our example above, we can now interpret the quantity [ sin(x) x[
as [ sin(x) P
2,0
(x)[, and not [ sin(x) P
1,0
(x)[. Hence the Taylor remainder
now becomes
[ sin(x) P
2,0
(x)[ = [ sin(x) x[ =

(c
x
)
3!
x
3

cos(c
x
)
6
x
3


x
3
6
,
for some c
x
between x and 0. Considering the quantity u
4
for u R again, we
now have
[ sin(u
4
) u
4
[
u
12
6
.
7.3 Big-O
Definition 7.3.1. Suppose that f, g : S R, where S R, and that I is
an open interval containing some a R such that I a S. We say that
f(x) is Big-O of g(x) as x a (notation: f(x) = O(g(x)) as x a or simply
f(x) = O(g(x)) if a is understood) if there exists an > 0 and an M > 0 such
that
[f(x)[ M[g(x)[
for all x (a , a +) a.
If f(x) is Big-O of g(x) as x a, then f(x) has order of magnitude that is
less than or equal to that of g(x) near x = a. In our applications below, we will
always use a = 0 and g(x) = x
n
for some n N.
Remark 7.3.2. Suppose f(x) = O(x
n
) for some n N. This implies that
M[x
n
[ f(x) M[x
n
[
on (, ) except possibly at x = 0. Since lim
x0
M[x
n
[ = 0 = lim
x0
M[x
n
[.
The squeeze theorem for functions shows that lim
x0
f(x) = 0. That is, every
function that is Big-O of x
n
as x 0 (for some n N) converges to 0 as x 0.
We denote this fact as
lim
x0
O(x
n
) = 0,
for all n N.
Definition 7.3.3. Suppose that f, g : S R, where S R, and that I is an
open interval containing some a R such that I a S. We write
f(x) = g(x) +O(h(x)) as x a
if
f(x) g(x) = O(h(x)) as x a.
We may drop the as x a part if a is understood.
117
The notation f(x) = g(x) +O(h(x)) makes clearer the fact that f(x) g(x)
near x = a with an error that is of order of magnitude at most h(x).
Example 7.5. Consider f(x) = sin(x) (dened on all of R) again. From Tay-
lors theorem, we get that if x [1, 1], there exists some c
x
between x and 0
such that
[ sin(x) P
1,0
(x)[ = [ sin(x) x[ =

(c)
2!
x
2

sin(c)
2
x
2


1
2
[x
2
[.
Hence, sin(x) x = O(x
2
), so that sin(x) = x + O(x
2
). In fact, as before, we
can interpret x as P
2,0
, and not P
1,0
, to get that sin(x) = x +O(x
3
). This is a
stronger statement because x
3
is an order of magnitude less than x
2
near x = 0.
Theorem 7.3.4. Suppose that f : S R, where S R, and that [1, 1] J
S, where J is an open interval. If f(x) is (n +1)-times dierentiable on J and
f
(n+1)
(x) is continuous on [1, 1], then f(x) = P
n,0
(x) +O(x
n+1
) as x 0.
Proof. By the extreme value theorem, f
(n+1)
(x) is bounded on [1, 1]; let M
be chosen so that [f
(n+1)
(x)[ M for all x [1, 1]. Taylors theorem implies
that for any x [1, 1], there exists a c
x
between x and 0 so that
[f(x) P
n,0
(x)[ =

f
(n+1)
(c
x
)
(n + 1)!
x
n+1

M
(n + 1)!
x
n+1

=
M
(n + 1)!
[x
n+1
[.
This shows that f(x) P
n,0
(x) = O(x
n+1
) as x 0 and the result follows.
Theorem 7.3.5 [Arithmetic of Big-O]. Suppose that f, g : S R, where
S R, and that an open interval I S contains 0. Assume that f(x) = O(x
n
)
and g(x) = O(x
m
) as x 0, for some m, n N. Then we have the following.
1. c(O(x
n
)) = O(x
n
). That is, (cf)(x) := c f(x) = O(x
n
).
2. O(x
n
) + O(x
m
) = O(x
k
), where k := minn, m. That is, f(x) g(x) =
O(x
k
).
3. O(x
n
)O(x
m
) = O(x
n+m
). That is, f(x)g(x) = O(x
n+m
).
4. If k n, then f(x) = O(x
k
).
5. If k n, then
1
x
k
O(x
n
) = O(x
nk
). That is,
f(x)
x
k
= O(x
nk
).
6. f(u
k
) = O(u
kn
). That is, we can simply substitute x = u
k
.
(Note: k N.)
Proof. This proof is left as an exercise.
118
It would be nice if some kind of converse to theorem 7.3.4 is true. That is,
if p(x) is a polynomial of degree n or less, and f(x) = p(x) + O(x
n+1
), must
p(x) = P
n,0
?
Lemma 7.3.6. If p(x) is a polynomial with degree n or less (n N 0), and
p(x) = O(x
n+1
), then p(x) = 0 identically.
Proof. Let Q(n) denote the statement that
if p(x) is a polynomial with degree n or less, and p(x) = O(x
n+1
),
then p(x) = 0 identically.
We will proceed by induction to show that Q(n) is true for all n N0. For
n = 0, p(x) = c
0
= O(x) would imply that c
0
= lim
x0
f(x) = 0 by continuity
of polynomials and remark 7.3.2. Hence p(x) = 0 identically.
For n = 1, p(x) = c
0
+c
1
x = O(x
2
) would imply that c
0
+0 = lim
x0
f(x) =
0 again by continuity and remark 7.3.2. Dividing p(x) = c
1
x by x, we obtain
q(x) :=
p(x)
x
= c
1
= O(x) (for all x ,= 0) by arithmetic of Big-O. We extend
q(x) continuously to x = 0 by dening q(0) = c
1
. Now q(x) is a polynomial of
degree zero and is Big-O of x, whence q(x) = 0 identically by the case n = 0.
Hence q(0) = c
1
= 0 and so p(x) = c
0
+c
1
x = 0 identically.
Suppose Q(k) is true for some k 1. Then let
p(x) := c
0
+c
1
x +c
2
x
2
+ +c
k
x
k
+c
k+1
x
k+1
be any polynomial with degree k + 1 or less and p(x) = O
x
k+2
. Then we have
that
c
0
+ 0 + 0 + + 0 + 0 = lim
x0
p(x) = 0
by continuity and remark 7.3.2. We divide p(x) by x to obtain, for all x ,= 0,
that
q(x) :=
p(x)
x
= c
1
+c
2
x+c
3
x
2
+ +c
k
x
k1
+c
k+1
x
k
= O(x
k+21
) = O(x
k+1
)
by arithmetic of Big-O. Since lim
x0
q(x) = c
1
, we can extend q(x) continuously
to x = 0 by dening q(0) = c
1
. Now q(x) is a polynomial of degree k or less
and is Big-O of x
k+1
, whence by the inductive hypothesis, q(x) = 0 identically.
Hence p(x) = xq(x) = 0 for all x ,= 0 and p(0) = 0 together prove Q(k + 1).
By the principle of mathematical induction, Q(n) is true for all n N
0.
Theorem 7.3.7. Suppose that f : S R, where S R, and that [1, 1] J
S, where J is an open interval. Assume that f(x) is (n+1)-times dierentiable
on J and f
(n+1)
(x) is continuous on [1, 1]. If p(x) is a polynomial of degree
n or less with
f(x) = p(x) +O(x
n+1
),
then p(x) = P
n,0
(x).
119
Proof. First note that by denition,
f(x) p(x) = O(x
n+1
)
Theorem 7.3.4 shows that f(x) = p(x) +O(x
n+1
), and so
f(x) P
n,0
(x) = O(x
n+1
).
By arithmetic of Big-O, we have that
[f(x) P
n,0
(x)] [f(x) p(x)] = O(x
n+1
)
(since n + 1 = minn + 1, n + 1). But
[f(x) P
n,0
(x)] [f(x) p(x)] = p(x) P
n,0
(x),
and so
p(x) P
n,0
(x) = O(x
n+1
),
whence lemma 7.3.6 shows that p(x) P
n,0
(x) = 0 identically and the result
follows.
Problem. Let f : R R be dened by
f(x) = x
2
(e
x
1) sin(x
2
).
Find f
(4)
(0) and f
(5)
(0).
Whereas a section ago we would be horried by the prospect of nding the
fourth and fth derivatives of f(x) at x = 0, theorem 7.3.7 and a bit of Big-O
arithmetic make this problem easy.
Solution. First, observe that sin(u) = u+O(u
3
), and so sin(x
2
) = x
2
+O(x
6
)
by arithmetic of Big-O. Next, observe that e
x
= 1 +x +O(x
2
), and so e
x
1 =
x +O(x
2
) by arithmetic of Big-O. We now compute using Big-O arithmetic:
f(x) = x
2
(e
x
1) sin(x
2
) = x
2
(x +O(x
2
))(x
2
+O(x
6
))
= (x
3
+O(x
4
))(x
2
+O(x
6
)) = x
5
+O(x
9
) +O(x
6
) +O(x
10
)
= x
5
+O(x
6
).
Theorem 7.3.7 shows that x
5
= P
5,0
(x), where
P
5,0
(x) := f(0) +f

(0)x +
f

(0)
2!
x
2
+
f

(0)
3!
x
3
+
f
(4)
(0)
4!
x
4
+
f
(5)
(0)
5!
x
5
.
Matching coecients, we get that 0 =
f
(4)
(0)
4!
and 1 =
f
(5)
(0)
5!
, whence f
(4)
(0) = 0
and f
(5)
(0) = 5!.
120
We can nally tackle problems of the following type.
Problem. Evaluate
lim
x0
x
2
sin(x
2
)(e
x
1)
(cos(x) 1)(sin
2
(x))(sin(2x))
.
Solution. Observe that cos(x) = 1
x
2
2
+ O(x
4
) and so cos(x) 1 =
x
2
2
+
O(x
4
); sin(u) = u + O(u
3
), and so sin(2x) = 2x + O(x
3
) and sin
2
(x) = (x +
O(x
3
))(x +O(x
3
)) = x
2
+O(x
4
) +O(x
6
) = x
2
+O(x
4
). Putting this together,
we have
(cos(x) 1)(sin
2
(x))(sin(2x)) =
_
x
2
2
+O(x
4
)
_
(x
2
+O(x
4
))(2x +O(x
3
))
=
_
x
4
2
+O(x
6
) +O(x
8
)
_
(2x +O(x
3
))
=
_
x
4
2
+O(x
6
)
_
(2x +O(x
3
))
= x
5
+O(x
7
) +O(x
9
)
= x
5
+O(x
7
).
Combining what we had from the last problem, namely
x
2
(e
x
1) sin(x
2
) = x
5
+O(x
6
),
we have by remark 7.3.2 that
lim
x0
x
2
sin(x
2
)(e
x
1)
(cos(x) 1)(sin
2
(x))(sin(2x))
= lim
x0
x
5
+O(x
6
)
x
5
+O(x
7
)
= lim
x0
1 +O(x)
1 +O(x
2
)
= lim
x0
1 + 0
1 + 0
= 1.
121

You might also like