CH 4

Chapter 4: Dierentiation
Marianito R. Rodrigo
1 Dierentiability and the derivative
Let us recall how we dened the derivative in elementary calculus. Suppose that
f : E , where E is an open interval containing x
0
. We dened the
derivative of f at x
0
to be the real number
f
(x
0
) = lim
h0
f (x
0
+ h) f (x
0
)
h
, (1)
provided the limit on the right-hand side exists. Now suppose that f : E
m
, where E
n
is now an open set containing x
0
. Then we cannot extend
(1) straightforwardly since h would necessarily be a vector in
n
for x
0
+ h to
make sense. But then we would be dividing by the vector h, which is undened
when m > 1. One way around this problem is to reformulate the denition of the
derivative in the single-variable case.
Dene a linear function T : such that
T(h) = f
(x
0
)h
for every h . Then (1) is equivalent to
lim
h0
f (x
0
+ h) f (x) T(h)
h
= 0
or
lim
|h|0
| f (x
0
+ h) f (x) T(h)|
|h|
= 0. (2)
Viewed in this way, we can alternatively say that a real-valued function of a real
variable is dierentiable at x
0
if we can nd a linear function T : such that
(2) holds. The main dierence is that (2) makes sense even when f is a vector-
valued function of a vector variable. In this case, however, we consider a linear
transformation instead of a linear function.
1
Denition 1.1. Let f : E
m
, where E
n
is an open set containing x
0
.
Then f is said to be dierentiable at x
0
if there exists a linear transformation T :
m
such that
lim
|h|0
| f (x
0
+ h) f (x
0
) T(h)|
|h|
= 0. (3)
We say that f is dierentiable on E if f is dierentiable at x
0
for every x
0
E.
Some remarks are in order here. Firstly, in (3) the norm in the numerator is
taken in
m
, while the norm in the denominator is taken in
n
. Secondly, x
0
is an
interior point of E since E is open. This implies that there exists r > 0 such that
B(x
0
; r) E. If h
n
is taken in such a way that 0 < |h| < r, then
|(x
0
+ h) x
0
| = |h| < r,
i.e., x
0
+ h B(x
0
; r) E. Therefore f is always dened at x
0
+ h provided that
|h| is small but positive.
The next proposition asserts the uniqueness of the linear transformation satis-
fying (3).
Proposition 1.2. Let f : E
m
, where E
n
0
.
Suppose that there exist linear transformations S :
n

m
and T :
n

m
such that
lim
|h|0
| f (x
0
+ h) f (x
0
) S (h)|
|h|
= 0
and
lim
|h|0
| f (x
0
+ h) f (x
0
) T(h)|
|h|
= 0.
Then S = T.
Proof. For h
n
{0}, we have by the Triangle Inequality that
0
|S (h) T(h)|
|h|
=
|S (h) [ f (x
0
+ h) f (x
0
)] + [ f (x
0
+ h) f (x
0
)] T(h)|
|h|
| f (x
0
+ h) f (x
0
) S (h)|
|h|
+
| f (x
0
+ h) f (x
0
) T(h)|
|h|
.
Taking the limit as |h| 0 and using the hypotheses on S and T, we obtain
lim
|h|0
|S (h) T(h)|
|h|
= 0.
2
Let x = (1/t)h, where t {0}. Then |h| = |tx| = |t||x| and therefore |h| 0
is equivalent to t 0. Thus, using the linearity of S and T,
0 = lim
t0
|S (tx) T(tx)|
|tx|
= lim
t0
|t||S (x) T(x)|
|t||x|
= lim
t0
|S (x) T(x)|
|x|
=
|S (x) T(x)|
|x|
,
where the last line holds since the function inside the limit is independent of |t|.
Then |S (x) T(x)| = 0, or S (x) = T(x) for every x
n
{0}. Moreover,
S (0) = 0 = T(0) since S and T are linear transformations. This proves that
S (x) = T(x) for every x
n
, i.e., S = T.
Due to this uniqueness result, we may denote T by Df (x
0
) and call it the
derivative of f at x
0
. We denote the matrix of the linear transformation Df (x
0
)
by f
(x
0
), and call it the Jacobian matrix of f at x
0
. This is also sometimes
denoted by J f (x
0
). We reiterate that Df (x
0
) is a linear transformation from
n
to
m
, while f
(x
0
) is an m n matrix. When m = n = 1, then f
(x
0
) is a 1 1
matrix whose single entry is the number which is denoted by f
(x
0
) in elementary
calculus.
Example 1.3. Fix c
m
. Let f :
n

m
be dened by f (x) = c for every
x
n
. Dene the linear transformation T(h) = 0
m
for every h
n
. Then
lim
|h|0
| f (x
0
+ h) f (x
0
) T(h)|
|h|
= lim
|h|0
|c c 0|
|h|
= 0.
By uniqueness of the derivative, Df (x
0
)(h) = 0 for every h
n
.
Example 1.4. Let f :
n

m
be a linear transformation. Dene T such that
T(h) = f (h) for every h
n
. It follows that T is also a linear transformation.
Then
lim
|h|0
| f (x
0
+ h) f (x
0
) T(h)|
|h|
= lim
|h|0
| f (x
0
) + f (h) f (x
0
) f (h)|
|h|
= 0.
0
)(h) = f (h) for every h
n
.
n
be the scalar eld dened by
f (x) = c, x = c
1
x
1
+ + c
n
x
n
,
3
where c = (c
1
, . . . , c
n
) (a xed n-vector) and x = (x
1
, . . . , x
n
). Using the properties
of the inner product, it is easily veried that f is a linear transformation. The
previous example then gives Df (x
0
)(h) = f (h) = c, h. For each j = 1, . . . , n,
Df (x
0
)(e
j
) = c, e
j
= c
j
.
Hence f
(x
0
) is the 1 n matrix
f
(x
0
) =
_
c
1
c
n
_
.
2
be the scalar eld dened by
f (x) = f (x
1
, x
2
) = x
2
1
x
2
, x = (x
1
, x
2
).
Suppose that x
0
= (1, 1), and take the linear transformation (verify!)
T(h) = T(h
1
, h
2
) = 2h
1
+ h
2
, h = (h
1
, h
2
).
Then
f (x
0
+ h) f (x
0
) T(h) = f (1 + h
1
, 1 + h
2
) f (1, 1) T(h
1
, h
2
)
= (1 + h
1
)
2
(1 + h
2
) 1 (2h
1
+ h
2
)
= h
2
1
+ 2h
1
h
2
+ h
2
1
h
2
.
Recall that
|h
1
| |h|, |h
2
| |h|.
Hence
| f (x
0
+ h) f (x
0
) T(h)| = |h
2
1
+ 2h
1
h
2
+ h
2
1
h
2
|
|h
1
|
2
+ 2|h
1
||h
2
| + |h
1
|
2
|h
2
|
|h|
2
+ 2|h|
2
+ |h|
3
,
so that
0
| f (x
0
+ h) f (x
0
) T(h)|
|h|
3|h| + |h|
2
.
Taking the limit as |h| 0 gives
lim
|h|0
| f (x
0
+ h) f (x
0
) T(h)|
|h|
= 0.
0
)(h) = Df (1, 1)(h
1
, h
2
) = 2h
1
+ h
2
. Since
Df (x
0
)(e
1
) = 2, Df (x
0
)(e
2
) = 1,
the Jacobian matrix of f at x
0
is f
(x
0
) = f
(1, 1) =
_
2 1
_
.
Finding the derivative of a function using the denition, as well as the matrix
associated with it, is dicult in general. Fortunately, we will nd an easier way
of doing it once we have studied the properties of the derivative.
4
2 Algebra of derivatives
In this section we look at the properties of the derivative. As to be expected, these
properties are generalizations of those obtained in elementary calculus.
Proposition 2.1. If f is dierentiable at x
0
, then it is continuous at x
0
.
Proof. Suppose that f is dierentiable at x
0
. Then
lim
|h|0
| f (x
0
+ h) f (x
0
) Df (x
0
)(h)|
|h|
= 0.
Also, since Df (x
0
) is a linear transformation, there exists K > 0 such that
|Df (x
0
)(h)| K|h|
for every h
n
. If h 0, then the Triangle Inequality yields
0 | f (x
0
+ h) f (x
0
)| = | f (x
0
+ h) f (x
0
) Df (x
0
)(h) + Df (x
0
)(h)|
| f (x
0
+ h) f (x
0
) Df (x
0
)(h)| + |Df (x
0
)(h)|
=
| f (x
0
+ h) f (x
0
) Df (x
0
)(h)|
|h|
|h| + |Df (x
0
)(h)|
| f (x
0
+ h) f (x
0
) Df (x
0
)(h)|
|h|
|h| + K|h|.
Taking the limit as |h| 0 shows that
lim
|h|0
| f (x
0
+ h) f (x
0
)| = 0,
i.e., f is continuous at x
0
.
Note that the converse is not always true, as we have seen in elementary cal-
culus. For example, the function f : dened by f (x) = |x| is continuous
at x
0
= 0 but is not dierentiable there.
m
and g : E
m
, where E
n
is an open
set containing x
0
, and suppose that t . If f and g are dierentiable at x
0
, then
f + g and t f are also dierentiable at x
0
. Furthermore,
D( f + g)(x
0
) = Df (x
0
) + Dg(x
0
) and D(t f )(x
0
) = tDf (x
0
).
Proof. If f and g are dierentiable at x
0
, then
lim
|h|0
| f (x
0
+ h) f (x
0
) S (h)|
|h|
= 0
5
and
lim
|h|0
|g(x
0
+ h) g(x
0
) T(h)|
|h|
= 0,
where
S = Df (x
0
) and T = Dg(x
0
).
We claim that S +T is the derivative of f +g and tS is the derivative of t f . Recall
that S +T and tS are linear transformations if S and T are. For h 0, we see that
0
|( f + g)(x
0
+ h) ( f + g)(x
0
) (S + T)(h)|
|h|
=
| f (x
0
+ h) f (x
0
) S (h) + g(x
0
+ h) g(x
0
) T(h)|
|h|
| f (x
0
+ h) f (x
0
) S (h)|
|h|
+
|g(x
0
+ h) g(x
0
) T(h)|
|h|
.
Taking the limit as |h| 0 gives
lim
|h|0
|( f + g)(x
0
+ h) ( f + g)(x
0
) (S + T)(h)|
|h|
= 0.
By uniqueness of the derivative, D( f + g)(x
0
) = S + T = Df (x
0
) + Dg(x
0
).
Similarly, for h 0,
0
|(t f )(x
0
+ h) (t f )(x
0
) (tS )(h)|
|h|
=
|t f (x
0
+ h) t f (x
0
) tS (h)|
|h|
= |t|
| f (x
0
+ h) f (x
0
) S (h)|
|h|
,
which tends to zero as |h| 0. Again, by uniqueness of the derivative, we con-
clude that D(t f )(x
0
) = tS = tDf (x
0
).
Theorem 2.3 (Chain Rule). Let E
n
and F
m
be open sets with x
0
E.
Let f : E
m
and g : F
p
be functions such that f (E) F. If f is
dierentiable at x
0
and g is dierentiable at f (x
0
), then the composition g f is
dierentiable at x
0
and
D(g f )(x
0
) = Dg( f (x
0
)) Df (x
0
).
We remark that in terms of matrices of linear transformations, this result says
that
(g f )
(x
0
) = g
( f (x
0
)) f
(x
0
).
When m = n = p = 1, we recover the usual Chain Rule of elementary calculus.
6
Proof. Let y
0
= f (x
0
). For notational simplicity we introduce
S = Df (x
0
) and T = Dg(y
0
).
For h
n
and k
m
, dene the remainder functions
R
f
(h) = f (x
0
+ h) f (x
0
) S (h),
R
g
(k) = g(y
0
+ k) g(y
0
) T(k).
Then
lim
|h|0
|R
f
(h)|
|h|
= 0 and lim
|k|0
|R
g
(h)|
|k|
= 0
since f is dierentiable at x
0
and g is dierentiable at y
0
. We see that
(g f )(x
0
+ h) = g( f (x
0
+ h)) = g( f (x
0
) + S (h) + R
f
(h)).
Taking
k = S (h) + R
f
(h)
and recalling that y
0
= f (x
0
), the denition of R
g
gives
(g f )(x
0
+ h) = g( f (x
0
) + S (h) + R
f
(h))
= g(y
0
+ k)
= g(y
0
) + T(k) + R
g
(k)
= g( f (x
0
)) + T(S (h) + R
f
(h)) + R
g
(S (h) + R
f
(h)).
Since T is linear, T(S (h) + R
f
(h)) = T(S (h)) + T(R
f
(h)). Hence
(g f )(x
0
+ h) = g( f (x
0
)) + R
g
(S (h) + R
f
(h)) + T(S (h)) + T(R
f
(h))
= (g f )(x
0
) + (T S )(h) + [T(R
f
(h)) + R
g
(S (h) + R
f
(h))].
If we dene R
g f
by
R
g f
(h) = T(R
f
(h)) + R
g
(S (h) + R
f
(h)),
then the above equation is equivalent to
(g f )(x
0
+ h) = (g f )(x
0
) + (T S )(h) + R
g f
(h).
Our task is to prove that
lim
|h|0
|R
g f
(h)|
|h|
= 0,
7
which would be equivalent to showing that
lim
|h|0
|(g f )(x
0
+ h) (g f )(x
0
) (T S )(h)|
|h|
= 0.
It would then follow from uniqueness of the derivative that
D(g f )(x
0
) = T S = Dg(y
0
) Df (x
0
) = Dg( f (x
0
)) Df (x
0
).
We observe that since S and T are linear, there exist positive numbers K and
L such that
|S (x)| K|x| and |T(x)| L|x|
for all x
n
. If h 0, then using the Triangle Inequality gives
|R
g f
(h)|
|h|

|T(R
f
(h))|
|h|
+
|R
g
(S (h) + R
f
(h))|
|h|
=
|T(R
f
(h))|
|h|
+
|R
g
(k)|
|h|
L|R
f
(h)|
|h|
+
|R
g
(k)|
|k|
|k|
|h|
.
Since f is dierentiable at x
0
, it is continuous at x
0
and
lim
|h|0
|k| = lim
|h|0
|S (h) + R
f
(h)| = lim
|h|0
| f (x
0
+ h) f (x
0
)| = 0.
Moreover, the Triangle Inequality again gives
|k|
|h|
=
|S (h) + R
f
(h)|
|h|
=
| f (x
0
+ h) f (x
0
)|
|h|
=
| f (x
0
+ h) f (x
0
) S (h) + S (h)|
|h|
| f (x
0
+ h) f (x
0
) S (h)|
|h|
+
|S (h)|
|h|
|R
f
(h)|
|h|
+ K.
Combining the upper bounds yields
0
|R
g f
(h)|
|h|

L|R
f
(h)|
|h|
+
|R
g
(k)|
|k|
_
|R
f
(h)|
|h|
+ K
_
.
8
Taking the limit as |h| 0 (which implies that |k| 0), we obtain
lim
|h|0
|R
g f
(h)|
|h|
= 0
as required.
m
be a vector eld, where E
n
is an open set
containing x
0
. Suppose that f has components
f (x) = ( f
1
(x), . . . , f
m
(x)),
where f
i
: E are scalar elds for all i = 1, . . . , m. Then f is dierentiable at
x
0
if and only if each f
i
for i = 1, . . . , m is dierentiable at x
0
, and
Df (x
0
)(h) = (Df
1
(x
0
)(h), . . . , Df
m
(x
0
)(h))
for every h
n
.
Proof. First, let us suppose that each f
i
for i = 1, . . . , m is dierentiable at x
0
.
Note that Df
i
(x
0
) is a linear transformation from
n
to . Let T :
n

m
be
the linear transformation whose value at each h
n
is
T(h) = (Df
1
(x
0
)(h), . . . , Df
m
(x
0
)(h)).
Then
f (x
0
+ h) f (x
0
) T(h)
= ( f
1
(x
0
+ h) f
1
(x
0
) Df
1
(x
0
)(h), . . . , f
m
(x
0
+ h) f
m
(x
0
) Df
m
(x
0
)(h)).
Before we proceed, note that
|x|
m
i=1
|x
i
|
for any x = (x
1
, . . . , x
m
)
m
. To prove this, we can show by induction on m that
m
i=1
x
2
i
=
m
i=1
|x
i
|
2
_
m
i=1
|x
i
|
_
_
2
.
Taking the square root of both sides yields
|x| =
_
_
m
i=1
x
2
i
_
_
1/2
i=1
|x
i
|.
9
Using the above bound,
0
| f (x
0
+ h) f (x
0
) T(h)|
|h|

m
i=1
| f
i
(x
0
+ h) f
i
(x
0
) Df
i
(x
0
)(h)|
|h|
.
Each of the terms on the right-hand side tends to zero as |h| 0 since each f
i
is
dierentiable at x
0
. Therefore
lim
|h|0
| f (x
0
+ h) f (x
0
) T(h)|
|h|
= 0,
i.e., f is dierentiable at x
0
.
On the other hand, suppose that f is dierentiable at x
0
. Then f
i
=
i
f ,
where
i
is a projection function, since
(
i
f )(x) =
i
( f (x)) =
i
( f
1
(x), . . . , f
m
(x)) = f
i
(x).
Since
i
is a linear transformation, it is dierentiable at f (x
0
). By the Chain
Rule, since f is dierentiable at x
0
and
i
is dierentiable at f (x
0
), it follows
that f
i
=
i
f is dierentiable at x
0
for all i = 1, . . . , m.
Lemma 2.5. Let (h, k), (x
0
, y
0
)
2
. We have the following results:
(i) If s :
2
is dened for every (x, y)
2
by
s(x, y) = x + y,
then
Ds(x
0
, y
0
)(h, k) = h + k,
i.e., Ds(x
0
, y
0
) = s.
(ii) If p :
2
is dened for every (x, y)
2
by
p(x, y) = xy,
then
Dp(x
0
, y
0
)(h, k) = y
0
h + x
0
k.
Proof. Part (i) is a special case of Example 1.5 since
s(x, y) = (1, 1), (x, y).
Then
Ds(x
0
, y
0
)(h, k) = (1, 1), (h, k) = h + k = s(x
0
, y
0
)(h, k).
10
For (ii), we see that
| p(x
0
+ h, y
0
+ k) p(x
0
, y
0
) (y
0
h + x
0
k)|
|(h, k)|
=
|(x
0
+ h)(y
0
+ k) x
0
y
0
(y
0
h + x
0
k)|
|(h, k)|
=
|hk|
|(h, k)|
.
But
|hk|
_
_
|h|
2
if |k| |h|,
|k|
2
if |h| |k|.
In either case, |hk| |h|
2
+ |k|
2
. Thus
|hk|
|(h, k)|

h
2
+ k
2
h
2
+ k
2
=
h
2
+ k
2
and
0
| p(x
0
+ h, y
0
+ k) p(x
0
, y
0
) (y
0
h + x
0
k)|
|(h, k)|

h
2
+ k
2
.
Taking the limit as

h
2
+ k
2
0 implies that
lim
|(h,k)|0
| p(x
0
+ h, y
0
+ k) p(x
0
, y
0
) (y
0
h + x
0
k)|
|(h, k)|
= 0,
i.e.,
Dp(x
0
, y
0
)(h, k) = y
0
h + x
0
k.
The last set of derivative properties are valid for scalar elds. We include the
case of the derivative of a sum here for completeness, but it is also valid for vector
elds (see Proposition 2.2).
Proposition 2.6. Let f : E and g : E , where E
n
is an open
set containing x
0
. If f and g are dierentiable at x
0
, then f + g, f g, and f /g (if
g(x
0
) 0) are also dierentiable at x
0
. Furthermore,
D( f + g)(x
0
) = Df (x
0
) + Dg(x
0
),
D( f g)(x
0
) = g(x
0
)Df (x
0
) + f (x
0
)Dg(x
0
),
and
D
_
f
g
_
(x
0
) =
g(x
0
)Df (x
0
) f (x
0
)Dg(x
0
)
g(x
0
)
2
.
11
Proof. We use the previous lemma here. To prove the rst result, we note that
f + g = s ( f , g).
Then the Chain Rule gives
D( f + g)(x
0
) = Ds( f (x
0
), g(x
0
)) D( f , g)(x
0
)
= s (Df (x
0
), Dg(x
0
))
= Df (x
0
) + Dg(x
0
).
For the second result, we see that
f g = p ( f , g).
Again, the Chain Rule implies that
D( f g)(x
0
) = Dp( f (x
0
), g(x
0
)) D( f , g)(x
0
)
= Dp( f (x
0
), g(x
0
)) (Df (x
0
), Dg(x
0
))
= Dp( f (x
0
), g(x
0
))(Df (x
0
), Dg(x
0
))
= g(x
0
)Df (x
0
) + f (x
0
)Dg(x
0
).
Finally, the third result is partially proved as follows. Let
h =
f
g
or f = gh.
Applying the second result,
Df (x
0
) = g(x
0
)Dh(x
0
) + h(x
0
)Dg(x
0
).
But h(x
0
) = f (x
0
)/g(x
0
), so
D
_
f
g
_
(x
0
) = Dh(x
0
) =
1
g(x
0
)
_
Df (x
0
)
f (x
0
)
g(x
0
)
Dg(x
0
)
_
=
g(x
0
)Df (x
0
) f (x
0
)Dg(x
0
)
g(x
0
)
2
.
This is only a partial proof since we implicitly assumed that h = f /g was already
shown to be dierentiable at x
0
.
12
3 Directional derivatives and partial derivatives
Here we introduce the concepts of the directional derivative and the partial
derivative, the latter being a special case of the former. We shall see that the
calculation of the Jacobian matrix f
(x
0
) is facilitated by calculating the partial
derivatives of f at x
0
.
Denition 3.1. Let f : E , where E
n
0
. Let
v
n
. If
lim
t0
f (x
0
+ tv) f (x
0
)
t
exists, then we call it the directional derivative of f at x
0
in the direction of v and
denote it by Df (x
0
; v).
Note that since f is a scalar eld, the above limit is just the ordinary limit of a
real-valued function of the real variable t. Also, Df (x
0
; v) is a real number, unlike
Df (x
0
) (a linear transformation) or f
(x
0
) (a matrix). Some authors require that v
be a unit vector, i.e., |v| = 1, but we do not need to do so here.
Example 3.2. Suppose that f :
n
is a linear transformation. Then
f (x
0
+ tv) f (x
0
)
t
=
f (x
0
) + t f (v) f (x
0
)
t
= f (v).
Hence for every v
n
,
Df (x
0
; v) = lim
t0
f (x
0
+ tv) f (x
0
)
t
= lim
t0
f (v) = f (v).
n
be dened by
f (x) =
1
2
|x|
2
.
Find Df (e
i
; e
j
). Note that if x = (x
1
, . . . , x
n
), then this function is just
f (x) = f (x
1
, . . . , x
n
) =
1
2
(x
2
1
+ + x
2
n
).
For every x
0
, v
n
,
f (x
0
+ tv) f (x
0
)
t
=
|x
0
+ tv|
2
|x
0
|
2
2t
=
x
0
+ tv, x
0
+ tv x
0
, x
0
2t
=
x
0
, x
0
+ 2tx
0
, v + t
2
v, v x
0
, x
0
2t
= x
0
, v +
1
2
t|v|
2
.
13
Hence
Df (x
0
; v) = lim
t0
f (x
0
+ tv) f (x
0
)
t
= x
0
, v
and
Df (e
i
; e
j
) = e
i
, e
j
=
_
_
0 if i j,
1 if i = j.
2
be dened for all (x, y)
2
by
f (x, y) = e
xy
.
Find the directional derivative of f at (1, 1) in the direction of (1, 1).
For any (x
0
, y
0
), (a, b)
2
,
f (x
0
+ at, y
0
+ bt) f (x
0
, y
0
)
t
=
e
(x
0
+at)(y
0
+bt)
e
x
0
y
0
t
.
But
lim
t0
e
(x
0
+at)(y
0
+bt)
e
x
0
y
0
t
= lim
t0
e
(x
0
+at)(y
0
+bt)
(ay
0
+ bx
0
+ 2abt)
= (ay
0
+ bx
0
)e
x
0
y
0
using LH opitals Rule. Therefore
Df ((x
0
, y
0
); (a, b)) = (ay
0
+ bx
0
)e
x
0
y
0
.
In particular,
Df ((1, 1); (1, 1)) = (1)(1) + (1)(1) = 0.
If v = e
i
, then we call Df (x
0
; e
i
) the ith partial derivative of f at x
0
and we
write
D
i
f (x
0
) = Df (x
0
; e
i
).
Other notations for the ith partial derivative of f at x
0
are
f
x
i
(x
0
) and f
x
i
(x
0
),
where x = (x
1
, . . . , x
n
) and f (x) = f (x
1
, . . . , x
n
).
Suppose that D
i
f (x
0
) exists for all x
0
belonging to an open set E
n
. Then
we can dene a function D
i
f : E whose value is D
i
f (x) at each x E. In
other words, D
i
f (x
0
) is a real number for a xed x
0
, but if x
0
is allowed to vary
over E, then D
i
f becomes a function.
14
For each x = (x
1
, . . . , x
n
) E,
D
i
f (x) = lim
t0
f (x + te
i
) f (x)
t
= lim
t0
f (x
1
, . . . , x
i
+ t, . . . , x
n
) f (x
1
, . . . , x
i
, . . . , x
n
)
t
,
which is the ordinary derivative of a real-valued function g dened by
g(x
i
) = f (x
1
, . . . , x
i
, . . . , x
n
).
Thus the problem of nding the ith partial derivative of f reduces to that of nd-
ing the ordinary derivative of f with respect to x
i
while keeping the other vari-
ables x
1
, . . . , x
i1
, x
i+1
, . . . , x
n
xed.
Example 3.5. Let
f (x, y, z) = e
x
2
y
2
cos z
for every (x, y, z)
3
. Find D
1
f , D
2
f , and D
3
f at (1, 2, /2).
We have
D
1
f (x, y, z) = 2xe
x
2
y
2
cos z,
D
2
f (x, y, z) = 2ye
x
2
y
2
cos z,
D
3
f (x, y, z) = e
x
2
y
2
sin z.
Then D
1
f (1, 2, /2) = 0, D
2
f (1, 2, /2) = 0, and D
3
f (1, 2, /2) = e
3
.
Example 3.6. Verify that the scalar eld
u(x, y) = log(e
x/2
+ e
y/3
)
satises
2
u
x
+ 3
u
y
= 1
for all (x, y)
2
. This is an example of a partial dierential equation. We
remark that it is a standard convention to suppress the arguments in the partial
derivatives when writing a partial dierential equation, e.g.,
u
x
instead of
u
x
(x, y).
Keeping y xed and applying the ordinary Chain Rule in the variable x gives
u
x
=
1
2
e
x/2
e
x/2
+ e
y/3
.
15
Similarly, keeping x xed and applying the ordinary Chain Rule in the variable y
gives
u
y
=
1
3
e
y/3
e
x/2
+ e
y/3
.
Hence
2
u
x
+ 3
u
y
=
e
x/2
e
x/2
+ e
y/3
+
e
y/3
e
x/2
+ e
y/3
=
e
x/2
+ e
y/3
e
x/2
+ e
y/3
= 1
for all (x, y)
2
.
Example 3.7. Verify that the scalar eld
w(x, y, z) =
x
2
+ y
2
+ z
2
satises the partial dierential equation
w
2
x
+ w
2
y
+ w
2
z
= 1
for all (x, y, z)
3
{(0, 0, 0)}.
It is clear that
w
x
=
2x
2
x
2
+ y
2
+ z
2
=
x
x
2
+ y
2
+ z
2
,
w
y
=
2y
2
x
2
+ y
2
+ z
2
=
y
x
2
+ y
2
+ z
2
,
w
z
=
2z
2
x
2
+ y
2
+ z
2
=
z
x
2
+ y
2
+ z
2
.
Therefore
w
2
x
+ w
2
y
+ w
2
z
=
x
2
x
2
+ y
2
+ z
2
+
y
2
x
2
+ y
2
+ z
2
+
z
2
x
2
+ y
2
+ z
2
= 1
for all (x, y, z)
3
{(0, 0, 0)}.
We have seen that if f is dierentiable at x
0
, then it is continuous at x
0
. It
is natural to ask whether the same result holds for partial derivatives, namely,
if D
1
f (x
0
), . . . , D
n
f (x
0
) exist, is it necessarily true that f is continuous at x
0
?
Consider f :
2
dened by
f (x, y) =
_
_
xy
x
2
+y
2
if (x, y) (0, 0),
0 if (x, y) = (0, 0).
(4)
16
Let us take (x
0
, y
0
) = (0, 0) and calculate D
1
f (0, 0) and D
2
f (0, 0). By denition,
D
1
f (0, 0) = Df ((0, 0); e
1
)
= lim
t0
f (0 + t, 0) f (0, 0)
t
= lim
t0
f (t, 0)
t
= 0
and
D
2
f (0, 0) = Df ((0, 0); e
2
)
= lim
t0
f (0, 0 + t) f (0, 0)
t
= lim
t0
f (0, t)
t
= 0.
Note that for f (t, 0) we have used the rst part of the denition for f since t
0. Hence both partial derivatives exist at (0, 0). However, we have seen in the
previous chapter that this function is not continuous at (0, 0). Thus we should not
use the partial derivative as our generalization of the usual derivative since it does
not satisfy the analogous condition that dierentiability implies continuity. In this
sense, partial dierentiability is weaker than dierentiability as dened in terms
of linear transformations.
4 Calculation of the derivative via partial deriva-
tives
Although directional derivatives (hence partial derivatives as well) do not have
some desired properties that we want for a derivative (e.g., dierentiability implies
continuity), they are still useful since we can express the Jacobian matrix f
(x
0
)
in terms of the partial derivatives of the component functions that make up the
function f . Showing this result is one of the main goals of this section.
Theorem 4.1 (Necessary condition for dierentiability). The following results
hold for scalar and vector elds:
(i) Let f : E be a scalar eld, where E
n
is an open set containing
x
0
. Assume that f is dierentiable at x
0
, with derivative Df (x
0
). Then the
directional derivative Df (x
0
; v) exists for every v
n
and
Df (x
0
)(v) = Df (x
0
; v).
17
In fact,
Df (x
0
)(v) = f (x
0
), v,
where f (x
0
) is the gradient of f at x
0
and is dened by the n-vector
f (x
0
) = (D
1
f (x
0
), . . . , D
n
f (x
0
)).
Moreover, f
(x
0
) is the 1 n matrix
f
(x
0
) =
_
D
1
f (x
0
) D
2
f (x
0
) D
n
f (x
0
)
_
.
(ii) Let f : E
m
be a vector eld, where E
n
is an open set containing
x
0
. Suppose that f is of the form
f (x) = ( f
1
(x), . . . , f
m
(x))
for every x in E, where f
i
: E are scalar elds for all i = 1, . . . , m.
Assume that f is dierentiable at x
0
, with derivative Df (x
0
). Then for all
i = 1, . . . , m and for all v
n
, the directional derivative Df
i
(x
0
; v) exists
and
Df (x
0
)(v) = (Df
1
(x
0
; v), . . . , Df
m
(x
0
; v)).
In fact,
Df (x
0
)(v) = (f
1
(x
0
), v, . . . , f
m
(x
0
), v)
and f
(x
0
) is the m n matrix
f
(x
0
) =
_
_
D
1
f
1
(x
0
) D
2
f
1
(x
0
) D
n
f
1
(x
0
)
D
1
f
2
(x
0
) D
2
f
2
(x
0
) D
n
f
2
(x
0
)
.
.
.
.
.
.
.
.
.
D
1
f
m
(x
0
) D
2
f
m
(x
0
) D
n
f
m
(x
0
)
_
_
. (5)
The gradient of f is also denoted by grad f . The expression f (x
0
) is read as
del f at x
0
or grad f at x
0
.
Proof. We rst consider (i). Suppose that v = 0. Then
Df (x
0
; 0) = lim
t0
f (x
0
) f (x
0
)
t
= 0.
On the other hand, Df (x
0
) is a linear transformation; hence Df (x
0
)(0) = 0. There-
fore
Df (x
0
)(0) = Df (x
0
; 0).
18
Now suppose that v 0. Since f is dierentiable at x
0
, then
lim
|h|0
|R
f
(h)|
|h|
= 0,
where
R
f
(h) = f (x
0
+ h) f (x
0
) Df (x
0
).
Let h = tv, where t 0. Then |h| = |t||v| and |h| 0 is equivalent to t 0.
Moreover,
f (x
0
+ tv) = f (x
0
) + Df (x
0
)(tv) + R
f
(tv) = f (x
0
) + tDf (x
0
)(v) + R
f
(tv)
since Df (x
0
) is a linear transformation. Then
f (x
0
+ tv) f (x
0
)
t
Df (x
0
)(v)
=
|R
f
(tv)|
|t|
=
|R
f
(tv)|
|t||v|
|v| =
|R
f
(h)|
|h|
|v|
and
lim
t0
f (x
0
+ tv) f (x
0
)
t
Df (x
0
)(v)
= lim
t0
|R
f
(h)|
|h|
|v| = |v| lim
|h|0
|R
f
(h)|
|h|
= 0.
This implies that
lim
t0
_
f (x
0
+ tv) f (x
0
)
t
Df (x
0
)(v)
_
= 0
or
Df (x
0
; v) = lim
t0
f (x
0
+ tv) f (x
0
)
t
= Df (x
0
)(v)
for all v 0.
If v = v
1
e
1
+ + v
n
e
n
, then the linearity of Df (x
0
) implies that
Df (x
0
)(v) = Df (x
0
)(v
1
e
1
+ + v
n
e
n
)
= v
1
Df (x
0
)(e
1
) + + v
n
Df (x
0
)(e
n
)
= D
1
f (x
0
)v
1
+ + D
n
f (x
0
)v
n
= f (x
0
), v.
For all j = 1, . . . , n,
Df (x
0
)(e
j
) = f (x
0
), e
j
= D
j
f (x
0
).
Hence
f
(x
0
) =
_
D
1
f (x
0
) D
2
f (x
0
) D
n
f (x
0
)
_
.
19
To prove (ii), suppose that f is dierentiable at x
0
. By Proposition 2.4, each f
i
is dierentiable at x
0
and
Df (x
0
)(v) = (Df
1
(x
0
)(v), . . . , Df
m
(x
0
)(v))
for every v
n
. Since each f
i
is a dierentiable scalar eld, we see from Part (i)
that Df
i
(x
0
; v) exists for all v
n
. In particular, taking v = e
j
for j = 1, . . . , n
implies that the partial derivatives D
j
f
i
(x
0
) exist for all i = 1, . . . , m and for all
j = 1, . . . , n. Also from Part (i) we see that
Df
i
(x
0
)(v) = Df
i
(x
0
; v) = f
i
(x
0
), v
for all i = 1, . . . , m and for all v
n
. Then
Df (x
0
)(v) = (Df
1
(x
0
; v), . . . , Df
m
(x
0
; v))
= (f
1
(x
0
), v, . . . , f
m
(x
0
), v).
Finally, for each j = 1, . . . , n,
Df (x
0
)(e
j
) = (f
1
(x
0
), e
j
, . . . , f
m
(x
0
), e
j
)
= (D
j
f
1
(x
0
), . . . , D
j
f
m
(x
0
)),
and the components of this vector form the jth column of f
(x
0
); thus (5) follows.
Example 4.2. Let us revisit Example 1.6, where f :

2
is the scalar
eld f (x
1
, x
2
) = x
2
1
x
2
and (x
0
, y
0
) = (1, 1). We have
D
1
f (x
1
, x
2
) = 2x
1
x
2
, D
2
f (x
1
, x
2
) = x
2
1
;
hence
f (x
1
, x
2
) = (2x
1
x
2
, x
2
1
) and f (1, 1) = (2, 1).
For any v = (v
1
, v
2
)
2
,
Df (1, 1)(v) = Df (1, 1)(v
1
, v
2
) = f (1, 1), (v
1
, v
2
) = 2v
1
+ v
2
and
f
(1, 1) =
_
D
1
f (1, 1) D
2
f (1, 1)
_
=
_
2 1
_
.
2
2
be given by
f (x, y) = (e
3x+2y
, sin(2x + 3y))
for every (x, y)
2
. Find f
(0, ).
20
The component functions of f are
f
1
(x, y) = e
3x+2y
, f
2
(x, y) = sin(2x + 3y).
Then
D
1
f
1
(x, y) = 3e
3x+2y
, D
2
f
1
(x, y) = 2e
3x+2y
and
D
1
f
2
(x, y) = 2 cos(2x + 3y), D
2
f
2
(x, y) = 3 cos(2x + 3y).
Thus
f
(0, ) =
_
D
1
f
1
(0, ) D
2
f
1
(0, )
D
1
f
2
(0, ) D
2
f
2
(0, )
_
=
_
3e
2
2e
2
2 3
_
.
The last proposition shows that the directional derivative is linear in its second
argument.
Proposition 4.4. Let f : E be a scalar eld, where E
n
is an open set
containing x
0
. Let v, w
n
and t . Then
Df (x
0
; tv) = tDf (x
0
; v).
Moreover, if f is dierentiable at x
0
, then
Df (x
0
; v + w) = Df (x
0
; v) + Df (x
0
; w).
Proof. Without loss of generality, we may assume that t 0 since the equality
clearly holds when t = 0. We see that
Df (x
0
; tv) = lim
u0
f (x
0
+ u(tv)) f (x
0
)
u
= t lim
u0
f (x
0
+ (tu)v) f (x
0
)
tu
= t lim
s0
f (x
0
+ sv) f (x
0
)
s
(s = tu)
= tDf (x
0
; v).
If f is dierentiable at x
0
, then by Theorem 4.1,
Df (x
0
; v) = Df (x
0
)(v)
for every v
n
. Since Df (x
0
) is a linear transformation,
Df (x
0
)(v + w) = Df (x
0
)(v) + Df (x
0
)(w),
or
Df (x
0
; v + w) = Df (x
0
; v) + Df (x
0
; w).
21
5 Sucient condition for dierentiability
The main problem with the calculations in the previous two examples is that we
have implicitly assumed that f was dierentiable at x
0
. But in general given a
function f , we do not know a priori whether the function is dierentiable at x
0
, so
the conclusions of Theorem 4.1 cannot be inferred. Another way of looking at this
problemis as follows. Theorem4.1 says that dierentiability implies the existence
of all the partial derivatives. Does the existence of all the partial derivatives imply
the dierentiability of f ? The answer to this is No, as we can see from (4). The
partial derivatives of f exist at (0, 0) but it is not dierentiable there since it is not
continuous at (0, 0). However, we now prove that if all the partial derivatives of f
exist and are continuous, then f is dierentiable.
Denition 5.1. Let f : E
m
, where E
n
is open. Then f is said to be
continuously dierentiable (or of class C
1
) on E if f is continuous on E and the
partial derivatives D
j
f
i
(i = 1, . . . , m and j = 1, . . . , n) are also continuous on E.
In such case we write f C
1
(E).
Theorem5.2 (Sucient condition for dierentiability). Let f : E
m
, where
E
n
is open. If f C
1
(E), then f is dierentiable on E.
Proof. It suces to show the case when f is a scalar eld since the vector eld
case will follow from Proposition 2.4. We wish to prove that
lim
|h|0
| f (x
0
+ h) f (x
0
) f (x
0
), h|
|h|
= 0
for every x
0
E. By uniqueness of the derivative, it would follow that f is
dierentiable on E and Df (x
0
)(v) = f (x
0
), v for every v
n
.
Since E is open, there exists r > 0 such that B(x
0
; r) E. Let h = (h
1
, . . . , h
n
)
with 0 < |h| < r/n. Then
|(x
0
+ h) x
0
| = |h| <
r
n
r
and x
0
+ h B(x
0
; r) E. We can express h in terms of the standard basis as
h = h
1
e
1
+ + h
n
e
n
.
Let us construct a nite sequence of vectors
w
0
= 0, w
1
= h
1
e
1
, w
2
= h
1
e
1
+ h
2
e
2
, . . . , w
n
= h
1
e
1
+ + h
n
e
n
= h.
Then
w
j
= w
j1
+ h
j
e
j
(6)
22
for all j = 1, . . . , n. We saw above that x
0
+ w
n
= x
0
+ h B(x
0
; r); now we want
to verify that x
0
+ w
j
B(x
0
; r) for all j = 1, . . . , n. Since
w
j
= h
1
e
1
+ + h
j
e
j
,
and recalling that
|h
j
| |h|
for all j = 1, . . . , n, we obtain
|(x
0
+ w
j
) x
0
| = |w
j
| |h
1
||e
1
| + + |h
j
||e
j
| = |h
1
| + + |h
j
| j|h| n|h| < r.
This veries that x
0
+ w
j
B(x
0
; r) for all j = 1, . . . , n. This allows us to evaluate
f at each x
0
+ w
j
since f is dened on E and B(x
0
; r) E.
We express f (x
0
+ h) f (x
0
) as a telescoping sum and use (6) to give
f (x
0
+ h) f (x
0
) = f (x
0
+ w
n
) f (x
0
)
=
n
j=1
[ f (x
0
+ w
j
) f (x
0
+ w
j1
)]
=
n
j=1
[ f (x
0
+ w
j1
+ h
j
e
j
) f (x
0
+ w
j1
)].
For all t [0, 1] and j = 1, . . . , n, dene g
j
: [0, 1] by
g
j
(t) = f (x
0
+ w
j1
+ th
j
e
j
).
We claim that each g
j
is dierentiable (in the sense of elementary calculus) and
g
j
(t) = h
j
D
j
f (x
0
+ w
j1
+ th
j
e
j
).
We see from the denition of the usual derivative that
g
j
(t) = lim
s0
g
j
(t + s) g
j
(t)
s
= lim
s0
f (x
0
+ w
j1
+ (t + s)h
j
e
j
) f (x
0
+ w
j1
+ th
j
e
j
)
s
= lim
s0
f ((x
0
+ w
j1
+ th
j
e
j
) + sh
j
e
j
) f (x
0
+ w
j1
+ th
j
e
j
)
s
= Df (x
0
+ w
j1
+ th
j
e
j
; sh
j
e
j
)
= sh
j
Df (x
0
+ w
j1
+ th
j
e
j
; e
j
)
= sh
j
D
j
f (x
0
+ w
j1
+ th
j
e
j
),
23
where we used the fact that the directional derivative scales its second argument.
It is not dicult to see that
|w
j1
| = |h
1
e
1
+ + h
j1
e
j1
| |h
1
| + |h
j1
|,
so that
|(x
0
+ w
j1
+ th
j
e
j
) x
0
| = |w
j1
+ th
j
e
j
|
|h
1
| + + |h
j1
| + |h
j
|
j|h|
< r
since j n and |h| < r/n. Hence x
0
+ w
j1
+ th
j
e
j
B(x
0
; r) E and
D
j
f (x
0
+ w
j1
+ th
j
e
j
),
for all j = 1, . . . , n, exist given that the partial derivatives of f exist on E (being
continuous on E). This implies that g
j
(t) for all j = 1, . . . , n also exist, i.e., each
g
j
is dierentiable on [0, 1], thus proving the claim.
Hence, since g
j
is dierentiable on [0, 1] (implying that g
j
is continuous on
[0, 1]), we conclude from the Mean Value Theorem that there exists t
(0, 1)
such that
g
j
(t
) = g
j
(1) g
j
(0),
or
f (x
0
+ w
j1
+ h
j
e
j
) f (x
0
+ w
j1
) = h
j
D
j
f (x
0
+ w
j1
+ t
h
j
e
j
).
To nish o the proof, we deduce that
f (x
0
+ h) f (x
0
) f (x
0
), h =
n
j=1
[ f (x
0
+ w
j1
+ h
j
e
j
) f (x
0
+ w
j1
)]
j=1
h
j
D
j
f (x
0
)
=
n
j=1
h
j
[D
j
f (x
0
+ w
j1
+ t
h
j
e
j
) D
j
f (x
0
)].
The Triangle Inequality gives
| f (x
0
+ h) f (x
0
) f (x
0
), h|
n
j=1
|h
j
||D
j
f (x
0
+ w
j1
+ t
h
j
e
j
) D
j
f (x
0
)|
|h|
n
j=1
|D
j
f (x
0
+ w
j1
+ t
h
j
e
j
) D
j
f (x
0
)|.
24
If h 0, then we get
0
| f (x
0
+ h) f (x
0
) f (x
0
), h|
|h|

n
j=1
|D
j
f (x
0
+ w
j1
+ t
h
j
e
j
) D
j
f (x
0
)|.
Recalling that
|w
j
| |h
1
| + + |h
j
| j|h| n|h|
and denoting
v = w
j1
+ t
h
j
e
j
,
we have
|v| n|h| + t
|h
j
| (n + t
)|h|.
Thus |h| 0 implies that |v| 0 and
lim
|h|0
|D
j
f (x
0
+ w
j1
+ t
h
j
e
j
) D
j
f (x
0
)| = lim
|v|0
|D
j
f (x
0
+ v) D
j
f (x
0
)| = 0
since each D
j
f is continuous on E by assumption. Finally, we are able to arrive at
lim
|h|0
| f (x
0
+ h) f (x
0
) f (x
0
), h|
|h|
= 0,
proving that f is indeed dierentiable at x
0
for all x
0
E provided that f
C
1
(E).
6 Higher-order partial derivatives
Let f : E be a scalar eld, where E
n
is open. Suppose that D
i
f (x
0
)
exists for all x
0
E. We have seen that D
i
f denes a function from E to . Then
the jth partial derivative of D
i
f , provided it exists, is given by D
j
(D
i
f ). There
are several notations for this second-order partial derivative (also known as a
mixed partial derivative), namely,
D
j
(D
i
f ) = D
i, j
f = f
x
i
x
j
=
2
f
x
j
x
i
=
x
j
_
f
x
i
_
.
Note that the indices in the notations D
i, j
and f
x
i
x
j
are written in reverse. When
j = i, we write
2
f
x
2
i
instead of

2
f
x
i
x
i
.
Higher-order partial derivatives can also be considered.
25
Example 6.1. Let f (x, y) = x
2
y
3
for all (x, y)
2
. Then
D
1
f (x, y) = 2xy
3
, D
2
f (x, y) = 3x
2
y
2
.
The second-order partial derivatives are therefore
D
1,1
f (x, y) = 2y
3
, D
1,2
f (x, y) = 6xy
2
and
D
2,1
f (x, y) = 6xy
2
, D
2,2
f (x, y) = 6x
2
y.
A few third-order partial derivatives are
D
1,1,1
f (x, y) = 0, D
1,1,2
f (x, y) = 6y
2
, D
1,2,1
f (x, y) = 6y
2
, D
1,2,2
f (x, y) = 12xy.
Example 6.2. Find nonzero real numbers a and b such that
u(x, t) = cos at sin bx
satises the partial dierential equation
2
u
t
2
=
2
u
x
2
for all (x, t)
2
.
Substituting u gives
2
u
t
2

2
u
x
2
= a
2
cos at sin bx + b
2
cos at sin bx
= (b
2
a
2
) cos at sin bx.
If we take b = a, where a is any nonzero real number, then
2
u
t
2

2
u
x
2
= 0
for all (x, t)
2
.
In the previous examples we saw that D
j
(D
i
f ) = D
i
(D
j
f ), although this is
not always the case. The next theorem gives sucient conditions for the mixed
partial derivatives to be the same. Before stating and proving it, let us rst look at
a useful lemma.
Lemma 6.3. Let f : E be a scalar eld, where E
n
is an open set
containing v. Let w
n
and t such that v + tw E. Furthermore, suppose
that Df (v + tw; w) exists and dene
g(t) = f (v + tw).
Then
g
(t) = Df (v + tw; w).

26
Proof. We have
g(t + u) g(t)
u
=
f (v + tw + uw) f (v + tw)
u
Taking the limit as u 0 gives
g
(t) = Df (v + tw; w).
Theorem 6.4 (Equality of mixed partial derivatives). Let f : E be a

scalar eld, where E
n
is an open set and n 2. For xed i and j, with
i j, suppose that D
i
f , D
j
f , D
i, j
f , and D
j,i
f all exist on E. If D
i, j
f and D
j,i
f are
continuous on E, then
D
i, j
f (x
0
) = D
j,i
f (x
0
)
for every x
0
E.
Proof. Let x
0
E. Then there exists r > 0 such that B(x
0
; r) E since E is open.
Let h = (h
1
, . . . , h
n
) such that 0 < |h| < r/n. Then
|(x
0
+ h) x
0
| = |h| <
r
n
r,
implying that x
0
+ h B(x
0
; r) E. Note also that
2|h| <
2r
n
r or 2|h| < r
since n 2.
Fix i and j with i j. We can write
h = h
1
e
1
+ + h
n
e
n
.
Then
|(x
0
+ h
i
e
i
+ h
j
e
j
) x
0
| = |h
i
e
i
+ h
j
e
j
| |h
i
| + |h
j
| 2|h| < r.
This implies that x
0
+h
i
e
i
+h
j
e
j
B(x
0
; r) E. Similarly, we observe that x
0
+h
i
e
i
and x
0
+ h
j
e
j
both belong to B(x
0
; r) E. Thus the expression
= f (x
0
+ h
i
e
i
+ h
j
e
j
) f (x
0
+ h
i
e
i
) f (x
0
+ h
j
e
j
) + f (x
0
)
is well dened.
For s, t [0, 1], dene the functions
F(s; t) = f (x
0
+ sh
i
e
i
+ th
j
e
j
) f (x
0
+ sh
i
e
i
) (t xed)
27
and
G(t; s) = f (x
0
+ sh
i
e
i
+ th
j
e
j
) f (x
0
+ th
j
e
j
) (s xed).
It is easy to see that
F(1; 1) F(0; 1) = = G(1, 1) G(0; 1).
For a xed t [0, 1],
F(s + u; t) F(s; t)
u
=
f (x
0
+ sh
i
e
i
+ uh
i
e
i
+ th
j
e
j
) f (x
0
+ sh
i
e
i
+ uh
i
e
i
)
u
f (x
0
+ sh
i
e
i
+ th
j
e
j
) f (x
0
+ sh
i
e
i
)
u
=
f (x
0
+ sh
i
e
i
+ th
j
e
j
+ uh
i
e
i
) f (x
0
+ sh
i
e
i
+ th
j
e
j
)
u
f (x
0
+ sh
i
e
i
+ uh
i
e
i
) f (x
0
+ sh
i
e
i
)
u
.
Taking the limit as u 0,
F
(s; t) = Df (x
0
+ sh
i
e
i
+ th
j
e
j
; h
i
e
i
) Df (x
0
+ sh
i
e
i
; h
i
e
i
)
from the denition of the directional derivative. Using Proposition 4.4 and the
denition of the partial derivative, we can rewrite this as
F
(s; t) = h
i
Df (x
0
+ sh
i
e
i
+ th
j
e
j
; e
i
) h
i
Df (x
0
+ sh
i
e
i
; e
i
)
= h
i
[D
i
f (x
0
+ sh
i
e
i
+ th
j
e
j
) D
i
f (x
0
+ sh
i
e
i
)].
Since
|(x
0
+ sh
i
e
i
+ th
j
e
j
) x
0
| = |sh
i
e
i
+ th
j
e
j
| |s||h
i
| + |t||h
j
| |h
i
| + |h
j
| 2|h| < r
for any s, t [0, 1], we see that x
0
+sh
i
e
i
+th
j
e
j
and x
0
+sh
i
e
i
belong to B(x
0
; r)
E, where D
i
f exists by assumption. Thus F(; t) is dierentiable on [0, 1] for every
xed t [0, 1]. By the Mean Value Theorem, there exists s
1
= s
1
(t) (0, 1) such
that
F
(s
1
; t) = F(1; t) F(0; t).
Similarly, for a xed s [0, 1],
G(t + u; s) G(t; s)
u
=
f (x
0
+ sh
i
e
i
+ th
j
e
j
+ uh
j
e
j
) f (x
0
+ th
j
e
j
+ uh
j
e
j
)
u
f (x
0
+ sh
i
e
i
+ th
j
e
j
) f (x
0
+ th
j
e
j
)
u
=
f (x
0
+ sh
i
e
i
+ th
j
e
j
+ uh
j
e
j
) f (x
0
+ sh
i
e
i
+ th
j
e
j
)
u
f (x
0
+ th
j
e
j
+ uh
j
e
j
) f (x
0
+ th
j
e
j
)
u
.
28
Taking the limit as u 0,
G
(t; s) = Df (x
0
+ sh
i
e
i
+ th
j
e
j
; h
j
e
j
) Df (x
0
+ th
j
e
j
; h
j
e
j
)
from the denition of the directional derivative. Using Proposition 4.4 and the
denition of the partial derivative, we can rewrite this as
G
(t; s) = h
j
Df (x
0
+ sh
i
e
i
+ th
j
e
j
; e
j
) h
j
Df (x
0
+ th
j
e
j
; e
j
)
= h
j
[D
j
f (x
0
+ sh
i
e
i
+ th
j
e
j
) D
j
f (x
0
+ th
j
e
j
)].
As before, x
0
+sh
i
e
i
+th
j
e
j
and x
0
+th
j
e
j
belong to B(x
0
; r) E, where D
j
f exists
by assumption. Thus G(; s) is dierentiable on [0, 1] for every xed s [0, 1].
By the Mean Value Theorem, there exists t
1
= t
1
(s) (0, 1) such that
G
(t
1
; s) = G(1; s) G(0; s).
Combining the above results yields
F
(s
1
; 1) = = G
(t
1
; 1),
where
F
(s
1
; 1) = F(1; 1) F(0; 1) = h
i
[D
i
f (x
0
+ s
1
h
i
e
i
+ h
j
e
j
) D
i
f (x
0
+ s
1
h
i
e
i
)]
and
G
(t
1
; 1) = G(1; 1) G(0; 1) = h
j
[D
j
f (x
0
+ h
i
e
i
+ t
1
h
j
e
j
) D
j
f (x
0
+ t
1
h
j
e
j
)].
Now, for each u [0, 1], dene
F(u) = D
i
f (x
0
+ s
1
h
i
e
i
+ uh
j
e
j
) = D
i
f ((x
0
+ s
1
h
i
e
i
) + u(h
j
e
j
))
and
G(u) = D
j
f (x
0
+ uh
i
e
i
+ t
1
h
j
e
j
) = D
j
f ((x
0
+ t
1
h
j
e
j
) + u(h
i
e
i
)).
Then
F
(s
1
; 1) = h
i
[

F(1)

F(0)], G
(t
1
; 1) = h
j
[

G(1)

G(0)]. (7)
Lemma 6.3 and Proposition 4.4 give
(u) = D(D
i
f )((x
0
+ s
1
h
i
e
i
) + u(h
j
e
j
); h
j
e
j
)
= h
j
D(D
i
f )((x
0
+ s
1
h
i
e
i
) + u(h
j
e
j
); e
j
)
= h
j
D
j
(D
i
f )(x
0
+ s
1
h
i
e
i
+ uh
j
e
j
)
29
and
(u) = D(D
j
f )((x
0
+ t
1
h
j
e
j
) + u(h
i
e
i
); h
i
e
i
)
= h
i
D(D
j
f )((x
0
+ t
1
h
j
e
j
) + u(h
i
e
i
); e
i
)
= h
i
D
i
(D
j
f )((x
0
+ t
1
h
j
e
j
) + u(h
i
e
i
))
= h
i
D
i
(D
j
f )(x
0
+ uh
i
e
i
+ t
1
h
j
e
j
).
Then
|(x
0
+ s
1
h
i
e
i
+ uh
j
e
j
) x
0
| s
1
|h
i
| + u|h
j
| 2|h| < r
and
|(x
0
+ uh
i
e
i
+ t
1
h
j
e
j
) x
0
| u|h
1
| + t
1
|h
j
| 2|h| < r,
i.e., x
0
+ s
1
h
i
e
i
+ uh
j
e
j
and x
0
+ uh
1
e
1
+ t
1
h
j
e
j
both belong to B(x
0
; r) E, where
D
i, j
f and D
j,i
f both exist. This implies that

F and

G are both dierentiable on
[0, 1]. By the Mean Value Theorem there exist s
2
, t
2
(0, 1) such that
(s
2
) =

F(1)

F(0),

G
(t
2
) =

G(1)

G(0).
Recalling that F
(s
1
; 1) = = G
(t
1
; 1), we obtain from (7) that
h
i
[

F(1)

F(0)] = h
j
[

G(1)

G(0)]
or
h
i

F
(s
2
) = h
j

G
(t
2
).
But this is the same as
h
i
h
j
D
j
(D
i
f )(x
0
+ s
1
h
i
e
i
+ s
2
h
j
e
j
) = h
j
h
i
D
i
(D
j
f )(x
0
+ t
2
h
i
e
i
+ t
1
h
j
e
j
)
or
D
j
(D
i
f )(x
0
+ s
1
h
i
e
i
+ s
2
h
j
e
j
) = D
i
(D
j
f )(x
0
+ t
2
h
i
e
i
+ t
1
h
j
e
j
).
Note that
0 |s
1
h
i
e
i
+ s
2
h
j
e
j
| s
1
|h
i
| + s
2
|h
j
| |h
i
| + |h
j
| 2|h|
and
0 |t
2
h
i
e
i
+ t
1
h
j
e
j
| t
2
|h
i
| + t
1
|h
j
| |h
i
| + |h
j
| 2|h|.
As |h| 0,
|s
1
h
i
e
i
+ s
2
h
j
e
j
| 0 and |t
2
h
i
e
i
+ t
1
h
j
e
j
| 0.
Then the continuity of D
i, j
f and D
j,i
f at x
0
implies that
D
j
(D
i
f )(x
0
) = D
i
(D
j
f )(x
0
).
30
7 Weak form of the Chain Rule
The next result, sometimes called the Chain Rule, is actually a weaker form in the
sense that the partial derivatives are assumed to be continuously dierentiable.
Recall that in the actual Chain Rule we only require the function to be dieren-
tiable. Nevertheless, this weaker version is one of the most useful results in the
calculus of several variables.
Theorem 7.1 (Weak form of the Chain Rule). Suppose that g
i
:
n
for
all i = 1, . . . , m are continuously dierentiable at x
0

n
. Let f :
m
be
dierentiable at (g
1
(x
0
), . . . , g
m
(x
0
)). Dene the function h :
n
by
h(x) = f (g
1
(x), . . . , g
m
(x)).
Then
D
j
h(x
0
) =
m
i=1
D
i
f (g(x
0
))D
j
g
i
(x
0
) =
m
i=1
D
i
f (g
1
(x
0
), . . . , g
m
(x
0
))D
j
g
i
(x
0
).
Proof. We see that h = f g, where
g(x) = (g
1
(x), . . . , g
m
(x)).
Since g
i
:
n
is continuously dierentiable at x
0
, then g
i
is dierentiable
at x
0
for all i = 1, . . . , m. Then g :
n

m
is also dierentiable at x
0
from
Proposition 2.4. Hence by the Chain Rule
h
(x
0
) = ( f g)
(x
0
) = f
(g(x
0
))g
(x
0
),
where
h
(x
0
) =
_
D
1
h(x
0
) D
2
h(x
0
) D
n
h(x
0
)
_
,
f
(g(x
0
)) =
_
D
1
f (g(x
0
)) D
2
f (g(x
0
)) D
m
f (g(x
0
))
_
,
and
g
(x
0
) =
_
_
D
1
g
1
(x
0
) D
2
g
1
(x
0
) D
n
g
1
(x
0
)
D
1
g
2
(x
0
) D
2
g
2
(x
0
) D
n
g
2
(x
0
)
.
.
.
.
.
.
D
1
g
m
(x
0
) D
2
g
m
(x
0
) D
n
g
m
(x
0
).
_
_
31
The left- and right-hand sides are each 1 n matrices. Equating the ith entries
gives
D
1
h(x
0
)
= D
1
f (g(x
0
))D
1
g
i
(x
0
) + D
2
f (g(x
0
))D
1
g
2
(x
0
) + + D
m
f (g(x
0
))D
1
g
m
(x
0
),
D
2
h(x
0
)
= D
1
f (g(x
0
))D
2
g
1
(x
0
) + D
2
f (g(x
0
))D
2
g
2
(x
0
) + + D
m
f (g(x
0
))D
2
g
m
(x
0
),
.
.
.
D
n
h(x
0
)
= D
1
f (g(x
0
))D
n
g
1
(x
0
) + D
2
f (g(x
0
))D
n
g
2
(x
0
) + + D
m
f (g(x
0
))D
n
g
m
(x
0
).
Therefore the jth equation is
D
j
h(x
0
) =
m
i=1
D
i
f (g(x
0
))D
j
g
i
(x
0
) =
m
i=1
D
i
f (g
1
(x
0
), . . . , g
m
(x
0
))D
j
g
i
(x
0
).
Let us look at special cases of Theorem 7.1. Suppose that m = n = 2, so that

h(x, y) = f (u(x, y), v(x, y)).
Then
D
1
h(x, y) = D
1
f (u(x, y), v(x, y))D
1
u(x, y) + D
2
f (u(x, y), v(x, y))D
1
v(x, y),
D
2
h(x, y) = D
1
f (u(x, y), v(x, y))D
2
u(x, y) + D
2
f (u(x, y), v(x, y))D
2
v(x, y).
Alternatively, these are expressed as
h
x
=
h
u
u
x
+
h
v
v
x
,
h
y
=
h
u
u
y
+
h
v
v
y
.
Now suppose that m = 2 and n = 3, so that
h(x, y, z) = f (u(x, y, z), v(x, y, z)).
Then
D
1
h(x, y, z)
= D
1
f (u(x, y, z), v(x, y, z))D
1
u(x, y, z) + D
2
f (u(x, y, z), v(x, y, z))D
1
v(x, y, z),
D
2
h(x, y, z)
= D
1
f (u(x, y, z), v(x, y, z))D
2
u(x, y, z) + D
2
f (u(x, y, z), v(x, y, z))D
2
v(x, y, z),
D
3
h(x, y, z)
= D
1
f (u(x, y, z), v(x, y, z))D
3
u(x, y, z) + D
2
f (u(x, y, z), v(x, y, z))D
3
v(x, y, z).
32
Alternatively,
h
x
=
h
u
u
x
+
h
v
v
x
,
h
y
=
h
u
u
y
+
h
v
v
y
,
h
z
=
h
u
u
z
+
h
v
v
z
.
Finally, suppose that m = 2 and n = 1, so that
h(t) = f (u(t), v(t)).
Then
D
1
h(t) = D
1
f (u(t), v(t))D
1
u(t) + D
2
f (u(t), v(t))D
1
v(t)
or
dh
dt
=
h
u
du
dt
+
h
v
dv
dt
.
Note that we replaced the appropriate partial derivatives by ordinary derivatives.
2
be a scalar eld whose value is f (x, y). Suppose
that x = r cos and y = r sin . Dene
g(r, ) = f (r cos , r sin ).
Express
2
g
2
in terms of the partial derivatives of f .
We have
x
r
= cos ,
x
= r sin ,
y
r
= sin ,
y
= r cos .
Therefore
g
r
=
f
x
x
r
+
f
y
y
r
=
f
x
cos +
f
y
sin
and
g
=
f
x
x
+
f
y
y
=
f
x
r sin +
f
y
r cos .
Note that this last equation is the same as
D
2
g(r, ) = D
1
f (r cos , r sin )r sin + D
2
f (r cos , r sin )r cos .
Hence
D
2,2
g(r, )
= D
1
f (r cos , r sin )(r cos )
[D
1,1
f (r cos , r sin )(r sin ) + D
1,2
f (r cos , r sin )(r cos )](r sin )
+ D
2
f (r cos , r sin )(r sin )
+ [D
2,1
f (r cos , r sin )(r sin ) + D
2,2
f (r cos , r sin )(r cos )](r cos ),
33
which can also be expressed as
2
g
2
=
f
x
r cos
_
2
f
x
2
r sin +
2
f
yx
r cos
_
r sin
f
y
r sin +
_

2
f
xy
r sin +
2
f
y
2
r cos
_
r cos
=
f
x
r cos +
2
f
x
2
r
2
sin
2

2
f
yx
r
2
sin cos
f
y
r sin

2
f
xy
r
2
sin cos +
2
f
y
2
r
2
cos
2
.
Exercises
1. Let f :
n
be the scalar eld given by
f (x) = |x|
4
.
Compute Df (x
0
; v) for any x
0
, v
n
.
2. Let T :
n

n
be a given linear transformation. Let f :
n
be the
scalar eld whose value is
f (x) = x, T(x).
Compute Df (x
0
; v) for any x
0
, v
n
.
3. A set E
n
is said to be convex if for every x, y E, the line segment
{tx + (1 t)y : 0 t 1}
also belongs to E.
(a) Prove that every open n-ball is convex.
(b) Suppose that f : E is a scalar eld, where E
n
is convex.
Prove that if Df (x
0
; v) = 0 for every x
0
E and for every v
n
, then
f is constant on E.
4. For the following scalar elds dened on an appropriate subset of
2
, nd
all the rst-order partial derivatives and verify that D
2
(D
1
f ) = D
1
(D
2
f ):
(a)
f (x, y) = (x
2
y
2
)
2
;
34
(b)
f (x, y) =
x
x
2
+ y
2
;
(c)
f (x, y) = tan
x
2
y
;
(d)
f (x, y) = x
y
;
(e)
f (x, y) = tan
1
x + y
1 xy
.
5. Let
v(r, t) = t
n
exp
_
r
2
4t
_
.
Find a value of the constant n such that
v
t
=
1
r
2
r
_
r
2
v
r
_
.
6. Consider the vector eld f :
3
3
dened by
f (x, y, z) = (x, y, z)
for every (x, y, z)
3
. Find the Jacobian matrix of f at any point (x, y, z).
7. Let f :
2
2
and g :
3
2
be vector elds dened by
f (x, y) = (e
x+2y
, sin(2x + y)), g(u, v, w) = (u + 2v
2
+ 3w
3
, u
2
+ 2v),
respectively.
(a) Compute f
(x, y) and g
(u, v, w);
(b) Find h(u, v, w) = f (g(u, v, w));
(c) Compute h
(1, 1, 1).
8. Dene
f (x, y) =

xy
0
e
t
2
dt (x, y > 0).
Find f
x
and f
y
in terms of x and y.
35
9. A function u is dened by an equation of the form
u(x, y) = x f
_
x + y
xy
_
.
Show that u satises a partial dierential equation of the form
x
2
u
x
y
2
u
y
= F(x, y)u
and nd F(x, y).
10. The substitution x = e
s
, y = e
t
converts f (x, y) into g(s, t), where g(s, t) =
f (e
s
, e
t
). If f is known to satisfy the partial dierential equation
x
2
2
f
x
2
+ y
2
2
f
y
2
+ x
f
x
+ y
f
y
= 0,
show that g satises the partial dierential equation
2
g
s
2
+
2
g
t
2
= 0.
11. Let f : E be a scalar eld, where E
n
is open. Assume that f is
dierentiable on E. We say that f is homogeneous of degree p over E if
f (tx) = t
p
f (x)
for every t > 0 and for every x E for which tx E. For a homogeneous
scalar eld of degree p, show that
x, f (x) = p f (x)
for each x E. If x = (x
1
, . . . , x
n
), then this equation can be expressed as
x
1
f
x
1
+ + x
n
f
x
n
= p f (x
1
, . . . , x
n
).
(Hint: For a xed x, dene g(t) = f (tx) and compute g
(1).)
12. This is the converse of the previous problem. Prove that if f satises
x, f (x) = p f (x)
for all x in an open set E, then f must be homogeneous of degree p. (Hint:
For a xed x, dene g(t) = f (tx) t
p
f (x) and compute g
(t).)
36

CH 4

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CH 4

Uploaded by

Copyright:

Available Formats

Chapter 4: Dierentiation

Example 4.2. Let us revisit Example 1.6, where f :

(t) = Df (v + tw; w).

(t) = Df (v + tw; w).

Theorem 6.4 (Equality of mixed partial derivatives). Let f : E be a

Let us look at special cases of Theorem 7.1. Suppose that m = n = 2, so that

You might also like