Professional Documents
Culture Documents
Lecture #1
Consider two nite-dimensional inner-product spaces continuous at
and
over
R.
A function
f : V W
is said to be
aV
if
h0
in
lim f (a + h) = f (a)
W. f
Equivalently:
h0
If is continuous at
lim f (a + h) f (a) = 0
and
T :V W
h0
This follows because
is continuous and
Denition: f
is
lim a
=0 T
exists, it is unique is called the total
implies continuity at
If such a
at
and is denoted:
T dfa : V W
To show that this linear map is unique, suppose both tiability. We need to show that
and
v V , T (v ) = S (v ).
Consider
h = tv
t R.
Then:
||T (h) f (a + h) +
>0
then
. If
||h|| <
> 0 ,|t|
to be
= T (v ) S (v ) = 0W = T (v ) = S (v )
Now that we have proven uniqueness of the derivative, we then looked at some examples of total derivatives. Special Case: 1. 2.
W =R f :V R
v1 , ..., vn
of
f (v )
Denition:
We dene the
directional derivative
Dv f (a) = lim
V,
at
f as a aV
function of to be
variables.
t0
f (a + tv ) f (a) t v
and
Proposition: Proof:
if
dfa
exists, then
Dv f (a)
Dv f (a) = dfa (v ).
Notice that the directional derivative is a real number in this special case. It is impotsant to note that there are cases in which the directional derivative always exists, but the total derivative does not.
t0
lim
f (a + tv ) f (a) dfa (tv ) = 0 ||tv || tdfa (tv ) f (a + tv ) f (a) = lim = lim t0 |t|||v || t0 |t|||v || = lim Dv f (a) = dfa (v )
t0
The rst line is simply a statement of the total derivative existing. In the second line, we pull out derivative by homogeneity. In the last line, we invoke the denition of the total derivative.
Denition: Partial derivatives are a special case of directional deriviatives denoted by:
Dvi f f x1
Example:
f (x1 , x2 ) = x1 sin x2 + ex1 x2 f = sin x2 + x2 ex1 x2 x1
Gross Quote:
Once you take one derivative, you will want to take more! If you give a mouse a cookie it will ask
2f x1 x1 2f x2 x1 2f x1 x2 2f x2 x2
general result:
x1 x2 = x2 2e
= =
x1 x2 = x1 cos x2 + x2 1e
There are 4 total functions. Notice the miracle! The second and third equations are the same. This suggests the
Aij =
To recap, we have:
2f =a xi xj
f :V dfa : V
R R
Since
such that:
dfa (v ) = v, fa
If
f a =
where
Lecture #2
Remember from last class that if we have a function
f :V W
we call
f dierentiable
at
dfa : V W
such that
f (a + h)
in the limit that
approaches 0.
We dened the
v=0
as
This denition clearly necessitates the existence of case of the directional derivative, it is linear in
v.
v: cv =
where c
= Dcv f (a)
is a basis of
cDv f (a) Dv
for
Because of this homogeneity, we can always scale the directional derivative and take
||v || = 1.
If
v1, v2 ..., vn
V,
v=
ci v i
Dv
to write:
Dv f (a) =
Now to write the matrix of the linear operator 1. Basis 2. Basis A basis of
ci Dvi f (a)
dfa we
need:
{v1 , ..., vn }
of
{w1 , ..., wm }
of W
gives us a decomposition of
into
functions from
to
R:
fi
and each
vj
we have
Dvj f (a) =
Now we can write the matrix
{vi }
as:
Aij =
fi xj
f : R3 R2
3
dened by:
f1 (x1 , x2 , x3 ) f2 (x1 , x2 , x3 )
consider this function at the point deriatives, we have:
a = (0, 1, 2).
dfa =
x2 x1 cos(x1 ) ex2
2x3 1
= f
at the point
f (a + h)
Eqplicitely, we have:
f (a) + dfa f
is a function between spaces of dimension 3 and 2).
f (a + h)
Calculating
f (0, 0) = 0.
Dv f (0, 0),
we have
f (tv ) f (0, 0) 2a2 bt3 2a2 bt2 2a2 b 2a2 = lim 4 4 /t = = lim = t0 t0 t a + t2 b2 t0 b2 t t4 a4 + t2 b2 b lim
we note that the that since
Dv
is not linear in
(a, b). V.
Remember from 25a
Now we will look at gradients. Remember that a gradient requires an inner product on
w W such that dfa (v ) = v, w . Denition: We dene this unique w as f (a) caled the gradient of f (a) . f Let e1 , ..., en be an orthonormal basis. Let xi = directional derivative w.r.t ei . dfa
is linear in
v,
Then
f (a) =
so
dfa (ei ) =
direction where the function increases most rapidly. To see this, note that:
What is the the signicance of the gradient? We can see that the gradient vector is the vector that moves in the
f (a), v
= =
From the rst to second line, we have taken advantage of the assumption that if
a,
then
f (a) = 0 V
v is normalized. We will claim that f (a) = 0 then the level curves are to the
gradient vector.
Fross Quote:
The closest nuclear reactor to Boston is in Seabrook, NJ. The Seabrook nuclear reactor has a How should we make our ee most eective? The normal idiot will draw a line from the
meltdown. Suppose, that you have access to precise data on the level curves of the nuclear waste emmitted from the nuclear reactor. nuclear reactor to his/her present position and run directly away from the nuclear reactor along this line. With your knowledge of the theory of the gradient, you know that the best way to run is in general not in this direction. You should ee along the gradient vector!
Lecture #3
Consider the function that if the
is dierntiable at
f : C R. As a then
always,
f (a), v dfa (v ) = 0
We also mentioned some of the properties of the gradient:
1. If 2. If
then
f (a) = 0
in
V.
f (a) = 0,
at
f (a)
are
to
= 2. f (a). The
{x V : f (x) = f (a)}
Consider the example where
we have
f (a) = 1
n 1.
We see that
f f , ..., )} x1 xn
{v = (v1 , ..., vn ) :
i=1
Now consider
(vi ai )
f x1
= 0}
a
S = graph
We will dene the graph of
of
f
dened by:
g :V RR
s S = (a, f (a))?
First we calculate:
(v, y ) :
n f i=1 xi
(vi ai ) = (y f (a)).
a
f (a)(x a) = y
This is what we intuitively think of as the tangent line to the point on a graph. Now we will talk about the most useful rule in analysis: the functions
Chain Rule.
and
g: V f W g U
is dierentiable at
If
f (a) W ,
then
f g
is dierentiable at
a V
and
Proposition: Proof:
If
the derivative of the composition of two functions is the composition of the linear derivatives
is dierentiable at
a,
we have
||h||0
Similarly, if
is dierentiable at
then we have:
||h||0
lim (h) =
k = f (a + h) f (a)
We know that
approaches 0 in
as
at
a.
g (f (a + h)) g (f (a))
We can rewrite this as:
= g (f (a) + k ) g (f (a)) = dgf (a) (k ) + ||h|| (k ) = dgf (a) (f (a + h) f (a)) + ||f (a + h) f (a)|| (k ) = dgf (a) (dfa (h) + (h)||h||) + ||dfa (h) + ||h||(h)|| (k ) = dgf (a) dfa (h) + ||h||dgf (a) ((h)) + ||h|| ||dfa h ||h|| + (h)|| (k )
g f (a + h) g f (a) dgf (a) dgf (a) dfa (h) = dgf (a) ((h)) + ||dfa ||h||
Now we wave out hands. We note that 0 to 0. Now we look at the
h ||h||
+ (h)|| (k )
must take
dgf (a) ((h)) is a linear operator and is continuous. Therefore, it (f (a + h) f (a). We note that this will also tend to zero. Then we have LHS = 0 + 0
goes to zero, doesn't mean that the product of the two terms
||dfa
h ||h||
+ (h)||
dfa
remains bounded as
h ||h||
h 0.
We will note that the vector in brackets has the property that its norm is 1 (it is on
h0
the vector wanders around the unit sphere. The unit sphere is
compact!
We use the result from last term that a continuous mapping of a compact set is compact and the fact that compact sets are closed and bounded to conclude that
dfa
h ||h||
is bounded as required.
f :V W
We will specify coordinates for our two spaces (this is equivalent to choosing a basis). We call if
dierentiable at
f (a + h) f (a)
where
T (h)
is a linear function. There is not always such a function, but when there is, it is unique. We denote this
dfa T : V W
The natural question to ask is: what information about the function does this linear transformation contain? We can think of v at
v:
t0
f (a + tv ) f (a) Dv f (a) t
6
directional derivative
f
v.
A special
case of the directional derivative is when we choose an orthonormal basis for directional derivative of in the direction of any of these basis vectors as
V , {e1 , ..., en }.
Dei =
Now if we additionally choose a basis we can represent an an arbitrary
f xi
and
{w1 , ..., wm } for W , we have a natural isomorphism bewteen Rm vector in W as w (c1 , ..., cm )
since
f : V Rm
as dened by
coordinate functions
f1 : V f2 : V f3 : V
. . .
R R R
fm: : V
where if
R dfa
as
f (v ) = w = c1 w1 + c2 w2 + ... + cm wm
then
fi (v ) = ci . fi xj
The matrix of this function is
Aij =
Example: f : R2 R3
dened by
(x, y ) (3 cos x sin y, 4 sin x sin y, 5 cos y ). 3 sin x sin y 3 cos x cos y fi = 4 cos x sin y 4 sin x cos y xj 0 5 sin y f (a) = (0, 4, 0) fi xj
and the matrix is:
a = ( 2 , 2 ),
we have
3 = 0 0
0 0 5
Now what does this tell us? We can think of the collumns of this matrix as the tangent vectors in the basis directions on the graph of
Chain Rule:
f.
Suppose we have two functions:
f :V W g:W U
Id
gf
is dierentiable at
if
is dierentiable at
and
is dierentiable at
f (a)
given by:
functoriality
of the derivative.
Now we will look at the chain rule from the perspective of matrix multiplication For spcicity, suppose we have
f : R2 R3 g : R3 R2
In the matrix view, the statement of the chain rule is just a bunch equalities of the form:
hi gi f1 gi f2 gi f3 = + + = xj y1 xj yj x2 y3 xj
7
m=3
i=1
gi fi yi xj
A special case of the chain rule is when the composition of functions results in a map from
R R:
R f Rn g R
The chain rule in this case tells us that
Theorem: U Rn
f : U Rn
is dierentiable on
then
if and only if
df 0 (that a, b U then
g : R U that is dierentiable such that g (0) = a and g (1) = b. Proof: If f is constant, denition of df makes it clear that this operator is zero (this proof is straightforward, but ommitted). Now we assume df 0 on U and we need to show f is constant. For now, it is okay to assume f : U R. We want to show that f (a) = f (b)a, b U . By our hypothesis, there exists a dierentible function g : R U and f : U R. We remember by our denition of connecteness that 0 a and 1 b. Then we can
apply the chain rule:
fg : R R
With
(f g ) = 0
is constant. Then
f (g (0)) = f (g (1))
and
f (a) = f (b)
as required.
Lecture #5
Last time we talked about the to results in
Chain Rule.
1 variable g : R R
with dimension
n.
:RV
and another function
f :V R
Now we can think of
as the composition of
and
(i.e.
. Now we have
dga = dfa da
Recall some properties of single variable calculus: 1.
Theorem: g
aR
then
g (a) = 0
Proof:
g (a) = lim
We note that when
t0
g (a + t) g (a) t t is positive.
We know that
t is negative,
from the left or right, and the only number such that
a = a
is
a=0
2.
Theorem: c [a, b] such that f (a) = Proof: Consider the special case:
Then there exists a
is continuous on
[a, b].
f (a) f (b) (x a) ba
which can be thought of as a modication by an ane linear function. Also require that
g (a) = g (b) = 0
Now
c : g (c) = 0
so we have
0 = g (c) = f (c)
3.
f (b) f (a) ba
Theorem: If f = 0 on an interval, then f = c on that interval. Proof: Take a, b D and note that
0 = f (b) f (a) = (b a)f (a)
4.
Theorem: If f '=g' on D = an interval, then f = g + c on D Proof: Apply the result from 3) to f g ). So f is determined completely by f
Now consider the dierential equation solution is
and
f (t0 ).
f = f.
We can then claim from the unicity theorem that the only
with
C = f (o). g (x) = 0
by the quotient rule which implies
Proof:
f (a) V
1.
Let
f (x) ex then
g = c.
and
dimensions.
Consider
f : V R
dfa : V R
with
Theorem: Proof:
If
a,
then
dfa = 0
in
L(V, R)
dfa = lim
Then we can use the same exact argument approaches zero from either side. 2.
Theorem:
(connecting
Let
is convex is that if
D.
and
b)
where
(t) = a + t(b a)
(0) = a
and
(1) = b.
Proof:
Consider functions
: [0, 1] V f :V R
such that
g =f
g (1) g (0)
and
= f (b) = f (a)
s [0, 1].
Thisfollows from the mean value theorem in one variable. Then we have:
c = (s)
and
(s) = (b a)
as
3.
Theorem:
f =c
on
Suppose
f :DR
with
DV
where
f = 0 D ,
and
then
D. a, b D
and parametrize a path
Proof:
Now let
Then Let
[0, 1]
(0) = a
(1) = b.
g : f.
f ((t)) = 0 V .
g =c
f (a) = f (b).
Next time we will prove that partial derivatives commute (in a at geometry).
Lecture #6
Today we will learn about the commuting property of second partials:
2f 2f = xi xj xj xi
Note that this is true only on at, or Euclidean space. These will typically be the spaces we are most interested in though for this class. We rst need to recall the mean value theorem holds onlyat dierentiable points that set. The statemement of the mean value theorem is:
that in
dimensions, the
set is one for which any two points within the set can be connected by a straight line of points all contained within
and
such that
b=a+h
so that
f (a + h) f (a) = f (c), h
Lets parametrize the path from
to
with a function:
(t) = a + th
Clearly then,
: [0, 1] line
Lets say that
from
ato b
c = a + h
Then
f (a + h) f (a)
= =
f (a + h), h f (c), h
Theorem:
f (a + h + k ) f (a + h) f (a + k ) + f (a) = Dk (Dh f (a + h + k ))
where, as before, 1.
, [0, 1].
is twice dierentiable
u
10
Dh f (a + h)
and
Dh f (a + k + h)
f (a + h + k ) f (a + h) f (a + k ) + f (a) = Dk (Dh f (a + h + k ))
just like we originally set out to do Now we note that we can swap
and
f (a + k + h) f (a + k ) f (a + h) + f (a) = Dh (Dk f (a + h + k ))
Now we can prove that partial derivatives commute.
Theorem:
If
v, x V
and
Dv f, Dw f, Dv Dw f, Dw Dv
U V
then
Dv Dw f = Dw Dv f
Proof: Let h = tu and k = sw. Take t and s to be small enough so that the parallelpped with vertices (a, a + h, a + k, a + h + k ) is contained in U. Remember that we can always do this because U is an open subset. Now we note that Dh = tDv and Dk = sD w and then we see
stDv (Dw f (a + h + k )) = stDw (Dv f (a + h + k )
Now we can divide both sides by
st
and
Dv Dw f (a) = Dw Dv f (a)
as required. Now consider
f :V R
a twice dierentiable function with continuous second derivatives. Then we can think
d2 fa : V V R
dened by
d2 fa (v, w) = Dv Dw f (a) = Dw Dv fa
Aij = T ei , ej =
2f xi xj
It is obvious from the commuting of partials that this matrix is symmetric. Now what is the easiest way to see that partials must commute? Consider the function
f : R2 R
dened by
f : (x, y ) xa y b .
2f xy 2f yx
= =
abxa1 y b1 abxa1 y b1
Clearly, these monomials have commuting partials, so if you believe that you can build up general functions out of these monomials, then you should believe the general result. Now what is the signicance of second partials? We will need this result for Taylor Series (for next lecture).
11
Lecture #7
We can write Taylor's Approximation as
to be
(e1 , en , ..., en ),
f 1 hi + xi a 2
i,j
2f xi xj
hi hj + ...
a
a=
ai ei
and
h=
hi ei .
f hi = dfa (h) xi a
hi ei ) =
i
Dei f (a)hi =
i
f hi xi a
We also note that all of the terms in the third term of the taylor expansion are symmetric (by the commuting property of partial derivatives that we showed last lecture). Lets write out explicitly the form of this term for
n = 2:
i,j
2f 2f 2f 2f + = + xi xj x2 x1 x2 x2 1 2
2f xi xj
expansion for
= diag(1 , 2 , ..., n )
i,j
dimensions as
f (a + h) = f (a) +
i
Now, if all maximum.
1 f hi + xi a 2
3 i h2 i + O (||h|| ) i
f xi
hi = 0
a
and all
i 0
a.
If we have that
then we have a
V.
Consider a function
f :V R
that has a critical point at or
a. Remember that critical points are dened as points in M V such that f (a) = 0 dfa = 0. Loosely speaking, we will dene M as a k dimensional manifold if it has a well-dened tangent plane Ta M V which is a k dimensional. A simle example of a manifold is a circle. The circle lives in R2 , but at every
point in the circle, there is a well-dened tangent line. Since lines are 1 dimensional, the circle is a 1 dimensional manifold. A non-example of a manifold is a V. There is a well-dened tangent line to all points on the V except at the point on its base. We can think of constructing the tangent at a point procedure. First paramatrize a bunch of curves of these parametrizations at the point
(t) : R M
such that
a on a manifold using the following (0) = a. Now take the derivative of each g : V R.
This means that
a.
The set of lines which this set of vectors denes forms the tangent plane,
Ta M .
has dimension
n1
M = {v : g (v ) = constant}
For example, we can think of the circle (often denoted
S1)
as
S 1 = {x, y : x2 + y 2 = 1}
12
Now to rene our denition of a manifold beyond sets that have tangent planes, we will also require that dierentiable and that orthogonal
g is g (a) = 0 for all a M . If these two requirements are met, we will also see that Ta M is the complement of the vector g (a). Remember that the orthogonal complement of a set W is dened as W = v V : v, w = 0
for all
wW
W W V (W )
Now we will show that
= = =
0 W W W
Theorem: if we take a dierentiable parametrization : ( , ) M with (0) = a as usual, then (0) Ta M . Proof: consider the composite function g (t) : ( , ) R. We know that this function is constant (by how
g)
so its derivative is zero. Now we use the chain rule to nd:
g (a) = Ta M .
g = constant.
we dened
(0) = a
then we have
g (a), (0) = 0
This is the statement that
g (a)
and
(0)
are orthogonal.
Lecture #8
Consider a function
f :V R
aV.
The
dfa = a
in
L(V, R)
because
t0
f (a + tv ) f (a) =0 t
M V is a manifold if for any point a M, then Ta M V . Remember that this is a k dimensional subspace called the tangent space to M at a spanned by vectors (0)where ( , ) M where 0 a. A generalization of dfa = 0 on V for a local max or min: f has a local max or min on M at a, then dfa = 0 when restricted to Ta M V . Proof: Take : ( , ) M Consider g f .This has a local max or min at t = 0. The Chain Rule then gives us 0 = g (0) = df(0) (0) Now there are two ways to dene M V . One way to dene it is as a varying collection of k dimensional subspaces Ta M V . We could equivalently dene it as a k dimensional subspace W V where
since this has both positive and negative values. Remember that 1. 2.
W =
V V which V V
W =]textnull
df is injective everywhere, then let M = f (Rk ) V . Example: : R V a parametrized curve such that 0 a provided that = 0. Ta M = R (0) Method 2: Take g : V R. Where g is a component-wise function (g1 , g2 , ...gn ). Then M = {v V : g (v ) = 0 R} = a level set of g Now Ta M = null space of dga : V Rnk which is onto for all a M We will now work out this method in some special cases. For M of dimension n 1 Then g : V R. Then M = {v : g (b) = 0} which is a hypersurface. We need dga : V R to be non zero for all a M. Then Ta M = null space of dga = orthogonal compliment of g (a). Now we know that if f is extremal at a M , then dfa |Ta M = 0 This states that dfa (v ) = v, f (a) = 0. So if v g (a) then v f (a) = f (a) = g (a). Now what does this look like for several constraints? Lets try it for l constraints. So M has dimension k = n l l . By method 2, we need a map g : V R with v (g1 (v ), g2 (v ), ..., gl (v )). Then M = {v : g (v ) = 0} this l has dimension k = n l. Now Ta M = null space of dga : V R . We need that for all a M the lvectors (g1 (a), g2 (a), ..., gl (a)) to be linear independent (they form a basis).
such that if
f : Rk V
13
gi (a) are linearly independent, they span Rl and get W V of dimension k = n l as the kernal. We that W = Ta M which is the orthogonal complement of {g1 , ..., gl } = W as matrix of dga has these rows. we have that if f is extremal at a, then dfa |W = 0 f (a)
is orthogonal to
W f (a)is
in the subspace
1 , ..., l : f (a) = 1 g1 (a) + 2 g2 (a) + ... + l gl (a) 3 Maximize the function f : R R dened by (x, y, z ) x on the intersection of the plane z = 1 and 2 2 2 3 2 2 2 2 the sphere x + y + z = 4. Now we have the function g : R R dened by (x, y, z ) (z 1, x + y + z r ) = (g1 , g2 ) We note that g1 (x, y, z ) = (0, 0, 1) and g2 (x, y, z ) = z (x, y, z ). We see that they are linearly independent on M . So , such that
So there exist
Example:
2x 2y 2z +
= = =
0 0 0
and we can solve this system of equations, along with the constraints, for the unknowns.
Lecture #9
There are 2 ways of getting a object where 1.
a M
then
k dimensional manifold M V n-dimensional real vector space. A a + Ta M tangent plane to M . The rst way of getting a manifold is:
to one.
monifold is an
Then
dfx : Rk V
has nullspace 0,
2.
M V is the zero set of (n k ) constraints g = (g1 , g2 , ..., gnk ) : V Rnk . Then M = {v : g (v ) = 0 Rnk } = {v : gi (v ) = 0i}. We must have tha dga : V Rnk is onto. and then that Ta M = ker(dga ) where dga (v ) = ( g1 (a), v , g2 (a), v , ... gnk (a), v ). Then ker(dga ) = {all vectors v which are orthogonal to the n k vecto In other words, if W =span of gi (a), then v ker(dga ) vW . So M is a manifold gi(a) are linearly independent in V .
Then
g = (g1 , ..., gnk ) = 0. If f is extremal at a, then dfa vanishes on Ta M . orhtogonal to W = Ta M , so f (a) W as the gi (a) from a basis of
property
M V
where
g1 = g2 = ... = gl = 0. h1 = ... = hm = 0
And
N V
where . We are asked to nd the point
mM
and
nN
V.
To do this
M N V V Rl+m
Where
G
n
takes
We can
f (x, y ) =
i=1
(xi yi )2
14
g1 (x, y )
=
. . .
(g1 (x), 0)
gl (x, y )
And similarly for
(gl (x), 0)
h.
and
xy
= =
i gi (x)
i=1 m
j hj (y )
y =1
This procedure has a nice geometric interpretation. It means that at the minimal two points is perpendicular to both tangent planes.
Example:
2 2 x2 1 + x2 + x + x3 1
h =
Then
y1 + y1 + y3 1
and
h = (1, 1, 1).
x1 y1 x2 y2 x3 y3
Now the punchline is that
= = =
x1 = x2 = x3 =
x1 = x2 = x3 =
This implies that
3 x2 1 =1
which implies that
xi =
y1 = y2 = y3 =
1 Now we see that the closest is 1 and the furthests is . 3 3 Now lets talk about Taylor Expansions. IT starts like this
2 3
is the remainder and the terms to the left are called the
Now why
is this particular polynomial important? It is the unique polynomial Note that the notation
with
Theorem:
Assume
(i) indicates that we have taken the ith derivative. f has k + 1 derivatives which are continuous between Rk (h) = f (k+1) (c) k+1 h (k + 1)!
the interval
[a, a + h].
exists
c [a, a + h]
such that
Note that if
propoertional to
k = 0 then hk+1 .
this is just the mean value theorem. We will see next time that this remainder term is
15
Lecture #10
Now we are going to leave Lagrange Multipliers and manifolds for a while and go to the second derivative test. Lets consider a function
f (x)
k+1
derivatives at
f (a + t).
= =
. . .
value at
rst derivative at
t=0
f (k) (a)
kth derivative at
t=0
Each additional derivative is a new
Its important to note that not all functions have all of their derivatives. Now lets write down the
hypothesis. There exist functions (though pathalogical) that are continuous everywhere but owhere dierentiable.
k th
Pk
(k+1)
(t)
t=0
=0
Rk (t) = f (a + t) Pk (t)
Then we have that
(k)
Rk
Now we are ready to state Taylors theorem:
(k+1)
(t) = f (k+1) (a + t)
(k+1)
Theorem: Corallary:
Assume
(c) k+1 f (k+1) (t) exists in the interval [a, a + h]. Then c [a, a + h] such that Rk (h) = f (h+1)! h . k = 0, we have f (a + h) f (a) = f (c)h which is just the statement of the mean value theorem.
is also continuous, so bounded in
Rather than proving this theorem rst, we note the following corallary:
f (k+1) (t)
[a, b]
by
on
[a + a + h]
then
|f (a + h) Pk (h)|
(k)
M hk+1 (k + 1)!
f (a + h) = f (a) + f (a)h + ... + f k!(a) + O(|h|k ) k (h) k+1 where (t) = Rk (t) Bt with (0) = 0 and (h) = 0. We note that by the mean Proof: Let B = R hk+1 value theorem gives us that t1 [0, h] such that (t1 ) = 0. But remember that (0) = 0. Therefore by the MVT, there exists t2 [0, t1 ] such that (t2 ) = 0. But (0) = 0. Then by the MVT t3 [0, t2 ) such that (t3 ) = 0. We repeat this all the way to the k + 1 derivative...
so We usually use this theorem to prove the second derivative test.
Assume that
Then if
f (a) < 0
that
f (a) = 0. f has a
local maximum at
Proof:
f (a) = 0
a.
M = max
|f | 6
on
[a b, a + b]
16
for some
b > 0.
Now let
is chosen so
|h| <
. Then
1 1 f (a)h2 + f (c)h3 2 6
We note that
1 6f
(c)h <
1 6f
f (a) . Then 2
is innitely dierentiable at
a.
Then
f (a + h)
k=0
Now we need to answer the following questions: 1. For which
f (k) (a) k h k!
2. If it converges to a function
g (h)
is
f (a + h) = g (h)? ex
around
a = 0.
We have
ex
k=0
f (k) (a) k x = k!
k=0
1 k x k!
Lecture #11
Review of Exam Material Linear Algebra
Bases
Rn . V V = L(V, R) T = T
Inner Products
Self-Adjoint Operator
W V , W V = W + W
Multivariable Calculus
Various derivative of
f : V W. dfa L(V, W )
is dened at a point
aV
as a limit when
h0
(in
V ).
This is a linear
v=0
in
is denoted
Dv f (a) W
and is dened as the limit
t0
lim
{w1 , ...wm }
so that
f :V W
dened by
v f (v ) =
i
f i ( v ) vi
17
f : V R.
f Dei f xi
where we have implicitly chosen an orthonormal basis for gradient of tha function at a point in its domain by
to be
{e1 , ..., em }.
f (a) =
Now this is the unique vector that satises
f f f , , ..., x1 a x2 a xn
g:U f :V
so that
V W
f g :U V
Then the chain rule is the statement
L(V, W )
:R f :V
then
V W
f :RR
Then the statement of the Mean Value theorem in several variables is that
and
b.
Equality of Second Partials We only talked about this for the case
f :V R
We remember that the second partials commute if all partials exist and are continuous. If this is the case, we have
Dv Dw = Dw Dv
f :V R f (a) = 0. M = g (Rk )
then
dimensional manifold?
g : Rk V
is one-to-one
dga
Tg(a) = image
Another way to make a manifold is
dga (Rk )
for all
g : V Rnk
where
Ta M =
null space of
dga .
a M = {v : g (v ) = c}
then
constraint).
18
Level sets of
with a function
f : V R.
Then we dene
=M V R
d:V RR
dened by
(v, y ) y f (v )
Question: suppose we have a fuction
f :V V
dened by
a f (a).
with
g f (v ) = v ?
dfa
is an invertible
Non-Exam Material
Now we return to Taylor Polynomials. Remember
ki Di Dj j f (0)hi hj ki !kj !
ki + kj = 0.
ij
Now the
k th
Lecture #12
One problem on the exam that gave people trouble was proving that the graph is a manifold.
f :V
graph(f ) Then we have that
R M V RR g (v, z ) = (f (v ), 1) = 0
in
{(v, f (v ) : v V } = M .
v )R.
Therefore, we
conclude that the graph of a function is always a manifold. Now lets return to Taylor Series. Supposet we have a function
f :V R
which is
k+1
C k+1 )
(v1 , ..., vn )
of
V.
is
j1 j2 j3 D1 D2 ...D3 f (a)
exists provided that in
V.
We dene the
f (a + h)
for a small
vector
(h1 , ..., hn )
j1 +...+jk k
19
Now we will have to assume that the domain contains the line between
and
a + h.
f (a + h) f (a) = f (a), h
or in other words
f (a + h) = f (a) +
i=1
The
Di f (c)...hi
Note that
P0 (h) = f (a)
Proving the several variable Taylor Theorem will follow a similar logic to the proof for the multivariable MVT and the 1-variable Taylor Theorem. We do not do this in class, but basically:
Proof:
the points
to
b.
near
f (a + h) = Pk (h) + O(||h||k )
Di f (a)hi +
i=j
Di Dj f (a)hi hj +
1 2
2 2 Di f (a)h2 i + O (||h|| ) i
or in other words, that
dfa = 0,
Di f (a) = 0
for all
i.
f (a + h) = f (a) +
i=j
Now writing this down explicitly for
Di Dj f (a)hi +
we have
1 2
2 2 Di f (a)h2 i + O (||h|| ) i
n = 2,
a > 0, c > 0
and that
ac > b
for
Need
and in this case, we can be ensured that we are on a maximum or a minimum (to see
this simply solve for the roots of the quadratic). Now what does this mean in terms of the matrix
h1
. . .
hn
A=
Then we see that there is a Min Max and that the det(A) transformed matrix
a b
b c
a a
> 0.
is diagonal. Then we can write the Taylor Polynomial in terms of the eigenvalues
f (a + h) = f (a) +
where the if all
1 2
i (hi )2
i=1
20
Lecture #13
Today we will start the Inverse Function Theorem. Suppose we have a linear map:
T :V V
which takes
0 0.
Can we solve
T (v ) = w
for
v,
given an arbitrary
w V.
is
onto and is injective (i.e. bijective). We showed in 25a that there is a unigue linear map that satises this:
T S (w ) = w
and we typically call this corresponds to an
T 1 .
{v1 , ..., vn } y1
. . .
of
Rn
dened by
v (c1 , ..., cn ),
the
nn
matrix
A.
A
This is
x1
. . .
xn n
linear equations in
yn
unknowns. We remember that we can nd such an invesre function if and only if the
determinant of
is non-zero.
The whole point of the inverse function theorem is to generalize this to non linear maps. Suppose we have a (generally non-linear function)
f :V V
which takes
0 0.
g:V V
f (g (w)) = w.
sometimes you can do this and sometimes you can't. Here is an example when you can't:
Example:
f ( x) = x2
f (0) = 0 as required, but we see that for every y in the domain of f there is a non-unique x. For 2 2 and 2 2. A necessary condition for the existence of g is assuming f and g are dierentiable at f g = dfg(0) dg0 = = I I L(V, V )
Theorem:
function
If
in a neighborhood of with
Expansion of
0 0 is continuous at v = 0 and df0 is invertible in L(V, V ), then an inverse f g (w) = w and dg0 = (df0 )1 L(V, V ). Now we can apply the Taylor g (h) = g (0) + dg0 (h) + ... where we already know that dg0 (h) = (df0 )1 (h) + ... We can then calculate
where
f :V V
the ... using an iterative process. Now lets talk about the inverse function theorem in 1 variable. Lets assume that function as a power series:
f (x) = a1 x + a2 x2 + ...
where we assume that the exists a power series is:
a are in F . Notice that automatically f (0) = 0. The question on is whether or not there g (x) = b1 x + b2 x2 + ... such that f (g (x)) = x. Now how do I evaluate f (g (x))? Explicitly, this a1 (b1 x + b2 x2 + ...) + a2 (b1 x + b2 x2 + ...)2 + a3 (b1 x + b2 x2 + ...)3 + ...
Note that in the second term there are no x terms and in the third term there are no x terms. So lets reorganize out function in orders of
x.
2 = a1 b1 x + (a1 b2 + a2 b2 1 )x + ...
Now if we want the inverse function is to exist, we want all of the coecients of higher powers of for the coecient of
to be zero, and
a1
and
b1 :
b1 a1
= =
1 a 1
21
b1
bi
(a1 b2 + a2 b2 1) = 0
Now that we know by etc., etc. So why have we not just proved the inverse function theorem? Well, the series does not necessarily converge: we do not actually know anything about the coecients assure convergence, we need an iterative procedure to calculate Let's do an example.
b1 ,
this is just one equation for one unknown so we can solve for
b2 .
b3
bi .
To
f (x) =
Now we write
dx = log(1 + x) = 1+x
(1 x + x2 x3 + = x xn n!
x2 x3 + 2 3
g (x) = x + b2 x2 + b3 x3 + b4 x4 + ... =
n1
1 so b2 = 2 . Then we can solve for bi through an iterative process. Now we will look at how Newton solved for inverses. Suppose we have a function
f (x) =
and we wish to solve for
1 1+x
such that
f (g (x)) = x
Then we have that
g (x)
g (x) =
n0
and similarly
nbn xn1
g ( x) =
n1
Then we can get the coecients of
bn x n
n1
since we have
nbn = bn1
Then we have
bn
= =
bn1 n 1 n! r
such that
Now we talked briey about the Euclidean algorithm (which is not central to the course). Newton's method for nding roots of algorithm for nding the roots of
f (x).
f (r) = 0.
Newton's
is the following:
x0
arbitrarily (but in practice, it is generally near the root). Now we make the approximation
r x0
Now we do the same process for
f (x0 ) = x1 f ( x0 )
x1
22
Lecture #14
Today we will prove Newton's method for nding rootsr, the solutions to the equation nding roots will actually give us a general method for nding solutions to as
f (r) = 0.
(A method of
f (x) = b = 0,
f b
f.)
Suppose we have a function from a closed interval to the reals
f : [a, b] R
and that
0 0
i.e.
f ( x) > 0
on
[a, b],
is increasing on
[a, b].
r [a, b],
and because
Newton uses an iterative method to nd what this root is. 1) Choose some 2) Take
to get
r x1 = x0
Let
f (x0 ) f (x0 )
3) Iterate: Calculate
Example :
Let
f (x) = x2 2.
a=1
and
x0 = 1.
Then
f (x) = 2x and is greater than zero on x1 = 1 + 1 4 and we can continue this process.
b = 2. f (x)
We have a is
M = 4.)
or
All of these ideas rely on one simple mathematical concept, which is the to prove the Inverse Function Theorem.
the contraction xed point theorem. We will need the theorem to prove validity of Newton's Method and eventually
: [a, b] [a.b] with |(x) (y )| < k |x y | for x, y [a, b]. Contraction Mapping Theorem: Let be a contraction mapping. Then has a xed point, that is a point such that (x ) = x . More precisely, let x0 [a, b], and then x1 = (x0 ), and xn = (xn1 ). Then {xn } x (the
A contraction mapping is a continous function for all some xed
Denition :
0 < k < 1,
|xn x |
kn |xn x | 1k
Continue this bounding process to get
|xn+1 xn | k n |x1 x0 |
Now choose
m < n,
and calculate
|xn xm | = |xn xn1 + xn1 xn2 ...xm+1 xm | |xn xn1 | + + |xm+1 xm | (k n1 + + k m )|x1 x0 |
Simplifying the right-hand side using the expression for geometric series, we get
|xn xm |
The sequence is Cauchy and we are working over point, call it
km kn 1k
R,
x .
n kn kn + |xn x | 1k n,
we have proved the
n,
and because
|xn x |
|xn x |
kn |xn x | 1k
23
x is a xed point of . By an inequality above, |(xn ) xn | = |xn+1 xn | k n , so |(xn ) xn | 0 as n . Meanwhile, by the continuity of the absolute value function and of , |(xn ) xn | |(x ) x | as n , so |(x ) x | = 0, and so (x ) = x as desired. Finally, to show the uniqueness of the xed point x , consider a case of two xed points x and x . Then |(x ) (x )| < |x x | = 0, a contradiction of the denition of the contraction mapping. f (x) Back to Newton's method. Let (x) = x M where M is the maximum of f (x) on [a, b]. We will show this
Now we can show that is a contraction mapping, so by the contraction mapping theorem we have a unique xed point. This xed point will be the root, since
(x ) = x
Then
f (x ) = 0. 0 < m < f (x) < M on [a, b]. We have a < (a) < (x) < (b) < b, (x) m 0 < (x) = 1 f M 1 M = k < 1 so is indeed a contraction mapping, so we
is precisely
can indeed apply the theorem. Now we are going to use this to help us show the inverse function theorem. Choose
f (a) = 0.
We need this condition, because by the chain rule, we can only nd an inverse for a nonzero derivative.
Thus our question is: for did have such a function Assume
y close g , then
to
b,
is there a function
x.
Then
x a
f (a)y f (a) to
g such that f (g (y )) = y ? As we will show later that if we g (f (x)) = x, or g (y ) = x. to nd x such that g (y ) = x. Consider the equation y b = f (a)(x a) get what g is approximately. Then we are going to iterate this sequence
g.
The sequence
Proposition:
x0 = a, xn+1 = xn
f (xn )y f (xn ) will converge to a point x = x which satises will depend on f (a). Thus x = g (y ) denes the inverse function in the
We are going to prove this proposition next time. But just imagine how this is going to work in several variables. Even worse, we will need a better mean value thoerem than what we have right now, since we need a mean value theorem for functions from
to
V,
to
Lecture #15
We are continuing to prove the inverse function theorem using the method of contraction mapping.
Contraction Theorem:
Suppose we have
: [c, d] [c, d]
and for
k < 1,
then
|(x) (y )| k |x y | x, y . Then there is a unique xed point x [c, d] : (x ) = x and this is obtained as the limit of the x0 , x1 = (x0 ), x2 = (x1 ), ... Generally, we will use a that is continuous on [c, d] and we assume that it is dierentiable with | (x|) k on [c, d]. We ran through a whole proof of the contraction theorem last class.
for all sequence
f :DR
which is continuously dierentiable with that
f (a) = 0.
We want to construct a
g : (b , b + ) (a , a + )
such
f (g (y )) = y . Fix a y near b.
Then dene
y (x) = y (x)
= =
= |y (x)| =
Since
is continuous,
> 0
such that
x [a , b + ] = [c, d]
24
Let =1 2 |f (a)|. Then for all y [b , b + ] the map y (x) is a contraction map of 1 with k = . 2 We will take a moment to prove this, but notice the immediate, corallary:
Theorem:
[a , a + ]
[a , a + ].
x0 = a, x1 = (x0 ), ..., xn+1 = (xn ) converges to the unique xed point xn of in x in this interval where f (x ) = y . Since this works for all y [b , b + ], we get an inverse with y the unique xed point of y ,equal to g (y ). Proof that y maps the interval [a , a + ] within itself, for |y b| < : 1 1 We already have that |x a| < . Then |y (x) a| |y (x) y (a)| + |y (a) a| < + = . Note that 2 2
The sequence This is the unique
Corallary:
|y (x) y (a)|
and
1 |x a| 2
|y (a) a| =
Now we have a sequence of functions we have
gn g
[b = , b + ]
with
g0 (y ) = a, g1 (y ) = a
n+1
g1 (b + h) = a +
gn+1 (y = b + h) = gn (y )
f (a)y by f (a) = a f (a) . Then f (gn (y ))y . This gives us the f (a)
Now we need to prove the inverse function theorem for multiple variables? Lets start with the contraction theorem for a vector space
V.
Suppose we have
:CC
where
is a compact subset of
V.
theorem. Then
k < 1.
subsequence as before, as
x1 = (x0 ), ...
In the one dimensional case, the mean value theorem was critical in proving the contraction theorem was applicable. Thus we will need to prove the mean value theorem on vector spaces in order to prove the multi-dimensional inverse function theorem. But before we do this, we will have to dene the operators
operator norm.
T : V W,
and
since
v, v = ||v ||2
is dened only if both spaces have an inner product. If we have this, we can dene the operator norm as
Denition: ||T || is the smallest positive real number such that ||T (v )|| ||T || ||v || for all v V .
For example, if we have that
T = 0,
then
||T || = 0.
It will be useful to talk about dierent types of norms on vector spaces (the topic of next class).
Lecture #16
V and W , then we will also have a norm on L(V, W ). Now we dene the operator norm ||T ||, T : V W, to be the least positive real number M such that ||T (v )||W M ||v ||v for all v V . In fact, M is the maximum value of ||T (v )|| on the sphere ||v || = 1. This means that we don't have to worry about extremizing v M over all of V . This M works for v B1 (0), but any v = 0 has a friend ||v || B1 (0) where we have dened B1 (0) as the boundary of the unit ball around the origin. To see that we only need to look at this subspace of V .
If we x norms on where
1 T (v ) ||v ||
M M ||v ||
The proof that this number satises the properties of the norm is left for you to prove!
Non-Example:
V = P (R).
T :V xn
so that Now
P x dP dx .
for this
||P || =
i=0
|ai | 1i
||vi || =
Therefore
= ||nvi || = n
so
T :V V
V. || ||2 and R with norm | |. Then this operator ||Tw || = ||w||. Then we will take v = w so
Now let's compute the norm of an operator that really does admit a norm. Suppose we have a linear operator
T :V R
so that
T (v ) = v, w
Consider
V = L(V, R)
V.
In fact
v.
V,
||
on
R.
L(V, R)?
We see that
T(
where
x i vi ) =
ai xi
a i = T ( v i ) R.
Then
T =
where
ai vi
vi
V
n
where
vi ( vj ) =
1 i=j . 0 otherwise
n
Then
|T v |w = |
i=1
So
ai xi |
i=1
||ai xi | =
i=1
|ai |
i=1
n n i=1 |ai |. To show equality, nd a v such that |T (v )| = i=1 |ai |. Then we would have||T || = and we would have have a 1-norm on V . We see a general pattern that 1 p . If we have a p-norm on
||T ||
a 1-norm on
q -norm
on
where
1 1 + =1 p q
We ended class with nding the norm of
and
R.
26
Lecture #17
Recall our discussion of norms from last class. We described how a norm on the spaces
V and W
gives a norm on
L(V, W ).
For
T : V W,
value of
||T v ||W
for
v B1 ||T || M.
respectively, so you will = the least positive
||T v ||W with ||v ||V = 1. ||T v || ||T || ||v || v V. A some upper bound for ||T v || on the a vector v B1 with ||T v || = M .
||T ||
is to
If you show the above two statements, you will have shown
||T || M
and
||T || M
then
||T ||
for general
||T || = maxi |i | = |largest | when the operator norm is taken with respect to T . This can be proven using the schematic above. operators T : V W , T given by matrix A. Let V and W both have the
norm. Claim:
v = i=1 xi vi . If T (v ) = i=1 aij xj . Then ||T v || = maxi (|yi| |) = |yk | (for some k ) = | akj xj | | a || x | ||v || j |aij | where in the last step we have used the fact that we are taking the - norm on V . kj j j (2) Suppose M = ||Ak ||1 for some particular k . Let j =sign(akj ), or j = 0 if akj = 0. Let v = ( 1 ,... n ). Note ||v || = 1. Then we easily show that ||T v || M . So now we have proved the claim ||T || = M = max(||Ai| ||1 ).
(1) Write Now we are going to talk about the mean value theorem for higher dimensions. You should recall the statement of the mean value theorem for
f : V R.
The statement and proof of the analagous (but weaker) statement for
f :V W
proof.
for some
on
and
b.
We won't prove this in class, but you should look in the book to get the avor of the
We will use this multiple-dimensional mean value theorem when we prove the inverse function theorem. We will prove the inverse function theorem in the special case with
df0
the identity in
L(V, V ).
df0 invertible.
The general case can be reduced to the special case, so our argument from the special case sucient.
We reduce the general case to the special case by replacing and then by replacing
with
1 f df0 ,
so that
has
0 0,
Lecture #18
Today is a guest lecture with Sarah Koch. Today we prove (or almost nish proving) the inverse function theorem as stated in Edwards (Theorem 3.3).
f : Rn Rn is C 1 in a neighborhood W Rn of a point a Rn . (Recall that C means continuously dierentiable.) Supppose that dfa is invertible. Then f is localy invertible at a; that is, there exist open sets U W , U a, V containing b = f (a), and a one-to-one map g : V U such that g (f (x)) = x x U , and f (g (y )) = y y V .
The theorem reads: Suppose that the mapping
Some remarks on the inverse function theorem: the theorem basically tells you that if a function's derivative is invertible at a point, then the function iteslf is invertible in a neighborhood of that point. In analysis, you often nd cases like this where a function is shown to mimic some behavior of its derivative. Another important note: the inverse function theorem is local! Even if you have local inverses at all points, you cannot necessarily stitch them together into a global inverse. Consider, for example, the function
27
We are going to prove some lemmas before we prove the inverse function theorem. You should already have the preliminaries in place: operator norms, the sup or or 0 norm that takes the maximum component of the vector, etc. Here is a useful corollary to the multivariable MVT: Let neighborhood of the line L with endpoints
and
a + h.
If
x f (x) (x).
U Rn
df0 = I h = x.
. It follows that if
||dfx I || <
x Cr ,
be a
C1 map
such that
f (0) = 0
and
This is another corollary in Edwards, and the idea of the proof is to apply the previous corollary to We need one last corollary from Edwards.
= df0 = I ,
f : R n Rm
be a
C 1 map
at
a Rn .
If
dfa : Rn Rm is
injective, then
aR
We will also use the conraction mapping theorem proven earlier: Let compact set
C,
k < 1.
Then
: C C be a contraction x C .
f : Rn Rn be a C 1 map such that f (0) = 0 and df0 = I . Suppose also that ||dfx I || < 1 x Cr . Then C(1 )r f (Cr ) C(1+ )r . Moreover, if we dene V to be the interior of C(1 )r and dene U 1 =intCr intf (V ), then f : U V is bijective (and therefore has an inverse g ). The map g is dierentiable at 0 and it is the limit of the sequence dened by g0 (y ) = 0, gn+1 (y ) = gn (y ) f (gn (y )) + y.
Let
We are ready to prove the fundamental lemma in our proof of the inverse function theorem:
|f (x) f (y ) (x y )|0 |x y |0 x, y Cr
We can rearrange this inequality to get
restricted to
we show that
C(1
)r
f (Cr ).
To show this, we
y : x x f (x) + y
To show that We can write
maps
Cr
to
Cr
Cr ).
|(x)|0 |f (x) x|0 + |y |0 = |f (x) f (0) + df0 (0)|0 + |y |0 |y |0 + |x|0 max ||dfx df0 || (1 )r + r = r,
xCr
which shows that
y maps Cr to Cr
Cr ). z Cr .
Then
f 1 ( y ) !
28
Lecture #19
Today we start integration. We are familiar with the interpretation of the integral in single variable calculus. If we have a function interval
f : R R,
b
a
f (x)dx
on the
[a, b].
In order to generalize this notion of integration, we have to rst generalize our idea of area. The
Rn
a(R)
where
R,
R=
Then
[ai , bi ]
i=1 n
a(R) =
i=1
(ai bi )
in
Denition:
1.
SV
has an area
a(S ) 0
0 Ri
where
0 n i=1 Ri S
and
where 2.
(ai , bi ).
by closed rectangles
Ri
where
n i=1 Ri S
and
Example:
S=R
then
a(S ) = a(R)
Non-Example:
S [0, 1] R
of rational numbers in The function 1. 2.
[0, 1].
You can see that the greatest lower bound for the area is 1 and the least upper bound
for the area is 0. Since these are not the same, there is no dened area.
a(S )
S S = a(S ) a(S ) S, S
are disjoint
So > 0, > 0 such that |x x | < . Therefore a |f (x) f (x )| < ba on the entire inerval [a, b]. Now we choose an N so large that bN < , so that we divide the interval into N equal parts. We can then construct a lower choosing the minimum value of f (x) on each interval
Proposition: If f is continuous on [a, b] then a(S ) exists. Proof: f is uniformly continuous on compact set [a, b].
29
[xn , xn+1 ].
To create an upper cover, we can do the same thing but just choose the maximum value of
f (x)
on each
of the intervals
[xn , xn+1 ].
0 Rn
Rn .
Then
0 a(Rn ) a(Rn )=
ba ba |f (xmin ) f (xmax )| = N N ba N f : Rn R+ .
He did this by calculating the
Now we will have to generalize this to the case where we have a continuous function outer cover is identical to the way that Archimedes found an approximation for areas of inscribed and circumscribed n-gons of a circle. He went up to
This technique of determining the area by nding the upper limit of an inner cover and a lower limit for the
n = 96
to determine that
3
which gives us the rst two decimals of the area function 1.
10 1 <<3 71 7
.
properties of the integral function
We will dene the integral in terms of the converging value of inner-covers and outer covers. The properties of
f (x)dx.
2. if
on
[a, b]
and
is the max of
on
[a, b],
then
m(b a)
a
3. Consider the function
f M (b a)
F (x) =
a
for
x [a, b].
Then
F (x)
is dierentiable and
Proof: F (x + h) F (x) =
x+h
a
on
x
a
min(f
then as 4. If
[x, x + h])
and
[x, x + h])
h0 x
a
we have that
min(f )
then
max(f ) f (x).
is a function with
G =f
b
a
f = G(b) G(a)
So
Proof:
f = F ( x)
also satises
F = f.
F (x) G(x) has a zero derivative. The Mean Value Theorem C = G(a) for F = G + C and 0 = G(a) + C .
Lecture #20
Review of the end of last lecture: Consider a continuous function
f : [a, b] R+
where the greatest lower bound and the least upper bound of the content approach each other. This means that
f = a(S )
For
x F (x) = a f = a(S ) Theorem: F (x) is dierentiable on [a, b] and F = f with F (a) = 0 Proof: First we need to prove that the derivative exists at x = c. We x [a, b],
we dene the fumction
have
F (c + h) F (c) h0 h lim
c+h = =
a
c+h
h0
f h h
c
a
lim
30
We will show that this exists by using our bounds on the integral
c+h min(f )
As
h f
max(f ) c.
Then
h0
then
min(f ) f (c)
and
max(f ) f (c)
since
is continuous at
F (c) = f (c)
as required. A corallary to this theorem is known as the fundamental theorem of calculus
If
[a, b]
with
G = f,
then
b
a
Proof:
We know that
G =F
on
[a, b].
Then
G = F + G(a).
(G F ) = 0 = G = F + C
on
[a, b]
where we have used the mean value theorem. Now we evaluate this at
x=a
G(a) = F (a) + C = C
Then
(f + g ) =
a
f+
a
Proof:
Let
and
(F + G) = F + G = f + g
where we have exploited the fact that the derivative is linear. Then we have that
(f + g )
(F + G)(b) (F + G)(a)
Substitution:
on
on
[a, b],
f (x) 0
and
is continuous
g (u)du =
a
g (f (a))f (a)dx G =g
on
Proof:
consider
We will use the chain rule and the Fundamental Theorem of Calculus (FTC). Let
[c, d].
Now
(G f )
= =
G (f (x)) f (x) (g f ) f
f (A) g =
g f (det(df ))
A
31
Integration by Parts:
det(df )
is called the
Jacobian.
a b
We will state this a little dierent than is conventional so that the proof will be more
f g +
a
(f g ) = f g + f g .
b
a
(f g )
and
0 /2
/2
+
0
cos2 = (1 sin2 )
So we have
2 =
0 0
/2
sin (x)dx
2
/2
=
0
1= 4
/2
sin2 (x)dx
Theorem:
The set
in
Rn
has a content
and
[0, 1]} A
so
> 0.
0 j . Ri A R
a(Rj )
So let
a(Ri ) < .
j R0 = union of closed rectangles. (I will defer to the proof in Edwards for this one). Bij = R i n Theorem: If f : R R which is bounded with bounded support and is continuous except on a, a negligable n set in R . Then f exists. Such functions are the most general types of functions that can be integrated (but S
they are not the only types of functions).
Lecture #21
There are three major topics for the upcoming midterm exam 1. Local analysis of
f :V R
near
a V.
dfa , Dv F (a),
Dv (Dw f )(a),
etc. You will be expected to know the form of Taylor polynomials in both one dimension and
multiple dimensions as well as how to analyze critcal points of a function using the second derivative. 2. Inverse function theorem. You should know the statement of this theorem although you will not be required to prove it. It will be important to understand the tools used to prove the inverse function theorem such as contraction mappings, operator norms, and the MVT in single and several variables. 3. Integrals of functions
f : R R.
substitutions. The emphasis of this exam will be on the rst two parts, though integration will be included. We will skip over the next section in the book which is called Fubini's Theorem. This theorem allows us to
integrate functions with rectanngular domains by integrating each variable seperably. Instead, w;e will focus on the change of variables formula. The statement of this formula is as follows. Suppose we have functions that
and
such
g:V f :V
V R+
32
If
takes a subset
A V g (A)
where
dg
is invertible on
A,
then
f= (f g )| det dg |
A
g (A)
in one dimension, the change of variables formula is just the substitution formula.
f (x, y )dxdy =
g =T :V V
g (A)
a A T
is an invertible transformation
dga
is invertible for
f = 1 on V . Then the LHS= a(T (A)) and the LHS= | det T | a(A) g = T is linear and f = 1, the change of variables theorem is just the statement of the following theorem Theorem: If A V is contented and T : V V is linear, then T (A) is contented and a(T A) = | det T | a(A) So when g = T is linear, the change of variables theorem is the statement about linear operators on real vector spaces with a specied basis (v1 , ..., vn ). If we have that T =
then
1 0
If
0 2 2 = 0,
then
T (x, y ) = (1 x, 2 y )
and then
a(T A) = 1 2 a(A).
T (A)
W V.
and
T1 , ..., Tn
T S.
is the product of some simple matrices, where we will check the formula by hand.
by checking it when
A = R.
In this
Lecture #22
Today we will prove the change of variables theorem. If we have a function where
dg
is an invertible function on
A,
then
f=
g (A)
f g | det dg |
A
We will start by proving this for the special case where we have
33
1. 2.
f =1 g
is an invertible linear map
dg = g
is constant.
Ri A Rj
We consider the dierence
vol(Rj )
Then we also need to show that
vol(Ri ) <
vol(g (Rj ))
and
g. T1 , ..., Tn = T = T1 T2 ... Tn
We will chek the formula for the linear maps and rectangles
A.
1
.. .
0 ci
.. .
Ci = 0 1
1ij Eij = 0
we then note that .. .
= ci = 1
colummn row
iby ci
iby ci ito
column
column row
j to
row
i A
Now we note that we can use these special matrices in combination with each other to turn an arbitrary matrix into the identity. This will look like
Lecture #23
Last time we proved the change of variables formula for the special case where we have a linear map
T :V V
which acts on a contented subset
AV
then we found that the volumes
g=T
and
f = 1.
Consider the
vol(A) = 1
Then we need to show that
independent vectors
p1 , p2 , ..., pn
given by a basis of
V.
Then
T : V V
dened by
ei pi
so we have
n
standard cube of volume 1
parallelpiped
dened by the
pi = {
i=1
xi pi : 0 xi 1}
pi
vV =
where
yi pi = xi + mi L : V
by
mi
is an integer and
0 xi 1.
L={
i=1
then every
mi pi : mi Z}
vV
is of the form
+z
where Let
where the
pi = ei ,
L={
is called the associated lattice. If we also have that
mi pi : mi Z }
B = large
what is a good estimate for
#(L G)
vol(B ) | det T | V
by using equal radii
A sensible question would be: If we try to pack round balls into at pts
L,
how much of
do we cover? It is not obvious, for example, that packing the spheres according to a
35
square lattice is better than packing them with respect to a parralelpiped lattice. First we note that if we have two spheres centered at
and
r || ||
where
r=
where
1 ||||min 2 V
are we covering with these
||||min
proportion of
B covered
B ) vol(Ballr ) =
C rn
vol(B ) | det T |
vol(B )
B covered =
1 ||||min )n Cn ( 2 det T
Lets compute this for a square lattice. For this case, we have that
pi = ei
and
L = Zn Rn
so the proportion of
Rn
r=
1 Cn ( 2 ||||min )n = .9069 det T 1 2 = 23 and ||||min = 1. Interestingly enough, the best spherical packing in 2 dimensions 3 2 is found for a hexagonal lattice...this is what bees use! There is a sketchy proof for this in 3 dimensions. What we
where
det T =
1 0
= 2, 4, 8, 24
Noam Elkies proved the best packing problem for 24 dimensions. Lets prove the optimal packing problem for 2 dimensions. Now lets move to proving the change of variables formula for a nonlinear, continuous invertible and
a b.
around
g : V V where dga is a such that for all x in this small rectangle, we have
Lecture #24
Theorem: Change of Variables
at all points of Suppose
A,
with
dga
| det(dga )|
f=
g (A)
f
Ri
f (g (a)volg (Ri )
36
f.
f g | det(dga )|.
Proof:
We just went over the heuristic proof, but now we need to make it precise. We will not use epsilonssee Take
the book if you want tobut we will at least make it more rigorous.
Lemma:
g : V V , a b, dga invertible, dg continuous. Let R be a small rectangle around a : x R, with respect to the sup norm on V . Let C be a rectangle around zero around the 1 translated by a. In other words, C = a (R) where a is the translation map, a (v ) = v + a.
for all
f : V V , df0 = I ,
then
small
rectangle
around
where
||dfx I ||
for all
x R,
where
C1 f (C ) C1+
This older lemma is a special case of the lemma we want to prove, so we reduce the lemma we want to prove to this special case by using
Improper integrals:
instance,
ex dx =
Another is
1 dx = lim t x2
1 1 dx = lim 1 = 1. t x2 t
In general, we dene the improper integral of the limit of the integral, as the region of the integral approahces the region we desire. Of course, these limits are not always dened. For instance,
1
has a well-dened nite value if
x dx = lim
a t 1
xa dx a = 1,
we get a logarithm, and the limit
a < 1,
x dx = lim
a 0
xa dx
converges for
a > 1.
This is a dierent sort of improper integral. The region under the graph is non-contented
2 =
ex dx
we have
2 =
0
er r dr d = =
0
1 = 2
37
Next time we'll learn about another famous integral, perhaps the second most important integral, from Euler. It's called the gamma function, and it's dened by an improper integral.
(x) =
0
We'll show that
tx1 et dt.
Lecture #25
Last time we discussed the notion of an impropor integral. For example, consider the integral
xa = lim
t 1
xa = lim a < 1.
xa+1 t a + 1
=
a
xa
Now lets compute the improper integral that denes Euler's gamma function
(x) =
0
We note that this limit exists as
t
x1 t
e dt =
0,N
lim
tx1 et dt
as
x > 0,
then
t 0.
for
a > 1
. No as
N ,
we have
for
t m,
1 2 t
m
1 t2 x > 0,
we see that
Therefore, the gamma function is nite (i.e. it exists). Why is this integral important? When
(x + 1) =
0 x
tx et dt
= t (et )
0
= x(x)
where in the second step, we have used integration by parts. You can see that the gamma function is closely related to the ! (factorial) operator. Now for another interesting property of the gamma function. We note that the gamma function is dened for all positive Then we can iteratively dene
x, not just the integers (for (x) : R {0, 1, 2, ...} R. Then (x)(1 x) =
38
sin x
x = n Z.
1 = ( )2 = 2 sin 2
So
1 ( ) = 2 ei = cos + i sin
n! log
of
De Moivre was the rst to come up with a basic approximation for the the
1080 n! .
log(n!)
Consider
F (x) F (x)
= =
log(n!) log n
The integral is
n log n n + 1
The RHS is
log(n!)
So we have teh aprosimation
log(n!) n log n n +
Stirling was the one to nd the constant
1 log n 2
n!
nn n 2 en
M V
Where the manifold is dened by the graph of a dierentiable curve
: [a, b] Rn
where
Nowwe will integrate over the entire curve by using the approximation of dividing
it up into many small lines dened by the tangent vectors all along the path
the length of the path, we compute the limit by dividing the path into small segments
s( )
i=1
s( ) = lim s(, T )
T 0
Where
Proposition:
|T |
t. s( ),
dened as the abovel limit, exists and is equal to
(t)
s( ) =
a
|| (t)||dt
39
Lecture #27
The notation for a functions that
= 1-form using the basis {e1 , ...en }. We use l(a)(ei ) = fi (a) for i = 1, 2, ..., n. This gives n fi : U R which determine the 1-form completely. Remember from linear algebra that v = xi ei so l(a)(v ) = xi fi (a) f xi
If
= df
then
fi =
f xi for
= df .
Then
{e1 , e2 } (f1 , f2 ) =
But remember that the partial derivatives are symmetric:
f f , xi x2
2f 2f = xi xj xj xi
So for this specic case, we have
f2 f1 = x2 x1
but this is not necessarily true for arbitrary 1-forms. Our notation for 1-forms once we have a basis is
df
Now how to we integrate a 1-form
= =
CU V
where the image of image( ([a, b]))
=C
Denition:
=
C
( (t))( (t))dt C.
Note that this does NOT depend on the parameters of Choose a basis
{e1 , ...en }
Then
so that
(f1 , ..., fn )
is a function on
U.
( (t))( (t))
= ( (t))( =
i
i (t)ei )
i (t) ( (t))(ei )
Then
=
i
fi ( (t))i (t)dt
= df , then
of the path is a
n integrals in 1 variable. = f ( (b)) f ( (a)). The fact that the integration of a 1-form depends only on the endpoints C generalization of the fundamental theorem of calculus. It then follows that if C is closed, then df = 0
C
40
= df
if the region
and
as
= P dx + Qdy
so that
P ( (t))1 (t) +
a
Q( (t))2 (t)
U = R2 {0, 0}
with
Q =
and then
y + y2 x 2 x + y2 x2
Py Qy
Now we ask, is
= =
P dx + Qdy = df
Lets compute the integral of this one form with respect to the polar coordinates over the unit circle:
=
C
Therefore,
2 1
1 = 2
df =
Now we will consider the 2-form on
where
dim V = 2.
F :U R
where
= = =
d2 = 0
For now, we will have to take it as a given that we dierentiate 1-forms in the following way to produce 2-forms
dw
Now we can state an important theorem
= =
Green's Theorem in R2 :
curves
Let
D.
Let
be a 1-form on
D be a compact, connected region in whose boundary consists of oriented, closed U D and d the 2-form. Then d =
D
41
df =
[a,b]
as an example, if
f = f (b) f (a)
D
= xdy =
vol(D )
Lecture #29
Last time we discussed functions with
f :U R U R
n
. We called these 0-forms. We called 1-forms maps
: U L(V, R) = V
an example is
= df .
: U Alternating V
is a vector space of dimension
bilinear forms on
V = (2 V )
n(n1) 2
B :V V R
then
(2 V )
for
i < j.
=
i<j
and our general 1-form looks like
=
that
fi (a)dxi dim V = 2
then there is only one pair such
i<j
and we have
0 f21 f31
There is an operator
f12 0 f23
f13 f23 0
d : 1-forms
on
U 2-forms
on
U. =
i
=
i=1
fi dxi = d
dfi dxi (
i j
= =
i<j
fj fi xi xj
42
( (t))( (r))dt
= df = f ( (b)) f ( (a))
There are equivalent relations for 2-forms. We will state, but not prove them. We can integrate a 2-form over a surface
D U.
RU
where
is
D,
=
D
Note that the we also have the denition
i s j s
i t i t
Theorem:
If
= d ,
then
=
i
d =
D
R2 . k k
k -form
3 dim V
multi-linear forms on
We dene a k-form as
k : U alternating,
We have
dim(k V ) =
Then
n k
To gain a little intuition on these
B (vi , ..., vk ) is determined by the values B (ei1 , ei2 , ..., eik ) on basis elements. 3 form
on
R3
3 form
on
R4
k -form
to a
k + 1-form.
V = R3
lets try to take a 0-form to a 1-form
f (x, y, z ) = df = fx dx + fy dy + fz dz
Now lets take this 1-form to a 2-form:
= d =
f3 f2 y z
dydz
f1 f3 z x
43
dxdz +
f2 f1 x y
dxdy
d2 = 3 =
f1 f2 f3 + + x y z
dxdydz
this is called the divergence. We see in this example some general propert of dierential forms. k-form The statement that
k+1-form k+2-form
d2 = 0
then becomes the equality of mixed partials. The other property of dierentials form is
dk1 =
Dk
k 1
Dk1
Lecture #30
Today we will start calculus of variations and derive the Euler Lagrange equation. One basic problem in the calculus of variations is to minimize the value of some integral
F ( )
a
over the space of functions:
: [a, b] [, ]
which are dierentiable. We could, for instance, minimze the distance of a curve between the straight line). Suppose we have functional
and
(this is clearly
[a, b].
M C 1 [a, b]
where M is the subset of functions that terminate on and This will generally not be the case, so true subspace. i.e.
is not a subspace.
to be 0, then
is a subspace.
M = C 1 [a, b](0,0) + 0
Lets investigate
dF : C 1 [a, b] R
at
At a critical point
dF = 0
on
T M = C [a, b](0,0) .
: ( , ) M
with the constraints that
(0) =
and
(0) = any
Then the function
vector in
T M
g = F : ( , ) R
has a critical point at 0 and
dF (any
vector in
dF ?
Note that
F ( + h) = F () + dF (h) + O(||h||)
44
C 1 [a, b].
Then
F ( + h) F () =
a
Now our strategy will be to x
f ( + h, + h , t) f (, , t)dt
f ( + h, + h , t) f (, , t) =
One might guess that
terms in
h, h )
dF(h) =
a
To show that this is true, we will need to evaluate the second part of this integral using integration by parts and exploit the fact that
h(a) = h(b) = 0.
a b
f f (, , t)h )dt = (, , t) y y
vanishes at
d dt
f (, , t)h(t) dt y
h TM
and
b.
0=
a
d f (, , t) x dt
a b
f (, , t y
h(t)dt
Now, since
=0
for all
d f (, , t) = x dt
asked to minimize the function
f (, , t y
Lets use this to solve the simplest application of the calculus of variations: the path of shortest length. We are
F ( ) =
a
1 + (t)2 dt 1 + y2
So we have
f (x, y, z ) =
and
f x f y
Lets plug this into the Euler lagrange equation
= =
0 y 1 + y2
f (, , t) = y
for a minimum of a maximum, E-L says that
(t) 1 + (t)2
d dt
This implies that
(t) 1+ (t)2
=0=
(t) = (t)
= mx + b
45
as we would have expected. Now, we will start talking about what a reasonable norm is on our vector space all of the normal properties of the norm (triangle inequality, etc.) Remember that
C 1 [a, b] R.
= = =
|vi |
2 vi
max(|vi |)
This suggests that the norm on a function space will require an integral. We will start with this next class, but we
Lecture #31
We continue discussing the calculus of variations, using the same notation from last class. Recall that we have a function
Ft : V R b Ft () = f (, , t) dt
a
where
the subset of
V = C [a, b] is the set of continuous, dierentiable functions : [a, b] R. We dened M V where M is such that (a) = and (b) = for some xed and that dene M . The denition of the map f : R3 R
Ft
with
dierentiable.
Ft on V ,
satises
the Euler-
fx (, , t) = S=
a
so we would want to use a function
d fy (, , t). dt S,
Let's imagine minimizing the surface area of a surface of revolution. We have the surface area
(t) 1 + (t)2 dt
f (x, y, z ) = x 1 + y 2 .
We won't nish solving this problem with the Euler-Lagrange equation. We'll use another method, discussed now. Euler noticed that there is a simplication of the Euler-Lagrange equations when the form of
z,
i.e.
t.
fx (, )
d fy (, ) = 0 dt d fy (, ) = 0. dt with respect to t d fy . dt
fx (, )
Note that the LHS of the above is the rst derivative whose derivative has terms
of the function
f (, ) (t)fy (, ),
fx + fy fy
So we have Euler's rst integral equation, which says that
f (, ) fy (, ) = c = constant.
Let's try applying this to the problem of minimizing the surface area of a surface of revolution. We had
f (x, y, z ) = x 1 + y 2
46
fy = fx =
Then Euler's rst integral equation becomes
xy 1 + y2 1 + y2
which simplies to
1+2
2 1+2
=c
2 =
say
1 2 ( c2 ). c2 c,
We aren't sure how to solve a dierential equation of this form, but we imagine solving it for a xed value of
c = 1.
Then we have
2 2 = 1
which is the familiar form of a hyperbola. A hyperbola can be parameterized by
the hyperbolic
sine and cosine (look them up on Wikipedia!), so solutions to the above are parameterized by
These curves are called catenary curves, and they're the way that strings hang from posts under
(a) = (a) = ,
t ). = c cosh( c
F :V R
and the dierential, a linear map
dFa : V R.
We glossed over the denition of this dierential in the innite-dimensional vector space denition of the derivative for nite-dimensional vector spaces. For map satisfying
F : V W,
we dened
h0
But this denition requires a norm on
lim
W.
W = R.
And the norm that we're going to put onV is the norm
and
dFa .
vector spaces are continuous. For instance, consider the map on the space of polynomials that sends To be continued next class!
xn
to
n.
Lecture #32
We have dealt a lot with linear operators between nite dimensional vector space
T :V W
But what about innite dimensional vector spaces
V, W
F = R, C
47
and
||
Then we insist that
|| : V R
T :V W
be continuous with respect to these norms. On
2 vi
In all of these cases, the all of the norms tend to zero simultaneously. however, this may not be the case. For example, on
C [a, b]we
b
have
=
a
||
a b
= =
atb
2 max |(t)|
(xing the height of the triangle), the 1 and 2 norms go to zero, while the
Theorem: With respect to the sup norm, the vector space V = C [a, b] is complete. Proof: n is Cauchy. This means that
, N : n, m > N : ||n m ||0 <
but
axb
This means that for a xed which
x [a, b],
n (x) (x).
n .
First x
> 0.
such that
||n m || <
Now for
Hence
||n || <
so
n
But in
is just dened as a function [a, b] R. C [a, b]). To do this, we need to show that
of continuous functions is continuous). To show that copy it). We say that a function
is continuous on [a, b] (i.e. that is indeed N so that ||n ||0 < 3 (A uniformly convergent sequence is continuous at x,chose > 0. (proof erased before I could
F :V W
on two complete normed vector spaces is dierentiable at
aV
T :V W
48
such that
Then
is unique.
Theorem:
T :V W
1.
a real number
M : ||T v || M ||v ||
for all
vV
2. T is a continuous map 3.
is continuous at
v=0
Proof:
1 = 2
By the continuity at
v.
choose
2 = 3
because this is just a special case of
1 = 2
But remember that
3 = 1
consider
= 1. > 0
such that
||T h T 0|| =
||T h||
v=0
for
with
||h|| = .
If
Therefore,
M=
Theorem:
is complete, so is
BL(V, W )
since
BL(V, R) = V
We also have
V (V ) .
These are the topics of functional analysis, a subject developed by Hilbert and Banach.
49