Real Analysis

Math 25b
Professor Benedict Gross Spring 2013

Irineo Cabreros, Daniel Ranard, Marina Lehner
These notes are in draft form and may be incomplete.
Lecture #1
Consider two nite-dimensional inner-product spaces continuous at
and
over
R.
A function
f : V W
is said to be
aV
if
h0
in
lim f (a + h) = f (a)
W. f
Equivalently:
h0
If is continuous at
lim f (a + h) f (a) = 0
and
T :V W
is any linear transformation then:
h0
This follows because
lim f (a + h) f (a) T (h) = 0 W T (0) = 0.

such that:
is continuous and
Denition: f
is
dierentiable at a if there is a linear map T

h0
lim a
f (a + h) f (a) T (h) ||h|| a.
=0 T
exists, it is unique is called the total
It is easy to see that dierentiability at derivative of
implies continuity at
If such a
at
and is denoted:
T dfa : V W
To show that this linear map is unique, suppose both tiability. We need to show that
and
are linear maps satisfying the denition of dieren-
v V , T (v ) = S (v ).
Consider
h = tv
for some small
t R.
Then:
||T (h) S (h)|| ||T (h) f (a + h) + f (a)|| + ||f (a + h) f (a) S (h)||

where we have used the triangle inequality (to see this, note that the LHS can be rewritten as
||T (h) f (a + h) +
f (a) S (h) + f (a + h) f (a) S (h)||).

above inequality satises:
We then see that
>0
then
. If
||h|| <
we have that the RHS of the
RHS < 2 |t|||v ||

Then we have that arbitrarily small so
> 0 ,|t|
suciently small such that
||T (v ) S (v )|| < 2 ||v ||.
But we can take
to be
= T (v ) S (v ) = 0W = T (v ) = S (v )
Now that we have proven uniqueness of the derivative, we then looked at some examples of total derivatives. Special Case: 1. 2.
W =R f :V R
3. We have a basis Then:
v1 , ..., vn
of
f (v )
= f (x1 v1 + ... + xn vn ) = f (x1 , x2 , ...xn )
The second line simply states that, given a basis of
Denition:
We dene the
directional derivative
Dv f (a) = lim
V,
we can then think of in the direction
at
f as a aV
function of to be
variables.
t0
f (a + tv ) f (a) t v
and
Proposition: Proof:
if
dfa
exists, then
Dv f (a)
exists for all
Dv f (a) = dfa (v ).
Notice that the directional derivative is a real number in this special case. It is impotsant to note that there are cases in which the directional derivative always exists, but the total derivative does not.
t0
lim
f (a + tv ) f (a) dfa (tv ) = 0 ||tv || tdfa (tv ) f (a + tv ) f (a) = lim = lim t0 |t|||v || t0 |t|||v || = lim Dv f (a) = dfa (v )
t0
The rst line is simply a statement of the total derivative existing. In the second line, we pull out derivative by homogeneity. In the last line, we invoke the denition of the total derivative.
from the total
Denition: Partial derivatives are a special case of directional deriviatives denoted by:
Dvi f f x1
Example:
f (x1 , x2 ) = x1 sin x2 + ex1 x2 f = sin x2 + x2 ex1 x2 x1
Gross Quote:
Once you take one derivative, you will want to take more! If you give a mouse a cookie it will ask
for a glass of milk. We can write subsequent partials like this:
2f x1 x1 2f x2 x1 2f x1 x2 2f x2 x2
general result:
x1 x2 = x2 2e
= =
cos x2 + ex1 x2 + x1 x2 ex1 x2 cos x2 + ex1 x2 + x2 x1 ex1 x2
x1 x2 = x1 cos x2 + x2 1e
There are 4 total functions. Notice the miracle! The second and third equations are the same. This suggests the
Aij =
To recap, we have:
2f =a xi xj
symmetric nxn matrix
f :V dfa : V
R R
is dierentiable at is a linear map
Since
is an inner product space there is a unique vector
such that:
dfa (v ) = v, fa
If
(v1 , ..., vn )is
an orthonormal bases then:
f a =
where
f (a) f (a) v1 + ... + vn x1 xn
is called the gradient.
Lecture #2
Remember from last class that if we have a function
f :V W
we call
f dierentiable
at
if there is a linear map
dfa : V W
such that
f (a + h)
in the limit that
f (a) + dfa (h)

where
approaches 0.
We dened the
directional derivative in the direction of v V

Dv f (a) = lim
t0
v=0
as
f (a + tv ) f (a) = dfa (v ) t dfa v

for all
This denition clearly necessitates the existence of case of the directional derivative, it is linear in
v.
Since the directional derivative is just a special
v: cv =
where c
= Dcv f (a)
is a basis of
cDv f (a) Dv
for
Because of this homogeneity, we can always scale the directional derivative and take
||v || = 1.
If
v1, v2 ..., vn
V,
we then we can write
v=
ci v i
and exploit the linearity of
Dv
to write:
Dv f (a) =
Now to write the matrix of the linear operator 1. Basis 2. Basis A basis of
ci Dvi f (a)
dfa we
need:
{v1 , ..., vn }
of
{w1 , ..., wm }
of W
gives us a decomposition of
into
functions from
to
R:
f (v ) = f1 (v )w1 + ... + fm (v )wm

For each
fi
and each
vj
we have
Dvj f (a) =
Now we can write the matrix
fi (a + tvi ) fi (a) fi = lim t 0 xj t dfa

with respect to the basis
of the linear operator
{vi }
as:
Aij =
fi xj
As a concrete example, we can consider the following map:
f : R3 R2
3
dened by:
f1 (x1 , x2 , x3 ) f2 (x1 , x2 , x3 )
consider this function at the point deriatives, we have:
= x1 x2 + x2 3 = sin(x1 ) + ex2 + x3 f (a) = (4, e + 2). 1 1 0 e 4 1 a:

Computing all of the partial
a = (0, 1, 2).
Plugging in, we get
dfa =
x2 x1 cos(x1 ) ex2
2x3 1
= f
Now we can use the total derivativeto approximate the function
at the point
f (a + h)
Eqplicitely, we have:
f (a) + dfa f
is a function between spaces of dimension 3 and 2).
The above notation is shorthand (we have to remember that
f (a + h)
Calculating
(4, e + 2) + (h1 + 4h3 , h1 + eh2 + h3 ) : R2 R f (x, y ) =

2x2 y x4 +y 2 .
First we note that
As another exmaple, consider the function:f
f (0, 0) = 0.
Dv f (0, 0),
we have
f (tv ) f (0, 0) 2a2 bt3 2a2 bt2 2a2 b 2a2 = lim 4 4 /t = = lim = t0 t0 t a + t2 b2 t0 b2 t t4 a4 + t2 b2 b lim
we note that the that since
Dv
is not linear in
(a, b). V.
Remember from 25a
Now we will look at gradients. Remember that a gradient requires an inner product on
w W such that dfa (v ) = v, w . Denition: We dene this unique w as f (a) caled the gradient of f (a) . f Let e1 , ..., en be an orthonormal basis. Let xi = directional derivative w.r.t ei . dfa
is linear in
v,
there exists a unique
Then
f (a) =
so
f (a) f (a) f (a) ei = ( , ..., ) xi x1 xn f (a) = ei , f (a) xi
dfa (ei ) =
direction where the function increases most rapidly. To see this, note that:
What is the the signicance of the gradient? We can see that the gradient vector is the vector that moves in the
f (a), v
= =
||v || ||f (a)|| cos() ||f (a)|| cos()

and that if
From the rst to second line, we have taken advantage of the assumption that if
has a local max or min at
a,
then
f (a) = 0 V
v is normalized. We will claim that f (a) = 0 then the level curves are to the
gradient vector.
Fross Quote:
The closest nuclear reactor to Boston is in Seabrook, NJ. The Seabrook nuclear reactor has a How should we make our ee most eective? The normal idiot will draw a line from the
meltdown. Suppose, that you have access to precise data on the level curves of the nuclear waste emmitted from the nuclear reactor. nuclear reactor to his/her present position and run directly away from the nuclear reactor along this line. With your knowledge of the theory of the gradient, you know that the best way to run is in general not in this direction. You should ee along the gradient vector!
Lecture #3
Consider the function that if the
is dierntiable at
f : C R. As a then
always,
is dened on an inner-product spaces. We discussed last lecture
f (a), v dfa (v ) = 0
We also mentioned some of the properties of the gradient:
1. If 2. If
then
f (a) = 0
in
V.
f (a) = 0,
it points in the direction of the maximum increase.
We can see both of these properties because:
dfa (v ) = f (a), v = ||f (a)|| cos

and this function clearly equals 0 when Level curves of
at
f (a)
are
to
= 2. f (a). The
level curves are dened as:
{x V : f (x) = f (a)}
Consider the example where
2 f (x1 , ..., xn ) = x2 1 + ... + x2 .
For the the level curve at
a = (1, 0, 0, ..., 0),
we have
f (a) = 1
and the level curves dene a hypersphere or radius 1 and dimension
n 1.
We see that
f (x) (2x1 , ..., 2xn )|a = (2, 0, 0, ..., 0)

Now we can nd the tangent hyper-plane:
{v V : (v a) f (a)} = {v V : (v1 a1 , ..., vn an ) (

This is just:
f f , ..., )} x1 xn
{v = (v1 , ..., vn ) :
i=1
Now consider
(vi ai )
f x1
= 0}
a
S = graph
We will dene the graph of
of
f
dened by:
to be the zero set of a function
g :V RR
g (x1 , ..., xn , y ) = f (x1 , ..., xn ) y

What would be the tangent hyperplane at the point on
s S = (a, f (a))?
First we calculate:
g (a, f (a)) = (f (a), 1) V R

The tangent hyper-plane consists of the vectors
(v, y ) :
n f i=1 xi
(vi ai ) = (y f (a)).
a
In one dimension, this is,
f (a)(x a) = y
= y f (a) = f (a) + f (a)(x a)
This is what we intuitively think of as the tangent line to the point on a graph. Now we will talk about the most useful rule in analysis: the functions
Chain Rule.
Consider the composition of two
and
g: V f W g U
is dierentiable at
If
f is dierentiable at a V and g d(g f ) = dgf (a) dfa . In words:
f (a) W ,
then
f g
is dierentiable at
a V
and
Proposition: Proof:
If
the derivative of the composition of two functions is the composition of the linear derivatives
(The Chain Rule).
is dierentiable at
a,
we have
||h||0
Similarly, if
lim (h) = f (a),
f (a + h) f (a) dfa (h) =0W ||h||
is dierentiable at
then we have:
||h||0
lim (h) =
g (a + h) g (a) dga (h) =0U ||h||

5
Now lets dene
k = f (a + h) f (a)
We know that
approaches 0 in
as
approaches 0 by the continuity of
at
a.
Now we will have to do a lot of
algebraic manipulation. We will have to estimate the dierence:
g (f (a + h)) g (f (a))
We can rewrite this as:
g (f (a) + f (a + h) f (a)) g (f (a))
= g (f (a) + k ) g (f (a)) = dgf (a) (k ) + ||h|| (k ) = dgf (a) (f (a + h) f (a)) + ||f (a + h) f (a)|| (k ) = dgf (a) (dfa (h) + (h)||h||) + ||dfa (h) + ||h||(h)|| (k ) = dgf (a) dfa (h) + ||h||dgf (a) ((h)) + ||h|| ||dfa h ||h|| + (h)|| (k )
Now we can write:
g f (a + h) g f (a) dgf (a) dgf (a) dfa (h) = dgf (a) ((h)) + ||dfa ||h||
Now we wave out hands. We note that 0 to 0. Now we look at the
h ||h||
+ (h)|| (k )
must take
dgf (a) ((h)) is a linear operator and is continuous. Therefore, it (f (a + h) f (a). We note that this will also tend to zero. Then we have LHS = 0 + 0
and we are done. Right? Wrong! We Just because
goes to zero, doesn't mean that the product of the two terms
goes to zero. Therefore, we must show that the term
||dfa
h ||h||
+ (h)||
is bounded. To nish the proof, we have to show that:
dfa
remains bounded as
h ||h||
h 0.
We will note that the vector in brackets has the property that its norm is 1 (it is on
the unit sphere). Therefore, as
h0
the vector wanders around the unit sphere. The unit sphere is
compact!
We use the result from last term that a continuous mapping of a compact set is compact and the fact that compact sets are closed and bounded to conclude that
dfa
h ||h||
is bounded as required.
Lecture #4 (Emily Riehl)

Consider a functions between nite dimensional inner-product spaces:
f :V W
We will specify coordinates for our two spaces (this is equivalent to choosing a basis). We call if
dierentiable at
f (a + h) f (a)
where
T (h)
is a linear function. There is not always such a function, but when there is, it is unique. We denote this
unique linear function:
dfa T : V W
The natural question to ask is: what information about the function does this linear transformation contain? We can think of v at
V as dening a direction in the domain. dfa (v ) = lim
We can compute the value of this linear transformation
v:
t0
f (a + tv ) f (a) Dv f (a) t
6
This is called the
directional derivative
f
and it describes how the function changes in the direction
v.
A special
case of the directional derivative is when we choose an orthonormal basis for directional derivative of in the direction of any of these basis vectors as
V , {e1 , ..., en }.
Then we can write the
Dei =
Now if we additionally choose a basis we can represent an an arbitrary
f xi
and
{w1 , ..., wm } for W , we have a natural isomorphism bewteen Rm vector in W as w (c1 , ..., cm )
since
We can now think of the function
f : V Rm
as dened by
coordinate functions
f1 : V f2 : V f3 : V

. . .
R R R
fm: : V
where if
R dfa
as
f (v ) = w = c1 w1 + c2 w2 + ... + cm wm
then
fi (v ) = ci . fi xj
The matrix of this function is
Now, we can nally write the matrix of the total derivative
Aij =
Example: f : R2 R3
dened by
(x, y ) (3 cos x sin y, 4 sin x sin y, 5 cos y ). 3 sin x sin y 3 cos x cos y fi = 4 cos x sin y 4 sin x cos y xj 0 5 sin y f (a) = (0, 4, 0) fi xj
and the matrix is:
Now for the point
a = ( 2 , 2 ),
we have
3 = 0 0
0 0 5
Now what does this tell us? We can think of the collumns of this matrix as the tangent vectors in the basis directions on the graph of
Chain Rule:
f.
Suppose we have two functions:
f :V W g:W U
Id
gf
is dierentiable at
if
is dierentiable at
and
is dierentiable at
f (a)
and the the total derivative is
given by:
d(g f )|a = dgf (a) dfa

The chain rule can be thought of as the
functoriality
of the derivative.
Now we will look at the chain rule from the perspective of matrix multiplication For spcicity, suppose we have
f : R2 R3 g : R3 R2
In the matrix view, the statement of the chain rule is just a bunch equalities of the form:
hi gi f1 gi f2 gi f3 = + + = xj y1 xj yj x2 y3 xj
7
m=3
i=1
gi fi yi xj
A special case of the chain rule is when the composition of functions results in a map from
R R:
R f Rn g R
The chain rule in this case tells us that
(g f ) (t) = g (f (t)) f (t)

As an example, consider the following specic functions:
f : t (cos t, sin t, t) g : (x, y, z ) x2 + y 2 + z 2

As an excercize, you should try to compute the total derivative of this function. Note that the major advantage of the chain rule is that it reduces multivariable calculus to single variable calculus.
Theorem: U Rn
is open and connected and
f : U Rn
is dierentiable on
then
derivative) is the zero matrix on a function
if and only if
is constant. We dene connected as
df 0 (that a, b U then
is, the total there exists
g : R U that is dierentiable such that g (0) = a and g (1) = b. Proof: If f is constant, denition of df makes it clear that this operator is zero (this proof is straightforward, but ommitted). Now we assume df 0 on U and we need to show f is constant. For now, it is okay to assume f : U R. We want to show that f (a) = f (b)a, b U . By our hypothesis, there exists a dierentible function g : R U and f : U R. We remember by our denition of connecteness that 0 a and 1 b. Then we can
apply the chain rule:
(f g ) (t) = f (g (t) g (t) = 0

Then
fg : R R
With
(f g ) = 0
is constant. Then
f (g (0)) = f (g (1))
and
f (a) = f (b)
as required.
Lecture #5
Last time we talked about the to results in
Chain Rule.
The Chain Rule allows us to transfer results from
1 variable g : R R
variables. Consider a vector space
with dimension
n.
Now consider a function
:RV
and another function
f :V R
Now we can think of
as the composition of
and
(i.e.
g (a) = (a), f ((a))
. Now we have
dga = dfa da
Recall some properties of single variable calculus: 1.
Theorem: g
aR
then
g (a) = 0
Proof:
g (a) = lim
We note that when
t0
g (a + t) g (a) t t is positive.
We know that
t is negative,
the derivative is positive and visa versa for when
the derivative must be the same whether we approach
from the left or right, and the only number such that
a = a
is
a=0
2.
Theorem: c [a, b] such that f (a) = Proof: Consider the special case:
Then there exists a
f (b)f (a) (Mean Value Theorem) b a
f (a) = f (b) = 0 c with f (c) = 0.

Note that
is continuous on
[a, b].
the closed interval is clearly compact
so it has a max or a min. Lets dene a function
g (x) = f (x) f (a)
f (a) f (b) (x a) ba
which can be thought of as a modication by an ane linear function. Also require that
g (a) = g (b) = 0
Now
c : g (c) = 0
so we have
0 = g (c) = f (c)
3.
f (b) f (a) ba
Theorem: If f = 0 on an interval, then f = c on that interval. Proof: Take a, b D and note that
0 = f (b) f (a) = (b a)f (a)
4.
Theorem: If f '=g' on D = an interval, then f = g + c on D Proof: Apply the result from 3) to f g ). So f is determined completely by f
Now consider the dierential equation solution is
and
f (t0 ).
f = f.
We can then claim from the unicity theorem that the only
f (x) = Cex g (x) =
with
C = f (o). g (x) = 0
by the quotient rule which implies
Proof:
f (a) V
1.
Let
f (x) ex then
g = c.
and
Now lets start trying to generalize these results to
dimensions.
Consider
f : V R
dfa : V R
with
Theorem: Proof:
If
a,
then
dfa = 0
in
L(V, R)
dfa = lim
Then we can use the same exact argument approaches zero from either side. 2.
f (a + tv ) f (a) t0 t for all v about the derivative D.

The statement that
taking the same value when
Theorem:
(connecting
Let
be dened on a convex region
is convex is that if
a line connecting them is in
D.
The Mean Value Theorem for multiple dimensions is:
a, b D, then c = (s) on this line
and
b)
where
f (b) f (a) = f (c), b a

where
(t) = a + t(b a)
(i.e. a map parametrized such that
(0) = a
and
(1) = b.
Proof:
Consider functions
: [0, 1] V f :V R
such that
g =f
Now note that
g (1) g (0)
and
= f (b) = f (a)
f (b) f (a) = g (1) g (0) = g (s)

for some
s [0, 1].
Thisfollows from the mean value theorem in one variable. Then we have:
g (s) = f ((s)), (s)

Now we identify
c = (s)
and
(s) = (b a)
as
is simple and we can actually compute its derivaive.
3.
Theorem:
f =c
on
Suppose
f :DR
with
DV
where
is as before path connected and
f = 0 D ,
and
then
D. a, b D
and parametrize a path
Proof:
Now let
Then Let
[0, 1]
in the usual way such that
(0) = a
(1) = b.
g : f.
Now we can use the chain rule to state that
g (t) = f ((t)), (t) = 0

We note that
f ((t)) = 0 V .
Then we note that
g =c
by the single variable equivalent which implies
f (a) = f (b).
Next time we will prove that partial derivatives commute (in a at geometry).
Lecture #6
Today we will learn about the commuting property of second partials:
2f 2f = xi xj xj xi
Note that this is true only on at, or Euclidean space. These will typically be the spaces we are most interested in though for this class. We rst need to recall the mean value theorem holds onlyat dierentiable points that set. The statemement of the mean value theorem is:
mean value theorem. Remember u U where U is a convex set.

a, b U
there exists
that in
dimensions, the
Remember that a convex
set is one for which any two points within the set can be connected by a straight line of points all contained within
on the line between
and
such that
f (b) f (a) = f (c), b a = f (c)(b a)

Now to make some progress on the problem of second partials we will set
b=a+h
so that
f (a + h) f (a) = f (c), h
Lets parametrize the path from
to
with a function:
(t) = a + th
Clearly then,
: [0, 1] line
Lets say that
from
ato b
c = a + h
Then
f (a + h) f (a)
= =
f (a + h), h f (c), h
= dfc (h) = Dh f (c)

The quantity f (a + h) f (a) is what we would call a rst dierence. Second dierences would be of the f (a + h + k ) f (+h) f (a + k ) + f (a). We will now prove the following equality of second dierences: form
Theorem:
f (a + h + k ) f (a + h) f (a + k ) + f (a) = Dk (Dh f (a + h + k ))
where, as before, 1.
, [0, 1].
We will prove this equality under the conditions that :
is twice dierentiable
2. the parallelpiped is contained in
u
10
Proof: Let g (x) = f (x + k) f (x) then we have the equality

g (a + h) g (a) = f (a + h + k ) f (a + h) f (a + k ) + f (a) = Dh g (a + h) = Dh f (x + k )|a+h Dh f (x)|a+h = Dh f (a + k + h) Dh f (a + h)
Between the rst and second lines, we have used the mean value theorem and between the second and third lines, we have used the chain rule. Since value theorem to nd that:
Dh f (a + h)
and
Dh f (a + k + h)
are both dierentiable, we can apply the mean
f (a + h + k ) f (a + h) f (a + k ) + f (a) = Dk (Dh f (a + h + k ))
just like we originally set out to do Now we note that we can swap
and
in the above argument and obtain:
f (a + k + h) f (a + k ) f (a + h) + f (a) = Dh (Dk f (a + h + k ))
Now we can prove that partial derivatives commute.
Theorem:
If
v, x V
and
Dv f, Dw f, Dv Dw f, Dw Dv
all exist and are continuous on an open
U V
then
Dv Dw f = Dw Dv f
Proof: Let h = tu and k = sw. Take t and s to be small enough so that the parallelpped with vertices (a, a + h, a + k, a + h + k ) is contained in U. Remember that we can always do this because U is an open subset. Now we note that Dh = tDv and Dk = sD w and then we see
stDv (Dw f (a + h + k )) = stDw (Dv f (a + h + k )
Now we can divide both sides by
st
and take the limit as both
and
go to zero. Now each side goes to
Dv Dw f (a) = Dw Dv f (a)
as required. Now consider
f :V R
a twice dierentiable function with continuous second derivatives. Then we can think
of the second derivative as a mapCon
d2 fa : V V R
dened by
d2 fa (v, w) = Dv Dw f (a) = Dw Dv fa
is a symmetric bilinear map. Now, what is the matrix of this linear
map with respect to some basis. This is just
Aij = T ei , ej =
2f xi xj
It is obvious from the commuting of partials that this matrix is symmetric. Now what is the easiest way to see that partials must commute? Consider the function
f : R2 R
dened by
f : (x, y ) xa y b .
Now we know how to take the second derivatives explicitely:
2f xy 2f yx
= =
abxa1 y b1 abxa1 y b1
Clearly, these monomials have commuting partials, so if you believe that you can build up general functions out of these monomials, then you should believe the general result. Now what is the signicance of second partials? We will need this result for Taylor Series (for next lecture).
11
Lecture #7
We can write Taylor's Approximation as
1 f (a + h) = f (a) + dfa (h) + d2 fa (h, h) + O(||h3 ||) 2

Now if we choose our basis for
to be
(e1 , en , ..., en ),
we can consider how a multi-dimensional function changes
with respect to perturbations along more than one direction:
f (a1 + h1 , ..., an + hn ) = f (a1 , ..., an ) +

i
Where we remember that is
f 1 hi + xi a 2
i,j
2f xi xj
hi hj + ...
a
a=
ai ei
and
h=
hi ei .
We also note that the second term in the above expression
f hi = dfa (h) xi a
You can see this by the linearity of the total derivative:
dfa (h) = dfa (

i
hi ei ) =
i
dfa (ei )hi =

i
Dei f (a)hi =
i
f hi xi a
We also note that all of the terms in the third term of the taylor expansion are symmetric (by the commuting property of partial derivatives that we showed last lecture). Lets write out explicitly the form of this term for
n = 2:
i,j
2f 2f 2f 2f + = + xi xj x2 x1 x2 x2 1 2
Now if we choose an orthonormal basis, then
2f xi xj
expansion for
= diag(1 , 2 , ..., n )
i,j
where the notation diag indicates a diagonal matrix with
across the diagonal. Now we can rewrite the taylor
dimensions as
f (a + h) = f (a) +
i
Now, if all maximum.
1 f hi + xi a 2
3 i h2 i + O (||h|| ) i
f xi
hi = 0
a
and all
i 0
then we have a minimum at
a.
If we have that
then we have a
We start with considering the critical points on manifolds in
V.
Consider a function
f :V R
that has a critical point at or
a. Remember that critical points are dened as points in M V such that f (a) = 0 dfa = 0. Loosely speaking, we will dene M as a k dimensional manifold if it has a well-dened tangent plane Ta M V which is a k dimensional. A simle example of a manifold is a circle. The circle lives in R2 , but at every
point in the circle, there is a well-dened tangent line. Since lines are 1 dimensional, the circle is a 1 dimensional manifold. A non-example of a manifold is a V. There is a well-dened tangent line to all points on the V except at the point on its base. We can think of constructing the tangent at a point procedure. First paramatrize a bunch of curves of these parametrizations at the point
(t) : R M
such that
a on a manifold using the following (0) = a. Now take the derivative of each g : V R.
This means that
a.
The set of lines which this set of vectors denes forms the tangent plane,
Ta M .
Now suppose that
has dimension
n1
and is the level set of a function
M = {v : g (v ) = constant}
For example, we can think of the circle (often denoted
S1)
as
S 1 = {x, y : x2 + y 2 = 1}
12
Now to rene our denition of a manifold beyond sets that have tangent planes, we will also require that dierentiable and that orthogonal
g is g (a) = 0 for all a M . If these two requirements are met, we will also see that Ta M is the complement of the vector g (a). Remember that the orthogonal complement of a set W is dened as W = v V : v, w = 0
for all
wW
Also remeber the properties of the orthogonal complement:
W W V (W )
Now we will show that
= = =
0 W W W
Theorem: if we take a dierentiable parametrization : ( , ) M with (0) = a as usual, then (0) Ta M . Proof: consider the composite function g (t) : ( , ) R. We know that this function is constant (by how
g)
so its derivative is zero. Now we use the chain rule to nd:
g (a) = Ta M .
First recall that
g = constant.
we dened
(g ) (t) = g ((t)), (t) = 0

Since
(0) = a
then we have
g (a), (0) = 0
This is the statement that
g (a)
and
(0)
are orthogonal.
Lecture #8
Consider a function
f :V R
that has a local max or min at
aV.
The
dfa = a
in
L(V, R)
because
dfa (v ) = Dv f (a) = lim
t0
f (a + tv ) f (a) =0 t
M V is a manifold if for any point a M, then Ta M V . Remember that this is a k dimensional subspace called the tangent space to M at a spanned by vectors (0)where ( , ) M where 0 a. A generalization of dfa = 0 on V for a local max or min: f has a local max or min on M at a, then dfa = 0 when restricted to Ta M V . Proof: Take : ( , ) M Consider g f .This has a local max or min at t = 0. The Chain Rule then gives us 0 = g (0) = df(0) (0) Now there are two ways to dene M V . One way to dene it is as a varying collection of k dimensional subspaces Ta M V . We could equivalently dene it as a k dimensional subspace W V where
since this has both positive and negative values. Remember that 1. 2.
W =
range of the linear map
V V which V V
is 1-1 which is onto.
W =]textnull
space of the linear map
Method 1: We take the function
df is injective everywhere, then let M = f (Rk ) V . Example: : R V a parametrized curve such that 0 a provided that = 0. Ta M = R (0) Method 2: Take g : V R. Where g is a component-wise function (g1 , g2 , ...gn ). Then M = {v V : g (v ) = 0 R} = a level set of g Now Ta M = null space of dga : V Rnk which is onto for all a M We will now work out this method in some special cases. For M of dimension n 1 Then g : V R. Then M = {v : g (b) = 0} which is a hypersurface. We need dga : V R to be non zero for all a M. Then Ta M = null space of dga = orthogonal compliment of g (a). Now we know that if f is extremal at a M , then dfa |Ta M = 0 This states that dfa (v ) = v, f (a) = 0. So if v g (a) then v f (a) = f (a) = g (a). Now what does this look like for several constraints? Lets try it for l constraints. So M has dimension k = n l l . By method 2, we need a map g : V R with v (g1 (v ), g2 (v ), ..., gl (v )). Then M = {v : g (v ) = 0} this l has dimension k = n l. Now Ta M = null space of dga : V R . We need that for all a M the lvectors (g1 (a), g2 (a), ..., gl (a)) to be linear independent (they form a basis).
such that if
f : Rk V
13
If the note Now
gi (a) are linearly independent, they span Rl and get W V of dimension k = n l as the kernal. We that W = Ta M which is the orthogonal complement of {g1 , ..., gl } = W as matrix of dga has these rows. we have that if f is extremal at a, then dfa |W = 0 f (a)
is orthogonal to
W f (a)is
in the subspace
1 , ..., l : f (a) = 1 g1 (a) + 2 g2 (a) + ... + l gl (a) 3 Maximize the function f : R R dened by (x, y, z ) x on the intersection of the plane z = 1 and 2 2 2 3 2 2 2 2 the sphere x + y + z = 4. Now we have the function g : R R dened by (x, y, z ) (z 1, x + y + z r ) = (g1 , g2 ) We note that g1 (x, y, z ) = (0, 0, 1) and g2 (x, y, z ) = z (x, y, z ). We see that they are linearly independent on M . So , such that
So there exist
Example:
f (x, y, z ) = (1, 0, 0) = g1 + g2 = (0, 0, ) + 2(x, y, z )

Then we see that
2x 2y 2z +
= = =
0 0 0
and we can solve this system of equations, along with the constraints, for the unknowns.
Lecture #9
There are 2 ways of getting a object where 1.
a M
then
k dimensional manifold M V n-dimensional real vector space. A a + Ta M tangent plane to M . The rst way of getting a manifold is:
to one.
monifold is an
f : Rk V where f is one Ta M = dfa (Rk ) of f (x) = a.
is the image of this map.
Then
dfx : Rk V
has nullspace 0,
2.
M V is the zero set of (n k ) constraints g = (g1 , g2 , ..., gnk ) : V Rnk . Then M = {v : g (v ) = 0 Rnk } = {v : gi (v ) = 0i}. We must have tha dga : V Rnk is onto. and then that Ta M = ker(dga ) where dga (v ) = ( g1 (a), v , g2 (a), v , ... gnk (a), v ). Then ker(dga ) = {all vectors v which are orthogonal to the n k vecto In other words, if W =span of gi (a), then v ker(dga ) vW . So M is a manifold gi(a) are linearly independent in V .
Then
Now lets apply this to Lagrange Multipliers. We want to minimize/maximize by is
g = (g1 , ..., gnk ) = 0. If f is extremal at a, then dfa vanishes on Ta M . orhtogonal to W = Ta M , so f (a) W as the gi (a) from a basis of
f : V R on a manifold M dened dfa (v ) = f (a), v . So nablaf(a)
this space. Remember that we have the
property
f (a) = 1 g1 (a) + ... + nk gnk

Sometimes it wont look like we can solve a problem using Lagrange Multipliers. Often we can use the following technique. If we have
M V
where
g1 = g2 = ... = gl = 0. h1 = ... = hm = 0
And
N V
where . We are asked to nd the point
mM
and
nN
which are the closest in
V.
To do this
we consider the cross product
M N V V Rl+m
Where
M N = {(x, y ) : G(x, y ) = 0}.
Where the map
G
n
takes
(x, y ) (g1 (x), ..., gl (x), h1 (y ), ...hm (y )).
We can
now convert this to a Lagrange problem by maximizing/minimizing the function
f (x, y ) =
i=1
(xi yi )2
14
Now we would have the following sustem of equations:
g1 (x, y )
=
. . .
(g1 (x), 0)
gl (x, y )
And similarly for
(gl (x), 0)
h.
Now we need to nd
and
using Lagrange multipliers by solving
xy
= =
i gi (x)
i=1 m
j hj (y )
y =1
This procedure has a nice geometric interpretation. It means that at the minimal two points is perpendicular to both tangent planes.
x and y , the line connecting these
Example:
2 2 x2 1 + x2 + x + x3 1
h =
Then
y1 + y1 + y3 1
g = (2x1 , 2x2 , 2x3 )
and
h = (1, 1, 1).
Now we have the system of equations
x1 y1 x2 y2 x3 y3
Now the punchline is that
= = =
x1 = x2 = x3 =
x1 = x2 = x3 =
This implies that
3 x2 1 =1
which implies that
xi =
1 .We then see that 3
y1 = y2 = y3 =
1 Now we see that the closest is 1 and the furthests is . 3 3 Now lets talk about Taylor Expansions. IT starts like this
2 3
1 1 1 f (a + h) f (a) + f (a)h + f (a)h + f (a)h + ... + f (k) (a)h + Rk (h) 2 6 k!

where
is the remainder and the terms to the left are called the
Taylor olynomial of degree k in h.

P (h)
of of degree
Now why
is this particular polynomial important? It is the unique polynomial Note that the notation
with
P (i) (0) = f (i) (a).

Then there
Theorem:
Assume
(i) indicates that we have taken the ith derivative. f has k + 1 derivatives which are continuous between Rk (h) = f (k+1) (c) k+1 h (k + 1)!
the interval
[a, a + h].
exists
c [a, a + h]
such that
Note that if
propoertional to
k = 0 then hk+1 .
this is just the mean value theorem. We will see next time that this remainder term is
15
Lecture #10
Now we are going to leave Lagrange Multipliers and manifolds for a while and go to the second derivative test. Lets consider a function
f (x)
and consider it near the point
We assume that this function has
k+1
derivatives at
a. Lets change t = 0 . Then t=0
variables and write this function as
f (a + t).
f (a) f (1) (a)
= =
. . .
value at
rst derivative at
t=0
f (k) (a)
kth derivative at
t=0
Each additional derivative is a new
Its important to note that not all functions have all of their derivatives. Now lets write down the
hypothesis. There exist functions (though pathalogical) that are continuous everywhere but owhere dierentiable.
k th
degree Taylor polynomial:
Pk (t) = f (a) + tf (a) +

Now lets make this equation exact by ensuring
tk t2 f (a) + ... + f (k) (a) 2! k!
Pk
(k+1)
(t)
t=0
=0
to make polynomial exact. Now lets introduce the dierence function
Rk (t) = f (a + t) Pk (t)
Then we have that
Rk (0) = Rk (0) = Rk (0) = ... = Rk (0)

and
(k)
Rk
Now we are ready to state Taylors theorem:
(k+1)
(t) = f (k+1) (a + t)
(k+1)
Theorem: Corallary:
Assume
Note that when If
(c) k+1 f (k+1) (t) exists in the interval [a, a + h]. Then c [a, a + h] such that Rk (h) = f (h+1)! h . k = 0, we have f (a + h) f (a) = f (c)h which is just the statement of the mean value theorem.
is also continuous, so bounded in
Rather than proving this theorem rst, we note the following corallary:
f (k+1) (t)
[a, b]
by
on
[a + a + h]
(this interval is combapct),
then
|f (a + h) Pk (h)|
(k)
M hk+1 (k + 1)!
f (a + h) = f (a) + f (a)h + ... + f k!(a) + O(|h|k ) k (h) k+1 where (t) = Rk (t) Bt with (0) = 0 and (h) = 0. We note that by the mean Proof: Let B = R hk+1 value theorem gives us that t1 [0, h] such that (t1 ) = 0. But remember that (0) = 0. Therefore by the MVT, there exists t2 [0, t1 ] such that (t2 ) = 0. But (0) = 0. Then by the MVT t3 [0, t2 ) such that (t3 ) = 0. We repeat this all the way to the k + 1 derivative...
so We usually use this theorem to prove the second derivative test.
Second Derivative Test:

f (a) > 0
then
Assume that
has three derivatives at
that are all continuous with
Then if
has a local minimum at
(and the similar statement is true for
f (a) < 0
that
f (a) = 0. f has a
local maximum at
Proof:
a). 1 1 f (a + h) = f (a) + f (a)h + f (a)h2 + f (c)h3 + ... 2 6
By the corallary, we have that
We now note that
f (a) = 0
by assumption. The essence of the proof is that we see that the
term will dominate
the value of the function locally around
a.
Therefore if it is positive, there will be a minimum and if it negative
there will be a maximum. Now to prove this fully, let
M = max
|f | 6
on
[a b, a + b]
16
for some
b > 0.
Now let
f (a) 2M . Assume that
is chosen so
|h| <
. Then
1 1 f (a)h2 + f (c)h3 2 6
We note that
1 6f
(c)h <
1 6f
(a) (c) f2M <
f (a) . Then 2
1 1 1 1 f (a)h2 + f (c)h3 = h2 ( f (a) + f (c)h) 2 6 2 6

and we are done. Now the idea is to study innite rather than nite power series. If
is innitely dierentiable at
a.
Then
f (a + h)
k=0
Now we need to answer the following questions: 1. For which
f (k) (a) k h k!
does this innite series converge?
2. If it converges to a function
g (h)
is
f (a + h) = g (h)? ex
around
Now just as an example, we remember the Taylor Series approximation for
a = 0.
We have
ex
k=0
f (k) (a) k x = k!
k=0
1 k x k!
Lecture #11
Review of Exam Material Linear Algebra
Bases
Rn . V V = L(V, R) T = T
Inner Products
Self-Adjoint Operator
W V , W V = W + W
Multivariable Calculus
Various derivative of
f : V W. dfa L(V, W )
is dened at a point

Total derivative map.
aV
as a limit when
h0
(in
V ).
This is a linear
Directional derivatives in a direction
v=0
in
is denoted
Dv f (a) W
and is dened as the limit
t0
lim
f (a + tv ) f (a) = Dv f (a) = dfa (v ) t

of
This results in a vector in the range.
When we choose a basis
{w1 , ...wm }
so that
f :V W
dened by
v f (v ) =
i
f i ( v ) vi
we can dene the total derivative of a function as a matrix.
17
We deal a lot with the special case by
f : V R.
In this case, we choose a Partial derivatives are denoted
f Dei f xi
where we have implicitly chosen an orthonormal basis for gradient of tha function at a point in its domain by
to be
{e1 , ..., em }.
Now we can write the
f (a) =
Now this is the unique vector that satises
f f f , , ..., x1 a x2 a xn
f (a), v = dfa (v ) = Dv f (a)

Operations of Calculus
Chain rule Suppose we have the functions
g:U f :V
so that
V W
f g :U V
Then the chain rule is the statement
d(f g )a = dfg(a) dga

in
L(V, W )
Mean Value Theorem Suppose we have the special case
:R f :V
then
V W
f :RR
Then the statement of the Mean Value theorem in several variables is that
f (b) f (a) = f (c), b a

for some
along the straight line in the domain between
and
b.
Equality of Second Partials We only talked about this for the case
f :V R
We remember that the second partials commute if all partials exist and are continuous. If this is the case, we have
Dv Dw = Dw Dv

Maxima and minima of
Maxima and minima exist when What is or isnt a
f :V R f (a) = 0. M = g (Rk )
then
You should remember the proof of this.
dimensional manifold?
g : Rk V
is one-to-one
dga
is one to one. Then of
Tg(a) = image
Another way to make a manifold is
dga (Rk )
for all
g : V Rnk
where
Ta M =
null space of
dga .
We normally dealt with
dga is onto n 1 manifolds
a M = {v : g (v ) = c}
then
(which is equivalent to having one
constraint).
18
Level sets of
with a function
f : V R.
Then we dene
graph(f ) as the zero set of
=M V R
d:V RR
dened by
(v, y ) y f (v )
Question: suppose we have a fuction
f :V V
dened by
a f (a).
Does there exist a
with
g f (v ) = v ?
The anwer is yes if
dfa
is an invertible
linear operator such that
dgf (a) dfa = I

This is the
inverse function theorem.
This is not a topic on the exam, but we will come to it soon!
Non-Exam Material
Now we return to Taylor Polynomials. Remember
f (k) (0) k 1 h + O(hk ) f (h) = f (0) + f (0)h + f (o)h2 + ... + 2 k!

Now what if we wanted to make a multivariable version of this Taylor Approximation. We hypothesize
f (h) f (0) + D1 f (0)h1 + D2 f (0)h2 + ... + Dn f (0)hn +

ij
where
ki Di Dj j f (0)hi hj ki !kj !
ki + kj = 0.
We can write out the summed term explicitly as
ij
Now the
ki Di Dj j 1 2 f (0)hi hj = D1 f (0)h2 1 + D1 D2 f (0)h1 h2 + ... ki !kj ! 2
k th
order term can be written as
j1 j2 jn jn 1 f (0)/(g1 !g2 !...gn !) hj D1 D2 ...Dn 1 ...hn ji +j2 +j3 +...+jn =k
Lecture #12
One problem on the exam that gave people trouble was proving that the graph is a manifold.
f :V
graph(f ) Then we have that
R M V RR g (v, z ) = (f (v ), 1) = 0
in
{(v, f (v ) : v V } = M .
Then we see that
v )R.
Therefore, we
conclude that the graph of a function is always a manifold. Now lets return to Taylor Series. Supposet we have a function
f :V R
which is
k+1
dierentiable (we will denote this as
C k+1 )
which has a basis
(v1 , ..., vn )
of
V.
The statememt that
is
is then the statement that
j1 j2 j3 D1 D2 ...D3 f (a)
exists provided that in
V.
We dene the
j1 + j2 + ... + jn k . Now k th Taylor polynomial Pk (h) =
we will try to approximate
f (a + h)
for a small
vector
(h1 , ..., hn )
j1 +...+jk k
j1 j2 j3 D1 D2 ...D3 f (a) j1 n h1 ...hj n j1 !...jn !
19
Now we will have to assume that the domain contains the line between
and
a + h.
The MVT tells us then that
on that line where
f (a + h) f (a) = f (a), h
or in other words
f (a + h) = f (a) +
i=1
The
Di f (c)...hi
multi-dimensional Taylor Theorem is then:

f (a + h) Pk (h) =
j1 +...+jk =k+1 j3 j1 j2 f (a) j1 D2 ...D3 D1 n h1 ...hj n j1 !...jn !
Note that
P0 (h) = f (a)
by how we dened the Taylor Polynomial.
Proving the several variable Taylor Theorem will follow a similar logic to the proof for the multivariable MVT and the 1-variable Taylor Theorem. We do not do this in class, but basically:
Proof:
Reduce to 1-variable Taylor theorem by using
g = f ((t)) where is the standard parametrization from

, then
the points
Corallary: If all the k + 1 derivatives of f Example (k = 2):

f (a + h) = f (a) =
i=1
to
b.
near
j1 jn ...Dn f a are continuous, D1
f (a + h) = Pk (h) + O(||h||k )
Di f (a)hi +
i=j
Di Dj f (a)hi hj +
1 2
2 2 Di f (a)h2 i + O (||h|| ) i
or in other words, that
Now to analyze the second derivatives, we will rst assume that
dfa = 0,
Di f (a) = 0
for all
i.
Now how do we know if a critical point is a max or min?
f (a + h) = f (a) +
i=j
Now writing this down explicitly for
Di Dj f (a)hi +
we have
1 2
2 2 Di f (a)h2 i + O (||h|| ) i
n = 2,
1 2 f (a + h) = f (a) + (a h2 1 + 2bh1 h2 + ch2 ) 2

Now we are asking, when is this quadratic polynomial,
a > 0, c > 0
and that
ac > b
2 (a h2 1 + 2bh1 h2 + ch2 ) > 0
for
(h1 , h2 ) = (0, 0)?
Need
and in this case, we can be ensured that we are on a maximum or a minimum (to see
this simply solve for the roots of the quadratic). Now what does this mean in terms of the matrix
1 1 2 d f (h, h) = (h1 , ..., hn ) (Di Dj f (a)) 2 2

whe will denote this matrix
h1
. . .
hn
A=
Then we see that there is a Min Max and that the det(A) transformed matrix
a b
b c
a a
> 0, c > 0 < 0, c < 0 (v1 , ..., vn ) (v1 , ...vn )

so that the
> 0.
In general, we can transform the original basis
is diagonal. Then we can write the Taylor Polynomial in terms of the eigenvalues
f (a + h) = f (a) +
where the if all
1 2
i (hi )2
i=1
i = Di2 f (a). i > 0.
We then see that
i < 0 means that the function is a minimum and, conversely a maximum
20
Lecture #13
Today we will start the Inverse Function Theorem. Suppose we have a linear map:
T :V V
which takes
0 0.
Can we solve
T (v ) = w
for
v,
given an arbitrary
w V.
Clearly, this can only be true if
is
onto and is injective (i.e. bijective). We showed in 25a that there is a unigue linear map that satises this:
T S (w ) = w
and we typically call this corresponds to an
T 1 .
Now if we choose a basis
{v1 , ..., vn } y1
. . .
of
Rn
dened by
v (c1 , ..., cn ),
the
nn
matrix
A.
In general, we have a system of linear equations
A
This is
x1
. . .
xn n
linear equations in
yn
unknowns. We remember that we can nd such an invesre function if and only if the
determinant of
is non-zero.
The whole point of the inverse function theorem is to generalize this to non linear maps. Suppose we have a (generally non-linear function)
f :V V
which takes
0 0.
The question is: can we invert
in a small neighborhood of 0. In other words, we are asking
if there exists a function
g:V V
dened in an open neighborhood of 0 such that
f (g (w)) = w.
It turns out that
sometimes you can do this and sometimes you can't. Here is an example when you can't:
Example:
f ( x) = x2
clearly, we have instance,
f (0) = 0 as required, but we see that for every y in the domain of f there is a non-unique x. For 2 2 and 2 2. A necessary condition for the existence of g is assuming f and g are dierentiable at f g = dfg(0) dg0 = = I I L(V, V )
0. Applying the the Chain Rule, we have
Theorem:
function
If
in a neighborhood of with
Expansion of
0 0 is continuous at v = 0 and df0 is invertible in L(V, V ), then an inverse f g (w) = w and dg0 = (df0 )1 L(V, V ). Now we can apply the Taylor g (h) = g (0) + dg0 (h) + ... where we already know that dg0 (h) = (df0 )1 (h) + ... We can then calculate
where
f :V V
the ... using an iterative process. Now lets talk about the inverse function theorem in 1 variable. Lets assume that function as a power series:
is a eld. Lets dene our
f (x) = a1 x + a2 x2 + ...
where we assume that the exists a power series is:
a are in F . Notice that automatically f (0) = 0. The question on is whether or not there g (x) = b1 x + b2 x2 + ... such that f (g (x)) = x. Now how do I evaluate f (g (x))? Explicitly, this a1 (b1 x + b2 x2 + ...) + a2 (b1 x + b2 x2 + ...)2 + a3 (b1 x + b2 x2 + ...)3 + ...
Note that in the second term there are no x terms and in the third term there are no x terms. So lets reorganize out function in orders of
x.
2 = a1 b1 x + (a1 b2 + a2 b2 1 )x + ...
Now if we want the inverse function is to exist, we want all of the coecients of higher powers of for the coecient of
to be zero, and
to be equal to 1. This immediately puts two conditions on
a1
and
b1 :
b1 a1
= =
1 a 1
21
Now that we have solved for
b1
we can iteratively solve for all of the
bi
. Now lets look at the second coecient:
(a1 b2 + a2 b2 1) = 0
Now that we know by etc., etc. So why have we not just proved the inverse function theorem? Well, the series does not necessarily converge: we do not actually know anything about the coecients assure convergence, we need an iterative procedure to calculate Let's do an example.
b1 ,
this is just one equation for one unknown so we can solve for
b2 .
You can then solve for
b3
bi .
To
that gives a convergent series.
f (x) =
Now we write
dx = log(1 + x) = 1+x
(1 x + x2 x3 + = x xn n!
x2 x3 + 2 3
g (x) = x + b2 x2 + b3 x3 + b4 x4 + ... =
n1
1 so b2 = 2 . Then we can solve for bi through an iterative process. Now we will look at how Newton solved for inverses. Suppose we have a function
f (x) =
and we wish to solve for
1 1+x
such that
f (g (x)) = x
Then we have that
g (x) = 1 1 + g (x) = g (x) = 1 + g (x)

We then note that we can write
g (x)
as the power series
g (x) =
n0
and similarly
nbn xn1
g ( x) =
n1
Then we can get the coecients of
bn x n
n1
since we have
nbn = bn1
Then we have
bn
= =
bn1 n 1 n! r
such that
Now we talked briey about the Euclidean algorithm (which is not central to the course). Newton's method for nding roots of algorithm for nding the roots of
f (x).
We dene a root as a number
f (r) = 0.
Newton's
is the following:
f (x0 ) f (r) = f (c)(x0 r)

Where we have chosen
x0
arbitrarily (but in practice, it is generally near the root). Now we make the approximation
r x0
Now we do the same process for
f (x0 ) = x1 f ( x0 )
x1
until we nd the root.
22
Lecture #14
Today we will prove Newton's method for nding rootsr, the solutions to the equation nding roots will actually give us a general method for nding solutions to as
f (r) = 0.
(A method of
f (x) = b = 0,
because we can take
f b
f.)
Suppose we have a function from a closed interval to the reals
f : [a, b] R
and that
f (a) < f (b) >

Suppose
0 0
i.e.
is continuously dierentiable and
f ( x) > 0
on
[a, b],
is increasing on
[a, b].
(Why does the former
imply the latter?
Try the mean value theorem.)
Now, intermediate value theorem imples that there is a root
r [a, b],
and because
is increasing, the root must be unique.
Newton uses an iterative method to nd what this root is. 1) Choose some 2) Take
x0 [a, b]. f (x0 ) f (r) f (r)(x0 r) f (x0 )(x0 r) xn+1 = xn

f (xn ) f (x0 ) .
and solve for
to get
r x1 = x0
Let
f (x0 ) f (x0 )
3) Iterate: Calculate
Example :
Let
To calculate the square root of 2, take the function
f (x) = x2 2.
a=1
and
positive derivative, since
x0 = 1.
Then
f (x) = 2x and is greater than zero on x1 = 1 + 1 4 and we can continue this process.
our interval. (The maximum of
b = 2. f (x)
We have a is
M = 4.)
or
All of these ideas rely on one simple mathematical concept, which is the to prove the Inverse Function Theorem.
Contraction Mapping Theorem
the contraction xed point theorem. We will need the theorem to prove validity of Newton's Method and eventually
: [a, b] [a.b] with |(x) (y )| < k |x y | for x, y [a, b]. Contraction Mapping Theorem: Let be a contraction mapping. Then has a xed point, that is a point such that (x ) = x . More precisely, let x0 [a, b], and then x1 = (x0 ), and xn = (xn1 ). Then {xn } x (the
A contraction mapping is a continous function for all some xed
Denition :
0 < k < 1,
sequence converges) and
|xn x |
kn |xn x | 1k
Continue this bounding process to get
Proof: |xn+1 xn | = |(xn ) (xn1 )| k|xn xn1 |.
|xn+1 xn | k n |x1 x0 |
Now choose
m < n,
and calculate
|xn xm | = |xn xn1 + xn1 xn2 ...xm+1 xm | |xn xn1 | + + |xm+1 xm | (k n1 + + k m )|x1 x0 |
Simplifying the right-hand side using the expression for geometric series, we get
|xn xm |
The sequence is Cauchy and we are working over point, call it
km kn 1k
R,
so we at least know the sequence will converge to some
x .
Then for any
n kn kn + |xn x | 1k n,
we have proved the
|xn x | |xn xn | + |xn x |

Because the above is true for any
n,
and because
|xn x |
is arbitrarily small for large
inequality given in the statement of the theorem,
|xn x |
kn |xn x | 1k
23
x is a xed point of . By an inequality above, |(xn ) xn | = |xn+1 xn | k n , so |(xn ) xn | 0 as n . Meanwhile, by the continuity of the absolute value function and of , |(xn ) xn | |(x ) x | as n , so |(x ) x | = 0, and so (x ) = x as desired. Finally, to show the uniqueness of the xed point x , consider a case of two xed points x and x . Then |(x ) (x )| < |x x | = 0, a contradiction of the denition of the contraction mapping. f (x) Back to Newton's method. Let (x) = x M where M is the maximum of f (x) on [a, b]. We will show this
Now we can show that is a contraction mapping, so by the contraction mapping theorem we have a unique xed point. This xed point will be the root, since
(x ) = x
Why can we apply the theorem? Assume so
: [a, b] [a, b].
Then
f (x ) = 0. 0 < m < f (x) < M on [a, b]. We have a < (a) < (x) < (b) < b, (x) m 0 < (x) = 1 f M 1 M = k < 1 so is indeed a contraction mapping, so we
is precisely
can indeed apply the theorem. Now we are going to use this to help us show the inverse function theorem. Choose
f : R R with f (a) = b and
f (a) = 0.
We need this condition, because by the chain rule, we can only nd an inverse for a nonzero derivative.
Thus our question is: for did have such a function Assume
y close g , then
to
b,
is there a function
we would also have
exists. We are going to try
and solve for to get
x.
Then
x a
f (a)y f (a) to
g such that f (g (y )) = y ? As we will show later that if we g (f (x)) = x, or g (y ) = x. to nd x such that g (y ) = x. Consider the equation y b = f (a)(x a) get what g is approximately. Then we are going to iterate this sequence
g.
The sequence
Proposition:
x0 = a, xn+1 = xn
f (x) = y for y (b , b + ) where interval ((b , b + ). Moreover, this g (f (x)) = f 1 (x) .
f (xn )y f (xn ) will converge to a point x = x which satises will depend on f (a). Thus x = g (y ) denes the inverse function in the
function is dierentiable and it's derivative is given by the chain rule, so
We are going to prove this proposition next time. But just imagine how this is going to work in several variables. Even worse, we will need a better mean value thoerem than what we have right now, since we need a mean value theorem for functions from
to
V,
not just from
to
and that will either kill us, or we will kill it.
Lecture #15
We are continuing to prove the inverse function theorem using the method of contraction mapping.
Contraction Theorem:
Suppose we have
: [c, d] [c, d]
and for
k < 1,
then
|(x) (y )| k |x y | x, y . Then there is a unique xed point x [c, d] : (x ) = x and this is obtained as the limit of the x0 , x1 = (x0 ), x2 = (x1 ), ... Generally, we will use a that is continuous on [c, d] and we assume that it is dierentiable with | (x|) k on [c, d]. We ran through a whole proof of the contraction theorem last class.
for all sequence
Inverse Function Theorem:
Suppose we have a function
f :DR
which is continuously dierentiable with that
f (a) = 0.
We want to construct a
g : (b , b + ) (a , a + )
such
f (g (y )) = y . Fix a y near b.
Then dene
y (x) = y (x)
= =
= |y (x)| =
Since
(f (x) y ) f (a) f (x) 1 f (a) |f (a) f [(x)| 1 |f (a)| 2 x

Now we can precisely state the theorem:
is continuous,
> 0
such that
x [a , b + ] = [c, d]
24
Let =1 2 |f (a)|. Then for all y [b , b + ] the map y (x) is a contraction map of 1 with k = . 2 We will take a moment to prove this, but notice the immediate, corallary:
Theorem:
[a , a + ]
[a , a + ].
x0 = a, x1 = (x0 ), ..., xn+1 = (xn ) converges to the unique xed point xn of in x in this interval where f (x ) = y . Since this works for all y [b , b + ], we get an inverse with y the unique xed point of y ,equal to g (y ). Proof that y maps the interval [a , a + ] within itself, for |y b| < : 1 1 We already have that |x a| < . Then |y (x) a| |y (x) y (a)| + |y (a) a| < + = . Note that 2 2
The sequence This is the unique
Corallary:
the second term in this string of inequalities has the properties
|y (x) y (a)|
and
1 |x a| 2
|y (a) a| =
Now we have a sequence of functions we have
1 |f (a)| by f (a) y = < = 2 f (a) f (a) |f (a)| |f (a)|

on
gn g
[b = , b + ]
with
g0 (y ) = a, g1 (y ) = a
n+1
h f (a) . Generalizing this, we have Taylor polynomial of g (b + h).
g1 (b + h) = a +
gn+1 (y = b + h) = gn (y )
f (a)y by f (a) = a f (a) . Then f (gn (y ))y . This gives us the f (a)
Now we need to prove the inverse function theorem for multiple variables? Lets start with the contraction theorem for a vector space
Contraction theorem for V :
V.
Suppose we have
:CC
where
is a compact subset of
V.
This compact subset is a generalization of the interval
[c, d] in the one dimensional
theorem. Then
||(x) (y )|| < k ||x y ||

for
k < 1.
Note that we will rst have to ensure that
is an inner product space. Then we will dene the convergent
subsequence as before, as
x1 = (x0 ), ...
In the one dimensional case, the mean value theorem was critical in proving the contraction theorem was applicable. Thus we will need to prove the mean value theorem on vector spaces in order to prove the multi-dimensional inverse function theorem. But before we do this, we will have to dene the operators
operator norm.
To put a norm on the linear
T : V W,
you need to x norms on both
and
since
v, v = ||v ||2
is dened only if both spaces have an inner product. If we have this, we can dene the operator norm as
Denition: ||T || is the smallest positive real number such that ||T (v )|| ||T || ||v || for all v V .
For example, if we have that
T = 0,
then
||T || = 0.
Usually we considered the Euclidean norm on vector spaces.
It will be useful to talk about dierent types of norms on vector spaces (the topic of next class).
Lecture #16
V and W , then we will also have a norm on L(V, W ). Now we dene the operator norm ||T ||, T : V W, to be the least positive real number M such that ||T (v )||W M ||v ||v for all v V . In fact, M is the maximum value of ||T (v )|| on the sphere ||v || = 1. This means that we don't have to worry about extremizing v M over all of V . This M works for v B1 (0), but any v = 0 has a friend ||v || B1 (0) where we have dened B1 (0) as the boundary of the unit ball around the origin. To see that we only need to look at this subspace of V .
If we x norms on where
v ||v || 1 v ||T (v )|| = ||T ||v || ||v || ||T (v )|| T

25
1 T (v ) ||v ||
M M ||v ||
The proof that this number satises the properties of the norm is left for you to prove!
Non-Example:
Consider a linear operator
on the innite dimensional vector space
V = P (R).
T :V xn
so that Now
V nxn1 T . P (R) = {an xn + ... + ap : ai R} = a0 v0 + ... + an vn .

n
P x dP dx .
We can see that there is no
for this
||P || =
i=0
|ai | 1i
||vi || =
Therefore
||T (vi )||
= ||nvi || = n
so
T :V V
is not continuous for the topology assigned to
V. || ||2 and R with norm | |. Then this operator ||Tw || = ||w||. Then we will take v = w so
Now let's compute the norm of an operator that really does admit a norm. Suppose we have a linear operator
T :V R
so that
T (v ) = v, w
Consider
with an inner product
norm will also put a norm on the dual-space
V = L(V, R)
V.
In fact
||Tw (w)|| = | w, w | = ||w||2 = ||w|| ||w||V

Then
||TW (v )|| = |T (v )| = | v, w | ||v || ||w||

for all
v.
Now lets consider another norm on
V,
the sup norm dened by
||v || = max |xi |

and again the absolute value norm
||
on
R.
Then what is the norm on the dual space
L(V, R)?
We see that
T(
where
x i vi ) =
ai xi
a i = T ( v i ) R.
Then
T =
where
ai vi
vi
is the dual basis of
V
n
where
vi ( vj ) =
1 i=j . 0 otherwise
n
Then
|T v |w = |
i=1
So
ai xi |
i=1
||ai xi | =
i=1
|ai ||xi | max |xi |

i=1
|ai |
i=1
|ai |||v || |ai |,

and
n n i=1 |ai |. To show equality, nd a v such that |T (v )| = i=1 |ai |. Then we would have||T || = and we would have have a 1-norm on V . We see a general pattern that 1 p . If we have a p-norm on
||T ||
a 1-norm on
then we will have a
q -norm
on
where
1 1 + =1 p q
We ended class with nding the norm of
when we dene a 1-norm on
and
R.
26
Lecture #17
Recall our discussion of norms from last class. We described how a norm on the spaces
V and W
gives a norm on
L(V, W ).
For
T : V W,
we dene a norm like this:
||T ||L(V,W ) = max

i.e. the max value of number such that (1) nd (2) nd have shown
value of
||T v ||W
for
v B1 ||T || M.
respectively, so you will = the least positive
||T v ||W with ||v ||V = 1. ||T v || ||T || ||v || v V. A some upper bound for ||T v || on the a vector v B1 with ||T v || = M .
We can equivalently dene the norm as general strategy for nding
||T ||
is to
boundary of the unit ball, call the bound
If you show the above two statements, you will have shown
||T || M
and
||T || M
||T || = M . Theorem: For T self-adjoint,

Now consider
then
the 2-norm on the domain and range of
||T ||
for general
||T || = maxi |i | = |largest | when the operator norm is taken with respect to T . This can be proven using the schematic above. operators T : V W , T given by matrix A. Let V and W both have the
norm. Claim:
||T || = M = max(||Ai| ||1 ).

m n
Show: We follow the schematic outlined above.
v = i=1 xi vi . If T (v ) = i=1 aij xj . Then ||T v || = maxi (|yi| |) = |yk | (for some k ) = | akj xj | | a || x | ||v || j |aij | where in the last step we have used the fact that we are taking the - norm on V . kj j j (2) Suppose M = ||Ak ||1 for some particular k . Let j =sign(akj ), or j = 0 if akj = 0. Let v = ( 1 ,... n ). Note ||v || = 1. Then we easily show that ||T v || M . So now we have proved the claim ||T || = M = max(||Ai| ||1 ).
(1) Write Now we are going to talk about the mean value theorem for higher dimensions. You should recall the statement of the mean value theorem for
f : V R.
The statement and proof of the analagous (but weaker) statement for
f :V W
proof.
relies on operator norms. In fact, the theorem says that
||f (a) f (b)|| ||dfc || ||b a||
for some
on
the line between
and
b.
We won't prove this in class, but you should look in the book to get the avor of the
We will use this multiple-dimensional mean value theorem when we prove the inverse function theorem. We will prove the inverse function theorem in the special case with
df0
the identity in
L(V, V ).
The general case only requires
f : V V , 0 0, f continuous and dierentiable, f : V V continuous and dierentiable with

maps
df0 invertible.
The general case can be reduced to the special case, so our argument from the special case sucient.
We reduce the general case to the special case by replacing and then by replacing
with
1 f df0 ,
so that
has
f rst with a translated function, so that f df =the identity.
0 0,
Lecture #18
Today is a guest lecture with Sarah Koch. Today we prove (or almost nish proving) the inverse function theorem as stated in Edwards (Theorem 3.3).
f : Rn Rn is C 1 in a neighborhood W Rn of a point a Rn . (Recall that C means continuously dierentiable.) Supppose that dfa is invertible. Then f is localy invertible at a; that is, there exist open sets U W , U a, V containing b = f (a), and a one-to-one map g : V U such that g (f (x)) = x x U , and f (g (y )) = y y V .
The theorem reads: Suppose that the mapping
Some remarks on the inverse function theorem: the theorem basically tells you that if a function's derivative is invertible at a point, then the function iteslf is invertible in a neighborhood of that point. In analysis, you often nd cases like this where a function is shown to mimic some behavior of its derivative. Another important note: the inverse function theorem is local! Even if you have local inverses at all points, you cannot necessarily stitch them together into a global inverse. Consider, for example, the function
f : R2 R2 (r, ) (er cos , er sin )

You can visualize this as the collapse of a helix to a circle. theorem, so the function has local inverses everywhere. The function is not globally injective, so it is not globally invertible, but at every point the function nonetheless satises the conditions of the inverse function
27
We are going to prove some lemmas before we prove the inverse function theorem. You should already have the preliminaries in place: operator norms, the sup or or 0 norm that takes the maximum component of the vector, etc. Here is a useful corollary to the multivariable MVT: Let neighborhood of the line L with endpoints
f : U Rm be a C 1 map, with U Rn a : R Rm is a linear transformation, then

n
Edwards Corollary 2.6:
and
a + h.
If
|f (a + h) f (a) (h)| |h|0 max ||dfx ||

xL
The idea to prove the above is to apply the MVT to a function taking
x f (x) (x).
The proof is in Edwards.

Let
We will also use another corollary to the MVT:
U Rn
be an open set containing the cube
df0 = I h = x.
. It follows that if
||dfx I || <
x Cr ,
Cr and let f : U Rn then f (Cr ) C(1+ )r .
be a
C1 map
such that
f (0) = 0
and
This is another corollary in Edwards, and the idea of the proof is to apply the previous corollary to We need one last corollary from Edwards.
= df0 = I ,

Let
f : R n Rm
be a
C 1 map
at
a Rn .
If
dfa : Rn Rm is
injective, then
is injective on a neighborhood of mapping on a
aR
We will also use the conraction mapping theorem proven earlier: Let compact set
C,
with contraction constant
k < 1.
Then
has a unique xed point
: C C be a contraction x C .
f : Rn Rn be a C 1 map such that f (0) = 0 and df0 = I . Suppose also that ||dfx I || < 1 x Cr . Then C(1 )r f (Cr ) C(1+ )r . Moreover, if we dene V to be the interior of C(1 )r and dene U 1 =intCr intf (V ), then f : U V is bijective (and therefore has an inverse g ). The map g is dierentiable at 0 and it is the limit of the sequence dened by g0 (y ) = 0, gn+1 (y ) = gn (y ) f (gn (y )) + y.
Let
Edwards Lemma 3.2, Fundamental Lemma:
We are ready to prove the fundamental lemma in our proof of the inverse function theorem:
Proof of the fundamental lemma (Edwards Lemma 3.2):

We already have from Cor 2.7 that
corollaries. Why? We use Cor 2.6 with
f (Cr ) C(1+ )r . We = df0 = I to see that
can also show that
is injective using the above
|f (x) f (y ) (x y )|0 |x y |0 x, y Cr
We can rearrange this inequality to get
(1 )|x y |0 |f (x) f (y )|0 (1 + )|x y |0 x, y Cr

The above gives us that
restricted to
cleverly use the contraction mapping. Fix
Cr is injective. Now y C(1 )r . Dene y : Rn Rn
we show that
C(1
)r
f (Cr ).
To show this, we
y : x x f (x) + y
To show that We can write
is a contraction mapping, rst we need to show that
maps
Cr
to
Cr
(and not outside of
Cr ).
|(x)|0 |f (x) x|0 + |y |0 = |f (x) f (0) + df0 (0)|0 + |y |0 |y |0 + |x|0 max ||dfx df0 || (1 )r + r = r,
xCr
which shows that
y maps Cr to Cr
(and not outside of
Cr ). z Cr .
Then
Now we need only to show that
actually contracts. Fix
|y (x) y (z )|0 = |f (x) f (z ) (x z )|0 |x z |0

by the inequality we derived at the start of our proof of this fundamental lemma. Then we can use the contraction mapping theorem to say that
has a unique xed point. But this unique xed point is
f 1 ( y ) !
28
Lecture #19
Today we start integration. We are familiar with the interpretation of the integral in single variable calculus. If we have a function interval
f : R R,
than the integral
b
a
f (x)dx
is interpreted as the area under the graph of
on the
[a, b].
In order to generalize this notion of integration, we have to rst generalize our idea of area. The
way that we are going to do this is to generalize simple regions in
Rn
(namely rectangles) and then use this to
approximate more general regions. We dene the area of a rectangle as
a(R)
where
R,
the rectangle, is dened as
R=
Then
[ai , bi ]
i=1 n
a(R) =
i=1
(ai bi )
in
Denition:
1.
We say that a subset
SV
has an area
a(S ) 0
if the following is true:
a nite inner cover of
by an open disjoint rectangles
0 Ri
where
0 n i=1 Ri S
and
m 0 a(Ri ) > a(S ) i=1
where 2.
is has the same denition as
except now we are considering open subsets
(ai , bi ).
a nite over cover of
by closed rectangles
Ri
where
n i=1 Ri S
and
a(Ri ) < a(S ) +

i=1
Not all sets have an area (or content).
Example:
S=R
then
a(S ) = a(R)
Non-Example:
S [0, 1] R
of rational numbers in The function 1. 2.
[0, 1].
You can see that the greatest lower bound for the area is 1 and the least upper bound
for the area is 0. Since these are not the same, there is no dened area.
a(S )
has the following properties:
S S = a(S ) a(S ) S, S
are disjoint
= a(S S ) = a(S ) + a(S ) f : R R+
Suppose that we have a function
So > 0, > 0 such that |x x | < . Therefore a |f (x) f (x )| < ba on the entire inerval [a, b]. Now we choose an N so large that bN < , so that we divide the interval into N equal parts. We can then construct a lower choosing the minimum value of f (x) on each interval
Proposition: If f is continuous on [a, b] then a(S ) exists. Proof: f is uniformly continuous on compact set [a, b].
29
[xn , xn+1 ].
To create an upper cover, we can do the same thing but just choose the maximum value of
f (x)
on each
of the intervals
[xn , xn+1 ].
We dene the minimal rectangles
0 Rn
and the maximal rectangles
Rn .
Then
0 a(Rn ) a(Rn )=
ba ba |f (xmin ) f (xmax )| = N N ba N f : Rn R+ .
He did this by calculating the
Now we will have to generalize this to the case where we have a continuous function outer cover is identical to the way that Archimedes found an approximation for areas of inscribed and circumscribed n-gons of a circle. He went up to
This technique of determining the area by nding the upper limit of an inner cover and a lower limit for the
n = 96
to determine that
3
which gives us the rst two decimals of the area function 1.
10 1 <<3 71 7
.
properties of the integral function
We will dene the integral in terms of the converging value of inner-covers and outer covers. The properties of
a(S ) are the same as the b c b f = a f + c f for any c [a, b]. a m

is the min of
f (x)dx.
2. if
on
[a, b]
and
is the max of
on
[a, b],
then
m(b a)
a
3. Consider the function
f M (b a)
F (x) =
a
for
x [a, b].
Then
F (x)
is dierentiable and
Proof: F (x + h) F (x) =
x+h
a
on
x
a
F (x) = f (x). x+h = x f where F (x + h) F (x) max(f h

on
min(f
then as 4. If
[x, x + h])
and
[x, x + h])
h0 x
a
we have that
min(f )
then
max(f ) f (x).
is a function with
G =f
b
a
f = G(b) G(a)
So
Proof:
f = F ( x)
also satises
F = f.
then tells us that the function is constant. Therefore
F (x) G(x) has a zero derivative. The Mean Value Theorem C = G(a) for F = G + C and 0 = G(a) + C .
Lecture #20
Review of the end of last lecture: Consider a continuous function
f : [a, b] R+
where the greatest lower bound and the least upper bound of the content approach each other. This means that
f = a(S )
For
x F (x) = a f = a(S ) Theorem: F (x) is dierentiable on [a, b] and F = f with F (a) = 0 Proof: First we need to prove that the derivative exists at x = c. We x [a, b],
we dene the fumction
have
F (c + h) F (c) h0 h lim
c+h = =
a
c+h
h0
f h h
c
a
lim
30
We will show that this exists by using our bounds on the integral
c+h min(f )
As
h f
max(f ) c.
Then
h0
then
min(f ) f (c)
and
max(f ) f (c)
since
is continuous at
F (c) = f (c)
as required. A corallary to this theorem is known as the fundamental theorem of calculus
Fundamental Theorem of Calculus:

G(b) G(a)
If
is any dierentiable function on
[a, b]
with
G = f,
then
b
a
Proof:
We know that
G =F
on
[a, b].
Then
G = F + G(a).
Then we have that
(G F ) = 0 = G = F + C
on
[a, b]
where we have used the mean value theorem. Now we evaluate this at
x=a
G(a) = F (a) + C = C
Then
F (b) = G(b) C = G(b) G(a).
The consequence of this theorem is that we can guess the integral of a
function and know that it is a unique solution (up to an additive constant).
Linearity of the Integral:
Want to show that
(f + g ) =
a
f+
a
Proof:
Let
and
be anti derivatives. Then
(F + G) = F + G = f + g
where we have exploited the fact that the derivative is linear. Then we have that
(f + g )
(F + G)(b) (F + G)(a)
= F (b) + G(b) F (a) G(a) b b = f+ g

a a
Substitution:
on
If f is continuous [c, f ] = [f (a), f (b)]. Then
on
[a, b],
and has a continuous, positive derivative
f (x) 0
and
is continuous
g (u)du =
a
g (f (a))f (a)dx G =g
on
Proof:
consider
We will use the chain rule and the Fundamental Theorem of Calculus (FTC). Let
[c, d].
Now
(G f )
= =
G (f (x)) f (x) (g f ) f
Then we will have
g f f = G f (b) G f (a) = G(d) G(c)

Eventually, we will generalize this theorem to several variables. It will turn out to be
f (A) g =
g f (det(df ))
A
31
The term apparent
Integration by Parts:
det(df )
is called the
Jacobian.
a b
We will state this a little dierent than is conventional so that the proof will be more
f g +
a
f g = f (b)g (b) f (a)g (a)

We can just pluf this into the integral
The proof of this relies on the product rule.
(f g ) = f g + f g .
b
a
(f g )
and
see, by linearity of the integral, the desired result.
Cute Example of Integration by Parts:
0 /2
sin2 (x)dx = sin cos |0

/2
/2
+
0
cos2 = (1 sin2 )
So we have
2 =
0 0
/2
sin (x)dx
2
/2
=
0
1= 4
/2
sin2 (x)dx
Theorem:
The set
in
Rn
has a content
as the limit points from both
and
its boundary A A0 . Then R2 A = A A = {real

numbers in
is negligable. We will dene the boundary
[0, 1]} A
so
A = [0, 1]. Proof: Choose
> 0.
I give you a union
0 j . Ri A R
Since we know that
has a content, we know
a(Rj )
So let
a(Ri ) < .
j R0 = union of closed rectangles. (I will defer to the proof in Edwards for this one). Bij = R i n Theorem: If f : R R which is bounded with bounded support and is continuous except on a, a negligable n set in R . Then f exists. Such functions are the most general types of functions that can be integrated (but S
they are not the only types of functions).
Lecture #21
There are three major topics for the upcoming midterm exam 1. Local analysis of
f :V R
near
a V.
This involved many dierent types of derivatives:
dfa , Dv F (a),
Dv (Dw f )(a),
etc. You will be expected to know the form of Taylor polynomials in both one dimension and
multiple dimensions as well as how to analyze critcal points of a function using the second derivative. 2. Inverse function theorem. You should know the statement of this theorem although you will not be required to prove it. It will be important to understand the tools used to prove the inverse function theorem such as contraction mappings, operator norms, and the MVT in single and several variables. 3. Integrals of functions
f : R R.
Should know the funcamental theorem of calculas, integration by parts, and
substitutions. The emphasis of this exam will be on the rst two parts, though integration will be included. We will skip over the next section in the book which is called Fubini's Theorem. This theorem allows us to
integrate functions with rectanngular domains by integrating each variable seperably. Instead, w;e will focus on the change of variables formula. The statement of this formula is as follows. Suppose we have functions that
and
such
g:V f :V
V R+
32
where f is continuous and g is dierentiable. det(dga ) = 0 for all a A. Then
If
takes a subset
A V g (A)
where
dg
is invertible on
A,
then
f= (f g )| det dg |
A
g (A)
in one dimension, the change of variables formula is just the substitution formula.
Example (Polar Coordinates): The goal is to make the change of variables:

(r, ) (x, y ) = (r cos(), r sin())
Consider the matrix form of
dg : dg(r,) = cos sin r sin r cos
We can then compute the determinant:
det(dg(r,) ) = (r cos2 + r sin2 ) = r

This is why in general we have
f (x, y )dxdy =
f (r cos , r sin )rdrd

Then
Consider the case when
g =T :V V
g (A)
a A T
is an invertible transformation
dga = T is independent of a A. det(T ) = 0 R . Then f= f T | det T | = | det T | f T

is linear,
dga
is invertible for
Now consider the super-special case where So when
f = 1 on V . Then the LHS= a(T (A)) and the LHS= | det T | a(A) g = T is linear and f = 1, the change of variables theorem is just the statement of the following theorem Theorem: If A V is contented and T : V V is linear, then T (A) is contented and a(T A) = | det T | a(A) So when g = T is linear, the change of variables theorem is the statement about linear operators on real vector spaces with a specied basis (v1 , ..., vn ). If we have that T =
then
1 0
If
0 2 2 = 0,
then
T (x, y ) = (1 x, 2 y )
and then
a(T A) = 1 2 a(A).
T (A)
lands in a proper subspace
W V.
Now we go about proving the change of vairables formula.
Proof of Change of Variables:

T
for
1. If the theorem is true for
and
T1 , ..., Tn
then it is true for
S , two linear operators, T1 T2 ... Tn . then
then it is true for
T S.
More generally, if it is true
a(T S )(A) = a(T (SA)) = | det T |a(SA)

and
det(T S ) = det T det S = | det T | | det S | a(A)

2. Show that any
is the product of some simple matrices, where we will check the formula by hand.
3. Verify the formula for out special matrices case
and any contented set
by checking it when
A = R.
In this
a(T (R)) = | det T | a(R)
Good luck preparing for the exam!
Lecture #22
Today we will prove the change of variables theorem. If we have a function where
f : V R along with a map g : V V
dg
is an invertible function on
A,
then
f=
g (A)
f g | det dg |
A
We will start by proving this for the special case where we have
33
1. 2.
f =1 g
is an invertible linear map
dg = g
is constant.
Then we need to show the change of volumes formula
vol(g (A)) = | det g |vol(A)

It is enough to prove this for a rectangle
Ri A Rj
We consider the dierence
vol(Rj )
Then we also need to show that
vol(Ri ) <
vol(g (Rj ))
and
vol(g (Ri )) < | det g |
g (Rj ) g (A) g (Rj )

for linear
g. T1 , ..., Tn = T = T1 T2 ... Tn
It is enought to prove the special case for linear maps
We will chek the formula for the linear maps and rectangles
A.
We will consider only the special linear maps
1
.. .
0 ci
.. .
Ci = 0 1
1ij Eij = 0
we then note that .. .
det Ci det Eij

We rst look at the properties of these matrices 1. 2. 3. 4.
= ci = 1
A Ci = scale Ci A = scale AEij = adds Eij A = adds
colummn row
iby ci
iby ci ito
column
column row
j to
row
i A
Now we note that we can use these special matrices in combination with each other to turn an arbitrary matrix into the identity. This will look like
...Ci AEij ... = I

We will check this rst for the rectangle
A = R = [a1 , b1 ] ... [an , bn ] g (R)

34
Lecture #23
Last time we proved the change of variables formula for the special case where we have a linear map
T :V V
which acts on a contented subset
AV
then we found that the volumes
v (T (A)) = | det T | vol(A)

This coresponds to the original statement of the change of variables formula when even more special case where
g=T
and
f = 1.
Consider the
A = [0, 1] [0, 1] ... [0, 1]

Then we have
vol(A) = 1
Then we need to show that
vol(T (A)) = | det T |

Lets start with
independent vectors
p1 , p2 , ..., pn
given by a basis of
V.
Then
T : V V
dened by
ei pi
so we have
n
standard cube of volume 1
parallelpiped
dened by the
pi = {
i=1
xi pi : 0 xi 1}
This has volume
| det(pi , p2 , ..., pn )| = det T

where the matrix is composed out of the
pi
column vectors. Now suppose we have any vector
vV =
where
yi pi = xi + mi L : V
by
mi
is an integer and
0 xi 1.
If I dene the lattice
L={
i=1
then every
mi pi : mi Z}
vV
is of the form
+z
where Let
L and z P . In the case {pi } be a basis of Rn . Then
where the
pi = ei ,
then this is a square lattice.
L={
is called the associated lattice. If we also have that
mi pi : mi Z }
B = large
what is a good estimate for
ball around the origin
#(L G)
vol(B ) | det T | V
by using equal radii
A sensible question would be: If we try to pack round balls into at pts
and centering the balls
L,
how much of
do we cover? It is not obvious, for example, that packing the spheres according to a
35
square lattice is better than packing them with respect to a parralelpiped lattice. First we note that if we have two spheres centered at
and
then the condition that they do not overlap is equivalent to
r || ||
where
is the radius of the balls. We can take
r=
where
1 ||||min 2 V
are we covering with these
||||min
is the minimal distance between two lattice points. What proportion of
spheres? Then we have that
proportion of
B covered
by the balls =#(L
B ) vol(Ballr ) =
C rn
vol(B ) | det T |
vol(B )
Then we want to nd prop of
B covered =
1 ||||min )n Cn ( 2 det T
Lets compute this for a square lattice. For this case, we have that
pi = ei
and
L = Zn Rn
so the proportion of
Rn
covered by balls of radius
r=
1 2 centered at integral points. Then
1 Cn ( 2 ||||min )n = .9069 det T 1 2 = 23 and ||||min = 1. Interestingly enough, the best spherical packing in 2 dimensions 3 2 is found for a hexagonal lattice...this is what bees use! There is a sketchy proof for this in 3 dimensions. What we
where
det T =
1 0
are interested is for an optimal packing in
dimensions. We know what is going on precisely in dimensions
= 2, 4, 8, 24
Noam Elkies proved the best packing problem for 24 dimensions. Lets prove the optimal packing problem for 2 dimensions. Now lets move to proving the change of variables formula for a nonlinear, continuous invertible and
a b.
Then there is a small rectangle
around
g : V V where dga is a such that for all x in this small rectangle, we have
the estimate that the operator norm
||dga dgx I || <

Then
(1 )n | det dga |vol(R) vol(g (R)) (1 + )n | det dga |vol(R)

So
vol(g (R)) | det dga |vol(R)

which would equality for the linear case.
Lecture #24
Theorem: Change of Variables
at all points of Suppose
A,
with
dga
invertible for all of
f : V R continuous, A V ; g: V V continuously dierentiable A, that is det(dga ) = 0. Then g(A) f = A f g | det(dga )|. Ri

such that vol(g (Ri ))
Note that we could cover that
with small rectangles
| det(dga )|
volRi . Then we have
f=
g (A)
f
Ri
f (g (a)volg (Ri )
36
Where the last equality holds by the continuity of
f.
This will be equal to
f (g (a))| det(dga )|vol(Ri ) =

A
f g | det(dga )|.
Proof:
We just went over the heuristic proof, but now we need to make it precise. We will not use epsilonssee Take
the book if you want tobut we will at least make it more rigorous.
1 ||dga dgx I || origin, equal to R
Lemma:
g : V V , a b, dga invertible, dg continuous. Let R be a small rectangle around a : x R, with respect to the sup norm on V . Let C be a rectangle around zero around the 1 translated by a. In other words, C = a (R) where a is the translation map, a (v ) = v + a.
for all
Finally, our lemma is then the statement that
1 dga (C1 ) b (g (R)) dga (C1+ )

A corollary of our lemma is that
|detdga |(1 )n vol(R) vol(g (R)) (1 + )n |detdga |vol(R)

Now, how can we prove our lemma? function theorem. Actually, we essentially proved the lemma when we proved the inverse Recall that we proved that if
f : V V , df0 = I ,
then
small
rectangle
around
where
||dfx I ||
for all
x R,
where
C1 f (C ) C1+
This older lemma is a special case of the lemma we want to prove, so we reduce the lemma we want to prove to this special case by using
1 1 f = dga b g a 1 df0 = dga dga = I
Improper integrals:
instance,
Now we discuss improper integrals.
contented region. An example is
ex dx =
Another is
An improper integral is an integral over a non-
1 x2 dx. We dene these improer integrals as, for
1 dx = lim t x2
1 1 dx = lim 1 = 1. t x2 t
In general, we dene the improper integral of the limit of the integral, as the region of the integral approahces the region we desire. Of course, these limits are not always dened. For instance,
1
has a well-dened nite value if
x dx = lim
a t 1
xa dx a = 1,
we get a logarithm, and the limit
a < 1,
but not otherwise. When
diverges. A similar calculation shows
x dx = lim
a 0
xa dx
converges for
a > 1.
This is a dierent sort of improper integral. The region under the graph is non-contented
because the function itself diverges.
Now we do the famous Gaussian integral
2 =
ex dx

2 2 ex dx = . Let = ex dx. Then by Fubini, 2 2 2 2 2 ex dx = ex dx ey dy = e(x +y ) dx dy.

R2
we have
That allows us to make the polar coordinate change of variables,
2 =
0
er r dr d = =
0
1 = 2
37
Next time we'll learn about another famous integral, perhaps the second most important integral, from Euler. It's called the gamma function, and it's dened by an improper integral.
(x) =
0
We'll show that
tx1 et dt.
(x + 1) = x(x) (x) = (x 1)!
Lecture #25
Last time we discussed the notion of an impropor integral. For example, consider the integral
xa = lim
t 1
xa = lim a < 1.
xa+1 t a + 1
=
a
ta+1 1 1 = a+1 a+1 a+1
We note that, as expected, this converges if integral:
As an excercize, you should also try evaluating the improper
xa
Now lets compute the improper integral that denes Euler's gamma function
(x) =
0
We note that this limit exists as
t
x1 t
e dt =
0,N
lim
tx1 et dt
as
x > 0,
then
x 1 > 1 and also et 1 1 ta dt
t 0.
We can compose this with
for
a > 1
. No as
N ,
we have
since the denominator is larger than any xed
N x+1 0 eN point of N . Now tx+1 et x1 t et tx1 et < 1 <
for
t m,
1 2 t
m
1 t2 x > 0,
we see that
Therefore, the gamma function is nite (i.e. it exists). Why is this integral important? When
(x + 1) =
0 x
tx et dt
xtx1 (et )dt
= t (et )
0
= x(x)
where in the second step, we have used integration by parts. You can see that the gamma function is closely related to the ! (factorial) operator. Now for another interesting property of the gamma function. We note that the gamma function is dened for all positive Then we can iteratively dene
x, not just the integers (for (x) : R {0, 1, 2, ...} R. Then (x)(1 x) =
38
which it is essentially the factorial operator).
sin x
This function has poles at al
x = n Z.
Notice how the gamma function acts on half-integers
1 = ( )2 = 2 sin 2
So
1 ( ) = 2 ei = cos + i sin
and it follows that
We can use the gamma function to approximate
Fun Fact: 52! 1067 .
n! log
of
The number of particles in the universe is on the order of
De Moivre was the rst to come up with a basic approximation for the the
1080 n! .
log(n!)
Consider
log 1 + log 2 + log 3 + ... + log n
F (x) F (x)
= =
x log x x log x + 1 1 = log x

n
Now De Moivre used the trapezoid method to make the approximation
log 1 + log 2 + ... + log(n 1) <

1
Then LHS is
log x < log 2 + log 3 + ... + log n
log(n!) log n
The integral is
n log n n + 1
The RHS is
log(n!)
So we have teh aprosimation
log(n!) n log n n +
Stirling was the one to nd the constant
1 log n 2
n!
nn n 2 en
Now we will talk about integration over manifolds. Consider a manifold
M V
Where the manifold is dened by the graph of a dierentiable curve
: [a, b] Rn
where
(t) = (x! (t), ..., xn (t)).
Nowwe will integrate over the entire curve by using the approximation of dividing
it up into many small lines dened by the tangent vectors all along the path
the length of the path, we compute the limit by dividing the path into small segments
(t) = (x1 (t), ..., xn (t)). t0 , t1 , ..., tn . So
To compute the length
s( )
i=1
|| (t) (ti1 )|| =

i=1
(xi (ti ) xi1 (ti1 ))2 + ...
Now we can dene the length of the curve as
s( ) = lim s(, T )
T 0
Where
Proposition:
|T |
is the length of your subdivisions in If
t. s( ),
dened as the abovel limit, exists and is equal to
(t)
is continuously dierentiable, then
s( ) =
a
|| (t)||dt
We will prove this next time.
39
Lecture #27
The notation for a functions that
= 1-form using the basis {e1 , ...en }. We use l(a)(ei ) = fi (a) for i = 1, 2, ..., n. This gives n fi : U R which determine the 1-form completely. Remember from linear algebra that v = xi ei so l(a)(v ) = xi fi (a) f xi
If
= df
then
l(a)(ei ) = dfa (ei ) =

so
fi =
f xi for
= df .
Then
{e1 , e2 } (f1 , f2 ) =
But remember that the partial derivatives are symmetric:
f f , xi x2
2f 2f = xi xj xj xi
So for this specic case, we have
f2 f1 = x2 x1
but this is not necessarily true for arbitrary 1-forms. Our notation for 1-forms once we have a basis is
df
Now how to we integrate a 1-form
= =
f1 dx1 + ... + fn dxn fn f1 dx1 + ... + dxn x1 x1
over some curve
CU V
where the image of image( ([a, b]))
=C
Denition:
=
C
( (t))( (t))dt C.
Note that this does NOT depend on the parameters of Choose a basis
{e1 , ...en }
Then
so that
(f1 , ..., fn )
is a function on
U.
(t) = 1 (t)e1 + ... + n (t)en

and we see that
( (t))( (t))
= ( (t))( =
i
i (t)ei )
i (t) ( (t))(ei )
Then
=
i
fi ( (t))i (t)dt
which is the sum of If
= df , then
of the path is a
n integrals in 1 variable. = f ( (b)) f ( (a)). The fact that the integration of a 1-form depends only on the endpoints C generalization of the fundamental theorem of calculus. It then follows that if C is closed, then df = 0
C
40
Note that even if we may not have that
= df
if the region
contains holes where we dene
and
as
= P dx + Qdy
so that
P ( (t))1 (t) +
a
Q( (t))2 (t)
To show this, lets consider the region
U = R2 {0, 0}
with
Q =
and then
y + y2 x 2 x + y2 x2
Py Qy
Now we ask, is
= =
(x2 + y 2 ) + 2y 2 y 2 x2 = (x2 + y 2 )2 (x2 + y 2 )2 2 2 2 (x + y ) 2x = Py ( x2 + y 2 )

?
P dx + Qdy = df
Lets compute the integral of this one form with respect to the polar coordinates over the unit circle:
=
C
Therefore,
= sin(t)( sin(t)) + cos2 + sin2
(cos(t))2 = cos2 + sin2
2 1
1 = 2
df =
Now we will consider the 2-form on
where
dim V = 2.
This will be a function
F :U R
where
= = =
F dxdy df = fx dx + fy dy (fyx fxy )dxdy = 0 = d2 f
This will be a fundamental, general property that
d2 = 0
For now, we will have to take it as a given that we dierentiate 1-forms in the following way to produce 2-forms
dw
Now we can state an important theorem
= =
P dx + Qdy (Qx Py )dxdy
Green's Theorem in R2 :
curves
Let
D.
Let
be a 1-form on
D be a compact, connected region in whose boundary consists of oriented, closed U D and d the 2-form. Then d =
D
41
This should remind you of the fundamental theorem of calculus
df =
[a,b]
as an example, if
f = f (b) f (a)
D
= xdy =
vol(D )
Lecture #29
Last time we discussed functions with
f :U R U R
n
. We called these 0-forms. We called 1-forms maps
: U L(V, R) = V
an example is
= df .
Today we will discuss 2-forms. These are maps
: U Alternating V
is a vector space of dimension
bilinear forms on
V = (2 V )
n(n1) 2
n 2 . If we have, for example
B :V V R
then
is bilinear in each product
B (v, w) = B (w, v ) = B (v, v ) = 0

A basis for
(2 V )
are the bilinear forms
Bij (ei , ej ) = 1 = B (ej , ei ) = 1. dxi dxj = Bij
Here is some notation. We will dene
for
i < j.
Now our general 2-form looks like
=
i<j
and our general 1-form looks like
fij (a)dxi dxj
=
that
fi (a)dxi dim V = 2
then there is only one pair such
As an easy example, lets write a bilinear form in 2-space. Suppose that
i<j
and we have
f12 (a)dx1 dx2

In three dimensions we would have the two-form
f12 (a)dx1 dx2 + f13 (a)dx1 dx3 + f23 (a)dx2 dx3

This matrix looks like
0 f21 f31
There is an operator
f12 0 f23
f13 f23 0
d : 1-forms
on
U 2-forms
on
U. =
i
=
i=1
fi dxi = d
dfi dxi (
i j
= =
i<j
fi dxj )dxi xj dxi dxj
fj fi xi xj
42
Remember that for 1-forms, we had the denition of a line integral as
and the theorem
( (t))( (r))dt
= df = f ( (b)) f ( (a))
There are equivalent relations for 2-forms. We will state, but not prove them. We can integrate a 2-form over a surface
D U.
If we have a map that takes
RU
where
is a rectangle and the image of
is
D,
then we have the denition
=
D
Note that the we also have the denition
fij ((s, t)) det

R i<j
i s j s
i t i t
(s, t) = (1 (s, t), 2 (s, t), ..., n (s, t))
Theorem:
If
= d ,
then
=
i
d =
D
This is a generalization of Green's theorem in question to ask is what is a
R2 . k k
There is 2-forms. But why stop there?!?!?! The next logical
k -form
for the case where
3 dim V
multi-linear forms on
We dene a k-form as
k : U alternating,
We have
dim(k V ) =
Then
n k
To gain a little intuition on these
B (vi , ..., vk ) is determined by the values B (ei1 , ei2 , ..., eik ) on basis elements. 3 form
on
forms, lets just write some out explitly.
R3
3 form
on
R4
B (e1 , e2 , e3 ) B (e1 , e2 , e3 ) B (e .e , e ) 1 2 4 B ( e 1 , e3 , e4 ) B (e2 , e3 , e4 )
In general, we can take any
k -form
to a
k + 1-form.
Lets try this in 3 dimensions
V = R3
lets try to take a 0-form to a 1-form
f (x, y, z ) = df = fx dx + fy dy + fz dz
Now lets take this 1-form to a 2-form:
= d =
f3 f2 y z
dydz
f1 f3 z x
43
dxdz +
f2 f1 x y
dxdy
this is known as the curl. Now lets dierentiate this
d2 = 3 =
f1 f2 f3 + + x y z
dxdydz
this is called the divergence. We see in this example some general propert of dierential forms. k-form The statement that
k+1-form k+2-form
d2 = 0
then becomes the equality of mixed partials. The other property of dierentials form is
the generalized Stokes Theorem
dk1 =
Dk
k 1
Dk1
Next time, we will talk about the calculus of variations!
Lecture #30
Today we will start calculus of variations and derive the Euler Lagrange equation. One basic problem in the calculus of variations is to minimize the value of some integral
F ( )
a
over the space of functions:
f ((t), (t), t)dt
: [a, b] [, ]
which are dierentiable. We could, for instance, minimze the distance of a curve between the straight line). Suppose we have functional
and
(this is clearly
F : C 1 [a, b] R C 1 [a, b] is the space maximize F on the subset

where of class-1 dierentiable functions on the closed interval
[a, b].
Now suppose we want to
M C 1 [a, b]
where M is the subset of functions that terminate on and This will generally not be the case, so true subspace. i.e.
is not a subspace.
. If and happen M is an ane space.
to be 0, then
is a subspace.
That is, it is the translate of a
M = C 1 [a, b](0,0) + 0
Lets investigate
dF : C 1 [a, b] R
which are the linear approximations to
at
At a critical point
dF = 0
on
T M = C [a, b](0,0) .
Lets dene another function
: ( , ) M
with the constraints that
(0) =
and
(0) = any
Then the function
vector in
T M
g = F : ( , ) R
has a critical point at 0 and
dF (any
vector in
T M ) = dF(0) ( (0)) = g (0) = 0
Now the question is: how do we calculate
dF ?
Note that
F ( + h) = F () + dF (h) + O(||h||)
44
but notice that we need a norm on
C 1 [a, b].
Then
F ( + h) F () =
a
Now our strategy will be to x
f ( + h, + h , t) f (, , t)dt
and estimate this.
f ( + h, + h , t) f (, , t) =
One might guess that
f f (, , t)h + (, , t)h + (quadratic x y f f (, , t)h + (, , t)h )dt = 0 x y
terms in
h, h )
dF(h) =
a
To show that this is true, we will need to evaluate the second part of this integral using integration by parts and exploit the fact that
h(a) = h(b) = 0.
a b
Lets deal with the second integral:
f f (, , t)h )dt = (, , t) y y
vanishes at
d dt
f (, , t)h(t) dt y
the left term is zero since integral. Now we have
h TM
and
b.
The second term is similar to the other term in the original
0=
a
d f (, , t) x dt
a b
f (, , t y
h(t)dt
Now, since
=0
for all
then we have the following famous second order dierential equation
d f (, , t) = x dt
asked to minimize the function
f (, , t y
Lets use this to solve the simplest application of the calculus of variations: the path of shortest length. We are
F ( ) =
a
1 + (t)2 dt 1 + y2
So we have
f (x, y, z ) =
and
f x f y
Lets plug this into the Euler lagrange equation
= =
0 y 1 + y2
f (, , t) = y
for a minimum of a maximum, E-L says that
(t) 1 + (t)2
d dt
This implies that
(t) 1+ (t)2
=0=
(t) (1 + (t)2 )3/2
(t) = (t)
= mx + b
45
as we would have expected. Now, we will start talking about what a reasonable norm is on our vector space all of the normal properties of the norm (triangle inequality, etc.) Remember that
C 1 [a, b] R.
This has to satisfy
||v ||1 ||v ||2 ||v ||

will use the norm
= = =
|vi |
2 vi
max(|vi |)
This suggests that the norm on a function space will require an integral. We will start with this next class, but we
|||| = max |(t)| + max | (t)|

t t
Lecture #31
We continue discussing the calculus of variations, using the same notation from last class. Recall that we have a function
Ft : V R b Ft () = f (, , t) dt
a
where
the subset of
V = C [a, b] is the set of continuous, dierentiable functions : [a, b] R. We dened M V where M is such that (a) = and (b) = for some xed and that dene M . The denition of the map f : R3 R
Ft
rests on the denition of some function
with
dierentiable.
We found that when we minimize Lagrange equation,
Ft on V ,
a necessary condition for a minimum is that
satises
the Euler-
fx (, , t) = S=
a
so we would want to use a function
d fy (, , t). dt S,
Let's imagine minimizing the surface area of a surface of revolution. We have the surface area
(t) 1 + (t)2 dt
f (x, y, z ) = x 1 + y 2 .
We won't nish solving this problem with the Euler-Lagrange equation. We'll use another method, discussed now. Euler noticed that there is a simplication of the Euler-Lagrange equations when the form of
does not depend on
z,
i.e.
does not vary explicitly on the third parameter,
t.
Re-writing the Euler-Lagrange equations with this
condition gets us Euler's rst integral:
fx (, )
d fy (, ) = 0 dt d fy (, ) = 0. dt with respect to t d fy . dt
fx (, )
Note that the LHS of the above is the rst derivative whose derivative has terms
of the function
f (, ) (t)fy (, ),
fx + fy fy
So we have Euler's rst integral equation, which says that
f (, ) fy (, ) = c = constant.
Let's try applying this to the problem of minimizing the surface area of a surface of revolution. We had
f (x, y, z ) = x 1 + y 2
46
fy = fx =
Then Euler's rst integral equation becomes
xy 1 + y2 1 + y2
which simplies to
1+2
2 1+2
=c
2 =
say
1 2 ( c2 ). c2 c,
We aren't sure how to solve a dierential equation of this form, but we imagine solving it for a xed value of
c = 1.
Then we have
2 2 = 1
which is the familiar form of a hyperbola. A hyperbola can be parameterized by
cosh(t) and sinh(t),
the hyperbolic
sine and cosine (look them up on Wikipedia!), so solutions to the above are parameterized by
t ((t), (t)) = (cosh(t), sinh(t)).

Sinh and cosh have the special property that they are each the derivative of the other, so we have the solution
(t) = cosh(t + d).
These curves are called catenary curves, and they're the way that strings hang from posts under
the weight of gravity. If we had specied
(a) = (a) = ,
then we would have
t ). = c cosh( c
We return to a point in our discussion from last lecture. We had a map
F :V R
and the dierential, a linear map
dFa : V R.
We glossed over the denition of this dierential in the innite-dimensional vector space denition of the derivative for nite-dimensional vector spaces. For map satisfying
F : V W,
we dened
V . Recall our old dFa as the unique linear
h0
But this denition requires a norm on
lim
F (a + h) F (a) dFa (h) = 0. ||h||

and a norm on
W.
Of course, in our case, we already have a norm on
W = R.
And the norm that we're going to put onV is the norm
|||| = max |(t)| + max | (t)|.

t[a,b] t[a,b]
A norm on
and
gives us an operator norm on
dFa .
But not all linear functions on innite-dimensional
vector spaces are continuous. For instance, consider the map on the space of polynomials that sends To be continued next class!
xn
to
n.
Lecture #32
We have dealt a lot with linear operators between nite dimensional vector space
T :V W
But what about innite dimensional vector spaces
V, W
such as the vector space of polynomials
P (F) = {an xn + ... + ao : ai F}

As an example, consider the eld
F = R, C
47
First we insist that
and
have the same norms
||
Then we insist that
|| : V R
T :V W
be continuous with respect to these norms. On
Rn , = = = |vi | v, v = max |vi |

In an innite dimensional vector space,
||v ||1 ||v ||2 ||v ||
2 vi
In all of these cases, the all of the norms tend to zero simultaneously. however, this may not be the case. For example, on
C [a, b]we
b
have
||||1 ||||2 ||||

These are not equivalent norms. Consider no equivalent.
=
a
||
a b
= =
atb
2 max |(t)|
whose graph is a triangle. As the base of this triangle goes to zero
(xing the height of the triangle), the 1 and 2 norms go to zero, while the
remains 1. Therefore, the norms are
Theorem: With respect to the sup norm, the vector space V = C [a, b] is complete. Proof: n is Cauchy. This means that
, N : n, m > N : ||n m ||0 <
but
axb
This means that for a xed which
max |b (x) m (x)| < {n (x)}

is Cauchy so there is a real number to
x [a, b],
the sequence of real numbers
n (x) (x).
Lets show that
n .
First x
> 0.
Then there exists
such that
||n m || <
Now for
n>N |n (x) (x)| = |n (x) m (x) + m x) (x)|
Hence
||n || <
so
n
But in
is just dened as a function [a, b] R. C [a, b]). To do this, we need to show that
We need to show that , chose
of continuous functions is continuous). To show that copy it). We say that a function
is continuous on [a, b] (i.e. that is indeed N so that ||n ||0 < 3 (A uniformly convergent sequence is continuous at x,chose > 0. (proof erased before I could
F :V W
on two complete normed vector spaces is dierentiable at
aV
if there is a continuous linear map
T :V W
48
such that
F (a + h) F (a) T (h) =0W h0 ||h|| lim T = dFa
Then
is unique.
Theorem:
The following are equivalent for a linear map
T :V W
of normed vector spaces
1.
a real number
M : ||T v || M ||v ||
for all
vV
2. T is a continuous map 3.
is continuous at
v=0
Proof:
1 = 2
By the continuity at
v.
choose
2 = 3
because this is just a special case of
||v w|| < = ||T v T w|| = ||T (v w)|| M
1 = 2
But remember that
3 = 1
consider
= 1. > 0
such that
||h|| = ||h 0|| = ||T h T (o)|| 1. ||T v || = ||v || T v ||v || ||v ||
||T h T 0|| =
||T h||
by linearity. Now for
v=0
for
1 works as a bound. We also have the following theorems
with
||h|| = .
If
Therefore,
M=
Theorem:
is complete, so is
BL(V, W )
since
is complete. Then we should call
BL(V, R) = V
We also have
V (V ) .
These are the topics of functional analysis, a subject developed by Hilbert and Banach.
49

Real Analysis

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Real Analysis

Uploaded by

Copyright:

Available Formats

Math 25b

Professor Benedict Gross Spring 2013

These notes are in draft form and may be incomplete.

is any linear transformation then:

lim f (a + h) f (a) T (h) = 0 W T (0) = 0.

dierentiable at a if there is a linear map T

f (a + h) f (a) T (h) ||h|| a.

It is easy to see that dierentiability at derivative of

are linear maps satisfying the denition of dieren-

for some small

||T (h) S (h)|| ||T (h) f (a + h) + f (a)|| + ||f (a + h) f (a) S (h)||

f (a) S (h) + f (a + h) f (a) S (h)||).

We then see that

we have that the RHS of the

RHS < 2 |t|||v ||

suciently small such that

||T (v ) S (v )|| < 2 ||v ||.

But we can take

3. We have a basis Then:

= f (x1 v1 + ... + xn vn ) = f (x1 , x2 , ...xn )

The second line simply states that, given a basis of

we can then think of in the direction

exists for all

from the total

for a glass of milk. We can write subsequent partials like this:

cos x2 + ex1 x2 + x1 x2 ex1 x2 cos x2 + ex1 x2 + x2 x1 ex1 x2

symmetric nxn matrix

is dierentiable at is a linear map

is an inner product space there is a unique vector

(v1 , ..., vn )is

an orthonormal bases then:

f (a) f (a) v1 + ... + vn x1 xn

is called the gradient.

if there is a linear map

f (a) + dfa (h)

directional derivative in the direction of v V

f (a + tv ) f (a) = dfa (v ) t dfa v

Since the directional derivative is just a special

we then we can write

and exploit the linearity of

f (v ) = f1 (v )w1 + ... + fm (v )wm

fi (a + tvi ) fi (a) fi = lim t 0 xj t dfa

of the linear operator

As a concrete example, we can consider the following map:

= x1 x2 + x2 3 = sin(x1 ) + ex2 + x3 f (a) = (4, e + 2). 1 1 0 e 4 1 a:

Plugging in, we get

Now we can use the total derivativeto approximate the function

The above notation is shorthand (we have to remember that

(4, e + 2) + (h1 + 4h3 , h1 + eh2 + h3 ) : R2 R f (x, y ) =

As another exmaple, consider the function:f

there exists a unique

f (a) f (a) f (a) ei = ( , ..., ) xi x1 xn f (a) = ei , f (a) xi

||v || ||f (a)|| cos() ||f (a)|| cos()

has a local max or min at

is dened on an inner-product spaces. We discussed last lecture

has a local max or min at

it points in the direction of the maximum increase.

We can see both of these properties because:

dfa (v ) = f (a), v = ||f (a)|| cos

level curves are dened as:

2 f (x1 , ..., xn ) = x2 1 + ... + x2 .

For the the level curve at

a = (1, 0, 0, ..., 0),

and the level curves dene a hypersphere or radius 1 and dimension

dierentiable at a if there is a linear map T

It is easy to see that dierentiability at derivative of

are linear maps satisfying the denition of dieren-

suciently small such that

for a glass of milk. We can write subsequent partials like this:

is dierentiable at is a linear map

is dened on an inner-product spaces. We discussed last lecture

level curves are dened as:

and the level curves dene a hypersphere or radius 1 and dimension

f is dierentiable at a V and g d(g f ) = dgf (a) dfa . In words:

Now lets dene

algebraic manipulation. We will have to estimate the dierence:

is bounded. To nish the proof, we have to show that:

V as dening a direction in the domain. dfa (v ) = lim

Now, we can nally write the matrix of the total derivative

is constant. We dene connected as

so it has a max or a min. Lets dene a function

be dened on a convex region

is as before path connected and