You are on page 1of 49

Math 25b

Professor Benedict Gross Spring 2013


Irineo Cabreros, Daniel Ranard, Marina Lehner

These notes are in draft form and may be incomplete.

Lecture #1
Consider two nite-dimensional inner-product spaces continuous at

and

over

R.

A function

f : V W

is said to be

aV

if

h0
in

lim f (a + h) = f (a)

W. f

Equivalently:

h0
If is continuous at

lim f (a + h) f (a) = 0

and

T :V W

is any linear transformation then:

h0
This follows because

lim f (a + h) f (a) T (h) = 0 W T (0) = 0.


such that:

is continuous and

Denition: f

is

dierentiable at a if there is a linear map T


h0

lim a

f (a + h) f (a) T (h) ||h|| a.

=0 T
exists, it is unique is called the total

It is easy to see that dierentiability at derivative of

implies continuity at

If such a

at

and is denoted:

T dfa : V W
To show that this linear map is unique, suppose both tiability. We need to show that

and

are linear maps satisfying the denition of dieren-

v V , T (v ) = S (v ).

Consider

h = tv

for some small

t R.

Then:

||T (h) S (h)|| ||T (h) f (a + h) + f (a)|| + ||f (a + h) f (a) S (h)||


where we have used the triangle inequality (to see this, note that the LHS can be rewritten as

||T (h) f (a + h) +

f (a) S (h) + f (a + h) f (a) S (h)||).


above inequality satises:

We then see that

>0

then

. If

||h|| <

we have that the RHS of the

RHS < 2 |t|||v ||


Then we have that arbitrarily small so

> 0 ,|t|

suciently small such that

||T (v ) S (v )|| < 2 ||v ||.

But we can take

to be

= T (v ) S (v ) = 0W = T (v ) = S (v )
Now that we have proven uniqueness of the derivative, we then looked at some examples of total derivatives. Special Case: 1. 2.

W =R f :V R

3. We have a basis Then:

v1 , ..., vn

of

f (v )

= f (x1 v1 + ... + xn vn ) = f (x1 , x2 , ...xn )

The second line simply states that, given a basis of

Denition:

We dene the

directional derivative
Dv f (a) = lim

V,

we can then think of in the direction

at

f as a aV

function of to be

variables.

t0

f (a + tv ) f (a) t v
and

Proposition: Proof:

if

dfa

exists, then

Dv f (a)

exists for all

Dv f (a) = dfa (v ).

Notice that the directional derivative is a real number in this special case. It is impotsant to note that there are cases in which the directional derivative always exists, but the total derivative does not.

t0

lim

f (a + tv ) f (a) dfa (tv ) = 0 ||tv || tdfa (tv ) f (a + tv ) f (a) = lim = lim t0 |t|||v || t0 |t|||v || = lim Dv f (a) = dfa (v )
t0

The rst line is simply a statement of the total derivative existing. In the second line, we pull out derivative by homogeneity. In the last line, we invoke the denition of the total derivative.

from the total

Denition: Partial derivatives are a special case of directional deriviatives denoted by:
Dvi f f x1

Example:
f (x1 , x2 ) = x1 sin x2 + ex1 x2 f = sin x2 + x2 ex1 x2 x1

Gross Quote:

Once you take one derivative, you will want to take more! If you give a mouse a cookie it will ask

for a glass of milk. We can write subsequent partials like this:

2f x1 x1 2f x2 x1 2f x1 x2 2f x2 x2
general result:

x1 x2 = x2 2e

= =

cos x2 + ex1 x2 + x1 x2 ex1 x2 cos x2 + ex1 x2 + x2 x1 ex1 x2

x1 x2 = x1 cos x2 + x2 1e

There are 4 total functions. Notice the miracle! The second and third equations are the same. This suggests the

Aij =
To recap, we have:

2f =a xi xj

symmetric nxn matrix

f :V dfa : V

R R

is dierentiable at is a linear map

Since

is an inner product space there is a unique vector

such that:

dfa (v ) = v, fa
If

(v1 , ..., vn )is

an orthonormal bases then:

f a =
where

f (a) f (a) v1 + ... + vn x1 xn

is called the gradient.

Lecture #2
Remember from last class that if we have a function

f :V W
we call

f dierentiable

at

if there is a linear map

dfa : V W
such that

f (a + h)
in the limit that

f (a) + dfa (h)


where

approaches 0.

We dened the

directional derivative in the direction of v V


Dv f (a) = lim
t0

v=0

as

f (a + tv ) f (a) = dfa (v ) t dfa v


for all

This denition clearly necessitates the existence of case of the directional derivative, it is linear in

v.

Since the directional derivative is just a special

v: cv =
where c

= Dcv f (a)
is a basis of

cDv f (a) Dv
for

Because of this homogeneity, we can always scale the directional derivative and take

||v || = 1.

If

v1, v2 ..., vn

V,

we then we can write

v=

ci v i

and exploit the linearity of

Dv

to write:

Dv f (a) =
Now to write the matrix of the linear operator 1. Basis 2. Basis A basis of

ci Dvi f (a)

dfa we

need:

{v1 , ..., vn }

of

{w1 , ..., wm }

of W

gives us a decomposition of

into

functions from

to

R:

f (v ) = f1 (v )w1 + ... + fm (v )wm


For each

fi

and each

vj

we have

Dvj f (a) =
Now we can write the matrix

fi (a + tvi ) fi (a) fi = lim t 0 xj t dfa


with respect to the basis

of the linear operator

{vi }

as:

Aij =

fi xj

As a concrete example, we can consider the following map:

f : R3 R2
3

dened by:

f1 (x1 , x2 , x3 ) f2 (x1 , x2 , x3 )
consider this function at the point deriatives, we have:

= x1 x2 + x2 3 = sin(x1 ) + ex2 + x3 f (a) = (4, e + 2). 1 1 0 e 4 1 a:


Computing all of the partial

a = (0, 1, 2).

Plugging in, we get

dfa =

x2 x1 cos(x1 ) ex2

2x3 1

= f

Now we can use the total derivativeto approximate the function

at the point

f (a + h)
Eqplicitely, we have:

f (a) + dfa f
is a function between spaces of dimension 3 and 2).

The above notation is shorthand (we have to remember that

f (a + h)
Calculating

(4, e + 2) + (h1 + 4h3 , h1 + eh2 + h3 ) : R2 R f (x, y ) =


2x2 y x4 +y 2 .
First we note that

As another exmaple, consider the function:f

f (0, 0) = 0.

Dv f (0, 0),

we have

f (tv ) f (0, 0) 2a2 bt3 2a2 bt2 2a2 b 2a2 = lim 4 4 /t = = lim = t0 t0 t a + t2 b2 t0 b2 t t4 a4 + t2 b2 b lim
we note that the that since

Dv

is not linear in

(a, b). V.
Remember from 25a

Now we will look at gradients. Remember that a gradient requires an inner product on

w W such that dfa (v ) = v, w . Denition: We dene this unique w as f (a) caled the gradient of f (a) . f Let e1 , ..., en be an orthonormal basis. Let xi = directional derivative w.r.t ei . dfa
is linear in

v,

there exists a unique

Then

f (a) =
so

f (a) f (a) f (a) ei = ( , ..., ) xi x1 xn f (a) = ei , f (a) xi

dfa (ei ) =
direction where the function increases most rapidly. To see this, note that:

What is the the signicance of the gradient? We can see that the gradient vector is the vector that moves in the

f (a), v

= =

||v || ||f (a)|| cos() ||f (a)|| cos()


and that if

From the rst to second line, we have taken advantage of the assumption that if

has a local max or min at

a,

then

f (a) = 0 V

v is normalized. We will claim that f (a) = 0 then the level curves are to the

gradient vector.

Fross Quote:

The closest nuclear reactor to Boston is in Seabrook, NJ. The Seabrook nuclear reactor has a How should we make our ee most eective? The normal idiot will draw a line from the

meltdown. Suppose, that you have access to precise data on the level curves of the nuclear waste emmitted from the nuclear reactor. nuclear reactor to his/her present position and run directly away from the nuclear reactor along this line. With your knowledge of the theory of the gradient, you know that the best way to run is in general not in this direction. You should ee along the gradient vector!

Lecture #3
Consider the function that if the

is dierntiable at

f : C R. As a then

always,

is dened on an inner-product spaces. We discussed last lecture

f (a), v dfa (v ) = 0
We also mentioned some of the properties of the gradient:

1. If 2. If

has a local max or min at

then

f (a) = 0

in

V.

f (a) = 0,

it points in the direction of the maximum increase.

We can see both of these properties because:

dfa (v ) = f (a), v = ||f (a)|| cos


and this function clearly equals 0 when Level curves of

at

f (a)

are

to

= 2. f (a). The

level curves are dened as:

{x V : f (x) = f (a)}
Consider the example where

2 f (x1 , ..., xn ) = x2 1 + ... + x2 .

For the the level curve at

a = (1, 0, 0, ..., 0),

we have

f (a) = 1

and the level curves dene a hypersphere or radius 1 and dimension

n 1.

We see that

f (x) (2x1 , ..., 2xn )|a = (2, 0, 0, ..., 0)


Now we can nd the tangent hyper-plane:

{v V : (v a) f (a)} = {v V : (v1 a1 , ..., vn an ) (


This is just:

f f , ..., )} x1 xn

{v = (v1 , ..., vn ) :
i=1
Now consider

(vi ai )

f x1

= 0}
a

S = graph
We will dene the graph of

of

f
dened by:

to be the zero set of a function

g :V RR

g (x1 , ..., xn , y ) = f (x1 , ..., xn ) y


What would be the tangent hyperplane at the point on

s S = (a, f (a))?

First we calculate:

g (a, f (a)) = (f (a), 1) V R


The tangent hyper-plane consists of the vectors

(v, y ) :

n f i=1 xi

(vi ai ) = (y f (a)).
a

In one dimension, this is,

f (a)(x a) = y

= y f (a) = f (a) + f (a)(x a)

This is what we intuitively think of as the tangent line to the point on a graph. Now we will talk about the most useful rule in analysis: the functions

Chain Rule.

Consider the composition of two

and

g: V f W g U
is dierentiable at

If

f is dierentiable at a V and g d(g f ) = dgf (a) dfa . In words:

f (a) W ,

then

f g

is dierentiable at

a V

and

Proposition: Proof:
If

the derivative of the composition of two functions is the composition of the linear derivatives

(The Chain Rule).

is dierentiable at

a,

we have

||h||0
Similarly, if

lim (h) = f (a),

f (a + h) f (a) dfa (h) =0W ||h||

is dierentiable at

then we have:

||h||0

lim (h) =

g (a + h) g (a) dga (h) =0U ||h||


5

Now lets dene

k = f (a + h) f (a)
We know that

approaches 0 in

as

approaches 0 by the continuity of

at

a.

Now we will have to do a lot of

algebraic manipulation. We will have to estimate the dierence:

g (f (a + h)) g (f (a))
We can rewrite this as:

g (f (a) + f (a + h) f (a)) g (f (a))

= g (f (a) + k ) g (f (a)) = dgf (a) (k ) + ||h|| (k ) = dgf (a) (f (a + h) f (a)) + ||f (a + h) f (a)|| (k ) = dgf (a) (dfa (h) + (h)||h||) + ||dfa (h) + ||h||(h)|| (k ) = dgf (a) dfa (h) + ||h||dgf (a) ((h)) + ||h|| ||dfa h ||h|| + (h)|| (k )

Now we can write:

g f (a + h) g f (a) dgf (a) dgf (a) dfa (h) = dgf (a) ((h)) + ||dfa ||h||
Now we wave out hands. We note that 0 to 0. Now we look at the

h ||h||

+ (h)|| (k )
must take

dgf (a) ((h)) is a linear operator and is continuous. Therefore, it (f (a + h) f (a). We note that this will also tend to zero. Then we have LHS = 0 + 0

and we are done. Right? Wrong! We Just because

goes to zero, doesn't mean that the product of the two terms

goes to zero. Therefore, we must show that the term

||dfa

h ||h||

+ (h)||

is bounded. To nish the proof, we have to show that:

dfa
remains bounded as

h ||h||

h 0.

We will note that the vector in brackets has the property that its norm is 1 (it is on

the unit sphere). Therefore, as

h0

the vector wanders around the unit sphere. The unit sphere is

compact!

We use the result from last term that a continuous mapping of a compact set is compact and the fact that compact sets are closed and bounded to conclude that

dfa

h ||h||

is bounded as required.

Lecture #4 (Emily Riehl)


Consider a functions between nite dimensional inner-product spaces:

f :V W
We will specify coordinates for our two spaces (this is equivalent to choosing a basis). We call if

dierentiable at

f (a + h) f (a)
where

T (h)

is a linear function. There is not always such a function, but when there is, it is unique. We denote this

unique linear function:

dfa T : V W
The natural question to ask is: what information about the function does this linear transformation contain? We can think of v at

V as dening a direction in the domain. dfa (v ) = lim

We can compute the value of this linear transformation

v:
t0

f (a + tv ) f (a) Dv f (a) t
6

This is called the

directional derivative
f

and it describes how the function changes in the direction

v.

A special

case of the directional derivative is when we choose an orthonormal basis for directional derivative of in the direction of any of these basis vectors as

V , {e1 , ..., en }.

Then we can write the

Dei =
Now if we additionally choose a basis we can represent an an arbitrary

f xi
and

{w1 , ..., wm } for W , we have a natural isomorphism bewteen Rm vector in W as w (c1 , ..., cm )

since

We can now think of the function

f : V Rm

as dened by

coordinate functions

f1 : V f2 : V f3 : V


. . .

R R R

fm: : V
where if

R dfa
as

f (v ) = w = c1 w1 + c2 w2 + ... + cm wm

then

fi (v ) = ci . fi xj
The matrix of this function is

Now, we can nally write the matrix of the total derivative

Aij =

Example: f : R2 R3

dened by

(x, y ) (3 cos x sin y, 4 sin x sin y, 5 cos y ). 3 sin x sin y 3 cos x cos y fi = 4 cos x sin y 4 sin x cos y xj 0 5 sin y f (a) = (0, 4, 0) fi xj
and the matrix is:

Now for the point

a = ( 2 , 2 ),

we have

3 = 0 0

0 0 5

Now what does this tell us? We can think of the collumns of this matrix as the tangent vectors in the basis directions on the graph of

Chain Rule:

f.
Suppose we have two functions:

f :V W g:W U
Id

gf

is dierentiable at

if

is dierentiable at

and

is dierentiable at

f (a)

and the the total derivative is

given by:

d(g f )|a = dgf (a) dfa


The chain rule can be thought of as the

functoriality

of the derivative.

Now we will look at the chain rule from the perspective of matrix multiplication For spcicity, suppose we have

f : R2 R3 g : R3 R2
In the matrix view, the statement of the chain rule is just a bunch equalities of the form:

hi gi f1 gi f2 gi f3 = + + = xj y1 xj yj x2 y3 xj
7

m=3

i=1

gi fi yi xj

A special case of the chain rule is when the composition of functions results in a map from

R R:

R f Rn g R
The chain rule in this case tells us that

(g f ) (t) = g (f (t)) f (t)


As an example, consider the following specic functions:

f : t (cos t, sin t, t) g : (x, y, z ) x2 + y 2 + z 2


As an excercize, you should try to compute the total derivative of this function. Note that the major advantage of the chain rule is that it reduces multivariable calculus to single variable calculus.

Theorem: U Rn

is open and connected and

f : U Rn

is dierentiable on

then

derivative) is the zero matrix on a function

if and only if

is constant. We dene connected as

df 0 (that a, b U then

is, the total there exists

g : R U that is dierentiable such that g (0) = a and g (1) = b. Proof: If f is constant, denition of df makes it clear that this operator is zero (this proof is straightforward, but ommitted). Now we assume df 0 on U and we need to show f is constant. For now, it is okay to assume f : U R. We want to show that f (a) = f (b)a, b U . By our hypothesis, there exists a dierentible function g : R U and f : U R. We remember by our denition of connecteness that 0 a and 1 b. Then we can
apply the chain rule:

(f g ) (t) = f (g (t) g (t) = 0


Then

fg : R R
With

(f g ) = 0

is constant. Then

f (g (0)) = f (g (1))

and

f (a) = f (b)

as required.

Lecture #5
Last time we talked about the to results in

Chain Rule.

The Chain Rule allows us to transfer results from

1 variable g : R R

variables. Consider a vector space

with dimension

n.

Now consider a function

:RV
and another function

f :V R
Now we can think of

as the composition of

and

(i.e.

g (a) = (a), f ((a))

. Now we have

dga = dfa da
Recall some properties of single variable calculus: 1.

Theorem: g

has a local max or min at

aR

then

g (a) = 0

Proof:
g (a) = lim
We note that when

t0

g (a + t) g (a) t t is positive.
We know that

t is negative,

the derivative is positive and visa versa for when

the derivative must be the same whether we approach

from the left or right, and the only number such that

a = a
is

a=0

2.

Theorem: c [a, b] such that f (a) = Proof: Consider the special case:
Then there exists a

f (b)f (a) (Mean Value Theorem) b a

f (a) = f (b) = 0 c with f (c) = 0.


Note that

is continuous on

[a, b].

the closed interval is clearly compact

so it has a max or a min. Lets dene a function

g (x) = f (x) f (a)

f (a) f (b) (x a) ba

which can be thought of as a modication by an ane linear function. Also require that

g (a) = g (b) = 0
Now

c : g (c) = 0

so we have

0 = g (c) = f (c)
3.

f (b) f (a) ba

Theorem: If f = 0 on an interval, then f = c on that interval. Proof: Take a, b D and note that
0 = f (b) f (a) = (b a)f (a)

4.

Theorem: If f '=g' on D = an interval, then f = g + c on D Proof: Apply the result from 3) to f g ). So f is determined completely by f
Now consider the dierential equation solution is

and

f (t0 ).

f = f.

We can then claim from the unicity theorem that the only

f (x) = Cex g (x) =

with

C = f (o). g (x) = 0
by the quotient rule which implies

Proof:
f (a) V
1.

Let

f (x) ex then

g = c.
and

Now lets start trying to generalize these results to

dimensions.

Consider

f : V R

dfa : V R

with

Theorem: Proof:

If

has a local max or min at

a,

then

dfa = 0

in

L(V, R)

dfa = lim
Then we can use the same exact argument approaches zero from either side. 2.

f (a + tv ) f (a) t0 t for all v about the derivative D.


The statement that

taking the same value when

Theorem:
(connecting

Let

be dened on a convex region

is convex is that if

a line connecting them is in

D.

The Mean Value Theorem for multiple dimensions is:

a, b D, then c = (s) on this line

and

b)

where

f (b) f (a) = f (c), b a


where

(t) = a + t(b a)

(i.e. a map parametrized such that

(0) = a

and

(1) = b.

Proof:

Consider functions

: [0, 1] V f :V R
such that

g =f

Now note that

g (1) g (0)
and

= f (b) = f (a)

f (b) f (a) = g (1) g (0) = g (s)


for some

s [0, 1].

Thisfollows from the mean value theorem in one variable. Then we have:

g (s) = f ((s)), (s)


Now we identify

c = (s)

and

(s) = (b a)

as

is simple and we can actually compute its derivaive.

3.

Theorem:
f =c
on

Suppose

f :DR

with

DV

where

is as before path connected and

f = 0 D ,
and

then

D. a, b D
and parametrize a path

Proof:
Now let

Then Let

[0, 1]

in the usual way such that

(0) = a

(1) = b.

g : f.

Now we can use the chain rule to state that

g (t) = f ((t)), (t) = 0


We note that

f ((t)) = 0 V .

Then we note that

g =c

by the single variable equivalent which implies

f (a) = f (b).
Next time we will prove that partial derivatives commute (in a at geometry).

Lecture #6
Today we will learn about the commuting property of second partials:

2f 2f = xi xj xj xi
Note that this is true only on at, or Euclidean space. These will typically be the spaces we are most interested in though for this class. We rst need to recall the mean value theorem holds onlyat dierentiable points that set. The statemement of the mean value theorem is:

mean value theorem. Remember u U where U is a convex set.


a, b U
there exists

that in

dimensions, the

Remember that a convex

set is one for which any two points within the set can be connected by a straight line of points all contained within

on the line between

and

such that

f (b) f (a) = f (c), b a = f (c)(b a)


Now to make some progress on the problem of second partials we will set

b=a+h

so that

f (a + h) f (a) = f (c), h
Lets parametrize the path from

to

with a function:

(t) = a + th
Clearly then,

: [0, 1] line
Lets say that

from

ato b

c = a + h
Then

f (a + h) f (a)

= =

f (a + h), h f (c), h

= dfc (h) = Dh f (c)


The quantity f (a + h) f (a) is what we would call a rst dierence. Second dierences would be of the f (a + h + k ) f (+h) f (a + k ) + f (a). We will now prove the following equality of second dierences: form

Theorem:

f (a + h + k ) f (a + h) f (a + k ) + f (a) = Dk (Dh f (a + h + k ))
where, as before, 1.

, [0, 1].

We will prove this equality under the conditions that :

is twice dierentiable

2. the parallelpiped is contained in

u
10

Proof: Let g (x) = f (x + k) f (x) then we have the equality


g (a + h) g (a) = f (a + h + k ) f (a + h) f (a + k ) + f (a) = Dh g (a + h) = Dh f (x + k )|a+h Dh f (x)|a+h = Dh f (a + k + h) Dh f (a + h)
Between the rst and second lines, we have used the mean value theorem and between the second and third lines, we have used the chain rule. Since value theorem to nd that:

Dh f (a + h)

and

Dh f (a + k + h)

are both dierentiable, we can apply the mean

f (a + h + k ) f (a + h) f (a + k ) + f (a) = Dk (Dh f (a + h + k ))
just like we originally set out to do Now we note that we can swap

and

in the above argument and obtain:

f (a + k + h) f (a + k ) f (a + h) + f (a) = Dh (Dk f (a + h + k ))
Now we can prove that partial derivatives commute.

Theorem:

If

v, x V

and

Dv f, Dw f, Dv Dw f, Dw Dv

all exist and are continuous on an open

U V

then

Dv Dw f = Dw Dv f

Proof: Let h = tu and k = sw. Take t and s to be small enough so that the parallelpped with vertices (a, a + h, a + k, a + h + k ) is contained in U. Remember that we can always do this because U is an open subset. Now we note that Dh = tDv and Dk = sD w and then we see
stDv (Dw f (a + h + k )) = stDw (Dv f (a + h + k )
Now we can divide both sides by

st

and take the limit as both

and

go to zero. Now each side goes to

Dv Dw f (a) = Dw Dv f (a)
as required. Now consider

f :V R

a twice dierentiable function with continuous second derivatives. Then we can think

of the second derivative as a mapCon

d2 fa : V V R
dened by

d2 fa (v, w) = Dv Dw f (a) = Dw Dv fa

is a symmetric bilinear map. Now, what is the matrix of this linear

map with respect to some basis. This is just

Aij = T ei , ej =

2f xi xj

It is obvious from the commuting of partials that this matrix is symmetric. Now what is the easiest way to see that partials must commute? Consider the function

f : R2 R
dened by

f : (x, y ) xa y b .

Now we know how to take the second derivatives explicitely:

2f xy 2f yx

= =

abxa1 y b1 abxa1 y b1

Clearly, these monomials have commuting partials, so if you believe that you can build up general functions out of these monomials, then you should believe the general result. Now what is the signicance of second partials? We will need this result for Taylor Series (for next lecture).

11

Lecture #7
We can write Taylor's Approximation as

1 f (a + h) = f (a) + dfa (h) + d2 fa (h, h) + O(||h3 ||) 2


Now if we choose our basis for

to be

(e1 , en , ..., en ),

we can consider how a multi-dimensional function changes

with respect to perturbations along more than one direction:

f (a1 + h1 , ..., an + hn ) = f (a1 , ..., an ) +


i
Where we remember that is

f 1 hi + xi a 2

i,j

2f xi xj

hi hj + ...
a

a=

ai ei

and

h=

hi ei .

We also note that the second term in the above expression

f hi = dfa (h) xi a

You can see this by the linearity of the total derivative:

dfa (h) = dfa (


i

hi ei ) =
i

dfa (ei )hi =


i

Dei f (a)hi =
i

f hi xi a

We also note that all of the terms in the third term of the taylor expansion are symmetric (by the commuting property of partial derivatives that we showed last lecture). Lets write out explicitly the form of this term for

n = 2:
i,j

2f 2f 2f 2f + = + xi xj x2 x1 x2 x2 1 2

Now if we choose an orthonormal basis, then

2f xi xj
expansion for

= diag(1 , 2 , ..., n )
i,j

where the notation diag indicates a diagonal matrix with

across the diagonal. Now we can rewrite the taylor

dimensions as

f (a + h) = f (a) +
i
Now, if all maximum.

1 f hi + xi a 2

3 i h2 i + O (||h|| ) i

f xi

hi = 0
a

and all

i 0

then we have a minimum at

a.

If we have that

then we have a

We start with considering the critical points on manifolds in

V.

Consider a function

f :V R
that has a critical point at or

a. Remember that critical points are dened as points in M V such that f (a) = 0 dfa = 0. Loosely speaking, we will dene M as a k dimensional manifold if it has a well-dened tangent plane Ta M V which is a k dimensional. A simle example of a manifold is a circle. The circle lives in R2 , but at every
point in the circle, there is a well-dened tangent line. Since lines are 1 dimensional, the circle is a 1 dimensional manifold. A non-example of a manifold is a V. There is a well-dened tangent line to all points on the V except at the point on its base. We can think of constructing the tangent at a point procedure. First paramatrize a bunch of curves of these parametrizations at the point

(t) : R M

such that

a on a manifold using the following (0) = a. Now take the derivative of each g : V R.
This means that

a.

The set of lines which this set of vectors denes forms the tangent plane,

Ta M .

Now suppose that

has dimension

n1

and is the level set of a function

M = {v : g (v ) = constant}
For example, we can think of the circle (often denoted

S1)

as

S 1 = {x, y : x2 + y 2 = 1}
12

Now to rene our denition of a manifold beyond sets that have tangent planes, we will also require that dierentiable and that orthogonal

g is g (a) = 0 for all a M . If these two requirements are met, we will also see that Ta M is the complement of the vector g (a). Remember that the orthogonal complement of a set W is dened as W = v V : v, w = 0
for all

wW

Also remeber the properties of the orthogonal complement:

W W V (W )
Now we will show that

= = =

0 W W W

Theorem: if we take a dierentiable parametrization : ( , ) M with (0) = a as usual, then (0) Ta M . Proof: consider the composite function g (t) : ( , ) R. We know that this function is constant (by how
g)
so its derivative is zero. Now we use the chain rule to nd:

g (a) = Ta M .

First recall that

g = constant.

we dened

(g ) (t) = g ((t)), (t) = 0


Since

(0) = a

then we have

g (a), (0) = 0
This is the statement that

g (a)

and

(0)

are orthogonal.

Lecture #8
Consider a function

f :V R

that has a local max or min at

aV.

The

dfa = a

in

L(V, R)

because

dfa (v ) = Dv f (a) = lim

t0

f (a + tv ) f (a) =0 t

M V is a manifold if for any point a M, then Ta M V . Remember that this is a k dimensional subspace called the tangent space to M at a spanned by vectors (0)where ( , ) M where 0 a. A generalization of dfa = 0 on V for a local max or min: f has a local max or min on M at a, then dfa = 0 when restricted to Ta M V . Proof: Take : ( , ) M Consider g f .This has a local max or min at t = 0. The Chain Rule then gives us 0 = g (0) = df(0) (0) Now there are two ways to dene M V . One way to dene it is as a varying collection of k dimensional subspaces Ta M V . We could equivalently dene it as a k dimensional subspace W V where
since this has both positive and negative values. Remember that 1. 2.

W =

range of the linear map

V V which V V

is 1-1 which is onto.

W =]textnull

space of the linear map

Method 1: We take the function

df is injective everywhere, then let M = f (Rk ) V . Example: : R V a parametrized curve such that 0 a provided that = 0. Ta M = R (0) Method 2: Take g : V R. Where g is a component-wise function (g1 , g2 , ...gn ). Then M = {v V : g (v ) = 0 R} = a level set of g Now Ta M = null space of dga : V Rnk which is onto for all a M We will now work out this method in some special cases. For M of dimension n 1 Then g : V R. Then M = {v : g (b) = 0} which is a hypersurface. We need dga : V R to be non zero for all a M. Then Ta M = null space of dga = orthogonal compliment of g (a). Now we know that if f is extremal at a M , then dfa |Ta M = 0 This states that dfa (v ) = v, f (a) = 0. So if v g (a) then v f (a) = f (a) = g (a). Now what does this look like for several constraints? Lets try it for l constraints. So M has dimension k = n l l . By method 2, we need a map g : V R with v (g1 (v ), g2 (v ), ..., gl (v )). Then M = {v : g (v ) = 0} this l has dimension k = n l. Now Ta M = null space of dga : V R . We need that for all a M the lvectors (g1 (a), g2 (a), ..., gl (a)) to be linear independent (they form a basis).
such that if

f : Rk V

13

If the note Now

gi (a) are linearly independent, they span Rl and get W V of dimension k = n l as the kernal. We that W = Ta M which is the orthogonal complement of {g1 , ..., gl } = W as matrix of dga has these rows. we have that if f is extremal at a, then dfa |W = 0 f (a)
is orthogonal to

W f (a)is

in the subspace

1 , ..., l : f (a) = 1 g1 (a) + 2 g2 (a) + ... + l gl (a) 3 Maximize the function f : R R dened by (x, y, z ) x on the intersection of the plane z = 1 and 2 2 2 3 2 2 2 2 the sphere x + y + z = 4. Now we have the function g : R R dened by (x, y, z ) (z 1, x + y + z r ) = (g1 , g2 ) We note that g1 (x, y, z ) = (0, 0, 1) and g2 (x, y, z ) = z (x, y, z ). We see that they are linearly independent on M . So , such that
So there exist

Example:

f (x, y, z ) = (1, 0, 0) = g1 + g2 = (0, 0, ) + 2(x, y, z )


Then we see that

2x 2y 2z +

= = =

0 0 0

and we can solve this system of equations, along with the constraints, for the unknowns.

Lecture #9
There are 2 ways of getting a object where 1.

a M

then

k dimensional manifold M V n-dimensional real vector space. A a + Ta M tangent plane to M . The rst way of getting a manifold is:
to one.

monifold is an

f : Rk V where f is one Ta M = dfa (Rk ) of f (x) = a.

is the image of this map.

Then

dfx : Rk V

has nullspace 0,

2.

M V is the zero set of (n k ) constraints g = (g1 , g2 , ..., gnk ) : V Rnk . Then M = {v : g (v ) = 0 Rnk } = {v : gi (v ) = 0i}. We must have tha dga : V Rnk is onto. and then that Ta M = ker(dga ) where dga (v ) = ( g1 (a), v , g2 (a), v , ... gnk (a), v ). Then ker(dga ) = {all vectors v which are orthogonal to the n k vecto In other words, if W =span of gi (a), then v ker(dga ) vW . So M is a manifold gi(a) are linearly independent in V .
Then

Now lets apply this to Lagrange Multipliers. We want to minimize/maximize by is

g = (g1 , ..., gnk ) = 0. If f is extremal at a, then dfa vanishes on Ta M . orhtogonal to W = Ta M , so f (a) W as the gi (a) from a basis of

f : V R on a manifold M dened dfa (v ) = f (a), v . So nablaf(a)

this space. Remember that we have the

property

f (a) = 1 g1 (a) + ... + nk gnk


Sometimes it wont look like we can solve a problem using Lagrange Multipliers. Often we can use the following technique. If we have

M V
where

g1 = g2 = ... = gl = 0. h1 = ... = hm = 0

And

N V
where . We are asked to nd the point

mM

and

nN

which are the closest in

V.

To do this

we consider the cross product

M N V V Rl+m
Where

M N = {(x, y ) : G(x, y ) = 0}.

Where the map

G
n

takes

(x, y ) (g1 (x), ..., gl (x), h1 (y ), ...hm (y )).

We can

now convert this to a Lagrange problem by maximizing/minimizing the function

f (x, y ) =
i=1

(xi yi )2

14

Now we would have the following sustem of equations:

g1 (x, y )

=
. . .

(g1 (x), 0)

gl (x, y )
And similarly for

(gl (x), 0)

h.

Now we need to nd

and

using Lagrange multipliers by solving

xy

= =

i gi (x)
i=1 m

j hj (y )
y =1

This procedure has a nice geometric interpretation. It means that at the minimal two points is perpendicular to both tangent planes.

x and y , the line connecting these

Example:

2 2 x2 1 + x2 + x + x3 1

h =
Then

y1 + y1 + y3 1

g = (2x1 , 2x2 , 2x3 )

and

h = (1, 1, 1).

Now we have the system of equations

x1 y1 x2 y2 x3 y3
Now the punchline is that

= = =

x1 = x2 = x3 =

x1 = x2 = x3 =
This implies that

3 x2 1 =1
which implies that

xi =

1 .We then see that 3

y1 = y2 = y3 =
1 Now we see that the closest is 1 and the furthests is . 3 3 Now lets talk about Taylor Expansions. IT starts like this

2 3

1 1 1 f (a + h) f (a) + f (a)h + f (a)h + f (a)h + ... + f (k) (a)h + Rk (h) 2 6 k!


where

is the remainder and the terms to the left are called the

Taylor olynomial of degree k in h.


P (h)
of of degree

Now why

is this particular polynomial important? It is the unique polynomial Note that the notation

with

P (i) (0) = f (i) (a).


Then there

Theorem:

Assume

(i) indicates that we have taken the ith derivative. f has k + 1 derivatives which are continuous between Rk (h) = f (k+1) (c) k+1 h (k + 1)!

the interval

[a, a + h].

exists

c [a, a + h]

such that

Note that if

propoertional to

k = 0 then hk+1 .

this is just the mean value theorem. We will see next time that this remainder term is

15

Lecture #10
Now we are going to leave Lagrange Multipliers and manifolds for a while and go to the second derivative test. Lets consider a function

f (x)

and consider it near the point

We assume that this function has

k+1

derivatives at

a. Lets change t = 0 . Then t=0

variables and write this function as

f (a + t).

f (a) f (1) (a)

= =
. . .

value at

rst derivative at

t=0

f (k) (a)

kth derivative at

t=0
Each additional derivative is a new

Its important to note that not all functions have all of their derivatives. Now lets write down the

hypothesis. There exist functions (though pathalogical) that are continuous everywhere but owhere dierentiable.

k th

degree Taylor polynomial:

Pk (t) = f (a) + tf (a) +


Now lets make this equation exact by ensuring

tk t2 f (a) + ... + f (k) (a) 2! k!

Pk

(k+1)

(t)
t=0

=0

to make polynomial exact. Now lets introduce the dierence function

Rk (t) = f (a + t) Pk (t)
Then we have that

Rk (0) = Rk (0) = Rk (0) = ... = Rk (0)


and

(k)

Rk
Now we are ready to state Taylors theorem:

(k+1)

(t) = f (k+1) (a + t)
(k+1)

Theorem: Corallary:

Assume

Note that when If

(c) k+1 f (k+1) (t) exists in the interval [a, a + h]. Then c [a, a + h] such that Rk (h) = f (h+1)! h . k = 0, we have f (a + h) f (a) = f (c)h which is just the statement of the mean value theorem.
is also continuous, so bounded in

Rather than proving this theorem rst, we note the following corallary:

f (k+1) (t)

[a, b]

by

on

[a + a + h]

(this interval is combapct),

then

|f (a + h) Pk (h)|
(k)

M hk+1 (k + 1)!

f (a + h) = f (a) + f (a)h + ... + f k!(a) + O(|h|k ) k (h) k+1 where (t) = Rk (t) Bt with (0) = 0 and (h) = 0. We note that by the mean Proof: Let B = R hk+1 value theorem gives us that t1 [0, h] such that (t1 ) = 0. But remember that (0) = 0. Therefore by the MVT, there exists t2 [0, t1 ] such that (t2 ) = 0. But (0) = 0. Then by the MVT t3 [0, t2 ) such that (t3 ) = 0. We repeat this all the way to the k + 1 derivative...
so We usually use this theorem to prove the second derivative test.

Second Derivative Test:


f (a) > 0
then

Assume that

has three derivatives at

that are all continuous with

Then if

has a local minimum at

(and the similar statement is true for

f (a) < 0

that

f (a) = 0. f has a

local maximum at

Proof:

a). 1 1 f (a + h) = f (a) + f (a)h + f (a)h2 + f (c)h3 + ... 2 6

By the corallary, we have that

We now note that

f (a) = 0

by assumption. The essence of the proof is that we see that the

term will dominate

the value of the function locally around

a.

Therefore if it is positive, there will be a minimum and if it negative

there will be a maximum. Now to prove this fully, let

M = max

|f | 6

on

[a b, a + b]

16

for some

b > 0.

Now let

f (a) 2M . Assume that

is chosen so

|h| <

. Then

1 1 f (a)h2 + f (c)h3 2 6
We note that

1 6f

(c)h <

1 6f

(a) (c) f2M <

f (a) . Then 2

1 1 1 1 f (a)h2 + f (c)h3 = h2 ( f (a) + f (c)h) 2 6 2 6


and we are done. Now the idea is to study innite rather than nite power series. If

is innitely dierentiable at

a.

Then

f (a + h)
k=0
Now we need to answer the following questions: 1. For which

f (k) (a) k h k!

does this innite series converge?

2. If it converges to a function

g (h)

is

f (a + h) = g (h)? ex
around

Now just as an example, we remember the Taylor Series approximation for

a = 0.

We have

ex
k=0

f (k) (a) k x = k!

k=0

1 k x k!

Lecture #11
Review of Exam Material Linear Algebra

Bases

Rn . V V = L(V, R) T = T

Inner Products

Self-Adjoint Operator

W V , W V = W + W
Multivariable Calculus

Various derivative of

f : V W. dfa L(V, W )
is dened at a point

 

Total derivative map.

aV

as a limit when

h0

(in

V ).

This is a linear

Directional derivatives in a direction

v=0

in

is denoted

Dv f (a) W
and is dened as the limit

t0

lim

f (a + tv ) f (a) = Dv f (a) = dfa (v ) t


of

This results in a vector in the range.

When we choose a basis

{w1 , ...wm }

so that

f :V W
dened by

v f (v ) =
i

f i ( v ) vi

we can dene the total derivative of a function as a matrix.

17

We deal a lot with the special case by

f : V R.

In this case, we choose a Partial derivatives are denoted

f Dei f xi
where we have implicitly chosen an orthonormal basis for gradient of tha function at a point in its domain by

to be

{e1 , ..., em }.

Now we can write the

f (a) =
Now this is the unique vector that satises

f f f , , ..., x1 a x2 a xn

f (a), v = dfa (v ) = Dv f (a)


Operations of Calculus

Chain rule Suppose we have the functions

g:U f :V
so that

V W

f g :U V
Then the chain rule is the statement

d(f g )a = dfg(a) dga


in

L(V, W )

Mean Value Theorem Suppose we have the special case

:R f :V
then

V W

f :RR
Then the statement of the Mean Value theorem in several variables is that

f (b) f (a) = f (c), b a


for some

along the straight line in the domain between

and

b.

Equality of Second Partials We only talked about this for the case

f :V R
We remember that the second partials commute if all partials exist and are continuous. If this is the case, we have

Dv Dw = Dw Dv

 

Maxima and minima of

Maxima and minima exist when What is or isnt a

f :V R f (a) = 0. M = g (Rk )
then

You should remember the proof of this.

dimensional manifold?

g : Rk V

is one-to-one

dga

is one to one. Then of

Tg(a) = image
Another way to make a manifold is

dga (Rk )
for all

g : V Rnk

where

Ta M =

null space of

dga .

We normally dealt with

dga is onto n 1 manifolds

a M = {v : g (v ) = c}

then

(which is equivalent to having one

constraint).

18

Level sets of

with a function

f : V R.

Then we dene

graph(f ) as the zero set of

=M V R

d:V RR
dened by

(v, y ) y f (v )
Question: suppose we have a fuction

f :V V
dened by

a f (a).

Does there exist a

with

g f (v ) = v ?

The anwer is yes if

dfa

is an invertible

linear operator such that

dgf (a) dfa = I


This is the

inverse function theorem.

This is not a topic on the exam, but we will come to it soon!

Non-Exam Material
Now we return to Taylor Polynomials. Remember

f (k) (0) k 1 h + O(hk ) f (h) = f (0) + f (0)h + f (o)h2 + ... + 2 k!


Now what if we wanted to make a multivariable version of this Taylor Approximation. We hypothesize

f (h) f (0) + D1 f (0)h1 + D2 f (0)h2 + ... + Dn f (0)hn +


ij
where

ki Di Dj j f (0)hi hj ki !kj !

ki + kj = 0.

We can write out the summed term explicitly as

ij
Now the

ki Di Dj j 1 2 f (0)hi hj = D1 f (0)h2 1 + D1 D2 f (0)h1 h2 + ... ki !kj ! 2

k th

order term can be written as

j1 j2 jn jn 1 f (0)/(g1 !g2 !...gn !) hj D1 D2 ...Dn 1 ...hn ji +j2 +j3 +...+jn =k

Lecture #12
One problem on the exam that gave people trouble was proving that the graph is a manifold.

f :V
graph(f ) Then we have that

R M V RR g (v, z ) = (f (v ), 1) = 0
in

{(v, f (v ) : v V } = M .

Then we see that

v )R.

Therefore, we

conclude that the graph of a function is always a manifold. Now lets return to Taylor Series. Supposet we have a function

f :V R
which is

k+1

dierentiable (we will denote this as

C k+1 )

which has a basis

(v1 , ..., vn )

of

V.

The statememt that

is

is then the statement that

j1 j2 j3 D1 D2 ...D3 f (a)
exists provided that in

V.

We dene the

j1 + j2 + ... + jn k . Now k th Taylor polynomial Pk (h) =

we will try to approximate

f (a + h)

for a small

vector

(h1 , ..., hn )

j1 +...+jk k

j1 j2 j3 D1 D2 ...D3 f (a) j1 n h1 ...hj n j1 !...jn !

19

Now we will have to assume that the domain contains the line between

and

a + h.

The MVT tells us then that

on that line where

f (a + h) f (a) = f (a), h

or in other words

f (a + h) = f (a) +
i=1
The

Di f (c)...hi

multi-dimensional Taylor Theorem is then:


f (a + h) Pk (h) =
j1 +...+jk =k+1 j3 j1 j2 f (a) j1 D2 ...D3 D1 n h1 ...hj n j1 !...jn !

Note that

P0 (h) = f (a)

by how we dened the Taylor Polynomial.

Proving the several variable Taylor Theorem will follow a similar logic to the proof for the multivariable MVT and the 1-variable Taylor Theorem. We do not do this in class, but basically:

Proof:

Reduce to 1-variable Taylor theorem by using

g = f ((t)) where is the standard parametrization from


, then

the points

Corallary: If all the k + 1 derivatives of f Example (k = 2):


f (a + h) = f (a) =
i=1

to

b.

near

j1 jn ...Dn f a are continuous, D1

f (a + h) = Pk (h) + O(||h||k )

Di f (a)hi +
i=j

Di Dj f (a)hi hj +

1 2

2 2 Di f (a)h2 i + O (||h|| ) i
or in other words, that

Now to analyze the second derivatives, we will rst assume that

dfa = 0,

Di f (a) = 0

for all

i.

Now how do we know if a critical point is a max or min?

f (a + h) = f (a) +
i=j
Now writing this down explicitly for

Di Dj f (a)hi +
we have

1 2

2 2 Di f (a)h2 i + O (||h|| ) i

n = 2,

1 2 f (a + h) = f (a) + (a h2 1 + 2bh1 h2 + ch2 ) 2


Now we are asking, when is this quadratic polynomial,

a > 0, c > 0

and that

ac > b

2 (a h2 1 + 2bh1 h2 + ch2 ) > 0

for

(h1 , h2 ) = (0, 0)?

Need

and in this case, we can be ensured that we are on a maximum or a minimum (to see

this simply solve for the roots of the quadratic). Now what does this mean in terms of the matrix

1 1 2 d f (h, h) = (h1 , ..., hn ) (Di Dj f (a)) 2 2


whe will denote this matrix

h1
. . .

hn

A=
Then we see that there is a Min Max and that the det(A) transformed matrix

a b

b c

a a

> 0, c > 0 < 0, c < 0 (v1 , ..., vn ) (v1 , ...vn )


so that the

> 0.

In general, we can transform the original basis

is diagonal. Then we can write the Taylor Polynomial in terms of the eigenvalues

f (a + h) = f (a) +
where the if all

1 2

i (hi )2
i=1

i = Di2 f (a). i > 0.

We then see that

i < 0 means that the function is a minimum and, conversely a maximum

20

Lecture #13
Today we will start the Inverse Function Theorem. Suppose we have a linear map:

T :V V
which takes

0 0.

Can we solve

T (v ) = w

for

v,

given an arbitrary

w V.

Clearly, this can only be true if

is

onto and is injective (i.e. bijective). We showed in 25a that there is a unigue linear map that satises this:

T S (w ) = w
and we typically call this corresponds to an

T 1 .

Now if we choose a basis

{v1 , ..., vn } y1
. . .

of

Rn

dened by

v (c1 , ..., cn ),

the

nn

matrix

A.

In general, we have a system of linear equations

A
This is

x1
. . .

xn n
linear equations in

yn

unknowns. We remember that we can nd such an invesre function if and only if the

determinant of

is non-zero.

The whole point of the inverse function theorem is to generalize this to non linear maps. Suppose we have a (generally non-linear function)

f :V V
which takes

0 0.

The question is: can we invert

in a small neighborhood of 0. In other words, we are asking

if there exists a function

g:V V

dened in an open neighborhood of 0 such that

f (g (w)) = w.

It turns out that

sometimes you can do this and sometimes you can't. Here is an example when you can't:

Example:

f ( x) = x2

clearly, we have instance,

f (0) = 0 as required, but we see that for every y in the domain of f there is a non-unique x. For 2 2 and 2 2. A necessary condition for the existence of g is assuming f and g are dierentiable at f g = dfg(0) dg0 = = I I L(V, V )

0. Applying the the Chain Rule, we have

Theorem:
function

If

in a neighborhood of with

Expansion of

0 0 is continuous at v = 0 and df0 is invertible in L(V, V ), then an inverse f g (w) = w and dg0 = (df0 )1 L(V, V ). Now we can apply the Taylor g (h) = g (0) + dg0 (h) + ... where we already know that dg0 (h) = (df0 )1 (h) + ... We can then calculate
where

f :V V

the ... using an iterative process. Now lets talk about the inverse function theorem in 1 variable. Lets assume that function as a power series:

is a eld. Lets dene our

f (x) = a1 x + a2 x2 + ...
where we assume that the exists a power series is:

a are in F . Notice that automatically f (0) = 0. The question on is whether or not there g (x) = b1 x + b2 x2 + ... such that f (g (x)) = x. Now how do I evaluate f (g (x))? Explicitly, this a1 (b1 x + b2 x2 + ...) + a2 (b1 x + b2 x2 + ...)2 + a3 (b1 x + b2 x2 + ...)3 + ...

Note that in the second term there are no  x terms and in the third term there are no  x  terms. So lets reorganize out function in orders of

x.
2 = a1 b1 x + (a1 b2 + a2 b2 1 )x + ...

Now if we want the inverse function is to exist, we want all of the coecients of higher powers of for the coecient of

to be zero, and

to be equal to 1. This immediately puts two conditions on

a1

and

b1 :

b1 a1

= =

1 a 1

21

Now that we have solved for

b1

we can iteratively solve for all of the

bi

. Now lets look at the second coecient:

(a1 b2 + a2 b2 1) = 0
Now that we know by etc., etc. So why have we not just proved the inverse function theorem? Well, the series does not necessarily converge: we do not actually know anything about the coecients assure convergence, we need an iterative procedure to calculate Let's do an example.

b1 ,

this is just one equation for one unknown so we can solve for

b2 .

You can then solve for

b3

bi .

To

that gives a convergent series.

f (x) =
Now we write

dx = log(1 + x) = 1+x

(1 x + x2 x3 + = x xn n!

x2 x3 + 2 3

g (x) = x + b2 x2 + b3 x3 + b4 x4 + ... =
n1

1 so b2 = 2 . Then we can solve for bi through an iterative process. Now we will look at how Newton solved for inverses. Suppose we have a function

f (x) =
and we wish to solve for

1 1+x

such that

f (g (x)) = x
Then we have that

g (x) = 1 1 + g (x) = g (x) = 1 + g (x)


We then note that we can write

g (x)

as the power series

g (x) =
n0
and similarly

nbn xn1

g ( x) =
n1
Then we can get the coecients of

bn x n

n1

since we have

nbn = bn1
Then we have

bn

= =

bn1 n 1 n! r
such that

Now we talked briey about the Euclidean algorithm (which is not central to the course). Newton's method for nding roots of algorithm for nding the roots of

f (x).

We dene a root as a number

f (r) = 0.

Newton's

is the following:

f (x0 ) f (r) = f (c)(x0 r)


Where we have chosen

x0

arbitrarily (but in practice, it is generally near the root). Now we make the approximation

r x0
Now we do the same process for

f (x0 ) = x1 f ( x0 )

x1

until we nd the root.

22

Lecture #14
Today we will prove Newton's method for nding rootsr, the solutions to the equation nding roots will actually give us a general method for nding solutions to as

f (r) = 0.

(A method of

f (x) = b = 0,

because we can take

f b

f.)
Suppose we have a function from a closed interval to the reals

f : [a, b] R
and that

f (a) < f (b) >


Suppose

0 0
i.e.

is continuously dierentiable and

f ( x) > 0

on

[a, b],

is increasing on

[a, b].

(Why does the former

imply the latter?

Try the mean value theorem.)

Now, intermediate value theorem imples that there is a root

r [a, b],

and because

is increasing, the root must be unique.

Newton uses an iterative method to nd what this root is. 1) Choose some 2) Take

x0 [a, b]. f (x0 ) f (r) f (r)(x0 r) f (x0 )(x0 r) xn+1 = xn


f (xn ) f (x0 ) .

and solve for

to get

r x1 = x0
Let

f (x0 ) f (x0 )

3) Iterate: Calculate

Example :
Let

To calculate the square root of 2, take the function

f (x) = x2 2.

a=1

and

positive derivative, since

x0 = 1.

Then

f (x) = 2x and is greater than zero on x1 = 1 + 1 4 and we can continue this process.

our interval. (The maximum of

b = 2. f (x)

We have a is

M = 4.)
or

All of these ideas rely on one simple mathematical concept, which is the to prove the Inverse Function Theorem.

Contraction Mapping Theorem

the contraction xed point theorem. We will need the theorem to prove validity of Newton's Method and eventually

: [a, b] [a.b] with |(x) (y )| < k |x y | for x, y [a, b]. Contraction Mapping Theorem: Let be a contraction mapping. Then has a xed point, that is a point such that (x ) = x . More precisely, let x0 [a, b], and then x1 = (x0 ), and xn = (xn1 ). Then {xn } x (the
A contraction mapping is a continous function for all some xed

Denition :

0 < k < 1,

sequence converges) and

|xn x |

kn |xn x | 1k
Continue this bounding process to get

Proof: |xn+1 xn | = |(xn ) (xn1 )| k|xn xn1 |.

|xn+1 xn | k n |x1 x0 |
Now choose

m < n,

and calculate

|xn xm | = |xn xn1 + xn1 xn2 ...xm+1 xm | |xn xn1 | + + |xm+1 xm | (k n1 + + k m )|x1 x0 |
Simplifying the right-hand side using the expression for geometric series, we get

|xn xm |
The sequence is Cauchy and we are working over point, call it

km kn 1k

R,

so we at least know the sequence will converge to some

x .

Then for any

n kn kn + |xn x | 1k n,
we have proved the

|xn x | |xn xn | + |xn x |


Because the above is true for any

n,

and because

|xn x |

is arbitrarily small for large

inequality given in the statement of the theorem,

|xn x |

kn |xn x | 1k

23

x is a xed point of . By an inequality above, |(xn ) xn | = |xn+1 xn | k n , so |(xn ) xn | 0 as n . Meanwhile, by the continuity of the absolute value function and of , |(xn ) xn | |(x ) x | as n , so |(x ) x | = 0, and so (x ) = x as desired. Finally, to show the uniqueness of the xed point x , consider a case of two xed points x and x . Then |(x ) (x )| < |x x | = 0, a contradiction of the denition of the contraction mapping. f (x) Back to Newton's method. Let (x) = x M where M is the maximum of f (x) on [a, b]. We will show this
Now we can show that is a contraction mapping, so by the contraction mapping theorem we have a unique xed point. This xed point will be the root, since

(x ) = x

Why can we apply the theorem? Assume so

: [a, b] [a, b].

Then

f (x ) = 0. 0 < m < f (x) < M on [a, b]. We have a < (a) < (x) < (b) < b, (x) m 0 < (x) = 1 f M 1 M = k < 1 so is indeed a contraction mapping, so we
is precisely

can indeed apply the theorem. Now we are going to use this to help us show the inverse function theorem. Choose

f : R R with f (a) = b and

f (a) = 0.

We need this condition, because by the chain rule, we can only nd an inverse for a nonzero derivative.

Thus our question is: for did have such a function Assume

y close g , then

to

b,

is there a function

we would also have

exists. We are going to try

and solve for to get

x.

Then

x a

f (a)y f (a) to

g such that f (g (y )) = y ? As we will show later that if we g (f (x)) = x, or g (y ) = x. to nd x such that g (y ) = x. Consider the equation y b = f (a)(x a) get what g is approximately. Then we are going to iterate this sequence

g.
The sequence

Proposition:

x0 = a, xn+1 = xn

f (x) = y for y (b , b + ) where interval ((b , b + ). Moreover, this g (f (x)) = f 1 (x) .

f (xn )y f (xn ) will converge to a point x = x which satises will depend on f (a). Thus x = g (y ) denes the inverse function in the

function is dierentiable and it's derivative is given by the chain rule, so

We are going to prove this proposition next time. But just imagine how this is going to work in several variables. Even worse, we will need a better mean value thoerem than what we have right now, since we need a mean value theorem for functions from

to

V,

not just from

to

and that will either kill us, or we will kill it.

Lecture #15
We are continuing to prove the inverse function theorem using the method of contraction mapping.

Contraction Theorem:

Suppose we have

: [c, d] [c, d]
and for

k < 1,

then

|(x) (y )| k |x y | x, y . Then there is a unique xed point x [c, d] : (x ) = x and this is obtained as the limit of the x0 , x1 = (x0 ), x2 = (x1 ), ... Generally, we will use a that is continuous on [c, d] and we assume that it is dierentiable with | (x|) k on [c, d]. We ran through a whole proof of the contraction theorem last class.
for all sequence

Inverse Function Theorem:

Suppose we have a function

f :DR
which is continuously dierentiable with that

f (a) = 0.

We want to construct a

g : (b , b + ) (a , a + )

such

f (g (y )) = y . Fix a y near b.

Then dene

y (x) = y (x)

= =

= |y (x)| =
Since

(f (x) y ) f (a) f (x) 1 f (a) |f (a) f [(x)| 1 |f (a)| 2 x


Now we can precisely state the theorem:

is continuous,

> 0

such that

x [a , b + ] = [c, d]

24

Let =1 2 |f (a)|. Then for all y [b , b + ] the map y (x) is a contraction map of 1 with k = . 2 We will take a moment to prove this, but notice the immediate, corallary:

Theorem:

[a , a + ]

[a , a + ].

x0 = a, x1 = (x0 ), ..., xn+1 = (xn ) converges to the unique xed point xn of in x in this interval where f (x ) = y . Since this works for all y [b , b + ], we get an inverse with y the unique xed point of y ,equal to g (y ). Proof that y maps the interval [a , a + ] within itself, for |y b| < : 1 1 We already have that |x a| < . Then |y (x) a| |y (x) y (a)| + |y (a) a| < + = . Note that 2 2
The sequence This is the unique

Corallary:

the second term in this string of inequalities has the properties

|y (x) y (a)|
and

1 |x a| 2

|y (a) a| =
Now we have a sequence of functions we have

1 |f (a)| by f (a) y = < = 2 f (a) f (a) |f (a)| |f (a)|


on

gn g

[b = , b + ]

with

g0 (y ) = a, g1 (y ) = a

n+1

h f (a) . Generalizing this, we have Taylor polynomial of g (b + h).

g1 (b + h) = a +

gn+1 (y = b + h) = gn (y )

f (a)y by f (a) = a f (a) . Then f (gn (y ))y . This gives us the f (a)

Now we need to prove the inverse function theorem for multiple variables? Lets start with the contraction theorem for a vector space

Contraction theorem for V :

V.

Suppose we have

:CC
where

is a compact subset of

V.

This compact subset is a generalization of the interval

[c, d] in the one dimensional

theorem. Then

||(x) (y )|| < k ||x y ||


for

k < 1.

Note that we will rst have to ensure that

is an inner product space. Then we will dene the convergent

subsequence as before, as

x1 = (x0 ), ...
In the one dimensional case, the mean value theorem was critical in proving the contraction theorem was applicable. Thus we will need to prove the mean value theorem on vector spaces in order to prove the multi-dimensional inverse function theorem. But before we do this, we will have to dene the operators

operator norm.

To put a norm on the linear

T : V W,

you need to x norms on both

and

since

v, v = ||v ||2
is dened only if both spaces have an inner product. If we have this, we can dene the operator norm as

Denition: ||T || is the smallest positive real number such that ||T (v )|| ||T || ||v || for all v V .
For example, if we have that

T = 0,

then

||T || = 0.

Usually we considered the Euclidean norm on vector spaces.

It will be useful to talk about dierent types of norms on vector spaces (the topic of next class).

Lecture #16
V and W , then we will also have a norm on L(V, W ). Now we dene the operator norm ||T ||, T : V W, to be the least positive real number M such that ||T (v )||W M ||v ||v for all v V . In fact, M is the maximum value of ||T (v )|| on the sphere ||v || = 1. This means that we don't have to worry about extremizing v M over all of V . This M works for v B1 (0), but any v = 0 has a friend ||v || B1 (0) where we have dened B1 (0) as the boundary of the unit ball around the origin. To see that we only need to look at this subspace of V .
If we x norms on where

v ||v || 1 v ||T (v )|| = ||T ||v || ||v || ||T (v )|| T


25

1 T (v ) ||v ||

M M ||v ||

The proof that this number satises the properties of the norm is left for you to prove!

Non-Example:

Consider a linear operator

on the innite dimensional vector space

V = P (R).

T :V xn
so that Now

V nxn1 T . P (R) = {an xn + ... + ap : ai R} = a0 v0 + ... + an vn .


n

P x dP dx .

We can see that there is no

for this

||P || =
i=0

|ai | 1i

||vi || =
Therefore

||T (vi )||

= ||nvi || = n

so

T :V V

is not continuous for the topology assigned to

V. || ||2 and R with norm | |. Then this operator ||Tw || = ||w||. Then we will take v = w so

Now let's compute the norm of an operator that really does admit a norm. Suppose we have a linear operator

T :V R

so that

T (v ) = v, w

Consider

with an inner product

norm will also put a norm on the dual-space

V = L(V, R)

V.

In fact

||Tw (w)|| = | w, w | = ||w||2 = ||w|| ||w||V


Then

||TW (v )|| = |T (v )| = | v, w | ||v || ||w||


for all

v.

Now lets consider another norm on

V,

the sup norm dened by

||v || = max |xi |


and again the absolute value norm

||

on

R.

Then what is the norm on the dual space

L(V, R)?

We see that

T(
where

x i vi ) =

ai xi

a i = T ( v i ) R.

Then

T =
where

ai vi

vi

is the dual basis of

V
n

where

vi ( vj ) =

1 i=j . 0 otherwise
n

Then

|T v |w = |
i=1
So

ai xi |
i=1

||ai xi | =
i=1

|ai ||xi | max |xi |


i=1

|ai |
i=1

|ai |||v || |ai |,


and

n n i=1 |ai |. To show equality, nd a v such that |T (v )| = i=1 |ai |. Then we would have||T || = and we would have have a 1-norm on V . We see a general pattern that 1 p . If we have a p-norm on

||T ||

a 1-norm on

then we will have a

q -norm

on

where

1 1 + =1 p q
We ended class with nding the norm of

when we dene a 1-norm on

and

R.

26

Lecture #17
Recall our discussion of norms from last class. We described how a norm on the spaces

V and W

gives a norm on

L(V, W ).

For

T : V W,

we dene a norm like this:

||T ||L(V,W ) = max


i.e. the max value of number such that (1) nd (2) nd have shown

value of

||T v ||W

for

v B1 ||T || M.
respectively, so you will = the least positive

||T v ||W with ||v ||V = 1. ||T v || ||T || ||v || v V. A some upper bound for ||T v || on the a vector v B1 with ||T v || = M .

We can equivalently dene the norm as general strategy for nding

||T ||

is to

boundary of the unit ball, call the bound

If you show the above two statements, you will have shown

||T || M

and

||T || M

||T || = M . Theorem: For T self-adjoint,


Now consider

then

the 2-norm on the domain and range of

||T ||

for general

||T || = maxi |i | = |largest | when the operator norm is taken with respect to T . This can be proven using the schematic above. operators T : V W , T given by matrix A. Let V and W both have the

norm. Claim:

||T || = M = max(||Ai| ||1 ).


m n

Show: We follow the schematic outlined above.

v = i=1 xi vi . If T (v ) = i=1 aij xj . Then ||T v || = maxi (|yi| |) = |yk | (for some k ) = | akj xj | | a || x | ||v || j |aij | where in the last step we have used the fact that we are taking the - norm on V . kj j j (2) Suppose M = ||Ak ||1 for some particular k . Let j =sign(akj ), or j = 0 if akj = 0. Let v = ( 1 ,... n ). Note ||v || = 1. Then we easily show that ||T v || M . So now we have proved the claim ||T || = M = max(||Ai| ||1 ).
(1) Write Now we are going to talk about the mean value theorem for higher dimensions. You should recall the statement of the mean value theorem for

f : V R.

The statement and proof of the analagous (but weaker) statement for

f :V W
proof.

relies on operator norms. In fact, the theorem says that

||f (a) f (b)|| ||dfc || ||b a||

for some

on

the line between

and

b.

We won't prove this in class, but you should look in the book to get the avor of the

We will use this multiple-dimensional mean value theorem when we prove the inverse function theorem. We will prove the inverse function theorem in the special case with

df0

the identity in

L(V, V ).

The general case only requires

f : V V , 0 0, f continuous and dierentiable, f : V V continuous and dierentiable with


maps

df0 invertible.

The general case can be reduced to the special case, so our argument from the special case sucient.

We reduce the general case to the special case by replacing and then by replacing

with

1 f df0 ,

so that

has

f rst with a translated function, so that f df =the identity.

0 0,

Lecture #18
Today is a guest lecture with Sarah Koch. Today we prove (or almost nish proving) the inverse function theorem as stated in Edwards (Theorem 3.3).

f : Rn Rn is C 1 in a neighborhood W Rn of a point a Rn . (Recall that C means continuously dierentiable.) Supppose that dfa is invertible. Then f is localy invertible at a; that is, there exist open sets U W , U a, V containing b = f (a), and a one-to-one map g : V U such that g (f (x)) = x x U , and f (g (y )) = y y V .
The theorem reads: Suppose that the mapping

Some remarks on the inverse function theorem: the theorem basically tells you that if a function's derivative is invertible at a point, then the function iteslf is invertible in a neighborhood of that point. In analysis, you often nd cases like this where a function is shown to mimic some behavior of its derivative. Another important note: the inverse function theorem is local! Even if you have local inverses at all points, you cannot necessarily stitch them together into a global inverse. Consider, for example, the function

f : R2 R2 (r, ) (er cos , er sin )


You can visualize this as the collapse of a helix to a circle. theorem, so the function has local inverses everywhere. The function is not globally injective, so it is not globally invertible, but at every point the function nonetheless satises the conditions of the inverse function

27

We are going to prove some lemmas before we prove the inverse function theorem. You should already have the preliminaries in place: operator norms, the sup or   or 0 norm that takes the maximum component of the vector, etc. Here is a useful corollary to the multivariable MVT: Let neighborhood of the line L with endpoints

f : U Rm be a C 1 map, with U Rn a : R Rm is a linear transformation, then


n

Edwards Corollary 2.6:

and

a + h.

If

|f (a + h) f (a) (h)| |h|0 max ||dfx ||


xL
The idea to prove the above is to apply the MVT to a function taking

x f (x) (x).

The proof is in Edwards.

Edwards Corollary 2.7:


Let

We will also use another corollary to the MVT:

U Rn

be an open set containing the cube

df0 = I h = x.

. It follows that if

||dfx I || <

x Cr ,

Cr and let f : U Rn then f (Cr ) C(1+ )r .

be a

C1 map

such that

f (0) = 0

and

This is another corollary in Edwards, and the idea of the proof is to apply the previous corollary to We need one last corollary from Edwards.

= df0 = I ,

Edwards Corollary 2.8:


Let

f : R n Rm

be a

C 1 map

at

a Rn .

If

dfa : Rn Rm is

injective, then

is injective on a neighborhood of mapping on a

aR

We will also use the conraction mapping theorem proven earlier: Let compact set

C,

with contraction constant

k < 1.

Then

has a unique xed point

: C C be a contraction x C .

f : Rn Rn be a C 1 map such that f (0) = 0 and df0 = I . Suppose also that ||dfx I || < 1 x Cr . Then C(1 )r f (Cr ) C(1+ )r . Moreover, if we dene V to be the interior of C(1 )r and dene U 1 =intCr intf (V ), then f : U V is bijective (and therefore has an inverse g ). The map g is dierentiable at 0 and it is the limit of the sequence dened by g0 (y ) = 0, gn+1 (y ) = gn (y ) f (gn (y )) + y.
Let

Edwards Lemma 3.2, Fundamental Lemma:

We are ready to prove the fundamental lemma in our proof of the inverse function theorem:

Proof of the fundamental lemma (Edwards Lemma 3.2):


We already have from Cor 2.7 that

corollaries. Why? We use Cor 2.6 with

f (Cr ) C(1+ )r . We = df0 = I to see that

can also show that

is injective using the above

|f (x) f (y ) (x y )|0 |x y |0 x, y Cr
We can rearrange this inequality to get

(1 )|x y |0 |f (x) f (y )|0 (1 + )|x y |0 x, y Cr


The above gives us that

restricted to

cleverly use the contraction mapping. Fix

Cr is injective. Now y C(1 )r . Dene y : Rn Rn

we show that

C(1

)r

f (Cr ).

To show this, we

y : x x f (x) + y
To show that We can write

is a contraction mapping, rst we need to show that

maps

Cr

to

Cr

(and not outside of

Cr ).

|(x)|0 |f (x) x|0 + |y |0 = |f (x) f (0) + df0 (0)|0 + |y |0 |y |0 + |x|0 max ||dfx df0 || (1 )r + r = r,
xCr
which shows that

y maps Cr to Cr

(and not outside of

Cr ). z Cr .
Then

Now we need only to show that

actually contracts. Fix

|y (x) y (z )|0 = |f (x) f (z ) (x z )|0 |x z |0


by the inequality we derived at the start of our proof of this fundamental lemma. Then we can use the contraction mapping theorem to say that

has a unique xed point. But this unique xed point is

f 1 ( y ) !

28

Lecture #19
Today we start integration. We are familiar with the interpretation of the integral in single variable calculus. If we have a function interval

f : R R,

than the integral

b
a

f (x)dx

is interpreted as the area under the graph of

on the

[a, b].

In order to generalize this notion of integration, we have to rst generalize our idea of area. The

way that we are going to do this is to generalize simple regions in

Rn

(namely rectangles) and then use this to

approximate more general regions. We dene the area of a rectangle as

a(R)
where

R,

the rectangle, is dened as

R=
Then

[ai , bi ]
i=1 n

a(R) =
i=1

(ai bi )
in

Denition:
1.

We say that a subset

SV

has an area

a(S ) 0

if the following is true:

a nite inner cover of

by an open disjoint rectangles

0 Ri

where

0 n i=1 Ri S
and

m 0 a(Ri ) > a(S ) i=1

where 2.

is has the same denition as

except now we are considering open subsets

(ai , bi ).

a nite over cover of

by closed rectangles

Ri

where

n i=1 Ri S
and

a(Ri ) < a(S ) +


i=1
Not all sets have an area (or content).

Example:

S=R
then

a(S ) = a(R)

Non-Example:
S [0, 1] R
of rational numbers in The function 1. 2.

[0, 1].

You can see that the greatest lower bound for the area is 1 and the least upper bound

for the area is 0. Since these are not the same, there is no dened area.

a(S )

has the following properties:

S S = a(S ) a(S ) S, S
are disjoint

= a(S S ) = a(S ) + a(S ) f : R R+

Suppose that we have a function

So > 0, > 0 such that |x x | < . Therefore a |f (x) f (x )| < ba on the entire inerval [a, b]. Now we choose an N so large that bN < , so that we divide the interval into N equal parts. We can then construct a lower choosing the minimum value of f (x) on each interval

Proposition: If f is continuous on [a, b] then a(S ) exists. Proof: f is uniformly continuous on compact set [a, b].

29

[xn , xn+1 ].

To create an upper cover, we can do the same thing but just choose the maximum value of

f (x)

on each

of the intervals

[xn , xn+1 ].

We dene the minimal rectangles

0 Rn

and the maximal rectangles

Rn .

Then

0 a(Rn ) a(Rn )=

ba ba |f (xmin ) f (xmax )| = N N ba N f : Rn R+ .
He did this by calculating the

Now we will have to generalize this to the case where we have a continuous function outer cover is identical to the way that Archimedes found an approximation for areas of inscribed and circumscribed n-gons of a circle. He went up to

This technique of determining the area by nding the upper limit of an inner cover and a lower limit for the

n = 96

to determine that

3
which gives us the rst two decimals of the area function 1.

10 1 <<3 71 7

.
properties of the integral function

We will dene the integral in terms of the converging value of inner-covers and outer covers. The properties of

a(S ) are the same as the b c b f = a f + c f for any c [a, b]. a m


is the min of

f (x)dx.

2. if

on

[a, b]

and

is the max of

on

[a, b],

then

m(b a)
a
3. Consider the function

f M (b a)

F (x) =
a
for

x [a, b].

Then

F (x)

is dierentiable and

Proof: F (x + h) F (x) =

x+h
a
on

x
a

F (x) = f (x). x+h = x f where F (x + h) F (x) max(f h


on

min(f
then as 4. If

[x, x + h])
and

[x, x + h])

h0 x
a

we have that

min(f )
then

max(f ) f (x).

is a function with

G =f

b
a

f = G(b) G(a)
So

Proof:

f = F ( x)

also satises

F = f.

then tells us that the function is constant. Therefore

F (x) G(x) has a zero derivative. The Mean Value Theorem C = G(a) for F = G + C and 0 = G(a) + C .

Lecture #20
Review of the end of last lecture: Consider a continuous function

f : [a, b] R+
where the greatest lower bound and the least upper bound of the content approach each other. This means that

f = a(S )
For

x F (x) = a f = a(S ) Theorem: F (x) is dierentiable on [a, b] and F = f with F (a) = 0 Proof: First we need to prove that the derivative exists at x = c. We x [a, b],
we dene the fumction

have

F (c + h) F (c) h0 h lim

c+h = =
a

c+h
h0

f h h

c
a

lim

30

We will show that this exists by using our bounds on the integral

c+h min(f )
As

h f

max(f ) c.
Then

h0

then

min(f ) f (c)

and

max(f ) f (c)

since

is continuous at

F (c) = f (c)
as required. A corallary to this theorem is known as the fundamental theorem of calculus

Fundamental Theorem of Calculus:


G(b) G(a)

If

is any dierentiable function on

[a, b]

with

G = f,

then

b
a

Proof:

We know that

G =F
on

[a, b].

Then

G = F + G(a).

Then we have that

(G F ) = 0 = G = F + C
on

[a, b]

where we have used the mean value theorem. Now we evaluate this at

x=a

G(a) = F (a) + C = C
Then

F (b) = G(b) C = G(b) G(a).

The consequence of this theorem is that we can guess the integral of a

function and know that it is a unique solution (up to an additive constant).

Linearity of the Integral:

Want to show that

(f + g ) =
a

f+
a

Proof:

Let

and

be anti derivatives. Then

(F + G) = F + G = f + g
where we have exploited the fact that the derivative is linear. Then we have that

(f + g )

(F + G)(b) (F + G)(a)

= F (b) + G(b) F (a) G(a) b b = f+ g


a a

Substitution:
on

If f is continuous [c, f ] = [f (a), f (b)]. Then

on

[a, b],

and has a continuous, positive derivative

f (x) 0

and

is continuous

g (u)du =
a

g (f (a))f (a)dx G =g
on

Proof:
consider

We will use the chain rule and the Fundamental Theorem of Calculus (FTC). Let

[c, d].

Now

(G f )

= =

G (f (x)) f (x) (g f ) f

Then we will have

g f f = G f (b) G f (a) = G(d) G(c)


Eventually, we will generalize this theorem to several variables. It will turn out to be

f (A) g =

g f (det(df ))
A
31

The term apparent

Integration by Parts:

det(df )

is called the

Jacobian.

a b

We will state this a little dierent than is conventional so that the proof will be more

f g +
a

f g = f (b)g (b) f (a)g (a)


We can just pluf this into the integral

The proof of this relies on the product rule.

(f g ) = f g + f g .

b
a

(f g )

and

see, by linearity of the integral, the desired result.

Cute Example of Integration by Parts:

0 /2

sin2 (x)dx = sin cos |0


/2

/2

+
0

cos2 = (1 sin2 )

So we have

2 =
0 0

/2

sin (x)dx
2

/2

=
0

1= 4

/2

sin2 (x)dx

Theorem:

The set

in

Rn

has a content

as the limit points from both

and

its boundary A A0 . Then R2 A = A A = {real


numbers in

is negligable. We will dene the boundary

[0, 1]} A

so

A = [0, 1]. Proof: Choose

> 0.

I give you a union

0 j . Ri A R

Since we know that

has a content, we know

a(Rj )
So let

a(Ri ) < .

j R0 = union of closed rectangles. (I will defer to the proof in Edwards for this one). Bij = R i n Theorem: If f : R R which is bounded with bounded support and is continuous except on a, a negligable n set in R . Then f exists. Such functions are the most general types of functions that can be integrated (but S
they are not the only types of functions).

Lecture #21
There are three major topics for the upcoming midterm exam 1. Local analysis of

f :V R

near

a V.

This involved many dierent types of derivatives:

dfa , Dv F (a),

Dv (Dw f )(a),

etc. You will be expected to know the form of Taylor polynomials in both one dimension and

multiple dimensions as well as how to analyze critcal points of a function using the second derivative. 2. Inverse function theorem. You should know the statement of this theorem although you will not be required to prove it. It will be important to understand the tools used to prove the inverse function theorem such as contraction mappings, operator norms, and the MVT in single and several variables. 3. Integrals of functions

f : R R.

Should know the funcamental theorem of calculas, integration by parts, and

substitutions. The emphasis of this exam will be on the rst two parts, though integration will be included. We will skip over the next section in the book which is called Fubini's Theorem. This theorem allows us to

integrate functions with rectanngular domains by integrating each variable seperably. Instead, w;e will focus on the change of variables formula. The statement of this formula is as follows. Suppose we have functions that

and

such

g:V f :V

V R+

32

where f is continuous and g is dierentiable. det(dga ) = 0 for all a A. Then

If

takes a subset

A V g (A)

where

dg

is invertible on

A,

then

f= (f g )| det dg |
A

g (A)

in one dimension, the change of variables formula is just the substitution formula.

Example (Polar Coordinates): The goal is to make the change of variables:


(r, ) (x, y ) = (r cos(), r sin())

Consider the matrix form of

dg : dg(r,) = cos sin r sin r cos

We can then compute the determinant:

det(dg(r,) ) = (r cos2 + r sin2 ) = r


This is why in general we have

f (x, y )dxdy =

f (r cos , r sin )rdrd


Then

Consider the case when

g =T :V V
g (A)

a A T

is an invertible transformation

dga = T is independent of a A. det(T ) = 0 R . Then f= f T | det T | = | det T | f T


is linear,

dga

is invertible for

Now consider the super-special case where So when

f = 1 on V . Then the LHS= a(T (A)) and the LHS= | det T | a(A) g = T is linear and f = 1, the change of variables theorem is just the statement of the following theorem Theorem: If A V is contented and T : V V is linear, then T (A) is contented and a(T A) = | det T | a(A) So when g = T is linear, the change of variables theorem is the statement about linear operators on real vector spaces with a specied basis (v1 , ..., vn ). If we have that T =
then

1 0
If

0 2 2 = 0,
then

T (x, y ) = (1 x, 2 y )

and then

a(T A) = 1 2 a(A).

T (A)

lands in a proper subspace

W V.

Now we go about proving the change of vairables formula.

Proof of Change of Variables:


T
for

1. If the theorem is true for

and

T1 , ..., Tn

then it is true for

S , two linear operators, T1 T2 ... Tn . then

then it is true for

T S.

More generally, if it is true

a(T S )(A) = a(T (SA)) = | det T |a(SA)


and

det(T S ) = det T det S = | det T | | det S | a(A)


2. Show that any

is the product of some simple matrices, where we will check the formula by hand.

3. Verify the formula for out special matrices case

and any contented set

by checking it when

A = R.

In this

a(T (R)) = | det T | a(R)

Good luck preparing for the exam!

Lecture #22
Today we will prove the change of variables theorem. If we have a function where

f : V R along with a map g : V V

dg

is an invertible function on

A,

then

f=
g (A)

f g | det dg |
A

We will start by proving this for the special case where we have

33

1. 2.

f =1 g
is an invertible linear map

dg = g

is constant.

Then we need to show the change of volumes formula

vol(g (A)) = | det g |vol(A)


It is enough to prove this for a rectangle

Ri A Rj
We consider the dierence

vol(Rj )
Then we also need to show that

vol(Ri ) <

vol(g (Rj ))
and

vol(g (Ri )) < | det g |

g (Rj ) g (A) g (Rj )


for linear

g. T1 , ..., Tn = T = T1 T2 ... Tn

It is enought to prove the special case for linear maps

We will chek the formula for the linear maps and rectangles

A.

We will consider only the special linear maps

1
.. .

0 ci
.. .

Ci = 0 1

1ij Eij = 0
we then note that .. .

det Ci det Eij


We rst look at the properties of these matrices 1. 2. 3. 4.

= ci = 1

A Ci = scale Ci A = scale AEij = adds Eij A = adds

colummn row

iby ci

iby ci ito
column

column row

j to

row

i A

Now we note that we can use these special matrices in combination with each other to turn an arbitrary matrix into the identity. This will look like

...Ci AEij ... = I


We will check this rst for the rectangle

A = R = [a1 , b1 ] ... [an , bn ] g (R)


34

Lecture #23
Last time we proved the change of variables formula for the special case where we have a linear map

T :V V
which acts on a contented subset

AV
then we found that the volumes

v (T (A)) = | det T | vol(A)


This coresponds to the original statement of the change of variables formula when even more special case where

g=T

and

f = 1.

Consider the

A = [0, 1] [0, 1] ... [0, 1]


Then we have

vol(A) = 1
Then we need to show that

vol(T (A)) = | det T |


Lets start with

independent vectors

p1 , p2 , ..., pn

given by a basis of

V.

Then

T : V V
dened by

ei pi
so we have

n
standard cube of volume 1

parallelpiped

dened by the

pi = {
i=1

xi pi : 0 xi 1}

This has volume

| det(pi , p2 , ..., pn )| = det T


where the matrix is composed out of the

pi

column vectors. Now suppose we have any vector

vV =
where

yi pi = xi + mi L : V
by

mi

is an integer and

0 xi 1.

If I dene the lattice

L={
i=1
then every

mi pi : mi Z}

vV

is of the form

+z
where Let

L and z P . In the case {pi } be a basis of Rn . Then

where the

pi = ei ,

then this is a square lattice.

L={
is called the associated lattice. If we also have that

mi pi : mi Z }

B = large
what is a good estimate for

ball around the origin

#(L G)

vol(B ) | det T | V
by using equal radii

A sensible question would be: If we try to pack round balls into at pts

and centering the balls

L,

how much of

do we cover? It is not obvious, for example, that packing the spheres according to a

35

square lattice is better than packing them with respect to a parralelpiped lattice. First we note that if we have two spheres centered at

and

then the condition that they do not overlap is equivalent to

r || ||
where

is the radius of the balls. We can take

r=
where

1 ||||min 2 V
are we covering with these

||||min

is the minimal distance between two lattice points. What proportion of

spheres? Then we have that

proportion of

B covered

by the balls =#(L

B ) vol(Ballr ) =

C rn

vol(B ) | det T |

vol(B )

Then we want to nd prop of

B covered =

1 ||||min )n Cn ( 2 det T

Lets compute this for a square lattice. For this case, we have that

pi = ei
and

L = Zn Rn
so the proportion of

Rn

covered by balls of radius

r=

1 2 centered at integral points. Then

1 Cn ( 2 ||||min )n = .9069 det T 1 2 = 23 and ||||min = 1. Interestingly enough, the best spherical packing in 2 dimensions 3 2 is found for a hexagonal lattice...this is what bees use! There is a sketchy proof for this in 3 dimensions. What we
where

det T =

1 0

are interested is for an optimal packing in

dimensions. We know what is going on precisely in dimensions

= 2, 4, 8, 24

Noam Elkies proved the best packing problem for 24 dimensions. Lets prove the optimal packing problem for 2 dimensions. Now lets move to proving the change of variables formula for a nonlinear, continuous invertible and

a b.

Then there is a small rectangle

around

g : V V where dga is a such that for all x in this small rectangle, we have

the estimate that the operator norm

||dga dgx I || <


Then

(1 )n | det dga |vol(R) vol(g (R)) (1 + )n | det dga |vol(R)


So

vol(g (R)) | det dga |vol(R)


which would equality for the linear case.

Lecture #24
Theorem: Change of Variables
at all points of Suppose

A,

with

dga

invertible for all of

f : V R continuous, A V ; g: V V continuously dierentiable A, that is det(dga ) = 0. Then g(A) f = A f g | det(dga )|. Ri


such that vol(g (Ri ))

Note that we could cover that

with small rectangles

| det(dga )|

volRi . Then we have

f=
g (A)

f
Ri

f (g (a)volg (Ri )

36

Where the last equality holds by the continuity of

f.

This will be equal to

f (g (a))| det(dga )|vol(Ri ) =


A

f g | det(dga )|.

Proof:

We just went over the heuristic proof, but now we need to make it precise. We will not use epsilonssee Take

the book if you want tobut we will at least make it more rigorous.

1 ||dga dgx I || origin, equal to R

Lemma:

g : V V , a b, dga invertible, dg continuous. Let R be a small rectangle around a : x R, with respect to the sup norm on V . Let C be a rectangle around zero around the 1 translated by a. In other words, C = a (R) where a is the translation map, a (v ) = v + a.
for all

Finally, our lemma is then the statement that

1 dga (C1 ) b (g (R)) dga (C1+ )


A corollary of our lemma is that

|detdga |(1 )n vol(R) vol(g (R)) (1 + )n |detdga |vol(R)


Now, how can we prove our lemma? function theorem. Actually, we essentially proved the lemma when we proved the inverse Recall that we proved that if

f : V V , df0 = I ,

then

small

rectangle

around

where

||dfx I ||

for all

x R,

where

C1 f (C ) C1+
This older lemma is a special case of the lemma we want to prove, so we reduce the lemma we want to prove to this special case by using

1 1 f = dga b g a 1 df0 = dga dga = I

Improper integrals:
instance,

Now we discuss improper integrals.

contented region. An example is

ex dx =

Another is

An improper integral is an integral over a non-

1 x2 dx. We dene these improer integrals as, for

1 dx = lim t x2

1 1 dx = lim 1 = 1. t x2 t

In general, we dene the improper integral of the limit of the integral, as the region of the integral approahces the region we desire. Of course, these limits are not always dened. For instance,

1
has a well-dened nite value if

x dx = lim
a t 1

xa dx a = 1,
we get a logarithm, and the limit

a < 1,

but not otherwise. When

diverges. A similar calculation shows

x dx = lim
a 0

xa dx

converges for

a > 1.

This is a dierent sort of improper integral. The region under the graph is non-contented

because the function itself diverges.

Now we do the famous Gaussian integral

2 =

ex dx

2 2 ex dx = . Let = ex dx. Then by Fubini, 2 2 2 2 2 ex dx = ex dx ey dy = e(x +y ) dx dy.


R2

we have

That allows us to make the polar coordinate change of variables,

2 =
0

er r dr d = =
0

1 = 2

37

Next time we'll learn about another famous integral, perhaps the second most important integral, from Euler. It's called the gamma function, and it's dened by an improper integral.

(x) =
0
We'll show that

tx1 et dt.

(x + 1) = x(x) (x) = (x 1)!

Lecture #25
Last time we discussed the notion of an impropor integral. For example, consider the integral

xa = lim
t 1

xa = lim a < 1.

xa+1 t a + 1

=
a

ta+1 1 1 = a+1 a+1 a+1

We note that, as expected, this converges if integral:

As an excercize, you should also try evaluating the improper

xa
Now lets compute the improper integral that denes Euler's gamma function

(x) =
0
We note that this limit exists as

t
x1 t

e dt =

0,N

lim

tx1 et dt
as

x > 0,

then

x 1 > 1 and also et 1 1 ta dt

t 0.

We can compose this with

for

a > 1

. No as

N ,

we have

since the denominator is larger than any xed

N x+1 0 eN point of N . Now tx+1 et x1 t et tx1 et < 1 <

for

t m,

1 2 t
m

1 t2 x > 0,
we see that

Therefore, the gamma function is nite (i.e. it exists). Why is this integral important? When

(x + 1) =
0 x

tx et dt

xtx1 (et )dt

= t (et )
0

= x(x)
where in the second step, we have used integration by parts. You can see that the gamma function is closely related to the ! (factorial) operator. Now for another interesting property of the gamma function. We note that the gamma function is dened for all positive Then we can iteratively dene

x, not just the integers (for (x) : R {0, 1, 2, ...} R. Then (x)(1 x) =
38

which it is essentially the factorial operator).

sin x

This function has poles at al

x = n Z.

Notice how the gamma function acts on half-integers

1 = ( )2 = 2 sin 2
So

1 ( ) = 2 ei = cos + i sin

and it follows that

We can use the gamma function to approximate

Fun Fact: 52! 1067 .

n! log
of

The number of particles in the universe is on the order of

De Moivre was the rst to come up with a basic approximation for the the

1080 n! .

log(n!)
Consider

log 1 + log 2 + log 3 + ... + log n

F (x) F (x)

= =

x log x x log x + 1 1 = log x


n

Now De Moivre used the trapezoid method to make the approximation

log 1 + log 2 + ... + log(n 1) <


1
Then LHS is

log x < log 2 + log 3 + ... + log n

log(n!) log n
The integral is

n log n n + 1
The RHS is

log(n!)
So we have teh aprosimation

log(n!) n log n n +
Stirling was the one to nd the constant

1 log n 2

n!

nn n 2 en

Now we will talk about integration over manifolds. Consider a manifold

M V
Where the manifold is dened by the graph of a dierentiable curve

: [a, b] Rn
where

(t) = (x! (t), ..., xn (t)).

Nowwe will integrate over the entire curve by using the approximation of dividing

it up into many small lines dened by the tangent vectors all along the path

the length of the path, we compute the limit by dividing the path into small segments

(t) = (x1 (t), ..., xn (t)). t0 , t1 , ..., tn . So

To compute the length

s( )
i=1

|| (t) (ti1 )|| =


i=1

(xi (ti ) xi1 (ti1 ))2 + ...

Now we can dene the length of the curve as

s( ) = lim s(, T )
T 0
Where

Proposition:

|T |

is the length of your subdivisions in If

t. s( ),
dened as the abovel limit, exists and is equal to

(t)

is continuously dierentiable, then

s( ) =
a

|| (t)||dt

We will prove this next time.

39

Lecture #27
The notation for a functions that

= 1-form using the basis {e1 , ...en }. We use l(a)(ei ) = fi (a) for i = 1, 2, ..., n. This gives n fi : U R which determine the 1-form completely. Remember from linear algebra that v = xi ei so l(a)(v ) = xi fi (a) f xi

If

= df

then

l(a)(ei ) = dfa (ei ) =


so

fi =

f xi for

= df .

Then

{e1 , e2 } (f1 , f2 ) =
But remember that the partial derivatives are symmetric:

f f , xi x2

2f 2f = xi xj xj xi
So for this specic case, we have

f2 f1 = x2 x1
but this is not necessarily true for arbitrary 1-forms. Our notation for 1-forms once we have a basis is

df
Now how to we integrate a 1-form

= =

f1 dx1 + ... + fn dxn fn f1 dx1 + ... + dxn x1 x1

over some curve

CU V
where the image of image( ([a, b]))

=C

Denition:

=
C

( (t))( (t))dt C.

Note that this does NOT depend on the parameters of Choose a basis

{e1 , ...en }
Then

so that

(f1 , ..., fn )
is a function on

U.

(t) = 1 (t)e1 + ... + n (t)en


and we see that

( (t))( (t))

= ( (t))( =
i

i (t)ei )

i (t) ( (t))(ei )

Then

=
i

fi ( (t))i (t)dt

which is the sum of If

= df , then

of the path is a

n integrals in 1 variable. = f ( (b)) f ( (a)). The fact that the integration of a 1-form depends only on the endpoints C generalization of the fundamental theorem of calculus. It then follows that if C is closed, then df = 0
C
40

Note that even if  we may not have that

= df

if the region

contains holes where we dene

and

as

= P dx + Qdy
so that

P ( (t))1 (t) +
a

Q( (t))2 (t)

To show this, lets consider the region

U = R2 {0, 0}
with

Q =
and then

y + y2 x 2 x + y2 x2

Py Qy
Now we ask, is

= =

(x2 + y 2 ) + 2y 2 y 2 x2 = (x2 + y 2 )2 (x2 + y 2 )2 2 2 2 (x + y ) 2x = Py ( x2 + y 2 )


?

P dx + Qdy = df
Lets compute the integral of this one form with respect to the polar coordinates over the unit circle:

=
C
Therefore,

= sin(t)( sin(t)) + cos2 + sin2

(cos(t))2 = cos2 + sin2

2 1

1 = 2

df =
Now we will consider the 2-form on

where

dim V = 2.

This will be a function

F :U R
where

= = =

F dxdy df = fx dx + fy dy (fyx fxy )dxdy = 0 = d2 f

This will be a fundamental, general property that

d2 = 0
For now, we will have to take it as a given that we dierentiate 1-forms in the following way to produce 2-forms

dw
Now we can state an important theorem

= =

P dx + Qdy (Qx Py )dxdy

Green's Theorem in R2 :
curves

Let

D.

Let

be a 1-form on

D be a compact, connected region in whose boundary consists of oriented, closed U D and d the 2-form. Then d =
D
41

This should remind you of the fundamental theorem of calculus

df =
[a,b]
as an example, if

f = f (b) f (a)
D

= xdy =
vol(D )

Lecture #29
Last time we discussed functions with

f :U R U R
n
. We called these 0-forms. We called 1-forms maps

: U L(V, R) = V
an example is

= df .

Today we will discuss 2-forms. These are maps

: U Alternating V
is a vector space of dimension

bilinear forms on

V = (2 V )

n(n1) 2

n 2 . If we have, for example

B :V V R
then

is bilinear in each product

B (v, w) = B (w, v ) = B (v, v ) = 0


A basis for

(2 V )

are the bilinear forms

Bij (ei , ej ) = 1 = B (ej , ei ) = 1. dxi dxj = Bij

Here is some notation. We will dene

for

i < j.

Now our general 2-form looks like

=
i<j
and our general 1-form looks like

fij (a)dxi dxj

=
that

fi (a)dxi dim V = 2
then there is only one pair such

As an easy example, lets write a bilinear form in 2-space. Suppose that

i<j

and we have

f12 (a)dx1 dx2


In three dimensions we would have the two-form

f12 (a)dx1 dx2 + f13 (a)dx1 dx3 + f23 (a)dx2 dx3


This matrix looks like

0 f21 f31
There is an operator

f12 0 f23

f13 f23 0

d : 1-forms

on

U 2-forms

on

U. =
i

=
i=1

fi dxi = d

dfi dxi (
i j

= =
i<j

fi dxj )dxi xj dxi dxj

fj fi xi xj

42

Remember that for 1-forms, we had the denition of a line integral as

and the theorem

( (t))( (r))dt

= df = f ( (b)) f ( (a))

There are equivalent relations for 2-forms. We will state, but not prove them. We can integrate a 2-form over a surface

D U.

If we have a map that takes

RU
where

is a rectangle and the image of

is

D,

then we have the denition

=
D
Note that the we also have the denition

fij ((s, t)) det


R i<j

i s j s

i t i t

(s, t) = (1 (s, t), 2 (s, t), ..., n (s, t))

Theorem:

If

= d ,

then

=
i

d =
D

This is a generalization of Green's theorem in question to ask is what is a

R2 . k k

There is 2-forms. But why stop there?!?!?! The next logical

k -form

for the case where

3 dim V
multi-linear forms on

We dene a k-form as

k : U alternating,
We have

dim(k V ) =
Then

n k
To gain a little intuition on these

B (vi , ..., vk ) is determined by the values B (ei1 , ei2 , ..., eik ) on basis elements. 3 form
on

forms, lets just write some out explitly.

R3

3 form

on

R4

B (e1 , e2 , e3 ) B (e1 , e2 , e3 ) B (e .e , e ) 1 2 4 B ( e 1 , e3 , e4 ) B (e2 , e3 , e4 )

In general, we can take any

k -form

to a

k + 1-form.

Lets try this in 3 dimensions

V = R3
lets try to take a 0-form to a 1-form

f (x, y, z ) = df = fx dx + fy dy + fz dz
Now lets take this 1-form to a 2-form:

= d =

f3 f2 y z

dydz

f1 f3 z x
43

dxdz +

f2 f1 x y

dxdy

this is known as the curl. Now lets dierentiate this

d2 = 3 =

f1 f2 f3 + + x y z

dxdydz

this is called the divergence. We see in this example some general propert of dierential forms. k-form The statement that

k+1-form k+2-form

d2 = 0

then becomes the equality of mixed partials. The other property of dierentials form is

the generalized Stokes Theorem

dk1 =
Dk

k 1
Dk1

Next time, we will talk about the calculus of variations!

Lecture #30
Today we will start calculus of variations and derive the Euler Lagrange equation. One basic problem in the calculus of variations is to minimize the value of some integral

F ( )
a
over the space of functions:

f ((t), (t), t)dt

: [a, b] [, ]
which are dierentiable. We could, for instance, minimze the distance of a curve between the straight line). Suppose we have functional

and

(this is clearly

F : C 1 [a, b] R C 1 [a, b] is the space maximize F on the subset


where of class-1 dierentiable functions on the closed interval

[a, b].

Now suppose we want to

M C 1 [a, b]
where M is the subset of functions that terminate on and This will generally not be the case, so true subspace. i.e.

is not a subspace.

. If and happen M is an ane space.

to be 0, then

is a subspace.

That is, it is the translate of a

M = C 1 [a, b](0,0) + 0
Lets investigate

dF : C 1 [a, b] R

which are the linear approximations to

at

At a critical point

dF = 0

on

T M = C [a, b](0,0) .

Lets dene another function

: ( , ) M
with the constraints that

(0) =
and

(0) = any
Then the function

vector in

T M

g = F : ( , ) R
has a critical point at 0 and

dF (any

vector in

T M ) = dF(0) ( (0)) = g (0) = 0

Now the question is: how do we calculate

dF ?

Note that

F ( + h) = F () + dF (h) + O(||h||)
44

but notice that we need a norm on

C 1 [a, b].

Then

F ( + h) F () =
a
Now our strategy will be to x

f ( + h, + h , t) f (, , t)dt

and estimate this.

f ( + h, + h , t) f (, , t) =
One might guess that

f f (, , t)h + (, , t)h + (quadratic x y f f (, , t)h + (, , t)h )dt = 0 x y

terms in

h, h )

dF(h) =
a

To show that this is true, we will need to evaluate the second part of this integral using integration by parts and exploit the fact that

h(a) = h(b) = 0.
a b

Lets deal with the second integral:

f f (, , t)h )dt = (, , t) y y
vanishes at

d dt

f (, , t)h(t) dt y

the left term is zero since integral. Now we have

h TM

and

b.

The second term is similar to the other term in the original

0=
a

d f (, , t) x dt
a b

f (, , t y

h(t)dt

Now, since

=0
for all

then we have the following famous second order dierential equation

d f (, , t) = x dt
asked to minimize the function

f (, , t y

Lets use this to solve the simplest application of the calculus of variations: the path of shortest length. We are

F ( ) =
a

1 + (t)2 dt 1 + y2

So we have

f (x, y, z ) =
and

f x f y
Lets plug this into the Euler lagrange equation

= =

0 y 1 + y2

f (, , t) = y
for a minimum of a maximum, E-L says that

(t) 1 + (t)2

d dt
This implies that

(t) 1+ (t)2

=0=

(t) (1 + (t)2 )3/2

(t) = (t)

= mx + b
45

as we would have expected. Now, we will start talking about what a reasonable norm is on our vector space all of the normal properties of the norm (triangle inequality, etc.) Remember that

C 1 [a, b] R.

This has to satisfy

||v ||1 ||v ||2 ||v ||


will use the norm

= = =

|vi |
2 vi

max(|vi |)

This suggests that the norm on a function space will require an integral. We will start with this next class, but we

|||| = max |(t)| + max | (t)|


t t

Lecture #31
We continue discussing the calculus of variations, using the same notation from last class. Recall that we have a function

Ft : V R b Ft () = f (, , t) dt
a
where

the subset of

V = C [a, b] is the set of continuous, dierentiable functions : [a, b] R. We dened M V where M is such that (a) = and (b) = for some xed and that dene M . The denition of the map f : R3 R

Ft

rests on the denition of some function

with

dierentiable.

We found that when we minimize Lagrange equation,

Ft on V ,

a necessary condition for a minimum is that

satises

the Euler-

fx (, , t) = S=
a
so we would want to use a function

d fy (, , t). dt S,

Let's imagine minimizing the surface area of a surface of revolution. We have the surface area

(t) 1 + (t)2 dt

f (x, y, z ) = x 1 + y 2 .
We won't nish solving this problem with the Euler-Lagrange equation. We'll use another method, discussed now. Euler noticed that there is a simplication of the Euler-Lagrange equations when the form of

does not depend on

z,

i.e.

does not vary explicitly on the third parameter,

t.

Re-writing the Euler-Lagrange equations with this

condition gets us Euler's rst integral:

fx (, )

d fy (, ) = 0 dt d fy (, ) = 0. dt with respect to t d fy . dt

fx (, )
Note that the LHS of the above is the rst derivative whose derivative has terms

of the function

f (, ) (t)fy (, ),

fx + fy fy
So we have Euler's rst integral equation, which says that

f (, ) fy (, ) = c = constant.
Let's try applying this to the problem of minimizing the surface area of a surface of revolution. We had

f (x, y, z ) = x 1 + y 2
46

fy = fx =
Then Euler's rst integral equation becomes

xy 1 + y2 1 + y2

which simplies to

1+2

2 1+2

=c

2 =
say

1 2 ( c2 ). c2 c,

We aren't sure how to solve a dierential equation of this form, but we imagine solving it for a xed value of

c = 1.

Then we have

2 2 = 1
which is the familiar form of a hyperbola. A hyperbola can be parameterized by

cosh(t) and sinh(t),

the hyperbolic

sine and cosine (look them up on Wikipedia!), so solutions to the above are parameterized by

t ((t), (t)) = (cosh(t), sinh(t)).


Sinh and cosh have the special property that they are each the derivative of the other, so we have the solution

(t) = cosh(t + d).

These curves are called catenary curves, and they're the way that strings hang from posts under

the weight of gravity. If we had specied

(a) = (a) = ,

then we would have

t ). = c cosh( c

We return to a point in our discussion from last lecture. We had a map

F :V R
and the dierential, a linear map

dFa : V R.
We glossed over the denition of this dierential in the innite-dimensional vector space denition of the derivative for nite-dimensional vector spaces. For map satisfying

F : V W,

we dened

V . Recall our old dFa as the unique linear

h0
But this denition requires a norm on

lim

F (a + h) F (a) dFa (h) = 0. ||h||


and a norm on

W.

Of course, in our case, we already have a norm on

W = R.

And the norm that we're going to put onV is the norm

|||| = max |(t)| + max | (t)|.


t[a,b] t[a,b]
A norm on

and

gives us an operator norm on

dFa .

But not all linear functions on innite-dimensional

vector spaces are continuous. For instance, consider the map on the space of polynomials that sends To be continued next class!

xn

to

n.

Lecture #32
We have dealt a lot with linear operators between nite dimensional vector space

T :V W
But what about innite dimensional vector spaces

V, W

such as the vector space of polynomials

P (F) = {an xn + ... + ao : ai F}


As an example, consider the eld

F = R, C
47

First we insist that

and

have the same norms

||
Then we insist that

|| : V R

T :V W
be continuous with respect to these norms. On

Rn , = = = |vi | v, v = max |vi |


In an innite dimensional vector space,

||v ||1 ||v ||2 ||v ||

2 vi

In all of these cases, the all of the norms tend to zero simultaneously. however, this may not be the case. For example, on

C [a, b]we
b

have

||||1 ||||2 ||||


These are not equivalent norms. Consider no equivalent.

=
a

||
a b

= =
atb

2 max |(t)|

whose graph is a triangle. As the base of this triangle goes to zero

(xing the height of the triangle), the 1 and 2 norms go to zero, while the

remains 1. Therefore, the norms are

Theorem: With respect to the sup norm, the vector space V = C [a, b] is complete. Proof: n is Cauchy. This means that
, N : n, m > N : ||n m ||0 <

but

axb
This means that for a xed which

max |b (x) m (x)| < {n (x)}


is Cauchy so there is a real number to

x [a, b],

the sequence of real numbers

n (x) (x).

Lets show that

n .

First x

> 0.

Then there exists

such that

||n m || <
Now for

n>N |n (x) (x)| = |n (x) m (x) + m x) (x)|

Hence

||n || <
so

n
But in

is just dened as a function [a, b] R. C [a, b]). To do this, we need to show that

We need to show that , chose

of continuous functions is continuous). To show that copy it). We say that a function

is continuous on [a, b] (i.e. that is indeed N so that ||n ||0 < 3 (A uniformly convergent sequence is continuous at x,chose > 0. (proof erased before I could

F :V W
on two complete normed vector spaces is dierentiable at

aV

if there is a continuous linear map

T :V W

48

such that

F (a + h) F (a) T (h) =0W h0 ||h|| lim T = dFa

Then

is unique.

Theorem:

The following are equivalent for a linear map

T :V W

of normed vector spaces

1.

a real number

M : ||T v || M ||v ||

for all

vV

2. T is a continuous map 3.

is continuous at

v=0

Proof:
1 = 2
By the continuity at

v.

choose

2 = 3
because this is just a special case of

||v w|| < = ||T v T w|| = ||T (v w)|| M

1 = 2
But remember that

3 = 1
consider

= 1. > 0

such that

||h|| = ||h 0|| = ||T h T (o)|| 1. ||T v || = ||v || T v ||v || ||v ||

||T h T 0|| =

||T h||

by linearity. Now for

v=0

for

1 works as a bound. We also have the following theorems

with

||h|| = .
If

Therefore,

M=

Theorem:

is complete, so is

BL(V, W )

since

is complete. Then we should call

BL(V, R) = V
We also have

V (V ) .

These are the topics of functional analysis, a subject developed by Hilbert and Banach.

49

You might also like