You are on page 1of 6

Calculus of Variations

Fix a space S (more details on this shortly). Let f be a real-valued


function on the space S, so, for each point p of S, f (p) is a real number.
Also, fix one particular point, po , of S.
Consider now a curve, , in S, so this assigns, to each value of its
parameter , a point, (), of S. Let the initial point of this curve be
the point po of S fixed above, i.e., let (0) = po . Denote by g the result of
evaluating the fixed function f along this curve, so this g takes, at parametervalue , the value given by g() = f (()). This g, then, is just an ordinary
real-valued function of one real variable.
We say that the point po of S is an extremum point of the function f
provided: For every such curve , we have dg/d|=0 = 0. This definition
requires, in other words, that as you go away from po (Thats what the curve
accomplishes.), in any old way you wish (Thats what allowing any such
curve accomplishes), the value of the function (Thats what evaluating f
along the curve accomplishes.) doesnt change to first order (Thats what
setting dg/d|=0 = 0 accomplishes.).
The calculus of variations consists of going through the definition above,
for various choices of the space S and of the function f on S. The goal, in
general, is to find some simpler criterion for when point po of S is an extremum point of f .
S the Space Rs
Fix positive integer s, and let S = Rs , the set of s-tuples of real numbers.
Fix any smooth, real-valued function f on S, so f is a function of s real
variables: f (x1 , , xs ). Lets go through the definition in this case.
The point po (our candidate for an extremal point) is now represented as
an s-tuple of real numbers, (x1o , x2o , , xso ). The curve is now represented
as s functions of one variables, (x1 (), x2 (), , xs ()). That this curve have
po as its initial point becomes: x1 (0) = x1o , , xs (0) = xso . The result, g, of
evaluating the function f along the curve is: g() = f (x1 (), , xs ()).
Note that this g is indeed one function of one variable. The condition that
po be an extremum point of f is that dg/d|=0 = 0, for every such curve .

To see what this means, we use the chain rule, to obtain


dg/d|=0 = (f /x1 )|po (x1 /)|=0 + + (f /xs )|po (xs /)|=0 . (1)
Note that the first factor in each term on the right depends only on the
function f (and not at all on the curve ). In fact, each such factor is
merely a partial derivative of f at po . The second factor in each term on
the right depends only on the curve (and not at all on the function f ).
In fact, each such factor is just a component of the tangent vector to the
curve at po . Now, in order that our function f be an extremum at po it
is necessary that the right side of Eqn. (1) vanish for every curve with
initial point po . But we can choose that curve to have any tangent vector,
((x1 /)|=0 , , (xs /)|=0 ) at po that we wish. So, given that we have
complete control (as a result of our freedom to choose curve ) over the second
factors in each term on the right in (1), how is it possible for this right side to
vanish for every curve? Clearly, a necessary and sufficient condition is that
the coefficients on the right, the (f /x1 )|po , , (f /xs )|po , all vanish. In
other words, we have shown: Point po of S is an extremum of function f if
and only if the gradient of f vanishes at po .
Note that, for this particular S, we have solved the extremum problem
for an arbitrary function f on S, in that we have given a simple necessary
and sufficient condition for a point po to be an extremum point.
S a Space of Curves in Rn
First, we must specify S. To this end, fix a closed interval, I = [a, b], so a
and b are real numbers, with a < b. Then I, the boundary of I, consists of
the two endpoints, a, and b. Next, fix a positive integer n, and fix also any
two points of Rn , which we denote (a) and (b). Thus, this is a mapping,

from I into Rn . Finally, let S consist of all curves, I Rn , with (a) = (a)
and (b) = (b). That is, each of these curves is parameterized by t in the
interval I, (i.e., as t varies in I, (t) traces out our curve in Rn ); and is
such that the curve begins (t = a) at the given point (a) of Rn and ends
(t = b)at the given point (b) of Rn . We could represent our curve , then,
by n functions of one variable, (x1 (t), , xn (t)); subject to the requirement
that (x1 (a), , xn (a)) be the point (a) of Rn , and (x1 (b), , xn (b)) be the
point (b) of Rn . In short, here S is the set of all curves in Rn , parameterized
2

by the interval I, and having fixed endpoints, (a) and (b). Do not read on
until you understand thoroughly the previous sentence.
We could have made many other choices for our space S. For example, we
could have defined S to consist of curves, but without the present condition
on the endpoints of those curves. The reason for this particular choice for S
is that it results in an interesting extremum problem. It will turn out that,
for the S introduced above, there will normally be just a couple of extremum
points. Other choices would result, typically, in no extremum points at all.
We must next specify the function f . This f is to be a real-valued function
on S, i.e., it is to assign a real number to each curve, , in our space S of
such curves. We consider the f given by
f () =

L(, d/dt)dt.

(2)

This function requires some explanation. First, we are to be given, once and
for all, some function L, of 2n variables (say, (x1 , , xn ) and (v 1 , , v n )).
Let the curve be represented, as above, by n functions (x1 (t), , xn (t)).
Then the integrand in Eqn. (2) is the function of one variable, given by
L(x1 (t), , xn (t); dx1 (t)/dt, , dxn (t)/dt). In other words, you evaluate L
on the 2n variables describing the position and the velocity of the curve .
This is done for each value of the curve-parameter t, resulting in a function
of t. Eqn (2) instructs that this function of t is to be integrated over the
interval I = [a, b]. The resulting number is f (). Choose a different curve,
say , then the integrand will change, and so the integral will yield a different
number: This is the value of f ( ). Repeating this procedure for all curves
in our space S yields the full function f of such curves.
There are many other possible functions on curves other than those of
Eqn. (2). For example, the integrand could have depended also on the
seventh derivative of the curve; or on no derivatives at all. We have chosen
here a particular function f (or, rather, a class of functions, depending on
what L is chosen) that results in an interesting extremum problem.
We wish to find the extremum points of this function f on this space
S. [Points is in quotes because a point of the space S is actually a certain
curve in Rn .] To this end, we fix a point of S, say curve o . Since this o is
to be in the space S, it must satisfy the required boundary conditions at I:
o (a) = (a), and o (b) = (b). In more detail, this o is represented by n
functions of one variable, (x1o (t), , xno (t)). This o is our candidate for an
3

extremum point of f ; and we want to find out what equation it must satisfy
in order that it actually be such an extremum.
We now proceed as outlined at the beginning.
First, we must introduce a curve, in the space S, with initial point o .
Let be the parameter of this curve, so, for each value of , our curve must
specify some point of S. We may describe our curve by a function, (, t),
of two variables, valued in Rn . Then, given any value of the parameter, say,
o , our curve will specify that point of S represented by the curve (o , t)
in Rn . This (, t), in turn, must satisfy two conditions. First, we demand
that (, a) = (a) and (, b) = (b) for every . This is the demand that
every one of our curves (i.e., that given by each ) satisfy our initial and
final conditions. In other words, this is the demand that each of our curves
actually lie in the space S. Second, we demand that (0, t) = o (t) for all t
in the interval I. This is the demand that the curve labeled by = 0 be the
curve o chosen above. In other words, this is the demand that our curve in
S have the correct initial point. In more detail, our curve in S is represented
by n functions of two variables, (x1 (, t), , xn (, t)). For each fixed value
of , these become n functions of one variable t, i.e., a point of S.
Next, we evaluate the function f of Eqn. (2) along this curve. Note that
this makes sense, for the function f is on the space S; and the curve is in
the space S. The result is
g() =

L(x1 (, t), , xn (, t); x1 (, t)/t, , xn (, t)/t) dt.

(3)

Note that t is integrated away on the right, so the right side is, indeed, a
function solely of the parameter .
The next step is to impose the extremum condition, dg/d|=0 = 0. Since
the -dependence of the right side enters only through the -dependence of
the arguments of L, we must use the chain rule. The result is
dg/d =

I
1

[(L/x1 )(x1 /) + + (L/xn )(xn /)

(4)

+(L/v )( 2 x1 /t) + + (L/v n )( 2 xn /t)] dt.


Note that the derivatives of L in the first line are with respect to the first
n variables of L; in the second line, with respect to the second n variables.
Next, we integrate by parts each term on the second line above. For the first
4

term on the second line, for example, we have


Z

[(L/v 1 )( 2 x1 /t)] dt
1

= [(L/v )(x /)]|t=b [(L/v )(x /)]|t=a


=

[(L/v 1 )/t (x1 /)] dt

[(L/v )/t (x /)] dt.

(5)

In the second step, we used the fact that (, t) is equal to (a) when t = a,
and to (b) when t = b, for all ; with the result that (x1 /) vanishes at
t = a and t = b. Repeating this integration by parts for all n terms in (4),
we obtain, finally,
dg/d =

[(L/x1 )(x1 /) + + (L/xn )(xn /)

(6)

(L/v 1 )/t (x1 /) (L/v n )/t (xn /)] dt.


Finally, we evaluate at = 0, i.e., at our initial curve o . On rearranging
terms, there results
dg/d|=0 =

[(L/x1 d/dt(L/v 1 ))|o (x1 /)|=0

(7)

+ + (L/xn d/dt(L/v n ))|o (xn /)|=0 ] dt.


This equation is analogous to Eqn. (1). It expresses dg/d|=0 as an integral
of a sum of terms. The first factor in each term depends only on the function
f (well, actually, only on the function L that gives rise, via Eqn. (2), to f ),
and not at all on our curve in S. The second factor in each term depends
only on our curve in S (i.e., only on the x1 (, t), , xn (, t)), and not at all
on our function f .
So, we have arrived at our final equation, (7). We now demand that
dg/d|=0 vanish for every curve. But how can the right side of (7) vanish
for every curve, when we have the ability to choose the second factor in each
term, the x1 /|=0 , etc. to be any function of t we wish, merely by our
choice of curve? The only way this can happen is if the coefficients of these
factors vanish. We conclude: Curve o , given by (x1o (t), , xno (t)), is an
extremal, among curves in S, for the function given by Eqn. (2) if and only
if it satisfies the following:
(L/x1 d/dt(L/v 1 ))|o = 0,
5


(L/xn d/dt(L/v n ))|o = 0.

(8)

You might also like