MIT Calculus Revisited: Multivariate Calculus Notes

Calculus Revisited: Multivariable Calculus
As given by Herbert Gross, MIT

Notes by Aleksandar Petrov
March 2015
Contents
1 Vector Arithmetic
1.1 The Game of Mathematics . . . .
1.2 The Structure of Vector Arithmetic .
1.3 Applications to 3-Dimensional Space
1.4 The Dot Product . . . . . . . . . . .
1.5 The Cross Product . . . . . . . . . .
1.6 Equations of Lines and Planes . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
2
2
3
5
6
2 Vector Calculus
2.1 Vector Functions of a Scalar Variable
2.2 Tangential and Normal Vectors . . .
2.3 Polar Coordinates . . . . . . . . . .
2.4 Vectors in Polar Coordinates . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
8
8
9
10
11
.
.
.
.
.
13
13
14
16
18
19
3 Partial Derivatives
3.1 n-Dimensional Vector Spaces . . . . .
3.2 An Introduction to Partial Derivatives
3.3 Differentiability and the Gradient . . .
3.4 The Chain Rule . . . . . . . . . . . . .
3.5 Exact Differentials . . . . . . . . . . .
4 Matrix Algebra
4.1 Linearity Revisited . . . . . . .
4.2 Introduction to Matrix Algebra
4.3 Inverting a Matrix . . . . . . .
4.4 Maxima and Minima in Several
.
.
.
.
.
.
.
.
.
.
. . . . . .
. . . . . .
. . . . . .
Variables
5 Multiple Integration
5.1 The Fundamental Theorem . . . . . .
5.2 Multiple Integration and the Jacobian
5.3 Line Integrals . . . . . . . . . . . . . .
5.4 Greens Theorem . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
21
21
22
23
25
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
26
26
26
27
27
Chapter 1
Vector Arithmetic
1.1
The Game of Mathematics
We can define a game to be any system consisting of definitions, rules, and

objectives, where the objectives are carried out as inescapable consequences
of the definitions and the rules by means of strategy. Not all definitions can be
clearly set. For example, no definition of number can be given. That means
that some of the definitions are subjective. However, using only specific
objective facts about these concepts (the rules) allows us to draw inescapable
conclusions.
It is important to distinguish between truth and validity. Truth is subjective and can change with the time. Validity means that the conclusion is
inescapable result of the definitions and the rules. Simply put, an argument
is valid when it follows logically from the definitions and the rules. If our
premises are true and our argument is valid, then the conclusions are also
going to be true. However, our conclusion may also be true if the premises
are not true or the arguments are not valid. Mathematics deal with the argumentation part of this problem. It draws valid conclusions. However, it is
not necessary that they are true. That will be the case only if the premises
are true. To conclude, one can be sure that a conclusion is true only if the
assumptions (definitions and rules) are true and the argumentation is valid.
That allows us to put a line between pure and applied mathematics.
Applied mathematics deal with problems that have definitions and rules that
we believe to be reality. Pure mathematics focus on the consistency of the
rules and the validity of the argument. However these rules need not be true.
An example could be Lobachevskys geometry which was pure math as it
did not correspond to any physical truths known back then until Einstein
noticed that it served as a realistic model for his theory of relativity.
CHAPTER 1. VECTOR ARITHMETIC
To show that an argument is invalid, all we need do is give one set of

conditions in which the assumptions are obeyed but the conclusion is false.
On the other hand, proving that something is true is rather difficult: one has
to find a way to show that the statement is always true.
1.2
The Structure of Vector Arithmetic
An important fact to keep in mind is that the operations in vector arithmetic

are result of definitions, not nature. We define a vector to be an object that
has a magnitude and direction. A more modern approach will define a vector
as an ordered sequence of numbers. A vector has magnitude (length), direction (orientation) and sense (each direction has two possible senses). The
mathematical concept of a vector is geometrically represented by an arrow.
Furthermore, as a vector is defined solely by its magnitude, direction and
sense, two vectors can be equal even is they do not have the same starting
and ending point. The equality of vectors, the zero vector, the summation
and subtraction of vectors are all operations that are defined in such a way
that they are easy and useful. Beware in mind that the mathematical structure of vector arithmetic is different from the structure of scalar arithemtic.
For example, we talk about summation of vectors but this operation is not
the same operation as summation of scalars. Furthermore, multiplication of
vectors is ambiguous while multiplication of scalars is clearly defined.
Some of the properties that vectors have are:
~a + ~b = ~b + ~a
~a + ~b = ~c ~a = ~c ~b
~a + (~b + ~c) = (~a + ~b) + ~c
c(~a + ~b) = c~a + c~b
~a + ~0 = ~a
(c + d)~a = c~a + d~a
Keep in mind that these properties were defined. They are not intrinsic
to all mathematical structures. For example if you subtract set B from AB
you will not get set A (apart from the case when A and B have no common
elements).
1.3
Applications to 3-Dimensional Space
When talking about three dimensional vectors we do actually mean vectors

with three coordinates. Of course, a vector is geometrically represented by
an arrow and an arrow is a two dimensional element. One extremely useful
property (or if you wish definition) of vectors is that the components of a
vector connecting the origin of a Cartesian coordinate system with a point

are the same as the coordinates of the point. An important note about the
mathematical structure of vectors is that the recipes stay the same no matter of the dimensionality of the vector. That means that the property stated
above is true for two-, three-, four- and n- dimensional vectors. Furthermore,
the definition of magnitude of a vector as the square root of the squares of
its components is also true for any n-dimensional vector provided that the
components are given in a Cartesian coordinate system.
But why do we use Cartesian coordinate system? In Cartesian system
~
~a + b = (a1 +b1 , a2 +b2 ). However in polar system ~a +~b 6= (ra +rb , a +b ). The
vector properties do not change with different coordinate systems. However,
the convenient methods that we mentioned above are result of properties of
the coordinate system. As a result it is suggested to use Cartesian coordinate
system as often as possible. A very important consideration, though, is that
as the vector properties do not change with different coordinate systems if a
given property is proven to hold in one coordinate system then it is a vector
property that always works in all coordinate systems.
1.4
The Dot Product
Lets start with a physical motivation for the dot product. We know that
work is the product of a path and the component of a force in the direction
~ B|
~ cos()
of the path. That can be written in vector notation as W = |A||
~ and the path B.
~ Now, we
where is the angle between the force vector A
define this to be equal to the dot product of two vectors:
~B
~ = |A||
~ B|
~ cos()
A
(1.1)
As finding the angle between the two vectors is often a pretty hard task,
~
A
~B
~
A
~
B
Figure 1.1:
lets try to get rid of the cosine. As can be seen from Figure 1.1 using the
~
A
~ cos
|A|
~
B
Figure 1.2: Vector projection

cosine theorem we get the following:
~ B|
~ 2 = |A|
~ 2 + |B|
~ 2 2|A||
~ B|
~ cos()
|A
~2
~ 2
~ ~ 2
~ B|
~ cos() = |A| + |B| |A B|
|A||
2
~ 2 + |B|
~ 2 |A
~ B|
~ 2
|
A|
~B
~ =
A
2
(1.2)
Note that this result is not dependable on the coordinate system in use. Only
for Cartesian coordinate system this simplifies to
~B
~ = a1 b 1 + a2 b 2 + a3 b 3
A
(1.3)
Projections Lets take a look at Figure 1.2. We can see that the projection
~ on B
~ looks like a dot product but with missing magnitude of another
of A
~ on B
~ does
vector. Furthermore, note that the length of the projection of A
~ We define a unit vector ~uB to have the
not depend on the magnitude of B.
~
same direction and sense as B but to be with magnitude of unity.
~uB =
~
B
~
|B|
(1.4)
~ cos = |~uB ||A|

~ cos . As a result,
Then as |~uB | = 1, |A|
~ = ~uB A
~
P rojB A
Structural properties
~a ~b = ~b ~a
~a (~b + ~c) = ~a ~b + ~a ~c
(1.5)
~ B
~ = c(A
~ B)
~
(cA)
This may seem trivial but one has to always keep in mind which operations
~B
~ =0
and conclusions are applicable in which situations. For example A
~ or B
~ is zero. It could be the case that they are
does not mean that A
orthogonal vectors and their cosine is zero.
1.5
The Cross Product
Although the cross product has vast applications in the physical sciences,
our focus will be on the geometry. The cross product is also called the vector
product because the result of it is a vector (contrary to the dot product that
is a scalar). As a result we need to define three parameters of a the cross
~ B
~ the magnitude is defined
product: magnitude, direction and sense. For A
~
~
~ and B
~ (that is
to be |A||B| sin , the direction is perpendicular to both A
perpendicular to the plane defined by the two vectors) and sense comes from
the right hand rule: going from the first vector to the second through the
smaller angle.
~B
~
As a result from the definition of the sense of the cross product A
~
~
is not equal to B A. They have the same magnitude and direction but
opposite sense, thus:
~B
~ = B
~ A
~
A
(1.6)
~ (B
~ C).
~ What is the direction of this vector? It
Now, lets consider A
should be perpendicular to a vector that is perpendicular to the plane defined
~ and C.
~ That means that A
~ (B
~ C)
~ is in fact parallel to the plane
by B
~ and C.
~ However, there this vector is not equal to (A
~ B)
~ C
~
containing B
for the simple reason that each of them is parallel to one of two non-parallel
planes.
Finally for cross product the following holds:
~ (B
~ + C)
~ =A
~B
~ +A
~C
~
A
(1.7)
The cross product of two vectors can be found through direct multiplication
of their components (keep in mind the sign of the vector products of unit
vectors) or through the determinant method.
An interesting conclusion is that the magnitude of the cross product
equals the area of the parallelogram which is enclosed by the two vectors.
1.6
Equations of Lines and Planes
Planes are to surfaces what lines are to curves. In the Calculus of single
variable the topic of tangent line comes up pretty frequently. When doing
Calculus of two variables we will use the concept of a tangent plane.
Lets start with the derivation of the equation of a plane. There are
several ways to define a plane but for this discussion it is useful to define
it with a point that the plane passes through and the normal vector to the
plane. We call the fixed point P0 (x0 , y0 , z0 ) and the normal vector will be
~ = (a, b, c). We want to find an equation for the components of any point
N
P (x, y, z) lying in the plane. Now, a smart way to approach this problem is
to see that the vector P P0 = (x x0 , y y0 , z z0 ) is on the plane and thus
~
is perpendicular to N . That means that N
P P0 = 0. As a result:
~
N
P P0 = 0
(a, b, c) (x x0 , y y0 , z z0 ) = 0
a(x x0 ) + b(y y0 ) + c(z z0 ) = 0
(1.8)
Now, several things can be observed from this equation. First, this is the
equation for a plane that has a normal vector (a, b, c) and passes through
point (x0 , y0 , z0 ). Second, a plane can be expressed with an infinite amount
of different equations of this kind. This can be easlily deduced from the fact
that this equation can be derived for any point P0 on the plane. The normal
vector would be the same and only the values of x0 , y0 and z0 will change.
Third, if we change the coordinates of P0 with those of a point that do not lie
on the original plane, will get an equation for another plane that is parallel to
the original one. This is because we keep the normal vector (the component
that determines the orientation of the plane) and change only its position in
space.
Finally, there are two important things to note. The equation of a plane
is linear. Furthermore, it has two degrees of freedom. That means that we
have to fix two of the variables so that we can calculate the others.
Next, lets shift our focus to the equation of a line. We choose to fix a line
based on a point P0 (x0 , y0 , z0 ) that it passes through and a vector parallel
to the line (giving its direction) ~v = (a, b, c). Just as we did in the case of a
plane, we want to find equation for any arbitrary point P (x, y, z) on the line.
Then, the vector P P0 = (x x0 , y y0 , z z0 ) is on the line as it connects

two points that are of the line. As the line is parallel to ~v that means that
P P0 is a scalar multiple of ~v . As a result:
t~v = P P0
t(a, b, c) = (x x0 , y y0 , z z0 )
t a = x x0 ,
x x0
y y0
z z0
t b = y y0 ,
=
=
a
b
c
t c = z z0
(1.9)
One can observe that the components of the position vector P P0 are proportional to the components of the direction vector ~v with the same constant
of proportionality t. Furthermore, the equation of a line has one degree of
freedom. If we fix one of the three coordinates we can easily find the other
two.
A very important point to stress on is that the three parts of the equation
define a line together. If you use only two parts you will get an equation of a
plane (although you will have only two variables). This can be understood
if the deference between the following two sets is understood:
{(x, y) : 4y 3x = 17}
{(x, y, z) : 4y 3x = 17}
In the second case, which is our case, z is free to take any value. However, if
you want to define a line, the three variables should be constrained.
Chapter 2
Vector Calculus
2.1
Vector Functions of a Scalar Variable
Functions can be divided into four types. They can have a scalar or a vector
as an input and a scalar or a vector as an output - four different combinations
in total. Single variable Calculus deals only with the case of scalar input and
output. In this section we will discuss functions that have a scalar input and
a vector output.
What is suggested by Mr. Gross is that if there is direct correspondence
between the definitions and the rules of scalar limits and vector limits, then
all the consequences coming from scalar limits, that use only accepted for
vector arithmetic rules, will be also true for vectors. That means that if we
define limit of a vector in a way that is analogous to the limit of a scalar and
we use only rules (operations) that are defined both for scalars and vectors
then vectorial limits and derivatives should be the same as the scalar ones
but with the appropriate variables vectorized.
Following this strategy the following conclusions can be drawn:
~ means that given any > 0 we can find > 0 such
limxa f~(x) = L
~ <
that whenever 0 < |x a| < then |f~(x) L|
#
"
~(x + x) f~(x)
f
f~0 (x) = limx0
x
if ~h(x) = f~(x) + ~g (x), then ~h0 (x) = f~0 (x) + ~g 0 (x)
d
[f (x)~g (x)]
dx
d ~
[f (x)
dx
= f (x)~g 0 (x) + f 0 (x)~g (x)
~g (x)] = f~(x) ~g 0 (x) + f~0 (x) ~g (x)

8
CHAPTER 2. VECTOR CALCULUS
2.2
d ~
[f (x)
dx
~g (x)] = [f~(x) ~g 0 (x)] + [f~0 (x) ~g (x)]
Tangential and Normal Vectors
Curves in planes or space have shapes that are independent from the coordinate system or the parametrization. Thus, it makes sense to try and express
them and their properties solely from their shape rather than external coordinate system. For this reason local coordinates shall be used. We call
these the tangential, normal and binormal vectors. That is what we will try
to do here.
Let ~r(t) be the position vector as a function of the parameter t. The
derivative of the position vector (the so-called velocity vector) is always tangent to the curve. Then the tangent vector T~ should be just a unit vector in
the direction of d~r/dt:
d~r
d~r
d~r
T~ = dt = dt =
ds
ds
d~r
dt
dt
(2.1)
This result comes naturally as d~r is in the tangent direction. Furthermore,

as we are talking about infinitesimal quantities, the magnitude of d~r and ds
is the same. Here ds is an arc length differential. Thus ds/dt has a physical
representation of speed. So we divide the velocity vector d~r/dt by the speed
ds/dt and it makes sense that the result is a unit vector in the direction of
the velocity (velocity is always tangential to the path). This is a beautiful
result as it does not depend on any coordinate system or parametrization.
However, it is quite impractical in real life as one is rarely given s(t). That
is why usually the unit tangent vector is calculated from d~r/dt divided by
its magnitude.
~ . Intuitively, it should be in
Next, lets derive the unit normal vector N
the direction of the derivative of the unit tangent vector. Why? Because the
magnitude of a unit vector is always one, thus no change in this direction is
possible. Moreover, as the only change of the unit tangent vector can be in
its direction, then its derivative has to be perpendicular to the unit tangent
vector itself. We also define the normal vector to be perpendicular to the
tangent vector. So:
dT~
~ = dt
(2.2)
N
dT~

dt
10
Now, we can prove this very same conclusion with more rigor. First, consider
any function ~r(t) such that |~r(t)| = c. Then, the dot product of ~r(t) with
itself will equal:
~r(t) ~r(t) = |~r(t)||~r(t)| cos 0 = |~r(t)|2 = c2
Lets take the derivative of this expression:
dc2
d
[~r(t) ~r(t)] =
=0
dt
dt
However recall from the previous section that:
d
[~r(x) ~r(x)] = ~r(x) ~r0 (x) + ~r0 (x) ~r(x) = 2~r(x) ~r0 (x)
dx
Combining the two expressions we get:
2~r(x) ~r0 (x) = 0
This proves that the derivative of a vector with constant magnitude is always
perpendicular to the original vector. In our discussion the magnitude of T~
is always one, so its derivative is always orthogonal to it. Thus T~ 0 is in the
normal direction. The only thing left is to make sure that its length is one,
so we divide it by the magnitude. In this way we get Equation 2.2.
As the definitions of the tangent and the normal unit vectors do not depend on the coordinate system, they also hold in three dimensions. However,
when we deal with space curves we can also define a third unit vector that
is normal to the oscillating plane - the plane defined by the unit normal and
tangent vectors:
~ = T~ N
~
B
(2.3)
~ the binormal vector.
We call B
2.3
Polar Coordinates
Polar coordinates are another way of representing coordinates in a plane. A

point P is defined by its distance from the origin r and angle between the
line connecting P with the origin and the horizontal axis. One can easily go
from polar coordinates to Cartesian or the other way around:
p
x = r cos
r = x2 + y 2
y = r sin
= arctan(y/x)
11
r
~u
~ur
~r
Figure 2.1:
A complication that arises when using polar coordinates is that a one point
can have many representations. Recall that in Cartesian coordinate system
each point has a set of coordinates and no point that has other coordinates
can be the same point. However, this is not the case with polar coordinates.
For example we can have the following two cases in which one point can be
represented by different sets of coordinates:
(r, ) = (r, + 2k)
(r, ) = (r, + )
An extremely important observation is that a point is to satisfy an equation,
not its representation. There can be a case in which a representation does
not satisfy the equation but the point satisfies it because there is another
representation that fits the equation. An example can be the following equa) clearly does not satisfy the equation as r
tion: r = sin2 . Point P ( 14 , 7
6
cannot be negative. However the very same point P can be represented by
( 41 , 6 ) which satisfies the equation r = sin2 .
2.4
Vectors in Polar Coordinates
In order to use vectors in Polar Coordinates we define two new unit vectors ~ur and ~u . ~ur is a unit vector in the direction of increasing r and ~u is positive
90 degree rotation of it. This can be seen in Figure 2.1. The position of a
point is defined by a position vector ~r = r~ur . It can be easily found that:
~ur = cos ~i + sin ~j
(2.4)
12
~u is a positive 90 degree rotation of ~ur so:

d~ur
~u = cos(90 + )~i + sin(90 + )~j = sin ~i + cos ~j =
d
(2.5)
In fact, it turns out that each differentiation of a unit vector gives a normal
vector that is rotated positive 90 degrees from the original one. Thus differentiating ~u will give a unit vector in the same direction as ~ur but with
opposite sense.
Another important thing to note is that the velocity vector expressed in
polar coordinates will generally have components both in ~ur and ~u . This is
because neither of the two polar unit vectors is always tangent to the path.
Furthermore, straightforward differentiation of the position vector will give
the velocity vector and differentiation of the velocity vector will give the
acceleration vector.
The instantaneous velocity ~v is obtained from taking the time derivative
of the position vector.
~v =
dr
d~ur
d~r
= ~ur + r
dt
dt
dt
Now it can be seen from Equation 2.5 that

~v =
d
d~ur
= ~u . Thus,
dt
dt
dr
d
~ur + r ~u
dt
dt
(2.6)
If we differentiate Equation 2.6 with respect to time we can obtain the instantaneous acceleration:
dr d~ur dr d
d2
d d~u
d2 r
~
u
+
+
~
u
+
r
~u + r
r
2
2
dt
dt dt
dt dt
dt
dt dt
d2 r
dr d~ur d dr d
d2
d d~u d
~a = 2 ~ur +
+
~u + r 2 ~u + r
dt
dt d dt
dt dt
dt
dt d dt
d2 r
dr d
dr d
d2
d d
~u +
~u + r 2 ~u + r
(~ur )
~a = 2 ~ur +
dt
dt dt
dt dt
dt
dt dt
2
dr d
d2
d
d2 r
~a = 2 ~ur + 2
~u + r 2 ~u r
~ur
dt
dt dt
dt
dt
"
2 #

d2 r
d
dr d
d2
~a =
r
~ur + 2
+ r 2 ~u
dt2
dt
dt dt
dt
~a =
(2.7)
Chapter 3
Partial Derivatives
3.1
n-Dimensional Vector Spaces
In the last section we discussed the case with the function box having a scalar
as an input and a vector as an output. Now we will consider the opposite
idea: vector input and scalar output. This type of functions are called scalar
functions of vector variables.
Although till now we used vectors and arrows interchangeably vectors do
not need to be arrows. Consider the following function:
V (r, h) = r2 h
This is a function that gives the volume of a cylinder with radius r and height
h. The arrow representation of the input (r, h) has no physical meaning.
That is why it is more natural to view this input not as an arrow but as an
ordered 2-tuple. Furthermore, as we do not link vectors with arrows anymore
the notation x shall be used for denoting n-tuples.
Now, as we have outgrown the graphical representation of a vector we
can talk about vectors that have more than three components. It makes
perfect sense for 4-tuples, 5-tuples and n-tuples to exist. Furthermore as we
have liberated ourselves from the constraints of the physical space, space
coordinates like (x, y, z) do not make much sense anymore. That is why an
n-tuple is defined as:
x = (x1 , x2 , x3 , . . . , xn )
Lets talk about the mathematical structure of vectors. We have already
defined n-tuples. However, they are useless without any operations that we
can do with them. We need to empower them, give them special abilities.
The insight here is that only when our set (the n-tuples) is endowed with the
structure of equality, summation and scalar multiplication, we can call the
13
CHAPTER 3. PARTIAL DERIVATIVES
14
resultant structure a n-dimensional vector space. What is to be remembered

is that the n-tuples together with the structure (equality, summation and
scalar multiplication) are called vector space, not the n-tuples alone. This
structure is easily defined:
1. If a = (a1 , a2 , . . . , an ) and b = (b1 , b2 , . . . , bn ), then a = b means that
a1 = b1 , a2 = b2 ,. . . ,an = bn .
2. a = (a1 , a2 , . . . , an ) and b = (b1 , b2 , . . . , bn ), then a + b = (a1 + b1 , a2 +
b2 , . . . , an + bn ).
3. If c is any scalar, scalar multiplication is defined as c(a1 , a2 , . . . , an ) =
(ca1 , ca2 , . . . , can ).
We should also note that the length of a n-tuple can be found by:
q
||x|| = x21 + x32 + . . . + x2n
(3.1)
Furthermore, dot product and its properties are also applicable to n-tuples.
Finding limits is a quite tricky as in 2,3,4,n-dimensional space you can approach a point from an infinite number of directions. A limit exists only if the
limits along all paths are the same. So, one need to prove that all the paths
(infinite amount) approach the same limit. The epsilon-delta limit proof can
be used in n-dimensions to solve this issue. An important consequence is
that a function is continuous at a point if its limit exists for this point and
its value is the value of the function at this point.
3.2
An Introduction to Partial Derivatives
We cant take the derivative of functions of multiple variables in the same

fashion as we do with function of a single variables because we know how
to take the derivate with respect to only one variable. However, there is a
go-around solution for functions of several variables. This is let all variables
but one be fixed, treat them as constants and take the derivative with respect
to the variable that we left unfixed. Note, that in order for this method to
work we need to take the derivative with respect to an independent variable.
Most of the usual derivative properties still hold. However, we cannot treat
differentials as fractions anymore. At least not in all cases.
1
x
u
6=
x
u
15
We can do this only in the case when in both derivatives the same variables
are held constant:

1
u
x
=
x y
u y
Derivatives of functions of multiple variables can be taken in an infinite
amount of directions. Partial derivative are only a few of these. However they
are very representative. A partial derivative with respect to one variable gives
the slope of a function in the direction in which all but that variable are held
constant.
If we narrow our discussion to functions of two variables we can obtain
some intuition and valuable results. If we have a function w(x, y) then for
each point in the xy plane for which the function is defined there will exist a
value w. This can be depicted graphically as a surface in three dimensions.
Now, the partial derivative at some point P with respect to x will give a slice
of the function perpendicular to the xw plane and passing through P (x0 , y0 ).
If we vectorize these derivatives as will get vectors that are tangent to the
surface at point P . In order to do this remember that the derivative is the
change in the function (w) for unit length of the variable (x or y). We dont
care about the magnitude of the vector so we can just take one in ~i (or ~j)
and the value of derivative in ~k. Note that, the the third component will be
zero.
w
~k
V~1 = ~j +

y (x0 ,y0 )
w
~k
V~2 = ~i +

x (x0 ,y0 )
The normal vector to the surface at point P can be found from the cross
product of the two tangent vectors:
w
w
~
~
~
~
~j ~k
N = V1 V2 =
i+
(3.2)

x (x0 ,y0 )
y (x0 ,y0 )
Then, as was shown in Section 1.6, the equation of the tangent plane with
~ at point P (x0 , y0 ) is:
normal vector N
w
w
(x x0 ) +
(y y0 ) (w w0 ) = 0
(3.3)

x (x0 ,y0 )
y (x0 ,y0 )
From here, the change in w on the tangent plane as a function of the cahnge
in x and y is:
w
w
x +
y
(3.4)
wtan =

x (x0 ,y0 )
y (x0 ,y0 )
Note that this equation holds for the tangent plane but is only an approximation to the function itself.
3.3
16
Differentiability and the Gradient
Lets continue our two dimensional discussion. Why should we restrict ourselves to derivatives only in the x and y directions? It makes perfect sense to
talk about a derivative in direction s at (a, b) fs (a, b) or dw/ds. Note that we
are using dw/ds instead of w/s. That is because when we talk about an
arbitrary path no variable is held constant. Or rather, x and y are no longer
independent. They are linked through the equation of the line s. Now, how
do we find how much dw/ds is? We can start with the definition of limit:
fs (a, b) =
w
dw
= lim
s0 s
ds
Recall from the previous section that

wtan = fx (a, b)x + fy (a, b)y
Then dividing both sides by s we get
x
y
wtan
= fx (a, b)
+ fy (a, b)
s
s
s
If we let s 0
fs (a, b) =
dw
dx
dy
= fx (a, b) + fy (a, b)
ds
ds
ds
One can recognize this as the dot product of two vectors:

f = (fx (a, b), fy (a, b))

dx dy
~us =
,
ds ds
Note that we call the second vector a unit vector.p Why is that? One can
easily see that its magnitude is always one as ds = dx2 + dy 2 . Additionally
the first vector is the gradient of f . Now rewriting the equation for fs (a, b)
as a dot product we get:
fs (a, b) = f ~us
(3.5)
An interesting observation is that the maximum possible directional derivative occurs when ~us is parallel to f . In this case the direction of s is the
same as the direction of f . Thus the directional derivative is maximum in
the direction of the gradient. Keep in mind that the definition of the gradient
is not dependent on a coordinate system. However in Cartesian coordinates
17
f = fx (a, b)~i+fy (a, b)~j. For example the gradient vector expressed in polar
coordinates is:
1 w
w
~ur +
~u
(3.6)
f =
r
r
These conclusions seem very nice but if you recall, we derived all these
results only after we restricted ourselves to 2-dimensional vector space. Now,
is it possible to scale the idea of differentiation to vectors of more than two
variables? Seems reasonable. Lets first see how will the limit definition of
the derivative look:
f (x + x) f (x)
x0
x
f 0 (x) = lim
(3.7)
This looks good on first sight. However, if one looks closely they will see
that the numerator of this fraction is a real number while the denominator
is a vector. Wait, have we defined how we divide scalar by vectors? Not yet.
Lets first see how we define division of scalars. The number xc is such that
when multiplied with x equals c. This definition also gives reason why it is
impossible to divide by zero. If one tried to do so, they would get that 0c
multiplied by zero equals c. However there is no number that multiplied by
zero will give anything different than zero. Now, if we go back to our problem
with dividing scalars by vectors we can use the very same definition. That is
c
is a vector that multiplied with x equals c. What kind of multiplication is
x
this? Obviously should be the dot product as the result of the multiplication
has to be a scalar. Notice that we said the number for the first case and
a vector for the second. The reason for this is that while there is only one
number that equals a division of scalars, there is an infinite amount of vectors
that equal the division of a scalar by a vector. Now, for sake of simplicity
and as we would like to get only one derivative out of the differentiation, we
will reduce the possible answers to only one:
c
c
=
v
v
kvk2
(3.8)
Now lets rewrite a bit Equation 3.7. First of all, note that x is a
vector. What do we mean by x 0? We mean that its direction is kept
constant while the magnitude approaches zero. Then we can substitute x
by tu where u is a unit vector in the direction of x and t is a positive
real number. It is obvious that if t 0, then x 0. If we make this
18
substitution in Equation 3.7 and use Equation 3.8 we get:

f (x + tu) f (x)
t0
tu
f (x + tu) f (x)
tu
f 0 (x) = lim
t0
ktuk2
f (x + tu) f (x)
f 0 (x) = lim
tu
t0
t2 kuk2
f (x + tu) f (x)
u
f 0 (x) = lim
t0
t
f 0 (x) = lim
(3.9)
Here, Equation 3.9 represents the instantaneous rate of change in the direction of u. This can be also denoted as the directional derivative fu (x).
Recall that we did not fix any constraints on x thus this result holds for any
function of an n-tuple.
Now we can define what differentiability is. A function f (x) is differentiable at x = a if and only if fu (a) exists in every direction u. That means
that the existence of Equation 3.9 should be independent from the direction
u. We can also define what a smooth surface is: a smooth surface is a surface for which the directional derivative exists in each direction at a point.
Finally, another definition we can make is for the derivative of f (x). That
is defined to be the directional derivative of f at a which has the greatest
magnitude.
3.4
The Chain Rule
The Chain Rule allows one to link a function to the functions that determine
its variables. Just as an illustration cosider the follwoing case:
w = f (x, y)
x = g(r, s)
y = h(r, s)
It is easy to see that w can be expressed as a function of r and s. Then one

can find the partial derivative with respect to r or s. However, it the Chain
Rule allows us to do this without substituting variables:
w
w x w y
=
+
r
x r
y r
Now, this holds only if the functions are continuously differentiable. Furthermore, it is not allowed to cross out the x-es and the y-s. This would lead
to an expression of the type 1 = 2. The reason for this that the different
19
partial derivatives are taken with different variables being held fixed. This
can be illustrated as:

w
w
x
w
y
=
+
r
x y r s
y x r s
The Chain Rule works for the general kind of functions whose parameters
are functions of other parameters. This nesting can continue even further.
The Chain Rule also holds for higher order derivatives. The logic behind
this is that if w/x is a partial derivative of w which is a function of both
x and y then in the general case w/x is also a function of both x and y.
If w/x is a continuous function then it can be differentiated again.
In most cases fxy = fyx . However this is not always the case.
Theorem: If f , fx , fy , fx y exist and are continuous in the neighborhood of the point (a, b), then fy x also exist at (a, b) and in
fact fy x(a, b) = fx y(a, b) =
Even if f , fx and fy exist and are continuous, it is possible that fx y and fy x
are not continuous.
3.5
Exact Differentials
Although that for illustration purposes we will use an example with a function
w = f (x, y), the principles are the same for functions of more than two
variables.
Recall that
w = fx x + fy y
If w is differentiable we can make x and y into the infinitesimal dx and
dy:
dw = fx dx + fy dy
(3.10)
This is the equation for the total differential of w. Any expression of the
form M (x, y)dx + N (x, y)dy is called a differential.
If we want to get back to w from the differential we need to integrate
with respect to the first variable while keeping in mind that we will have a
constant function that depends on the other variables or a constant. Then we
differentiate with respect to the second variable and establish the constant
of integration. Repeat this for all variables. The last constant should be a
number. We call a differential exact if this method is able to find a function
w whose partial derivatives form the differential. If such a function does not
exist, then the differential is inexact. Finally, it turns out that for functions
20
of two variables fxy = fyx . The opposite is also true: if fxy = fyx , then the
differential is exact.
Chapter 4
Matrix Algebra
4.1
Linearity Revisited
Linear functions are simple and nice to work with. One property they
share is that all linear functions have an inverse function. Unfortunately,
most functions are non-linear. However most functions are locally linear.
By this we mean that provided the function f is differentiable at x = a, then
f f 0 (a)x near x = a. In other words, if f is continuously differentiable
at x = a then locally (near x = a) f (x) = f (a)+f 0 (a)(xa). This is also true
for functions of multiple variables. If w = f (x1 , . . . , xn ) and f is continuously
differentiable at x = a, then:
wlin = fx1 (a)x1 + . . . + fxn (a)xn
(4.1)
This motivates the use of linear systems:

a11 x1 + . . . + a1n xn = b1
..
.
a1m x1 + . . . + amn xn = bm
Lets start with the definition of a matrix: by an m by n matrix we mean
a rectangle array of numbers arranged in m rows and n columns. Now, one
can put the n coefficients of m linear equations in a matrix. It turns out
that the chain rule motivates the definition of matrix multiplication. We
can multiply two matrices if the number of columns of the first one equals
the number of rows of the second. The dot the i-th row of the first matrix
with the j-th column of the second to obtain the term in the i-th row, j-th
column of the product. This can be used to change the variable of the linear
equations. Namely, if the first matrix gives y1 , y2 , y3 as functions of x1 , x2
and the second one gives x1 , x2 as functions of q1 , q2 , q3 , q4 , then the product
of the two matrices will give us y1 , y2 , y3 as functions of q1 , q2 , q3 , q4 .
21
CHAPTER 4. MATRIX ALGEBRA
4.2
22
Introduction to Matrix Algebra
We defined what matrices are but without defining their structure, they
are pretty useless. Now, lets start with equating matrices. Any two m n
matrices (with the same dimensions) are equal if they are equal term-by-term.
This is they are equal if [aij ] = [bij ]. Next, the sum of two m n matrices
equals to the matrix that is obtained by the term-by-term summation: [cij ] =
[aij ] + [bij ]. The same situation holds for scalar multiplication: it is term-byterm multiplication with the scalar. For all these definitions the sizes of the
matrices do not matter, as long as they have the same size.
If we want to define multiplication of matrices though this wont be the
case. Of course, we can define the multiplication of matrices to be termby-term, then we will be able to do it with any size matrices and will be
absolutely feasible abstract mathematics definition. However, it would have
no physical application. That is why we define multiplication of matrices
as dotting the i-th row of the first matrix with the j-th column of the
second to obtain the term in the i-th row, j-th column of the product. One
consequence from this is that the order of the matrices does matter. Thus,
generally AB 6= BA. Of course, there are cases when this is true, but
generally you get different result if you switch the matrices.
Some other properties also follow:
1. A + B = B + A
2. A + (B + C) = (A + B) + C
0 ... 0
.. , then A + 0 = A
3. If 0 = ...
.
0 ... 0
4. A = [aij ]
5. A(BC) = (AB)C
6. A(B + C) = AB + AC
1 0 ... 0
.. , then AI = I A = A
7. If In = ... ...
n
n
.
0 0 ... 1
The last result is pretty important. The identity matrix In is a n n matrix
that has ones in the major (top left to bottom right) diagonal and all the
other values are zeros. It comes from our definition of multiplication that
23
this result is true. Note that although generally AB 6= BA in this case

AIn = In A.
The inverse A1 of a matrix A is another matrix that multiplied by the
original matrix give the identity matrix. A very important fact is that A1
need not exist. An interesting link to the systems of linear equations is that
if A1 does not exist, then for some reason we cannot invert the system of
linear equations. The matrices for which A1 exists are called non-singular
matrices.
We can prove that if AB = AC for non-singular A then B = C. First,
take a look at the following equation where a, b and c are real numbers:
ab = ac
It is clear that unless a = 0 b = c. But why is that? If we multiply both
sides of the equation by the inverse of a, 1/a then we get b = c. The very
same train of thought can be applied to matrices:
AB
A AB
In B
B
1
= AC
= A1 AC
= In C
=C
Keep in mind that in this derivation we assumed that A1 exists. If it did

not exist that would not be possible. So, if AB = AC then B = C if A
is non-singular. If A is singular then the case could be that AB = AC yet
A 6= 0 and B 6= C. When we talk about A1 we infer that A is a square
matrix. Non-square matrices do not have inverses.
Finally, if A is any matrix, we define the transpose of A, written AT , to
be the matrix obtained when we interchange the rows and columns of A.
That is, the columns of A are the rows of AT .
4.3
Inverting a Matrix
If we have
y1 = ax1 + bx2 + cx3
y2 = dx1 + ex2 + f x3
y3 = gx1 + hx2 + ix3
we can rewrite is as

y1
y2 =
y3
a b c
d e f
g h i
24

x1
x2
x3
And this can be further rewritten as

Y = AX
(4.2)
Now, if one want to solve for X, meaning find x1 , x2 and x3 , we can rearrange
the equation provided that the inverse of A exists. A exists if the linear
equations are just as many as the x variable and are unique, meaning that
none of them is a constant multiplier times another. If A exists then x1 , x2
and x3 can be expressed as a function of y1 , y2 and y3 .
if A1 exists:
A1 Y = A1 AX
A1 Y = In X
A1 Y = X
(4.3)
So far so good. It is clear that we can solve a system of linear equations if only we knew the inverse of the matrix that contains the coefficients.
But how do we compute the inverse? We perform matrix row operations.
These are row-switching, row multiplication by scalar and rows addition and
subtraction (one row from another). To start, write down the matrix that
contains the coefficient and on the right of it the identity matrix:
a b c 1 0 0
d e f 0 1 0
g h i 0 0 1
Now using the row operations
1 0
0 1
0 0
stated above transform this matrix to:
0 j k l
0 m n o
1 p r s
The right-hand part is the inverse:
A1
j k l
= m n o
p r s
If the determinant of a matrix is non-zero then it has an inverse. Furthermore, the if the matrix of coefficients of a system of linear equations is
non-zero, then the system has a unique solution, otherwise when the determinant is zero there are either no solutions or many solutions.
4.4
25
Maxima and Minima in Several Variables
A local maximum is a point a for which f (a) f (x) for each x in th neighborhood of a. The definition for a local minimum is similar. There are three
steps one should take when looking for max-min candidates. Why candidates? All the maxima and the minima that the function has (on the given
domain) will be in the set of the candidate points. However, some of this
candidates might not be minima or maxima, thus further investigation is
necessary.
1. Solve the system
fx1 (x1 , . . . , xn ) = 0
..
.
fxn (x1 , . . . , xn ) = 0
This will give the points where all the partial derivatives are zero.
Note, that these points will not be maxima or minima if a directional
derivative in any other direction is not zero.
2. Find the points where f is not differentiable as these points were not
included in the analysis in the previous point
3. Check the boundaries of the domain. If the domain is bounded, then
there is at least one maximum and minimum and it is possible that
these occur on the boundary.
WWhen we have found a candidate point (a, b) we must look at the sign
of f (a + x, b + y) f (a, b). For a maximum this should be negative for all
small values of x and y and for a minimum it should be positive. This is
not always easy to show. In this case, it is usually easier to use the second
derivatives. However, we will restrict the further discussion of this matter to
functions of two variables. We will use the values for fx , fy and fxy = fyx so
this will hold only if the function and its partial derivatives are differentiable
at (a, b). If fx (a, b) = fy (a, b) = 0 then:
2
1. If fxx fyy fxy
> 0 then (a, b) is a local minimum if fxx > 0 and local
maximum if fxx < 0
2
2. If fxx fyy fxy
< 0 then (a, b) is a saddle point
2
3. If fxx fyy fxy
= 0 then the test is insufficient and f (a + x, b + y)
f (a, b) should be used to investigate further
Chapter 5
Multiple Integration
5.1
The Fundamental Theorem
Integrating multiple integrals of a continuous function of several variables

is done in an iterative manner. First, the innermost integral is computed
while keeping all other variables constant and evaluating with the limits.
In this fashion all instances of the variable that we integrated with respect
to should be gone and we are left with an integral of a function dependent
only on the other variables. Then, the same procedure is repeated with
the next integral. Keep in mind, that although the order of integration
is not important provided that the function is continuous, it is impossible
to evaluate integrals with limits of integration dependent on x after one
has already integrated with respect to x. That means, that the order of
integration should be such that no limits of integration depend on variables
that were already integrated with respect to.
5.2
Multiple Integration and the Jacobian
Lets discuss variable substitution. We know that often our integration can
be greatly simplified if we use substitution. However, one thing that we need
to always keep in mind when mapping integrals is that the area elements get
scaled. That is that we not only have to perform the substitution and the
change of the limits of integration but also introduce a scaling factor. This
can be illustrated with the following example:
Z 3
Z 10
Z 3
2
2
2
2x x + 1dx 6=
2 u 1 udu
If
2x x + 1dx and u = x + 1,
1
The key idea is that although the scaling is not always linear, due to the
26
CHAPTER 5. MULTIPLE INTEGRATION
27
fact that we are dealing with infinitesimal values means that the error that
arises from using linearizion goes to zero. The general form of the scaling
factor (also known as the Jacobian) is:
F1
F1

x1 xn
.
dF
F
F
..
...
..
(5.1)
J=
=
=
dx
x1
xn
Fm
Fm
x1
xn
or, component-wise:
J i,j =
5.3
Fi
xj
(5.2)
Line Integrals
Line integrals are often used in physics to calculate the work a force has done.
They can be written in several ways:
Z
F~ d~r
Z
if F~ = (M, N ) and d~r = (dx, dy) M dx + N dy
Often F~ and ~r are expressed in terms of the same variable, so integration
is further simplified. Line integrals do not depend only on the starting and
final positions but also on the path taken. That is taken care of by ~r.
5.4
Greens Theorem
Lets first introduce the concept of a connected region. A connected region

is a region in which any point can be connected to any other point with a
line (not necessary straight) that does not leave the region. Furthermore, the
connected regions are divided into simply-connected and multiply-connected.
Intuitively, a simply-connected region is a region without any holes. Rigorously defined, a simply-connected region is a region whose compliment
(inverted) region is also connected.
Greens Theorem: If R is a simply-connected region with boundary C, then

I
ZZ
M
N
dA
M dx + N dy =
x
y
C
R
CHAPTER 5. MULTIPLE INTEGRATION
28
provided that M , N , My and Nx exist and are continuous on R.

Here the positive direction of C is defined to be such that when one is moving
along C the region is on their left side.
There are two interesting consequences from Greens theorem. First,
note that if M dx + N dy is an exact differential (if there exists potential field
whose partial derivatives are M and N ) then (N /x M /y) = 0 and
the integral will equal zero. That is to be expected as a closed line integral
in a conservative field is always zero.
Note that Greens theorem is defined for a simply-connected region. However, it is easy to show that it in fact holds for any closed region. If we have a
region with a hole, we can split it into two separate regions. As the cut has
no thickness, the sum of the areas of the new regions will be the same as the
original area. Furthermore, the cuts will be traversed twice but in opposite
directions they cancel out and the total boundary traversed is the outside
plus the inside boundary. How do we evaluate integrals like this. An integral
with a hole will be the sum of the line integrals of the outside boundary and
the inside boundary (provided that they are in the positive direction that
was defined above).

MIT Calculus Revisited: Multivariate Calculus Notes

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MIT Calculus Revisited: Multivariate Calculus Notes

Uploaded by

Copyright:

Available Formats

Calculus Revisited: Multivariable Calculus

As given by Herbert Gross, MIT

The Game of Mathematics

We can define a game to be any system consisting of definitions, rules, and

CHAPTER 1. VECTOR ARITHMETIC

To show that an argument is invalid, all we need do is give one set of

The Structure of Vector Arithmetic

An important fact to keep in mind is that the operations in vector arithmetic

~a + (~b + ~c) = (~a + ~b) + ~c

c(~a + ~b) = c~a + c~b

(c + d)~a = c~a + d~a

Applications to 3-Dimensional Space

When talking about three dimensional vectors we do actually mean vectors

CHAPTER 1. VECTOR ARITHMETIC

vector connecting the origin of a Cartesian coordinate system with a point

The Dot Product

CHAPTER 1. VECTOR ARITHMETIC

Figure 1.2: Vector projection

~ cos = |~uB ||A|

CHAPTER 1. VECTOR ARITHMETIC

The Cross Product

CHAPTER 1. VECTOR ARITHMETIC

Equations of Lines and Planes

to see that the vector P P0 = (x x0 , y y0 , z z0 ) is on the plane and thus

Then, the vector P P0 = (x x0 , y y0 , z z0 ) is on the line as it connects

CHAPTER 1. VECTOR ARITHMETIC

P P0 is a scalar multiple of ~v . As a result:

Vector Functions of a Scalar Variable

= f (x)~g 0 (x) + f 0 (x)~g (x)

~g (x)] = f~(x) ~g 0 (x) + f~0 (x) ~g (x)

CHAPTER 2. VECTOR CALCULUS

~g (x)] = [f~(x) ~g 0 (x)] + [f~0 (x) ~g (x)]

Tangential and Normal Vectors

This result comes naturally as d~r is in the tangent direction. Furthermore,

CHAPTER 2. VECTOR CALCULUS

Polar coordinates are another way of representing coordinates in a plane. A

CHAPTER 2. VECTOR CALCULUS

Vectors in Polar Coordinates

CHAPTER 2. VECTOR CALCULUS

~u is a positive 90 degree rotation of ~ur so:

Now it can be seen from Equation 2.5 that

n-Dimensional Vector Spaces

CHAPTER 3. PARTIAL DERIVATIVES

resultant structure a n-dimensional vector space. What is to be remembered

An Introduction to Partial Derivatives

We cant take the derivative of functions of multiple variables in the same

CHAPTER 3. PARTIAL DERIVATIVES

CHAPTER 3. PARTIAL DERIVATIVES

Differentiability and the Gradient

Recall from the previous section that

One can recognize this as the dot product of two vectors:

CHAPTER 3. PARTIAL DERIVATIVES

CHAPTER 3. PARTIAL DERIVATIVES

substitution in Equation 3.7 and use Equation 3.8 we get:

The Chain Rule

It is easy to see that w can be expressed as a function of r and s. Then one

CHAPTER 3. PARTIAL DERIVATIVES

CHAPTER 3. PARTIAL DERIVATIVES

This motivates the use of linear systems:

CHAPTER 4. MATRIX ALGEBRA

Introduction to Matrix Algebra

CHAPTER 4. MATRIX ALGEBRA

this result is true. Note that although generally AB 6= BA in this case

Keep in mind that in this derivation we assumed that A1 exists. If it did