Professional Documents
Culture Documents
Michael Singer
E-mail address: michael.singer@ucl.ac.uk
Department of Mathematics, University College, London WC1E 6BT
Contents
Chapter 1. Introductory mathematical material: some geometry
1.1. Vector spaces and affine spaces
1.2. Quadratic forms and bilinear forms
1.3. Curves, tangent vectors and so on
1.4. Calculus of variations
5
5
7
11
13
17
17
18
19
21
21
22
23
25
26
27
32
32
35
35
37
39
41
41
43
45
47
47
49
50
53
53
56
56
60
61
63
67
67
67
70
71
CONTENTS
6.5.
6.6.
6.7.
6.8.
6.9.
6.10.
6.11.
Curvature
Curvature at a point
Ricci and scalar curvature
Relative acceleration and geodesic deviation
Comparison with the newtonian theory
Weak field limit
Physical dierential equations
72
76
79
80
81
82
83
85
85
86
89
96
102
102
102
CHAPTER 1
(x, y, z) = ( x, y, z)
Example 1.1.3. More generally, the space of lists of n real numbers (x1 , . . . , xn ) is a real
vector space denoted by Rn .
1.1.2. Bases and matrices. We recall that a (finite) basis for a vector space V is a set
of elements (v1 , . . . , vn ) such that every element v in V has a unique representation
v=
n
X
j vj ,
j=1
2 R).
(1.1.2)
In the expansion (1.1.2), the numbers j are called the coefficients (or sometimes coordinates)
of v with respect to the basis (v1 , . . . , vn ). If V does have a finite basis consisting of n elements,
5
then any other basis of V will also consist of n elements. This number n is defined to be the
dimension of V .
If V does not have a finite basis, then it is said to be infinite-dimensional.
Example 1.1.4. The set Rn of n-tuples (x1 , . . . , xn ) (see above) has dimension n. This has
a standard basis
e1 = (1, 0, . . . , 0), e2 = (0, 1, 0 . . . , 0), . . . , en = (0, . . . , 0, 1).
Example 1.1.5. The set of dierentiable functions on R is an infinite-dimensional vector
space.
Theorem 1.1.6. If V is a real vector space of dimension n, then V is isomorphic to Rn .
We wont need the proof. But its important to understand that an isomorphism of V with
Rn is exactly the same thing as a choice of basis of V . For if T : Rn ! V is an isomorphism,
we define
vj = T (ej ),
where the ej is the standard basis of Rn , and then you check that the vj form a basis of V .
Conversely, given a basis vj of V , for every v 2 V we have its n-tuple of coefficients j as in
(1.1.2), and the map from v to its coefficients is an isomorphism from V to Rn .
In particular without more structure there is no natural or unique isomorphism between our
n-dimensional vector space and Rn . Returning to our original motivation, the observation that
our world appears to be very well described by a space with coordinates (x, y, z), but we dont
want to specify particular coordinates, we see that these ideas are quite well captured by saying
that our world is well described by a 3-dimensional real vector space.
Remark 1.1.7. When we are using vectors to describe physical space we often call their
components in a basis coordinates instead.
1.1.3. Symmetries of a vector space. Another way of thinking about choices of basis
in a vector space is in terms of the symmetry of the space.
Definition 1.1.8. Let V be a finite-dimensional real vector space. The set of all invertible
linear maps T : V ! V (i.e. linear isomorphisms of V with itself) is a group, denoted GL(V ).
If V = Rn , we also write GLn (R) or GL(n, R) for GL(V ).
GL here stands for general linear. Recall group just means that there is an associative
multiplication with inverses. In this case it is composition of linear maps. The group GLn (R)
is the same as the group of n n invertible matrices M . Such M defines a map from Rn to
itself by matrix multiplication, where we write the typical element x of Rn as a column vector
with coefficients (x1 , . . . , xn ), so
x 7! M x
1.1.4. Affine spaces. A linear map T between vector spaces automatically takes the zero
vector to the zero vector. In particular, thinking about R3 , the translation
(x, y, z) 7! (x + a, y + b, z + c)
for some fixed vector (a, b, c) is not a linear map. However, from the physical point of view, we
would certainly want to be able to consider such transformations as part of our story.
There is a formal abstract definition of affine space, but we shall not give it. Instead, we
work with our vector space V and just enlarge the symmetries.
Definition 1.1.9. Let V be a finite-dimensional real vector space. The affine group A(V )
is the group of all transformations of the form
x 7! T x + b
where T 2 GL(V ) (i.e. is an invertible linear transformation) and b 2 V is a vector.
Note in particular, by taking T = the identity I, we get the translations as part of A(V ).
And on the other hand it contains GL(V ) as the subgroup of elements with b = 0.
We shall somewhat loosely talk about the affine space V to mean the space V , but where
we are allowing the whole of A(V ) to act as symmetries.
The following example may entertain the mathematicians.
Example 1.1.10. Let T be a linear mapping from a vector space V to another vector space
W . Let w 2 W be any given vector. Let P be the solution set of the equation T x = w, i.e.
P = {x 2 V : T x = w}
Suppose that P is not empty. Then P is a natural example of an affine space if w 6= 0 and is a
vector space if w = 0.
To picture what is going on here, suppose that V = R3 and W = R. Then we picture P
as a two-dimensional plane inside of R3 (usually) which goes through the origin if w = 0 and
does not (necessarily) otherwise. When w = 0, P is a linear (or vector) subspace of R3 , hence a
vector space K in its own right. If w 6= 0, P is identifiable with K; geometrically, P is a plane
parallel to K, but not going through 0. We can identify K with P by picking any element p of
P and mapping k 2 K to k + p 2 P . There is no preferred choice of p and hence no given origin
in P .
A further interesting fact is that the group of linear transformations of R3 which map P
into itself is identifiable with A(P ).
If we think of V as an affine space, then given any two points p and q, we have the displacement vector !
pq of q relative to p, given in terms of the linear structure of V by
!
pq = q
p.
(1.1.3)
Example 1.2.1. In one variable, the only possibility is ax2 , where a is real. In two variables
(x, y), we have
a g x
2
2
ax + 2gxy + by = x y
.
g b y
In three variables,
2
32 3
a
g
f
x
f (v)
and v 2 V .
Remark 1.2.4. This explains what homogeneous of degree 2 means above. Homogeneity
is also important later on in this chapter, when we look at Lagrangians and the calculus of
variations (cf. Proposition 1.4.4).
There are two reasons for the introduction of quadratic forms and bilinear forms. First of
all, quadratic forms on vector spaces provide the additional structure needed to define distance.
The other reason bilinear forms is that they, and their multilinear cousins, are essential in the
development of multivariable calculus (see Chapter 4).
The basic example is x2 + y 2 + z 2 in R3 . By Pythagorass theorem, this is the square of the
distance of the point (x, y, z) from (0, 0, 0) if we are thinking of a standard system of mutually
perpendicular axes.
In special relativity (see Chapter 2) the physics is captured by a quadratic form in 4 variables,
c2 t2 x2 y 2 z 2 . The significance of this will be that
c 2 t2
x2
y2
z2 = 0
if and only a photon (particle of light) emitted at the origin at t = 0 (and travelling in a straight
line) can pass through the point (x, y, z) at time t.
Thus one should think generally of quadratic forms as defining squares of distances, or
squares of lengths of vectors on a vector space. This needs to be taken with a pinch of salt,
thought, since in the above 4-dimensional example, the square of the distance can be 0 or even
negative! We shall explore this in detail in the chapter on special relativity.
We define these homogeneous quadratic polynomials in terms of bilinear forms.
Definition 1.2.5. A bilinear form B on a vector space V is a map
B :V V !R
with the property that for each fixed v, the maps
w 7! B(v, w) and w 7! B(w, v) are linear in w.
To spell this out,
B(v, u + w) = B(v, u) + B(v, w).
and similarly
B( u + w, v) = B(u, v) + B(w, v).
Definition 1.2.6. A bilinear form B is said to be symmetric if B(v, w) = B(w, v) for all
v, w. A bilinear form is said to be skew-symmetric (or just skew) if B(v, w) = B(w, v).
Definition 1.2.7. If B is a symmetric bilinear form, then the associated quadratic form is
Q(v) = B(v, v).
A bilinear form is called non-degenerate if for any v 6= 0, there exists w 2 V such that
B(v, w) 6= 0.
It is equivalent to the corresponding condition with the roles reversed: that is B is also nondegenerate if for any v 6= 0, there exists w such that
B(w, v) 6= 0
We shall see later that any bilinear form (on a finite-dimensional vector space) can be
Then
represented by a square matrix B.
is symmetric (B
=B
t );
B is symmetric if and only if the matrix B
is skew (B
= B
t );
B is skew-symmetric if and only if the matrix B
(1.2.1)
for all vectors v in V . The set of all Q-isometries forms a group denoted O(V, Q).
Remark 1.2.9. Here the O stands for orthogonal. The condition
Example 1.2.10. If V = R2 and Q(x, y) = x2 + y 2 , then Q gives the ordinary euclidean
length-squared of the vector from (0, 0) to (x, y). You can verify that the linear transformation
(x, y) 7! (cx + sy, sx + cy)
B(x, y) = xt By
where xt is the transpose of x, i.e. the row vector with coefficients (x1 , . . . , xn ).
The isometry condition in terms of Q is equivalent to its polarized version
B(T v, T v 0 ) = B(v, v 0 ) for all v, v 0 2 V.
Tt B
(1.2.2)
(1.2.3)
det(T )2 det(B)
(1.2.4)
is non-degenerate and so det B
6= 0.
and so det(T ) = 1 since B
10
Definition 1.2.11. The set of Q-orthogonal T s with det(T ) = 1 forms a subgroup denoted
SO(V, Q), read the special orthogonal group.
Example 1.2.12. Suppose that B is a symmetric bilinear form on R2 such that
B(e1 , e1 ) = 0, B(e2 , e2 ) = 0
is this enough to determine B uniquely?
Solution 1.2.13. The answer is no, we need to know B(e1 , e2 ) as well to determine B. If
B(e1 , e2 ) = , however, then by symmetry also B(e2 , e1 ) = . So the matrix representation of
this symmetric bilinear form will be
B=
.
(1.2.5)
0
(1.2.6)
1s.
Remark 1.2.14. In the pure mathematical literature, the dierence r s of the number of
+1s and the number of 1s is often called the signature. On the other hand it is also common
to refer to a quadratic form as having signature +, +, +, + or +, , , rather than 4 or 2.
There is no real risk of confusion with these slight variations.
The corresponding orthogonal group is denoted by Or,s or O(r, s) and the corresponding
subgroup of elements with determinant equal to +1 is denoted SOr,s or SO(r, s).
One can make the case that the most important cases are s = 0, when the groups are also
just denoted O(n) and SO(n) (for ordinary classical n-dimensional euclidean geometry) and
s = 1, r = 3, for the study of special relativity. In this case O(1, 3) is called the Lorentz group.
The following is a basic fact about non-degenerate quadratic forms:
Theorem 1.2.15. Let V be a finite-dimensional vector space and let Q be a non-degenerate
of Q in this basis
quadratic form on V . Then there exists a basis of V so that the matrix Q
takes the standard form (1.2.6) for some particular r and s (which depend on Q).
Proof. (Sketch, useful to be aware of it, but not examinable.)
Pick any v at random with Q(v) 6= 0. [If Q(v) = 0 for all v, then by polarization (see
problem set) B(v, w) = 0 for all v and w, contradicting the non-degeneracy of Q. So such v
does exist.]
p
Replacing v by e1 = v/ |Q(v)|, we get
Q(e1 ) =
Q(v)
= 1.
|Q(v)|
0
6 0 b0
b0n2 7
22
7
=6
Q
(1.2.7)
6 ..
7
..
..
4 .
5
.
.
0
b0n2
b0nn
The top left-hand element is Q(e1 ). The zeros in the first row and first column come from the
orthogonality condition B(e1 , w) = 0 for all w 2 V 0 .
11
We suppose by induction that the theorem has already been proved for non-degenerate
quadratic forms on vector spaces of dimension < n. Then we can choose e2 , . . . , en so that the
matrix of b0ij is diagonal with entries 1.
Recall that if all the signs in (1.2.6) are + then we call Q (or B) positive-definite; if they
are all , negative-definite. If the signature is r s, then r is the largest possible dimension of
subspaces of V on which Q is positive-definite; and similarly s is the largest possible dimension
of subspaces of V on which Q is negative-definite. Note, however, that such subspaces are not
unique.
1.2.3. Affine version. Suppose we want to incorporate translations. We may define the
group of affine isometries of (V, Q) to be all maps of the form
x 7! T x + b, where T 2 O(V, Q) and b 2 V.
In the case of of interest in relativity, where Q has signature (+, , , ), this is called the
Poincare group.
In this affine case, we should think of Q as giving the length-squared of position vectors of
one point relative to another, that is
!
!
Q-length-squared of AB = Q(AB).
In a euclidean space, (i.e. Q is positive-definite) we can also measure angles between dis!
placement vectors: if X, Y and Z are three points, then the angle, , that u = XY makes with
!
v = XZ satisfies
B(u, v)
p
cos = p
.
Q(u) Q(v)
Thus we have lengths and angles as in ordinary two- or three-dimensional euclidean geometry.
Notation 1.2.16. In a euclidean vector space with fixed positive definite quadratic form it
is common to refer the associate bilinear form as an inner product and denote
pit B(u, v)
p= hu, vi.
Similarly the length of a vector in this context is often simply denoted |u| = Q(u) = B(u, u).
Remark 1.2.17. Why have I gone through all this? First of all, why should it be that vector
spaces or affine spaces are the right thing for describing the world?
We agree that triples of real numbers (x, y, z) are very good for describing where things are
in the world (or indeed the solar system...). But the key thing about vector and affine spaces
(which well discuss next) is that there is a distinguished set of curves (or paths or trajectories),
namely the straight lines. These are distinguished in the sense that if you take a straight line
and apply an affine transformation (translation and linear transformation), you get another
straight line.
Fast forward to Newtons laws of motion: the first of these says that a particle remains at
rest or continues to move with a constant velocity unless acted upon by an external force.
Thus straight lines have a physical significance, as the trajectories of free particles. In this
section, we have learned or recalled the underlying geometry of 3-dimensional spaces of points
(x, y, z), as vector or affine spaces. Weve discussed the symmetries, both linear and affine, of
such spaces. Weve introduced bilinear forms so that we have a notion of length and distance
in our vector space. And this flat geometry seems a good basis for Newtonian physics, because
it has straight lines in some sense built in, and these are the trajectories of particles not acted
upon by any force.
1.3. Curves, tangent vectors and so on
For our purposes, the best definition of a curve in a vector space (or an affine space) is as a
smooth map
:I!V
12
where I is some interval which may be open, closed, (semi-)infinite, whatever. If I is closed
(and bounded), I = [u, v], say, then the endpoints of the curve are (u) and (v).
We generally assume the parameterisation is regular i.e. 0 (t) 6= 0 for every t 2 I. Then
0 (t) is called the velocity vector and it is tangent to the curve. If we choose a basis of V , then
(t) is represented in terms of its components ( 1 (t), . . . , n (t)), where each of the j is just an
ordinary smooth function of the variable t. Then 0 (t) has component ( 10 (t), . . . , n0 (t)).
If 0 (t0 ) = 0 for a point t0 of I, then the curve can be singular (have a sharp corner) at that
point. We dont want to consider such singularities.
The image of is sometimes called the trace of the curve.
The vector 00 (t) with components ( 100 (t), . . . , n00 (t)) along the curve is called the acceleration
vector.
Example 1.3.1. If
a = (a1 , . . . , an ) and b = (b1 , . . . , bn )
are two (constant) vectors in Rn , with b 6= 0, then we may construct the parameterized straight
line
(t) = a + bt
(1.3.1)
0
through a in the direction b. In fact the tangent vector (t) = b (i.e. b is the velocity vector in
this case) and the acceleration 00 (t) is zero.
Example 1.3.2. In R2 , consider the parameterized circle
(t) = (a cos t, a sin t),
(1.3.2)
00
1.3.1. Curves in a euclidean space. Let V be a euclidean space, i.e a real vector space
equipped with a positive-definite inner product h, i and length2 denoted by | |2 (cf. Notation 1.2.16).
Definition 1.3.3. A curve
(t).
0 (t) = | 0 (t)|.
where we choose the positve square consistently with being an increasing function of t. Since
we assume that | 0 (t)| > 0 (regularity of the curve), we get an equation which determines as
a function of t, uniquely up to the addition of a constant.
13
(Here, for consistency with later sections, we use for the derivative of the curve .) This is
the main example we shall consider, though it will be substantially generalized when we come
to consider geodesics in general manifolds or space-times.
Calculus of variations was treated in a previous course, and we shall remind you of some of
the notation and other facts here.
First of all, the thing being integrated is called the Lagrangian. The simplest (timeindependent) Lagrangians are functions of 2n variables, L = L(x, y), and then the functional to
be minimized/maximized will be
Z 1
F[ ] =
L( (t), (t)) dt where (0) = p and (1) = q.
(1.4.2)
0
Notation 1.4.1. In this section, for consistency with later applications we are going to
denote the components of x and y with upstairs indices. To be clear about this, in the previous
paragraph (taking n = 3 for notational simplicity) x and y both stand for triples of numbers, or
points in a 3D vector space. If we choose a basis to expand these vectors, we would get triples
of real numbers which we are going to denote
x = (x1 , x2 , x3 ) and y = (y 1 , y 2 , y 3 ).
This notation takes some getting used to: x2 is not the square of x! With luck youll get used
to this pretty quickly, and will learn not to think that x2 is the square of x.
We shall also write x = (xa ), y = (y a ), with the understanding that the index a runs from
1 to n.
Theorem 1.4.2. If is a sufficiently smooth curve which is an extremal for (1.4.2), and
the Lagrangian is sufficiently nice, then we have
d @L
dt @y a
@L
= 0, xa (0) = pa , xa (1) = q a .
@xa
(1.4.3)
Extremal means that is a local maximum of minimum of (1.4.2) amongst all curves with
the same endpoints.
It is very important to understand the meaning of the d/dt here. It means that we calculate
the partials of L wrt the y a , and then we substitute the values xa (t) for xa and x a (t) for y a ,
before dierentiation with respect to t.
These equations (1.4.3) are called the EulerLagrange equations. The Lagrangian is often
written as a function L(xa , x a ) and the EulerLagrange equations as
d @L
@L
= 0.
(1.4.4)
dt @ x a @xa
With experience, there is no problem with this, except that one has to say something about regarding the xa and x a as independent variables when computing the partials and then regarding
them @L/@ x a as a function of t before dierentiation wrt to t.
Example 1.4.3. Consider the energy functional
1
1
L(x, y) = |y|2 =
(y 1 )2 + + (y n )2 .
2
2
(1.4.5)
14
We have
@L
@L
= 0, a = y a .
a
@x
@y
Inserting y a = x a (t) and subbing in, we get the EL equations:
x
a = 0.
(1.4.6)
(1.4.7)
We learn that xa (t) = pa + v a t where pa and v a are constants. Of course, this is just
the parametric form of the straight line through pa in the direction v. Note that with this
parameterization, the straight line is traversed at constant velocity v as well. In fact, given
xa (1) = 1, v a = q a pa .
1.4.1. Coordinate invariance. If we have a change of coordinates, z a as an invertible
function of the xa , then the result of minimizing a particular functional should not change. Of
course the explicit formulae giving the z a as a function of t will be dierent from the formulae
giving the xa as a function of t. The relation will be
z a (t) = z a (x(t)).
(1.4.8)
As a telling example, consider the same energy functional in spherical polars (r, , '). We
have
Z
1 2
E[ ] =
r + r2 (2 + sin2 ' 2 dt
(1.4.9)
2
We calculate
@L
@L
@L = r2 sin2 '.
= r,
= r2 ,
(1.4.10)
@ r
@ '
@
and
@L
@L
@L
= r(2 + sin2 ' 2 ),
= r2 sin cos ' 2 ,
= 0.
(1.4.11)
@r
@
@'
Hence the EulerLagrange equations are as follows
r = r(2 + sin2 ' 2 ),
d 2
r
= r2 sin cos ' 2 ,
dt
(1.4.12)
(1.4.13)
d 2 2
r sin ' = 0.
(1.4.14)
dt
It hardly needs saying that these equations are much more daunting than the same system in
euclidean coordinates. There are, however, some tricks to lead us to a solution, which we shall
discuss next.
1.4.2. Symmetries imply conservation laws. A Lagrangian is said to have a symmetry if it is independent of a coordinate. For example the energy Lagrangian expressed in
euclidean coordinates is independent of all the xa (it depends only upon the x a ), while in polar
coordinates it is independent of '. By the EulerLagrange equations, if @L/@xa = 0, for a particular a, then @L/@ x a is constant along a solution curve. We also sometimes say is a constant
of the motion.
This principle is very important in many cases, because by knowing that certain quantities
are constant we can render the EulerLagrange equations easier to solve.
Another important symmetry principle is the following: Suppose that we have a Lagrangian
of the form
L(x, y) = T (x, y) U (x, y)
(1.4.15)
Proposition 1.4.4. Suppose that L is as in (1.4.15) and moreover T (x, y) is homogeneous
of degree 2 in y for each fixed x. Then
E(x, y) = T (x, y) + U (x, y)
is constant along solutions of the EulerLagrange equations (1.4.3).
15
Example 1.4.5. Let us see how to use these ideas to solve the system (1.4.12), or at least
how to obtain some of the solutions. We use the principles weve just discovered: along a
solution, the energy itself is constant,
r 2 + r2 (2 + sin2 ' 2 ) = 2E
(1.4.16)
and because L is independent of ' the angular velocity
J = r2 sin2 '
(1.4.17)
is also constant. The full analysis is complicated, but we see that if = /2, then the -equation
is satisfied, so we can start by considering those solutions with = /2 identically1. (Note that
= 0 or are not such good choices, as these values of are singularities of the coordinate
system.)
So, let us substitute = /2 and see what were left with:
r 2 + r2 ' 2 = 2E, J = r2 '
(1.4.18)
Thus if u = 1/r,
1
u4
so multiplying up by u4 ,
dr
d'
du
d'
+ r2 =
2E
' 2
+ r2 =
2E
' 2
(1.4.19)
(1.4.20)
du 2
2E
2E
+ u2 = 4 2 = 2 .
(1.4.21)
d'
r '
J
We can now integrate this to obtain an equation of a non-radial, equatorial geodesic.
We can solve this by dierentiating with respect to ':
d2 u
+ u = 0.
d'2
Hence u = u0 cos(' '0 ) for arbitrary constants ' and '0 . This is the equation of a straight
line in polar coordinates, consistently with Example 1.4.3.
In Problem 1.10, you are encouraged to work through the slightly more involved problem of
finding the orbit of a particle around a heavy star, using the same ideas.
1Actually, if we have any solution curve, then by using the spherical symmetry of the problem and re-orienting
16
Remark 1.4.6. In the calculus of variations, you can consider more general time-dependent
Lagrangians, i.e. functions L(x, y, t). While the EL equations continue to give extrema in this
case also, Proposition 1.4.4 is false in general.
CHAPTER 2
vt, y 0 = y, z 0 = z
(2.1.1)
where v is the relative speed of the frames. (If Bob is sitting at (x0 , y 0 , z 0 ) = (0, 0, 0), then Alice
gives Bobs coordinates as x = vt, y = z = 0.)
This transformation (often known as the Galilean transformation between inertial frames)
is clearly incompatible with the idea that the speed of light is the same in all inertial frames.
The idea that this should be the caseand more generally that all physics should appear the
same to all (inertial) observersis the first of Einsteins famous postulates:
1. The laws by which the states of physical systems undergo change are not aected,
whether these changes of state be referred to the one or the other of two systems of
coordinates in uniform translatory motion.
2. As measured in any inertial frame of reference, light is always propagated in empty
space with a definite velocity c that is independent of the state of motion of the emitting
body.
In slogan form:
1. The laws of physics are the same in all inertial frames of reference.
2. The speed of light in free space has the same value c in all inertial frames of reference.
If the laws of physics are the same for any inertial frame, then it follows that no experiment
can be performed that will single out one frame as preferred above all others. In particular,
there can be no absolute standard of rest and only relative motion is physically meaningful.
Note that the constancy of the speed of light means that something has to go wrong with
the obvious change of coordinates (2.1.1). More explicitly, this change of coordinates is not
compatible with Maxwells equations of electrodynamics.
It is convenient to make certain subsidiary assumptions explicit as well:
P1 Free particles and photons (light particles) appear to inertial observers to travel in
straight lines at constant speeds.
P2 Photons appear to travel at the same speed c to all inertial observers.
P3 The standard clock of one inertial observer appears to any other observer to run at a
constant rate.
17
18
1, (ea , eb ) = 0 for a 6= b.
(2.2.1)
2.3. WORLDLINES
19
Note that there are many choices of inertial coordinate systems. From the mathematical
point of view, this is because there are many dierent choices of basis of M with respect to
which takes standard diagonal form. From the physical point of view this is because there are
many dierent inertial observers, all on an equal footing.
Remark 2.2.2. Note that if X and Y are two vectors in M , and if (e0 , e1 , e2 , e3 ) is a basis
as in (2.2.1) then if
X = X 0 e0 + X 1 e1 + X 2 e2 + X 3 e3 , Y = Y 0 e0 + Y 1 e1 + Y 2 e2 + Y 3 e3
we have
(X, Y ) = X 0 Y 0
X 1Y 1
X 2Y 2
X 3Y 3.
2.3. Worldlines
Recall that anythingparticle, observer, photonwhich exists for an extended period of
time, is described in Minkowski spacetime by a worldline. This is a curve in M consisting of all
the events through which our particle, observer, photon passes.
For example, suppose Alice is at rest at the spacial origin of an inertial coordinate system
(t, x, y, z). The events on Alices world line have coordinates of the form (t, 0, 0, 0), t being the
time on the clock that Alice has beside her.
More generally, Alice observes a particle by noting its (x, y, z) coordinates for dierent times
t. In other words she observes the particles world-line in the form of a curve
(t) = (t, x(t), y(t), z(t))
in the given coordinates.
It is often useful to decouple the parameter which parameterises the curve from an observers time coordinate, replacing the above by the more general form
( ) = (t( ), x( ), y( ), z( ))
so that all 4 coordinates depend on the parameter .
Example 2.3.1. If Alice is an inertial observer who sets up an inertial coordinate system
as above with herself at the (spatial) origin x = y = z = 0, then her worldline will be
t( ) = , x( ) = y( ) = z( ) = 0.
Definition 2.3.2. For the worldline ( ) of a particle, observer or photon in M, d /d is
called the velocity 4-vector.
The use of the term 4-vector is traditional. It helps to distinguish this vector from ordinary
velocity vectors: e.g. the velocity vector of a particle as measured by an observer.
Note that in terms of the original parameterization, (t) = (t, x(t), y(t), z(t)),
dx dy dz
d
= 1,
, ,
.
dt
dt dt dt
and the spatial part of this is the 3-vector
dx dy dz
, ,
.
dt dt dt
which is the instanteous velocity of the particle as calculated by Alice when her clock says time
t.
For now, we shall mainly be concerned with straight, constant-speed worldlines: i.e. where
has the form
( ) = X + V
(2.3.1)
where X and V are constant vectors in M . (Here again we are regarding M and M as the same
by choice of an event E of M corresponding to the zero-vector in M .
We now have to see how P2 and P4 are to be interpreted: photons travel at speed c = 1 as
measured by any inertial observer, and no particle is ever observed to travel faster than light.
20
2.3.1. What are the photon worldlines? Suppose that a photon is emitted by a laser
at the event with coordinates (0, 0, 0, 0) and passes through the event (t, x, y, z), relative to the
above inertial coordinate system. In other
p words, at the later time t, its spatial coordinates are
(x, y, z). Then the distance covered is x2 + y 2 + z 2 but this must be equal to t as the speed
!
is 1. In particular, if E and P are two events on the worldline of a (free) photon, then EP is
! !
null in the sense that (EP , EP ) = 0. [NB, a null vector need not be the zero vector!]
The following definition is useful:
Definition 2.3.3. Two events P and Q are null-separated if the displacement vector X =
!
P Q is null, i.e. (X, X) = 0.
Remark 2.3.4. This definition depends only upon the events P and Q, and the form ; it
does not depend upon any choice of inertial basis or coordinate system.
To flesh this remark out: We saw by calculation in a particular inertial frame, that if P and
!
Q are two events on a photon worldline, then P Q is a null vector. But the latter is a statement
purely about the geometry of M: it uses only the basic facts that given any two events we have
a displacement vector, and that we can feed vectors to . In particular all inertial observers
agree about when a pair of events are null separated, and hence the speed of light is the same
for all such observers.
This leads us to the following
Hypothesis 2.3.5. The worldline of a photon has the form
( ) = X + N ,
(2.3.2)
(2.3.3)
1 )N, (2
1 )N ) = (2
1 )2 (N, N ) = 0.
(2.3.4)
In summary, we have seen that if inertial coordinate systems are defined as in Hypothesis 2.2.1 and free photon worldlines are as in Hypothesis 2.3.5, then all inertial observers agree
on the speed at which photons travel.
2.3.2. What are the free particle worldlines? Return to the two events E and P at
the beginning of the previous section, and suppose now that they are on the worldline of a
particle travelling at uniform speed v with 0 6 v < 1. Then we must have
p
x2 + y 2 + z 2 = |vt| < |t|
(2.3.5)
and so
t2
x2
y2
z 2 > 0.
(2.3.6)
Definition 2.3.7. A vector X 2 M is timelike if (X, X) > 0. Two events P and Q are
!
timelike separated if the displacement vector P Q is timelike.
We now make the free-particle hypothesis:
21
(2.3.7)
where X and V are constant vectors of M and V is timelike. The parameter is called proper
time if (V, V ) = 1. In this case, if P1 and P2 are two events on the worldline with parameter
values 1 and 2 respectively, then 2 1 is interpreted as the elapsed time between these two
events as measured by a clock carried by an observer on this worldline.
Remark 2.3.9. As for null-vectors, the notion of a vector being time-like is independent of
any choice of observer or coordinate system. In particular if one inertial observer thinks that a
particle is travelling at speed less than that of light, all observers will agree on this.
Remark 2.3.10. Note the analogy between a curve being parameterized by proper time
here and the idea of unit-speed curves being parameterized by arc-length for ordinary curves
in euclidean space.
As in the case of photon worldlines, we started in Alices coordinate system (t, x, y, z), and
calculated that the events E and P are on the worldline of a particle moving at speed < 1 if
!
(and only if) the displacement vector EP is timelike. This, however, is a statement which is
independent of any particular choice of inertial coordinate system. Thus it must be the case
that Bob, with an inertial coordinate system (t0 , x0 , y 0 , z 0 ) will also calculate that E and P are
events on the worldline of a particle moving at speed less than 1.
2.4. Why do clocks carried by inertial observers all go at uniform rates?
Let us remember that the postulate P3 states that if Alice and Bob are inertial observers,
possibly in relative motion, then if Alice looks at Bobs clock, she will see that it is ticking at
a uniform rate, but that this rate may be dierent from the rate at which her own (identical)
clock is ticking.
Suppose that Bob has worldline
( ) = V
(2.4.1)
(so that he passes through the event E at parameter value = 0). Recall the hypothesis that
is proper time (i.e. the time as measured on the clock hes carrying with him) if (V, V ) = 1.
In Alices coordinate system (t, x, y, z), this has the form
t( ) = V 0 , x( ) = V 1 , y( ) = V 2 , z( ) = V 3 .
(2.4.2)
where
(V 0 )2
(V 1 )2
(V 2 )2
(V 3 )2 = 1.
(2.4.3)
In particular V 0 6= 0 and we find the time measured on Bobs clock is related to Alices time
coordinate t by the fixed multiple V 0 . So with all our hypotheses made about what inertial
frames are, we see that each of P1P4 are now satisfied.
2.5. Spacetime diagrams
Minkowski spacetime can be pictured by suppressing one or two of the spatial dimensions
and drawing a picture with time going up the page and x or x and y going across.
Suppressing y and z, leaving just a space variable x and a time variable t in play, a typical
spacetime diagram is shown below. Ive drawn in the worldlines of two free massive particles,
two photon worldlines and the axes of two inertial coordinate systems.
Note that the photon worldlines are at 45 and that the free particles have worldlines inclined
at less than 45 to the vertical.
22
t-axis
t0 -axis
x0 -axis
x-axis
photon worldlines
free particle worldline
|v|2 ) = 1.
Thus
1
V0 = p
.
1 |v|2
The RHS here is usually denoted by (v) (where as usual v = |v|),
1
(v) = p
.
1 |v|2
This has to be replaced by
(v) = p
1
|v|2 /c2
(2.6.1)
(2.6.2)
(v) > 1
(2.6.3)
with equality if and only if v = 0. Moreover (v) ! 1 as v ! c.
Now the relation
t = V 0 = (v)
(2.6.4)
encodes the time dilatation or moving clocks run slowly: indeed, if P1 and P2 are two events
on Bobs world line which occur at parameter values 1 and 2 , then he reckons that the time
23
dierence is 2 1 . Alice however, reckons that the time dierence between these two events
is (v)(2 1 ), so the time elapsed is greater, according to Alice, by a factor of (v).
2.7. Simultaneity and distance
We have made a hypothesis about how coordinates introduced by diagonalizing are inertial
coordinates introduced by inertial observers and how unit speed straight lines are worldlines
parameterized by proper time.
There is a good question, though, which is how would an observer actually try to set up
coordinates without appealing to an absolute standard of rest. So suppose Alice wants to set
up such coordinates.
Suppose that F is any event in M. Alice is travelling on her straight world line which doesnt
pass through F . She sends out light signals and lets them scatter o F . She finds that the signal
emitted at time 1 on her clock scatters o F and is picked up by her at time 2 . She infers
two things: that the distance to F is c(2 1 )/2. And that F should have time coordinate
1
2 (1 + 2 ). This is the radar method of assigning times and measuring distances.
So using only allowable methods, she assigns a position and time coordinate to F .
Lets see how all this looks in terms of trajectories and worldlines.
If Alices trajectory is ( ) = U , and F has position vector Y relative to the chosen origin,
the photon trajectory is
7! U 1 + N
on the outward leg and
0
7! U 2 + N 0
on the return leg. Here N and N 0 are null vectors, and we may suppose that the parameters
and 0 are chosen so that these trajectories hit Y at = 1, 0 = 1:
U 1 + N = Y = U 2 + N 0
(2.7.1)
1
1
(1 + 2 )U = (2
2
2
1
1 )U + N 0 = (1
2
2 )U + N
(2.7.2)
We also assume (U, U ) = 1 so that Alices worldline is parameterised by her proper time .
!
Proposition 2.7.1. (U, EF ) = 0.
Proof. From (2.7.1),
U (1
2 ) = N 0
N0
N.
(2.7.3)
to get
2 )(U, N ) = (N, N 0 )
2 )(U, N 0 ) =
(N, N 0 )
(2.7.4)
(2.7.5)
N, N 0
N ) = (1
2 )2 (U, U ) so
2(N, N 0 ) = (1
2 ) 2 .
(2.7.6)
1
(1
2
2 ).
(2.7.7)
24
Now we calculate
!
1
(U, EF ) = U, N + (1 2 )U
2
1
= (U, N ) + (1 2 )
2
1
1
=
(1 2 ) + (1 2 )
2
2
= 0
as required.
If you didnt like that proof, here is another. There are often several dierent ways to
accomplish the same thing.
Proof. The idea of this proof is to write everything in terms of the null vectors N and N 0 .
It is perhaps a more symmetrical proof than the previous one. From (2.7.3), we obtain
U=
N0
1
N
.
2
(2.7.8)
!
We also have the two formulae for EF in (2.7.2). Adding these, we get
!
2EF = N + N 0 .
Now we calculate
!
(U, EF ) =
1
(N + N 0 , N 0
2(1 2 )
by using again the bilinearity of to expand the RHS.
N ) = 0.
(2.7.9)
(2.7.10)
U 2
N0
As world-line ( ) = U
F
E
U 1
Definition 2.7.2. If Alice is moving uniformly with 4-velocity vector U , then she reckons
two events E and F to be simultaneous if
!
(U, EF ) = 0.
These events then have a well-defined spatial separation d, where
! !
d2 = (EF , EF ).
However we look at it, the key point is that if we have two observers, Alice and Bob, moving
relative to each other, then they will generally disagree about which pairs of distant events are
simultaneous.
From the mathematical or geometric point of view, they have dierent 4-velocity vectors U
!
and V . If F and G are distant events, Alice thinks they are simultaneous if (U, F G) = 0, while
!
Bob thinks they are simultaneous if (V, F G) = 0. These are dierent conditions, and if one is
satisfied, then there is no guarantee that the other one will also be.
Definition 2.7.3. Let P and Q be two particles. If Alice is an inertial observer with
4-velocity U , she measures the distance between these two particles at time on her clock by
25
Finding events F on the world-line of P and G on the worldline of Q which are simultaneous with the event E at time on her worldline; in other words, finding F and G
!
!
such that (U, EF ) = 0, (U, EG) = 0.
Calculating the distance as
q
! !
d=
(F G, F G).
2.8. Length contraction
Lets push this operational definition of distance or length to see how an inertial observer will
measure the length of a moving rod. Suppose that we have a rod of length d, whose endpoints
have world-lines
( ) = V , ( 0 ) = D + V 0
where (D, V ) = 0. The length of the rod should be defined as the length of the rod as measured
by an observer at rest with respect to the rod. For such an observer, with 4-velocity V , we ask:
which pairs of events ( ) and ( 0 ) are simultaneous. Plugging in the defintion we need
(V, D + V ( 0
)) = (V, D) + 0
= 0.
( ) = (, d, 0, 0).
If Alices worldline is ( ) = U , to measure the length, she has to find events E and F on
the world-lines that she considers to be simultaneous.
If these are V and D + V 0 , then the displacement vector is
X = D + V ( 0
(2.8.1)
)) = 0
(2.8.2)
(U, D)
V.
(U, V )
(2.8.3)
Thus we compute
(X, X) = (D, D)
(2.8.4)
(2.8.5)
Recall that (D, D) = d2 is the square of the length of the rod as measured in its rest-frame,
so d0 6 d in general.
To understand this calculation, we may work explicitly in the frame in which the rod is at
rest. Then
V = (1, 0, 0, 0), D = (0, d, 0, 0).
(2.8.6)
Alices 4-velocity vector U has the form
U = (u)(1, u) = (u)(1, u1 , u2 , u3 )
(2.8.7)
26
where u is the velocity of Alice relative to Bob and (u1 , u2 , u3 ) are its components in Bobs
coordinates and
1
(u) = p
.
(2.8.8)
1 |u|2
as in (2.6.1). Hence
(U, D) =
(u)du1 , (U, V ) = (u)
(2.8.9)
and so from (2.8.5), we find
q
0
d = d 1 u21
This is the famous LorentzFitzgerald length contraction: if the component u1 of the relative
velocity of the observer in the direction of the
p rod is non-zero, then the observer judges the length
of the rod to be less than d by the factor 1 u21 . Note that there is no length contraction if
the observer is moving at right-angles to the rod.
2.9. Lorentz transformations
The set of linear transformations of M which preserve is called the Lorentz group. This
means, concretely, the set of linear maps L : M ! M such that
(LX, LY ) = (X, Y )
The group of all Lorentz transformations, or the Lorentz group, is also denoted by O(1, 3).
We shall use Lorentz transformations mainly to compare dierent inertial frames with the same
event in M as origin. More precisely, suppose that Alice introduces an inertial basis (e0 , e1 , e2 , e3 )
and Bob introduces an inertial basis (e
e0 , ee1 , ee2 , ee3 ). Their coordinates are respectively (t, x, y, z)
and (e
t, x
e, ye, ze).
Because (ei ) and (e
ei ) are both diagonalizaing bases, there is a Lorentz transformation L
with the property
(e
e0 , ee1 , ee2 , ee3 ) = (e0 , e1 , e2 , e3 )L
(2.9.1)
This is shorthand for expressing the primed basis vectors as linear combinations of the unprimed
ones.
Note that the matrix product
0 t1
ee0
Beet C
B 1t C (e
(2.9.2)
@ee2 A e0 , ee1 , ee2 , ee3 )
t
ee3
has as its ab component the scalar (e
ea , eeb ). Hence, as (e
ea ) are diagonalizing, this matrix is
diagonal, with entries
(e
e0 , ee0 ) = 1, (e
e1 , ee1 ) = (e
e2 , ee2 ) = (e
e3 , ee3 ) =
Lt et eL = , so Lt L = ,
confirming the relevance of the Lorentz group for changing frame between observers.
Suppose that our frames are related by (2.9.1). Multiplying on the right by the column
vector (e
t, x
e, ye, ze), we get
0 1
e
t
Bx
C
e
C
e
tee0 + x
eee1 + yeee2 + zeee3 = (e0 , e1 , e2 , e3 )L B
(2.9.3)
@yeA
ze
27
0 1
0 1
e
t
t
Bx C
Bx
C
e
B C = LB C
(2.9.4)
@y A
@yeA
z
ze
Note the usual confusing point that the L appears multiplying the untilded es in (2.9.1) but
the tilded coordinates in (2.9.4).
2.9.0.1. Examples. A particular example (with c = 1) is
0
1
(v)
(v)v 0 0
B (v)v
(v) 0 0C
C
L=B
(2.9.5)
@ 0
0
1 0A
0
0
0 1
This is often referred to as the standard 2D Lorentz transformation. It is of course, fourdimensional, but y = ye and z = ze, so all the action is going on in the way the (t, x) and (e
t, x
e)
variables are related to each other. To save writing, well ignore the y, z and ye, ze variables in
the rest of the discussion of this
p example.
Here, as before, (v) = 1/ 1 v 2 . Suppose Bob is sitting at the origin of the coordinate
system. Then his world line is x
e = 0. Inserting (e
t, 0) into the coordinate transformation, we see
that
t = (v)e
t, x = (v)ve
t.
This gives Bobs worldline, as a parameterized curve in Alices coordinate system (the parameter
being e
t). Since x/t = v for this curve, we see that Bob is moving at speed v in the direction of
Alices positive x-axis. The conclusion is that this Lorentz transformation corresponds precisely
to two inertial observers one moving at speed v relative to the other.
It is of interest to derive this from the postulates P1P4 and the relativity principle R. I
shall omit this here: you can find it in Woodhouse, SR (new edition, 4.44.6.)
Consideration of this transformation gives a dierent way to derive the standard counterintuitive properties of SR: time dilatation, length contraction, and so on.
2.9.1. The Lorentz and Poincar
e Groups. The Poincare group is the Lorentz group
with (4-dimensional) translations included. Thinking in terms of coordinate transformations,
the typical element of the Poincare group has the form
0 1
0 1 0 1
e
t
p
t
Bx C
Bx
C BqC
e
B C = L B C + B 0C
(2.9.6)
@y A
@yeA @r A
z
s
ze
where (p, q, r, s) are constants and L is a Lorentz transformation.
From a more sophisticated point of view, it is the natural symmetry group of the affine
space M, preserving the bilinear form on the set ME of all position vectors relative to E.
Remark 2.9.1. The 3-dimensional euclidean group is contained in the Poincare group, via
0 1
0 1 0 1
e
t
0
t
Bx C
Bx
C Bq C
1
0
e
B C=
B C+B C
(2.9.7)
@y A
0 R @yeA @r A
z
s
ze
Just as the reflections are isometries of euclidean space that do not seem to be realized
physically, so there are some Lorentz transformations that are not so relevant as others. The
physically most relevant Lorentz transformations are those that preserve the spatial orientation
(rotations rather than reflections) and a time orientation or times arrow.
28
One can show that if a Lorentz transformation maps one future-pointing timelike vector to
a future-pointing timelike vector, then it maps all future-pointing timelike vectors to futurepointing timelike vectors.
Definition 2.10.3. A Lorentz transformation is orthochronous if it maps the future-pointing
nappe to the future-pointing nappe, in other words, if it preserves the time-orientation. The
subgroup of such transformations is denoted O+ (1, 3).
Definition 2.10.4. A Lorentz transformation is called proper if its determinant is 1. The
group of proper, orthochronous Lorentz transformations is denoted SO+ (1, 3).
We note that spatial reflections are excluded from SO+ (1, 3), and so is any transformation
that reverses the arrow of time. Thus this seems to be the most physically appropriate subgroup
of O(1, 3).
Alongside these restricted groups, we should also restrict the allowable inertial bases. We
say that a basis is oriented and time-oriented if e0 is future-pointing and (e1 , e2 , e3 ) is righthanded. Then SO+ (1, 3) maps any oriented and time-oriented basis to another such basis, and
conversely any two such bases are related by an element of SO+ (1, 3).
Remark 2.10.5. It is worth mentioning that X is future-pointing (timelike or null) if and
only if X is past-pointing (timelike or null).
2.10.1. Causality in Special Relativity. If M is given a time-orientation, then the nonzero null vectors also fall into two distinct sets, the future-pointing and the past-pointing. A null
vector N is future-pointing if (X, N ) > 0 for any given future-pointing timelike X. Similarly
a null vector is past-pointing if (X, N ) < 0 for future-pointing timelike X. (Of course, if N is
null future-pointing, then (Y, N ) < 0 if Y is timelike past-pointing.
A future-pointing null vector is in the boundary of the future-pointing nappe of the coneit
is a limiting case of future-pointing timelike vectors. For example, if we consider
Xt = e0 + te1 ,
where e0 is future-pointing timelike and e1 is spacelike (i.e. (e1 , e1 ) =
(Xt , Xt ) = 1
1), then
29
The set Fut(E) can be pictured as the solid half-cone whose boundary is the set of futurepointing null vectors emanating from E.
Similarly, the set of all events G which can aect or influence E is
!
Past(E) = {G 2 M such that GE is timelike or null future-pointing}
!
= {G 2 M such that EG is timelike or null past-pointing}
(2.10.2)
This can be pictured as the solid half-cone whose boundary is the set of past-pointing null
vectors emanating from E.
So although space and time are mixed up in the geometry of special relativity, there is still
a well defined notion of causality.
We have defined timelike and null vectors. If a vector is not timelike or null, it is called
spacelike:
Definition 2.10.6. If X 2 M , then X is called spacelike if (X, X) < 0. Two events E and
! !
F in M are said to be spacelike separated if (EF , EF ) < 0.
We end by noting the following:
!
Proposition 2.10.7. Suppose that E and F are events such that EF is future-pointing
timelike. Then there exist inertial coordinates such that E has coordinates (0, 0, 0, 0) and F has
coordinates (t, 0, 0, 0) with t > 0.
Suppose that E and F are spacelike separated events. Then there exists an inertial frame
with respect to which E and F are simultaneous (e.g. E has coordinates (0, 0, 0, 0) and F has
coordinates (0, d, 0, 0). Moreover, there exist other coordinate systems in which E occurs before
F.
Proof. The first follows from the basic fact that given if X is any future-pointing timelike
vector, then there is an oriented and time-oriented basis (e0 , e1 , e2 , e3 ) with respect to which
!
is diagonal, and such that X is a positive multiple of e0 . In such a basis, EF = (t, 0, 0, 0),
where t > 0, and if we choose the origin so that the coordinates of E are (0, 0, 0, 0), then the
coordinates of F will be (t, 0, 0, 0).
! !
!
Similarly if (EF , EF ) < 0, we can pick a multiple e1 of EF such that (e1 , e1 ) = 1.
!
We extend this to a diagonalizing (oriented and time-oriented) basis of , and then EF has
the desired form. In particular, it is -orthogonal to e0 and so these events will be judged
simultaneous by an observer with 4-velocity e0 .
For the last part, let V = e0 + e1 . Then
(V, e1 ) =
So if V is the 4-velocity vector of an observer, Bob, he will reckon that F happens after E if
< 0 and that F happens before E if > 0.
Remark 2.10.8. The above makes complete sense from the point of view of the radar
method. See the picture below. Consider two inertial observers, Alice and Bob, and suppose
that E is an event on both of their worldlines. If we suppose that Alice judges E and F to be
simultaneous, this means that Alice bounces a light signal o F , then if she sends it out at 1
and receives it at 2 , she assigns E time (1 + 2 )/2. In the diagram, Alice sends her light signal
out at event A1 , and receives it at A2 , and E is the midpoint of the segment A1 A2 .
Now if Bob is heading towards F , it is clear that the light signal he needs to send to bounce
of F has to be transmitted at event B1 and received at event B2 . The event on his worldline
that he judges to be simultaneous with F is therefore the midpoint of B1 B2 , shown as E 0 . It
is clear from the geometry that the segment EB1 is longer than EB2 , so E 0 will be, as shown,
before E on his worldline.
Similarly, if B is heading away from F , the event he judges simultaneous with F will be
later, on his worldline, then the event E.
30
A2
B2
As world-line
E0
Bs world-line
A1
B1
2.10.2. Spatial and temporal components. We have seen that inertial observers, and
free particles and photons have straight lines, and the basic feature of a worldline is the 4velocity vector. It is often annoying to choose a full inertial basis to solve particular problems,
but it is important to split Minkowski vectors into their spatial and temporal components with
respect to a particular timelike vector.
Suppose that V is a timelike future-pointing 4-vector. Then we can write any vector X in
terms of its components parallel to and -orthogonal to V . That is
X = V + Y, where (V, Y ) = 0.
(2.10.3)
(V, X)
.
(V, V )
(2.10.4)
Then
(V, X)
V.
(V, V )
More concretely, relative to an inertial basis in which V is a positive multiple of e0 ,
Y =X
(2.10.5)
V = (V 0 , 0), X = (X 0 , )
(2.10.6)
(V, X) = V 0 X 0 , (V, V ) = (V 0 )2
(2.10.7)
Y = (0, ).
(2.10.8)
where
and
Here 0 is the ordinary 3-dimensional zero-vector and is also a euclidean 3-vector.
It is worth spelling out that if X and Z are any two Minkowski vectors with components
X = (X 0 , ), Z = (Z 0 , )
then
(X, Z) = X 0 Z 0
A photons velocity 4-vector will take the form
!(1, e)
.
(2.10.9)
where ! > 0 (the photon is travelling forward in time) and e is a unit vector. For physical
reasons, ! is identified with the frequency of the photon as measured by an observer with
4-velocity V .
31
Example 2.10.9. Relative velocity. Suppose that Alice and Bob are inertial observers with
4-velocity vectors U and V , with (U, U ) = (V, V ) = 1.
In Alices rest-frame,
U = (1, 0), V = (1, v)
for some constant . This is, of course, the
says
2
(2.10.10)
(1
(2.10.11)
(2.10.12)
(2.10.13)
where u is the velocity vector of Alice relative to Bob and w is the velocity vector of Chris
relative to Bob.
Then
() = (U, W ) = (u) (w)(1
u w)
(2.10.14)
This is an answer, but it is instructive to rearrange it a bit. By squaring and taking the
reciprocal,
1
2 =
(1
u2 )(1 v 2 )
(1 u w)2
(2.10.15)
Hence
2 =
=
=
=
u w)2 (1 u2 )(1 w2 )
(1 u w)2
1 2u w + (u w)2 1 + u2 + w2
(1 u w)2
(u2 2u w + w2 ) + u w2 u2 w2
(1 u w)2
2
|u w| + u w2 u2 w2
(1 u w)2
(1
(2.10.16)
u2 w 2
(2.10.17)
(2.10.18)
(2.10.19)
w|2 to a
32
(2.11.1)
Let E be the event spaceship leaves earth let A be the event spaceship reaches destination
and let R be the event spaceship arrives back at earth. We assume that the space ship travels
at constant speed v on outward and return trip. Continuing the idealization, lets suppose that
the earth is inertial, with 4-velocity U . Taking E to be the origin of M, the world-line of the
earth is then 7! U . On the outward leg, the spaceships trajectory is 7! V1 and on the
inward leg it is 7! T V1 + V2 , where T V1 is the displacement vector from E to the arrival
event A.
We may choose the frame so U = (1, 0, 0, 0), V1 = (v)(1, v, 0, 0), V2 = (v)(1, v, 0, 0). By
geometry (see the picture below, in which the y and z directions have been suppressed)
T (1, 0) = S (v)(1, v) + S (v)(1, v).
(2.11.2)
R = (T, 0)
A = S (v)(1, v)
E = (0, 0)
2.12. Summary of key notation and definitions
Definition 2.12.1. If X 2 M is a vector, we say that X is
timelike if (X, X) > 0;
null if (X, X) = 0;
spaceelike if (X, X) < 0.
!
If X is the displacement vector EF from an event E to an event F in M, we have the
following corresponding definitions
!
Definition 2.12.2. Let E and F be events in M and let X = EF be the displacement
vector. Then
33
CHAPTER 3
(3.1.3)
t2
(3.1.4)
36
x
2 =
a2
(3.1.5)
(3.1.6)
t = u sinh u, tu cosh u.
(3.1.7)
Then
Substituting in to (3.1.5)
So u = a(
u 2 = a2
0 ) (if a > 0, so the curve is future-pointing). Integrating the equations
t = cosh a, x = sinh a,
(3.1.8)
(3.1.9)
yields
1
1
sinh a, x( ) = (cosh a 1)
(3.1.10)
a
a
choosing the constants of integration so that the particle is at (t, x) = (0, 0) when = 0.
t( ) =
3.1.1. Interstellar travel revisited. Suppose that a spaceship starts from rest and its
engines deliver uniform acceleration a. What happens?
One thing is that the velocity remains below that of light, as it must. For large ,
1 a
1 a
t( )
e , x( )
e
(3.1.11)
2a
2a
which is a parameterization (not by proper time) of the null ray t = x.
[The trajectory is a hyperbola.]
The relation t = cosh a relates the time which elapses on board the ship, compared with
that t measured by clocks left behind on earth.
To reach distance D, you have to solve D = a1 (cosh a
1). If D is reasonably large,
sinh a ' exp(a )/2 so
1
= log(2D).
(3.1.12)
a
This logarithmic relationship means that in principle, with modest accelerations from rest, a
uniformly accelerating spaceship can cover interstellar distances in reasonable times (as measured by the astronauts). For example, suppose that we measure distance in light-years and
time in years. There are 1016 metres in a light-year and 3 107 seconds in a year.
So the acceleration due to gravity, 10ms 2 , is equal to 10 10 16 (3 107 )2 light-years
per year2 . This is (miraculously) approximately 1. So according to the above, a spaceship
accelerating so that the astronauts would feel a earths gravity on board would cover distance
D light-years in years, where log(2D), if D is reasonably large. If D = 100, then
= log(200) ' 5.3 years.
3.1.2. Relativistic motion in a circle. (Cf. Woodhouse, SR, p. 111).
Suppose that a particle moves on a circle
x(t) = R cos !t, y(t) = R sin !t, z(t) = 0.
In other words,
(t) = (t, R cos !t, R sin !t, 0)
We do not claim t is proper time, and the first thing to work out is the relation between t and
. We have
d
= (1, R! sin !t, R! cos !t, 0)
dt
which has Minkowski length-squared equal to 1 R2 ! 2 . Thus
p
p
d
= (d /dt, d /dt) = 1 R2 ! 2 .
dt
37
!2R
1 R2 ! 2
R2 ! 2 ).
(3.2.1)
Suppose we have k particles with 4-momenta P1 , . . . , Pk which interact (for example in the
LHC) and after the interaction there are m outgoing particles with momenta Q1 , . . . , Qm . The
basic assumption is the conservation of total 4-momentum
Q 1 + + Q m = P1 + + Pk .
(3.2.2)
If Alice is an inertial observer with 4-velocity U , and we have a particle of rest-mass m and
with 4-velocity P = mV , then Alice can look at the spatial and temporal parts of P . If she
measures the velocity of the particle in her rest frame as v, then
P = m (v)(1, v)
where as usual
(v) = p
, v = |v|.
1 v2
Expanding this using the binomial expansion for small v,
(v) = 1 +
(3.2.3)
(3.2.4)
v2
+ O(v 4 )
2
we see that
P ' m(1 + v 2 /2)(1, v) = (m + mv 2 /2, mv) + O(v 3 ).
Now the term mv 2 /2 appearing here is the classical kinetic energy of a particle of mass m and
moving at speed v. The spatial component mv is just the classical momentum.
Einsteins conclusion from these considerations was the equivalence of mass and energy:
that a particle of rest-mass m should have total energy mc2 : an observer with 4-velocity U
assigns total energy (U, P ) to a particle with 4-momentum P . The very surprising conclusion
is that a free particle of mass m has to be assigned energy mc2 by an inertial observer for whom
the particle is at rest.
38
(3.2.6)
(3.2.7)
(3.2.8)
which yields
2m2 + 2(P1 , P2 ) = 2m2 + 2(Q1 , Q2 )
(3.2.9)
2
since (P1 , P1 ) = (P2 , P2 ) = (Q1 , Q1 ) = (Q2 , Q2 ) = m . Using (3.2.6) to compute the
cross-terms in (3.2.9), we get
m2 (u) = m2 (v) (w)(1
v w).
(3.2.10)
This is useful because v w = vw cos , and cos is what we are looking for. On the other
hand, we need to eliminate (u). This can be done by taking the scalar product with P2 , or, in
down-to-earth terms, just by inspecting the temporal component of the conservation equation
(3.2.7), which gives
(u) + 1 = (v) + (w)
(3.2.11)
Combining this with (3.2.10) gives
1
vw cos =
(v) + (w) 1
(v) (w)
(3.2.12)
so
(v) + (w) 1
(v) (w)
(v) (w)
(v)
(w) + 1
(v) (w)
( (v) 1)( (w) 1)
(v) (w)
(1 1/ (v))(1 1/ (w)).
(3.2.13)
p
p
2 (v)
v/ 1 v 2 =
1, so a nice way of writing this
vw cos = 1
vw cos =
=
=
This is the result. Notice that v (v) =
is
cos =
(3.2.14)
39
(3.3.2)
Example 3.3.1. A photon with frequency ! collides with an electron at rest in an inertial
frame. After the collision, the frequency of the electron is ! 0 . Obtain a relation between the
scattering angle of the photon, the frequencies and the rest-mass of the electron.
The initial momenta (in the electrons rest frame) are
P1 = ~!(1, e), P2 = (m, 0)
(3.3.3)
(3.3.4)
P1 + P2 = Q1 + Q2 .
(3.3.5)
e0
Since we want to know cos , we want to square (3.3.5) in such a way as to get e as a cross
term. For this purpose we rearrange it so that both photon momenta are on the same side of
the equation:
P1 Q 1 = Q 2 P2 ,
(3.3.6)
which gives
(P1
Q1 , P1
Q1 ) =
2(Q1 , P1 ) = (Q2
P2 ) = 2m2
P2 , Q2
2m2 (v).
(3.3.7)
Simplifying,
~2 !! 0 (1 cos ) = m2 ( (v)
Looking at the temporal component of (3.3.5), we find
so finally
~! + m = m (v) + ~! 0 ) m( (v)
~!! 0 (1
cos ) = m(!
1).
1)) = ~(!
!0)
(3.3.8)
!0)
(3.3.9)
(3.3.10)
CHAPTER 4
Multivariable calculus
4.1. Smooth functions and changes of coordinates
4.1.1. Smooth functions. Let be an open set of Rn . Open means that is a possibly
infinite union of open balls. The open balls of Rn are all of the form
B(a, r) = {x 2 Rn such that |x
a| < r}
(4.1.1)
where a is any point of Rn and r > 0. It is the strict inequality in (4.1.1) that makes B(a, r)
an open ball.
Let f : ! R be a real-valued function in . I assume you know what partial derivatives
are.
Definition 4.1.1. A function f is smooth, also written C 1 , if all partial derivatives, of any
order, exist. That is, for any non-negative integers, 1 , . . . , n ,
@ 1
@ n
f (p)
(4.1.2)
@x1
@xn
exists for every point p of . The set of all functions which are smooth in is denoted by
C 1 (). Then C 1 () is an infinite-dimensional vector space.
Remark 4.1.2. The order of the partial derivative in (4.1.2) is = 1 + + n .
Recall that for smooth functions, partial dierentiation with respect to dierent variables
commutes in the sense that
@ @
@ @
f (x) =
f (x) =
(4.1.3)
i
j
@x @x
@xj @xi
for all 1 6 i, j 6 n.
Remark 4.1.3. Smooth functions model scalar physical quantities such as density, pressure, charge-density, temperature,...
4.1.2. Changes of coordinates. We have tacitly taken the (x1 , . . . , xn ) to be standard
linear coordinates in Rn (i.e. associated by the standard basis of Rn ). It is fairly clear that the
idea of smoothness of functions should be independent of the choice of such linear coordinates,
but here we are going to take things further by considering more general coordinate systems.
As far as GR goes, this is a requirement of trying to make a theory that is generally covariant
(i.e. transforms predictably under general changes of coordinates).
So, what are changes of coordinates?
Definition 4.1.4. A change of coordinates written in compact form x = x(y) is a collection
of smooth functions
x1
x1 (y 1 , . . . , y n )
x2 = x2 (y 1 , . . . , y n )
xn = xn (y 1 , . . . , y n )
with the further properties:
x = x(y) gives a 1:1 correspondence between points x 2 and points y in some other
open set 0 ;
41
42
4. MULTIVARIABLE CALCULUS
(4.1.4)
is invertible if the matrix L is invertible. (In this case, the Jacobian of the transformation is L.)
Example 4.1.6. Plane polar coordinates.
x = r cos , y = r sin .
(To make this look like the above, write it as
x1 = y 1 cos y 2 , x2 = y 1 sin y 2 ,
i.e. (x, y) = (x1 , x2 ), (r, ) = (y 1 , y 2 ).) The Jacobian is
xr x
cos
r sin
=
y r y
sin r cos
This gives a change of coordinates between
and
Remark 4.1.7. Make sure you understand why we have to restrict the values of (y 1 , y 2 ) to
get a change of coordinates.
We can think of a change of coordinates more actively as follows. Given a function f in
C 1 (), we get a new function fe 2 C 1 (0 ) by the formula
Similarly, given ge 2
C 1 (0 )
fe(y) = f (x(y))
(4.1.5)
C 1 ()
by the formula
g(x) = ge(y(x)).
(4.1.6)
Remark 4.1.8. The fancy terminology for this is pull-back: fe is obtained from f by pulling
back by the change-of-coordinates map 0 ! . If you find this helpful (because youve seen it
elswhere) fine. If you havent, dont worry.
The fact that fe is smooth if f is smooth follows from the chain rule:
n
X @xi @
@ e
f
(y)
=
f (x) where x = x(y),
@y j
@y j @xi
(4.1.7)
i=1
which I assume youve seen before and are happy with. The matrix whose components are
(@xj /@y i ) entering in (4.1.7) is called the Jacobian of the coordinate transformation x = x(y).
The Jacobian matrix (@y j /@xi ) of the inverse transformation y = y(x) is the inverse of
the Jacobian matrix (@xj /@y i ). As an exercise, you can verify this by using the chain rule to
dierentiate the equation
xj (y(x)) = xj for j = 1, . . . , n
(4.1.8)
which is true by definition of the transformations being inverse to each other. In particular for
a change of coordinates, the Jacobian matrix must be invertible everywhere.
The explicit forms of these inverse relationships are
n
X
@xk @y j
j=1
@y j @xi
k
i,
n
X
@y k @xj
j=1
@xj @y i
k
i.
(4.1.9)
43
Remark 4.1.9. The inverse function theorem says that if x = x(y) is just smooth for y 2 0
and x 2 and if the Jacobian matrix if invertible at a point q, say, in 0 , then in fact x = x(y)
is invertible at least if you restrict the transformation to a small ball B 0 containing q inside 0
and its image W . That is, after restricting in this way, there is smooth y = y(x), for x 2 W
such that y(x) 2 B 0 inverting x = x(y).
Thus the inverse function is a (partial) converse to the fact that Jacobians of coordinate
transformations must be invertible.
Example 4.1.10. In the case of polar coordinates, the determinant of the Jacobian is just
r. This is invertible if and only if r 6= 0. This ties in with the fact that polar coordinates go
wrong at r = 0.
4.2. Two types of vector
Many physical quantities are vectorial, as we know. We now consider vectorial quantities
in the context of general coordinate transformations. A major subtlety is that there are two
dierent kinds of vectorial quantities and we need to be clear on the dierence between them.
4.2.1. Vector fields. A vector field in is smooth first-order dierential operator of the
form
n
X
@
V =
V j (x) j
(4.2.1)
@x
j=1
where the V
are smooth functions in . If f 2 C 1 () we obtain a new function V f 2
1
C () called the derivative of f along V ,
n
X
@f
V f (x) =
V j (x) j (x)
(4.2.2)
@x
j (x)
j=1
(4.2.3)
j=1
df =
n
X
@f
dxj
@xj
(4.2.4)
j=1
also known as the exterior derivative or dierential of f . To match up the notation between
(4.2.3) and (4.2.4), df is the covector whose components are !j = @f /@xj .
Remark 4.2.1. If ! is a covector field in it is not generally true that ! = df for some
function f . (Indeed the condition
@!j
@!i
=
j
@x
@xi
is necessary for ! = df , by (4.1.3).)
A covector field is also known as a tensor (field) of type (0, 1).
4.2.3. Dual pairing (or contraction). Given a vector field V and a covector field !, the
contraction hV, !i is a scalar function defined in terms of components by
n
X
hV, !i =
V j (x)!j (x).
(4.2.5)
j=1
The directional derivative is an example of this contraction: from the above formulae,
hV, df i = V f
(4.2.6)
44
4. MULTIVARIABLE CALCULUS
4.2.4. Transformation laws for vector fields and covector fields. Let V be a vector
field in and let x = x(y) (for y 2 0 ) be a change of coordinates with inverse y = y(x). We
get a vector field Ve in 0 as follows: Ve is supposed to dierentiate functions in 0 . We know
how to dierentiate functions in . But we can transfer a function in 0 to by change of
variables. So we define
Ve ge (y) = (V g)(x(y))
(4.2.7)
where g and ge are related as in (4.1.6).
P j
P ej
Proposition 4.2.2. If V =
V (@/@xj ) in terms of the xj in and Ve =
V (@/@y j )
j
0
in terms of the y in , then
X
@y i
Ve i =
Vj j
(4.2.8)
@x
Proof. The chain rule tells us everything: from g(x) = ge(y(x)), we get
X @y i @e
@g
g
=
.
@xj
@xj @y i
i
Vj
i g
X
@g
j @y @e
=
V
.
@xj
@xj @y i
i,j
P
But the RHS is supposed to be i Ve i (@e
g /@y i ), so the result follows by equating coefficients.
Remark 4.2.3. This is an example of covariance (as opposed to invariance). The coefficients
of a vector depend on a choice of coordinates, but they transform in a predictable and linear
way. In particular if the coefficients are all zero at a given point in one coordinate system then
they are also zero in any other coordinate system. This is as it should be: if there is no wind
at a particular point (and time) in the atmosphere, then all observers should agree on this fact,
regardless of how they choose their coordinates!
The classical formula
dxj =
suggests that if
!
ej dy j is to agree with
X
X @xj
@y i
dy i
(4.2.9)
!j dxj =
X
i,j
!j
@xj i
dy
@y i
X
j
!j
@xj
.
@y i
(4.2.10)
Note that even allowing for the dierence between upstairs and downstairs indices, (4.2.8) and
(4.2.10) are dierent transformation laws.
The rule (4.2.10) gives a way of transferring a covector field ! in to a new covector field !
e
in 0 . We already have a rule for transferring vector fields from to 0 . These are compatible
in the sense that the contraction is invariant:
Proposition 4.2.4. Let ! and V be a covector field and a vector field on and let !
e and
0
e
V be the corresponding covector field and vector field on . Then we have
hVe , !
e i = hV, !i
(4.2.11)
45
(4.2.12)
i,j
Using the fact that the Jacobians (@y i /@xj ) and (@xi /@y j ) are inverse to each other, (4.2.10) is
seen to be equivalent to
X @y i
!j =
!
ei j .
(4.2.13)
@x
i
Remark 4.2.5. It is possible to change the logic around: given the relation between V and
e
V which is natural in terms of the way vector fields are supposed to dierentiate functions, we
could have defined !
e in terms of ! so that (4.2.11) holds. This would then have implied the
transformation law (4.2.10) for the coefficients of the covector field !.
4.2.5. Tangent space and cotangent space. We shall not make great use of the following, but they are really important in a more systematic development of these ideas.
If p is a point of then Tp , the tangent space to at p, is defined to be the space of all
directional derivatives acting at p. A typical element of Tp is thus written
V =
n
X
Vj
j=1
@
@xj
(4.2.14)
p
f=
p
@f
(p)
@xj
(4.2.15)
n
X
j=1
!j dxj |p .
(4.2.16)
This is also an n-dimensional vecto space, independent of choice of coordinates. The duality
between Tp and Tp is given by
X
hV, !ip =
V j !j
(4.2.17)
as above but now producing a number rather than a function on the RHS.
4.3. The Einstein summation convention
In the above (and in the previous chapters), there are many expressions involving a summation over repeated indices, one upstairs and one downstairs. The Einstein summation convention
is to omit the symbol, so that whenever a repeated index appears in an expression it is to be
understood that you sum over the range of that index (in this case from 1 to n).
46
4. MULTIVARIABLE CALCULUS
In order for this to work, of course, it is essential that if an index is repeated then it must
not occur anywhere else in the expression, so for example
Ai B i Ci is not OK.
are matrices with
Multiple sums (unfortunately) very often occur. For instance, if L and L
i
i
and
has components
LX
i Xj = L
i Xj
L
j
j
Then
i p q
has components Li [LX]
p = Li [ L
p q
LLX
(4.3.1)
p
p q X ] = Lp Lq X .
When the summation convention is in operation, repeated indices are dummy indices in the
sense that
Ai B i = Ap B p = As B s
as each of these is equal to
A1 B 1 + A2 B 2 + + An B n .
in (4.3.1) is unpacked as
The expression for the components of LLX
n X
n
X
pq X q
Lip L
p=1 q=1
However this is an ambiguous expression because the index i has been overworked, appearing
4 times. So before putting them together we should change one of the dummy indices, writing
(say)
Y i i = Y j j.
Thus
P = X i i Y j j
is an unambigous way to write P using the summation convention.
Example 4.3.1. Write the expression
X i j Y j
47
.
If f 2 C 1 (), then we get a function of
F ( ) = f ( ( )).
(4.4.1)
This is the function f along the curve. If is the world-line of an observer and f is some
physical quantity (like pressure) then F ( ) would be the pressure measured by the observer at
dierent times along her worldline.
If the components of are (x1 ( ), . . . , xn ( )) as before, then we compute
dF
@f
= x j ( ) j
(4.4.2)
d
@x
so that defines the vector field
d
@
= x j j
(4.4.3)
d
@x
along . Here the LHS will be used as short-hand for the RHS!
In contrast to the vector fields weve considered before, this one is only defined along the
curve and not in an open set . Notice that the definition (4.4.2) is independent of any choice
of coordinates and gives a suitably invariant definition of tangent vector to the curve .
In so far as is a mapping from a subset of R into a subset of V , its derivative ( ) is a
mapping from I into V . Again, it is better to regard ( ) as being in the tangent space to ( ),
( ) 2 T ( ) .
Then ( ) is called the tangent vector to
at the point ( )
(4.5.1)
(In the mathematical literature, you will often see the notation dxi dxj on the RHS. I think
this can be fairly safely ignored in this course. is pronounced tensor, by the way. )
The transformation law for the components Bij are deduced as for covector fields: from
(4.2.9),
i
j
@x
@x
i
j
p
q
B = Bij dx dx = Bij
dy
dy
(4.5.2)
@y p
@y q
epq are the components of B in the y coordinates,
so if B
i
epq = Bij @x @x
B
(4.5.3)
@y p @y q
Note that if X and Y are vector fields and B is a tensor field of type (0, 2) we can form
! = !i dxi , !i = Bij Y j .
(4.5.4)
As a dierential geometer, I might write this as ! = B(, Y ) if I wanted to avoid using indices
and components.
Proposition 4.5.1. ! defined from B and Y as above is a covector field.
48
4. MULTIVARIABLE CALCULUS
The proof is left as an exercise: you have to write down the transformation laws and check
that the components of ! transform correctly.
We can further form the scalar
B(X, Y ) = Bij X i Y j .
(4.5.5)
As the notation suggests, this is a well-defined scalar function; its value at a point does not
depend on the coordinates used to write out the components of B, X and Y .
Remark 4.5.2. The formula (4.5.5) gives another way to think about tensors of type (0, 2).
Namely, you can reverse the logic and define such a B to be a smoothly varying bilinear form
Bp on the tangent space Tp , for each p 2 . Smoothness can be defined by saying that the
coefficients Bij are smooth functions in for any choice of coordinates x. If we do this for the
tangent space, and require B(X, Y ) to be invariant (i.e. independent of choice of coordinates),
then we say that B is a tensor of type (0, 2).
Tensors of type (0, 2) are important because the metric tensor, which is the fundamental
object in GR, is an example.
Remark 4.5.3. It is very important not to switch the order of the dxi symbols in computations of this kind. In other words, dxi dxj 6= dxj dxi . Indeed the first one is the bilinear form B
such that B(X, Y ) = 0 unless X = @i , Y = @j , whereas the second represents the bilinear form
C such that C(X, Y ) = 0 unless X = @j , Y = @i .
4.5.2. Tensor fields of type (1, 1). Suppose that for each p in , we have linear map
A(p) from Tp V to Tp V which varies smoothly with p. Such a thing is called a smooth tensor
field of type (1, 1). In coordinates, A has an expression of the form
@
A = Akj dxj k
(4.5.6)
@x
where the Akj are a collection of n2 functions of x. A can be pictured as an n n matrix whose
entries are smooth functions of x.
We obtain the transformation law under a change of coordinates by substituting
@
@y q @
@xj p
j
=
,
dx
=
dy
@y p
@xk
@xk @y q
into (4.5.6), getting
@y q @
A=
@xk @y q
Hence the transformation law is
Akj
@xj p
dy
@y p
@xj
Akj p
j
q
eq = Ak @x @y
A
p
j
@y p @xk
@y q
@y @xk
(4.5.7)
dy p
@
.
@y q
(4.5.8)
(4.5.9)
Example 4.5.4. The identity matrix is an example of a (1, 1) tensor. Its components in
any coordinate system are the Kronecker , kj .
The transformation law (4.5.9) means that if X is a vector field on then Y = AX, with
components Ajk X k is again a vector field on . This means that under coordinate transforeX
e where the relation between A
e and A is given by (4.5.9) and the
mations we have Ye = A
transformation law (4.2.8) is used for the components of the vector fields X and Y .
4.5.3. Tensor fields of type (2, 0). Weve seen tensors with two downstairs indices and
one up and one down. The zoo of two-index tensors is completed by the ones with two upstairs
indices.
We give the transformation law first:
Definition 4.5.5. A tensor field H of type (2, 0) is an object whose components H ij after
a choice of coordinates transform according to the rule
p
q
e pq = H ij @y @y .
H
(4.5.10)
@xi @xj
49
i1 ...is
s
@y p1
@y ps @xj1
@xjr
(4.6.2)
50
4. MULTIVARIABLE CALCULUS
...jr
Tij11...i
Sk11...kq
s
If T is of type (r, s), then picking a pair of indices, one up and one down, we have a
contraction of T , a tensor of type (r 1, s 1). For example if we pick the first indices
upstairs and downstairs, we get
...jr
Tiiij22...i
.
s
Recall that by the summation convention, this is actually a sum over the index i.
Contraction of a dierent pair of indices will generally give a dierent tensor.
We note that the tensor product is distributive over addition.
4.7. Manifolds
A manifold is, roughly speaking, a topological space M , with the additional structure necessary to be able to speak of smooth functions from M to R. This additonal structure is called
a smooth atlas and consists of systems of local coordinates satisfying certain compatibility conditions. A function M ! R is then called smooth if it is smooth when written in terms of any
of these local coordinate systems.
We are not going to get into the details of what a topological space is: it is a set of points
with enough structure (open sets) to be able to define continuous functions.
As in the definition of curvilinear coordinates on an open subset of Rn , suppose we have a
set of n continuous functions x(p) = (x1 (p), . . . , xn (p)) from U M with image some open set
of Rn .
Definition 4.7.1. The functions (x1 , . . . , xn ) from U to form a local coordinate system
on M if the map U ! is one-one and onto, and if the inverse is continuous.
Thus every point p of U gets labelled by an ordered set of n real numbers which were
calling the coordinates of p, and conversely if this set of labels is taken from , then it is the
label of one and only one point of U .
Definition 4.7.2. If p0 2 U is a given point, we say the coordinate system is centred at p0
if xj (p0 ) = 0 for all j.
Definition 4.7.3. An atlas on M is a collection of local coordinate systems x : U ! 0 ,
where the open sets U cover M .
In this definition the subscript does not refer to the dierent components of the coordinate
system, but rather to the dierent local coordinate systems needed to cover M .
Remark 4.7.4. The individual local coordinate systems x : U ! are often called
charts: thus an atlas is a set of a lot of charts (which made a lot more sense before everyone
was using GPS to find their way around).
4.7. MANIFOLDS
51
! 0 , x (p) = x (p).
(4.7.1)
52
4. MULTIVARIABLE CALCULUS
The previous example generalises to functions of any number of variables, but the condition
on the non-vanishing of at least one of the partials at every point on the level-set f = c continues
to be essential.
It is interesting to note that the null cone, defined by
t2
x2
y2
z2 = 0
exactly fails to satisfy this condition on the partials at the origin: all partials vanish there as
well.
4.7.1. The tangent space. Let M be a smooth manifold. For each point p in M , we
can use the local coordinates defined near p to define the tangent space. We can either say
that it is the abstract vector space spanned by the partials corresponding to any choice of
local coordinates from the atlas or we can say that it is the space of directional derivatives at
p (and then show that this space is an n-dimensional vector space). Either way Tp M is an
n-dimensional vector space naturally associated with the point p.
We can now define a vector field X on M as a function which assigns to each p in M , a
vector Xp in Tp M , which is required to vary smoothly with p. As in the case of open subset of
Rn varying smoothly with p means: when expanded as a linear combination of the @/@xj , the
coefficients are smooth (in the domain of the coordinate system).
4.7.2. The cotangent space. This is the dual to the tangent space. If f is a smooth
function on M , then df is a smooth covector field on M : at each point it is in the dual space
Tp M and varies smoothly with p.
If X is a vector field and f is a function, then Xf is the directional derivative of f with
respect to X. It is again a smooth function on M . It can also be written hX, df i, where h, i
is the pairing between T M and T M .
4.7.3. General tensors. Taking it further, we can extend the idea of tensor field of any
type (r, s) to a manifold M , using the low-tech definition above: in any chart a tensor is given
by a collection of components, and these are required to transform according to (4.6.2) where
x = x(y) is any of the change-of-coordinates map arising from a smooth atlas on M .
4.7.4. Other smooth gadgets. With the aid of a smooth atlas we can define more than
just smooth function on M . For example, a smooth curve on M is defined as a continuous
map : I ! M (I is an interval) with the property that the corresponding maps : I ! ,
are all smooth, where
( ) = x ( ( )) if ( ) 2 U .
Similarly (Ill omit the details) if M and M 0 are two manifolds, and F : M ! M 0 is a mapping,
we can define what it means for F to be a smooth map between manifolds. The idea is that we
can look at F using the charts on M and M 0 and define F to be smooth if and only if all these
functions are smooth. The interested reader is referred to any standard introductory book on
dierential geometry.
Remark 4.7.13. The formalism of general relativity works most naturally on the assumption
that space-time is a smooth 4-dimensional manifold. This is particularly important when trying
to understand black holes and the large-scale structure of the universe. For the purposes of this
course we shall mostly work with space-times that are subsets of R4 : but we shall need to work
as if it were a manifold, in other words, without assigning any privileged role to the standard
flat coordinates on R4 .
CHAPTER 5
(5.1.1)
Here the summation convention is in force and the components gab of g with respect to the
coordinates xa form a 4 4 symmetric matrix whose entries are smooth functions. To say
the metric is lorentzian is to say that at any point x, the matrix (gab (x)) is invertible and has
signature (+, , , ).
53
54
c
a.
(5.1.2)
= dr2 + r2 d2 .
Note that the coefficient of the cross-term dr d + d dr is zero.
Example 5.1.6. If
ds2 = dr2 + r2 d2 ,
then writing x1 = r, x2 = ,
g11 = 1, g12 = g21 = 0, g22 = r2 .
Hence
g 11 = 1, g 12 = g 21 = 0, g 22 =
This is because
1 0
gij =
0 r2
ij
and g =
1
.
r2
1 0
0 r 2
55
Example 5.1.9. Spherical polars 3-dimensional spherical polar coordinates are given by
x = r sin cos ', y = r sin sin ', x = r cos .
(5.1.3)
(5.1.4)
Thus, very near any given point p of M , the geometry of M is approximately the same as
Minkowski space.
Remark 5.1.11. We shall do better than this: in 5.5, we shall see that we can choose
coordinates so that
geab (e
x) = ab + O(|e
x| 2 )
for small x
e. Such a choice of coordinates will be called local inertial coordinates. They give the
best approximating Minkowski space at the point with coordinates x
e = 0.
Proof. It is sufficient to make a change of coordinates which is linear:
xa = Jba y b .
Then by the transformation law for tensors of type (0, 2),
geab = gpq Jap Jbq
ge = J t gJ.
By the basic theorem about diagonalization of symmetric bilinear forms, K can be chosen to
make ge(0) diagonal, with diagonal entries 1. The signs are determined by the signature of g.
If the latter is Lorentzian, this yields the Minkowski metric = diag(1, 1, 1, 1).
5.1.2. Timelike/spacelike/null.
Definition 5.1.12. A tangent vector X = X a @a is called timelike at p in M if
g(X, X)(p) = gab X a X b |p > 0,
null if
g(X, X)(p) = gab X a X b |p = 0
and spacelike if
g(X, X)(p) = gab X a X b |p < 0.
The set of null vectors at a point p form a cone (in Tp M ) which are supposed to be tangent
to photon worldlines through p.
56
(5.2.1)
Similarly, a parameterized curve (t) is called null if its tangent vector is null for every value of
the parameter t,
g( , ) = 0.
(5.2.2)
As in SR, where (M , g) reduces to (M, ), massive particles follow timelike curves in M : this
corresponds to the speed of the particle being everywhere less than the speed of light. Similarly
photons follow null curves. Also as in Minkowski space, these curves are called worldlines.
Hypothesis 5.2.2. A timelike curve
g
d d
,
= 1.
d d
(5.2.3)
Then is the time that would be shown on a clock with worldline ( ). More precisely, if Alices
worldline is ( ) and p = (1 ) and q = (2 ) are two events on her worldline, then her clock
will show an elapsed time 2 1 between these two events if (5.2.3) holds.
Given any timelike curve, e(u), there is always a reparameterization
( ) = e(u( ))
5.3. Geodesics
In special relativity, inertial observers were taken to travel at constant speed along straight
lines in Minkowski space M. One of the definitions of straight line is a curve which minimises
the energy (cf. 1.4), amongst all those with fixed endpoints.
Using the space-time metric g on M , we can do the same thing.
Definition 5.3.1. The energy of a curve : [t0 , t1 ] ! M is defined to be
Z
1 t1
E[ ] =
g( (t), (t)) dt.
2 t0
(5.3.1)
A geodesic with end-points p and q is a curve which extremizes the energy among all curves
with (t0 ) = p, (t1 ) = q.
Hypothesis 5.3.2. In GR, freely falling particles (and free photons) follow geodesics, timelike for particles and null for photons. Here freely falling means acted upon by no force except
gravity.
Definition 5.3.3. Let xa = (x0 , x1 , x2 , x3 ) be a given coordinate system, such that the
metric coefficients are gab ,
ds2 = gab dxa dxb .
The Christoel symbols of gab are defined by the formula
1 cs
c
@s gab ) .
(5.3.2)
ab = g (@a gbs + @b gas
2
Remark 5.3.4. Note the symmetry of in its two lower indices,
c
ab
c
ba .
(5.3.3)
5.3. GEODESICS
57
Theorem 5.3.5. Let (t) be a curve in M and suppose that in some coordinate system, it is
given by t 7! xc (t). Then the EulerLagrange equations for E ( ) are equivalent to the equations
x
c +
where
c
ab
c a b
x
ab x
= 0.
(5.3.4)
Remark 5.3.6. The system of equations (5.3.4) are called the geodesic equations. They are
frequently a more convenient way of getting at the s, as we shall see in examples.
Proof. This is a calculus of variations problem with Lagrangian L(x, x)
= 12 g(x)[x,
x]
=
1
a
b
x . We have
2 gab (x)x
@L
@L
1
= gsb x b ,
= @s gab x a x b
(5.3.5)
s
s
@ x
@x
2
so
d @L
@gas a b
= gsb x
b +
x x .
(5.3.6)
s
dt @ x
@xb
Thus the EulerLagrange equations are
gsb x
b +
@gas a b
x x
@xb
1
@s gab x a x b = 0
2
1
c
cs @gas
x
+g
@s gab x a x b = 0
2
@xb
Now
@gas a b 1
x x =
2
@xb
@gas @gbs
+
@xa
@xb
x a x b
(5.3.7)
(5.3.8)
(5.3.9)
and substituting this into (5.3.8), taking into account the definition of the s, yields (5.3.4).
For the last part, the Lagrangian is homogeneous of degree 2 in the velocities, and so is
conserved along a solution curve (Proposition 1.4.4: the potential energy part is zero in the case
at hand.)
Computing the s is the first step in computing the curvature of the metric, and computing
the geodesic equations is needed to understand particle (and photon) motion in GR. We therefore
give some worked examples. In each case, the s are read o the geodesic equations (5.3.4) rather
than from the formula (5.3.2).
Example 5.3.7. Minkowski space. This is
ds2 = ab dxa dxb .
The Lagrangian is
1
L = ab x a x b
2
Then
@L
@L
= ab x b , a = 0.
a
@ x
@x
(5.3.10)
58
(5.3.12)
d x
1
d y
+ (x 2 + y 2 ) = 0;
= 0.
d x2 x3
d x2
Rearranging, these become
x 2 y 2
x y
+
= 0, y 2
= 0.
x
x
x
With x1 = x, x2 = y, these are supposed to be identical to the geodesic equations
x
x
1 +
Hence
1
11
NB the factor of 2 in
2
12
1 i j
x
ij x
= 0, x
2 +
2 i j
x
ij x
= 0.
(5.3.13)
(5.3.14)
1 1
1
1
,
= , 2 = 221 =
, others = 0.
x 22 x 12
x
coming from the symmetry i12 = i21 .
=
Example 5.3.9. The geodesics in 2D hyperbolic space Rather than tackle the secondorder equations (5.3.13) directly, it is better to work with conserved quantities. The first is the
length of the velocity vector. If we assume is arc-length (i.e. L = 1), then
x 2 + y 2 = x2 .
(5.3.15)
x
dx
=
.
y
dy
(5.3.18)
Cxdx
= dy
1 C 2 x2
p
1
C 2 x2 = C(y
y0 )
y0 ) 2 = C
,x > 0
(5.3.19)
i.e. a semicircle with diameter along the y axis. It is pleasing that as C ! 0, the radius C 1
tends to infinity and these semicircles approach the straight half-lines that we saw previously.
5.3. GEODESICS
59
sech , y0 + C
tanh u)
(5.3.20)
is a parameterizaiton by arclength. (To obtain this, you eliminate x from (5.3.16) using (5.3.19),
to get
dy
= C(C 2 (y y0 )2 )
d
and integrate this up to get y as a function of .)
Note that in this example it is relatively easy to obtain the geodesics as an implicit relation
between x and y (5.3.19), whereas finding x and y as a function of is rather more involved.
This is quite typical of the simple examples we shall see in this course.
Example 5.3.10. Minkowski space in polar coordinates. In spherical polars, the Minkowski
metric takes the form
ds2 = dt2 dr2 r2 (d2 + sin2 d'2 ).
(5.3.21)
The Lagrangian is
1 2
L=
t
r 2 r2 2 r2 sin2 ' 2
(5.3.22)
2
Then
@L
@L
@L
@L
= t,
= r,
= r2 ,
= r2 sin2 ' (5.3.23)
@ r
@ '
@ t
@
@L
@L
@L
@L
= 0,
= r(2 sin2 ' 2 ),
= r2 sin cos '2 ,
= 0.
(5.3.24)
@t
@r
@
@'
Thus the geodesic equations are
t = 0,
(5.3.25)
2
2
2
r r( + sin ' ) = 0,
(5.3.26)
2
+ r sin cos ' 2 = 0,
r
2
' + r ' + 2 cot ' = 0.
r
0
1
2
3
With x = t, x = r, x = , x = ', we read o that
1
22
1
33
r,
(5.3.27)
(5.3.28)
r sin2 ,
(5.3.29)
1
2
= ,
sin cos ,
(5.3.30)
33 =
r
1
3
3
3
3
(5.3.31)
13 = 31 = ,
23 = 32 = cot ,
r
while all others are zero. Note again that there are factors of 2 between the s and the coefficients
in the geodesic equations for the abc with b 6= c.
We will not go into finding the geodesics here as the moves you have to make were already
described in Example 1.4.5. And you are urged to review that example now! Of course this is
a bit dierent because of having the variable t as an additional coordinate in the problem. But
since t is a constant, , say, the constancy of L means that
r 2 + r2 (2 + sin2 ' 2 ) = 2L 2 2
2
12
2
21
is a constant. Also J = r2 sin2 ' is a constant and we can restrict to equatorial curves where
= /2 identically. Then we find that
r 2 + r2 ' 2 = 2L
and J = r2 '
(5.3.32)
60
(5.4.3)
A = r
(5.4.4)
c
a b
ab X X
= rX X.
(5.4.5)
Proposition 5.4.2. The curve 7! ( ) is a geodesic if and only its acceleration is zero.
This is pleasing because it is a natural generalization of what happens in Minkowski space.
In that case, the geodesics are of the form
7! X + U
where X and U are constant vectors. Then the tangent vector is U and the acceleration is zero.
And conversely any curve with zero acceleration in Minkowski space is of the above form and
extremizes the energy.
Definition 5.4.3. If ( ) is a curve in M with tangent vector = X then a vector field Y
along is said to be parallel, parallel-transported, or parallel-propagated along if
rX Y = 0 along .
(5.4.6)
Explicitly, in local coordinates, the parallel propagation equation (5.4.6) has the form
Y c +
c a b
Y
ab x
= 0.
(5.4.7)
Note that to be completely explicit, the s here are evaluated at the point xa ( ). With the
curve fixed, xa ( ) and x a ( ) are known, and so (5.4.7) is a first-order system of equations for the
unknown components Y b ( ) as a functions of . Thus given any point 0 and a given tangent
vector Z at (0 ), there is a unique solution of (5.4.7) with initial condition Y (0 ) = Z. In this
situation, we say that Y (1 ) is obtained from Z by parallel transport along .
In Minkowski space we know when two vectors at dierent points are parallel (or point in
the same direction). In a general curved space M there is no such global notion of parallelism,
and the above is the best one can do.
61
(5.5.1)
(5.5.2)
Habs ) .
(5.5.3)
(5.5.4)
(5.5.5)
Such coordinates are called local inertial coordinates at p. Note that no statement is made
about the form of the metric at other points near, but not equal to p.
The notation is motivated as follows: from the Proposition, if the metric has the form (5.5.5)
then cab = 0 with respect to these coordinates. Thus the geodesic equations
x
c +
c a b
x
ab x
=0
reduce to x
c at xa = 0, so at this point, at least, the equation is the same as for inertial worldlines
in Minkowski space. In particular freely falling particles and free photons will have worldlines
that appear straight in a very small neighbourhood of x = 0 in these special coordinates. This
is the precise sense in which GR reduces to SR over small length- and time-scales, but it only
works well relative to one of these inertial coordinate systems.
5.5.1. Proof of Theorem 5.5.2. We may suppose that
gab = ab + Habc xc
as the next term in the Taylor expansion is already of quadratic order. Consider the coordinate
transformation
1 c a b
xc = y c
G y y
(5.5.6)
2 ab
62
where the Gcab is an array of numbers to be determined, symmetric in the indices ab. The
Jacobian of the transformation is
@e
xc
= Jpc = pc + jpc , where jpc = Gcap y a .
(5.5.7)
@xp
This is invertible when y = 0 so by the inverse function theorem, (5.5.6) has a smooth inverse
as a mapping between a neighbourhood of y = 0 and a neighbourhood of x = 0.
Now we calculate, keeping only the leading terms (because we dont really care whats
happening at O(|y|2 ) and beyond):
gpq dxp dxq = gpq (x)(dy p
= gpq dy p dy q
Gpac y a dy c )(dy q
Gacq y a dy c dy q
Gqbd y b dy d )
Gbdp y b dy p dy d + O(|y|2 )
(5.5.8)
(5.5.9)
Hence
gepq (y) = gpq (x(y)) Gapq y a Gaqp y a + O(|y 2 |).
(5.5.10)
Now on the RHS we still have g(x) and we need to write this in terms of y. The inverse to our
transformation has the form
1
y a = xa + Gabc xb xc + O(|x|3 )
(5.5.11)
2
as you can see by inserting (5.5.6) on the RHS of this equation. It follows that
gpq (x(y)) = gpq (y) + O(|y|2 ) = pq + Hpqr y r + O(|y|2 ).
and so
gepq (y) = pq + Hpqa y a Gapq y a Gaqp y a + O(|y|2 ).
(5.5.12)
Thus we will get rid of the first order terms in y if we can choose the numbers G so that
Gapq + Gaqp = Hpqa .
(5.5.13)
(Here we have used the symmetry of the indices of G to neaten up the equation.) This is an
equation for the array of number G in terms of the array of numbers H. In Lemma 5.5.9 below
it is shown that this can always be solved, the solution being
1
Gabc = (Hacb + Hbca Habc ).
(5.5.14)
2
Note that this formula depends in an important way on the symmetry (5.5.9) of G. Inverting
(5.5.9) we define
1
Gdab = cd Gabc = cd (Hacb + Hbca Habc ).
2
Thus G is uniquely determined by H, and the by the above calculations, the change of coordinates (5.5.6) gives metric components geab which satisfy the conditions of the Theorem. The
proof is complete.
Remark 5.5.3. The array of numbers Gcab is nothing but cab (0), the Christoel symbols of
the metric components gab , evaluated at x = 0 (cf. Proposition 5.5.1).
It remains to prove:
Lemma 5.5.4. The equation (5.5.13) is solved by (5.5.14),
1
Gabc = (Hcab + Hcba Habc )
2
Proof. One proof of this is simply to substitute (5.5.14) into (5.5.13) and see that it works.
Namely, G is symmetric in its first two indices: simply switch them, and use the symmetry of
H in its first two indices. And then
1
Gabc + Gacb = (Hcab + Hcba Habc + Hbac + Hbca Hacb ).
(5.5.15)
2
63
Now use the symmetry of H in its first two indices to arrange the indices as far as possible in
alphabetical order:
1
Gabc + Gacb = (Hacb + Hbca Habc + Habc + Hbca Hacb ) = Hbca
2
as required.
There is also a derivation of this formula in the Problem set, Problem 4.9.
(5.5.16)
(5.6.1)
(5.6.2)
We ask: is there a change of coordinates which can get rid of the P term here? The answer is
no in general, but it is interesting to try.
It would be natural to try
1 a b c d
x
ea = xa + Wbcd
xxx
(5.6.3)
6
to change P . Note that here the array of numbers W satisfies
a
Wbcd
is totally symmetric in bcd
(5.6.4)
In Problem 4.8, you are invited to show that if gab is as in (5.6.1) and x
e and x are related
by (5.6.3), then
1
geab = ab + Peabcd x
ec x
ed + O(|e
x| 3 )
(5.6.5)
2
where
Peabcd = Pabcd Wabcd Wabdc
(5.6.6)
Now I claim that this cannot be solved in general, because W just does not have enough
parameters! For this, we need to do some counting.
5.6.1. Counting tensor components.
Definition 5.6.1. A tensor T of type (0, m) in n dimensions is said to be totally symmetric
if for the corresponding m-linear form we have
T (v1 , . . . , vi , . . . , vj , . . . , vm ) = T (v1 , . . . , vj , . . . , vi , . . . , vm )
for any i and j. In components this is the same as saying
Tp1 ...pi ...pj ...pm = Tp1 ...pj ...pi ...pm .
Proposition 5.6.2. The dimension of the vector space of all totally symmetric tensors of
type (0, m) in n dimensions is
n+m 1
.
(5.6.7)
m
Definition 5.6.3. A tensor of type (0, m) in n dimensions is totally skew (or totally skew
symmetric if the corresponding m-linear form has the property
T (v1 , . . . , vi , . . . , vj , . . . , vm ) =
for any i and j.
T (v1 , . . . , vj , . . . , vi , . . . , vm )
64
Proposition 5.6.4. The dimension of the vector space of totally skew tensors of type (0, m)
in n dimensions is
n
(5.6.8)
m
(In particular the dimension is 1 if n = m and 0 if m > n.)
Proof. We shall prove both of these together. By way of a warm-up lets make sure that
we understand that
The dimension of the space of all tensors of type (0, m) in n dimensions is nm
(5.6.9)
To see this, note that such a tensor has m indices. We can choose each one of these in n ways1.
All choices are independent because there are no symmetry conditions. So the total number of
possibilities is nm . The number of these choices is equal to the dimension of the space of these
tensors.
Let us move on to the proof of Proposition 5.6.4. Again we have a tensor with m indices.
For ease of exposition, suppose that m = 3. Any component with two indices the same must
be zero, because (for example)
T115 = T115
by switching the first two indices. So the only non-zero components of T have distinct indices.
If we have a set of 3 distinct indices, say, 523, then we can use the skew symmetry to relate the
T523 to T235 , where the indices are now in increasing order:
T523 =
T253 = T235
(switching first the first two indices and then the last two). Thus the number of independent
components of a totally skew tensor of type (0, 3) is equal to the number of unordered subsets of
3 elements of the set {1, . . . , n}. This is the binomial coefficient
n
3
The general case, with 3 replaced by m, works in the same way.
Finally let us prove Proposition 5.6.2. The big dierence from the case of skew symmetry is
that now that indices can take the same value without that component being zero. For a small
number of indices (e.g. 3) its possible to count by hand. There are n components where the
indices are the same:
T111 , T222 , . . . , Tnnn .
There are n(n 1) with precisely two indices the same:
T112 , T113 , . . . , T11n ; T221 , T223 , . . . .
And there are n(n 1)(n 2)/6 where the indices are all distinct. Thus the total number of
independent components of a totally symmetric tensor of type (0, 3) in n dimensions is:
1
n(n + 1)(n + 2)
1) + n(n 1)(n 2) =
(5.6.10)
6
6
which checks with (5.6.7) if m = 3.
This approach can be generalized to tensor of any rank, but its pretty messy. The following
is the cunning way of doing it.
Consider a collection of indices on our totally symmetric tensor with m indices. It will
consist of m1 1s, m2 2s, and so on, up to mn ns. Here the mj are allowed to be 0, but the
constraint that the tensor is of type (0, m) is
n + n(n
m1 + m2 + + mn = m
(5.6.11)
This combinatorial problem can be visualised in the following way. Consider an arrangement of
m coins and n 1 pencils in a line, as in the example below:
1Since they vary from 1 to n
65
Given such an arrangement, we count the coins to the left of the first pencil, and call that m1 .
Then we count the coins between the first and second pencils, and call that m2 . Proceeding
in this way, we get a collection of n integers mj > 0, satisfying the constraint (5.6.11). (Note
that mj = 0 if two of the pencils are right next to each other, or if there is a pencil at the very
beginning or the very end of the line.)
In the pictured configuration there are 5 pencils and 8 coins, and
m1 = 3, m2 = 2, m3 = 1, m4 = 2, m5 = 0, m6 = 0.
This would correspond to the component
T11122344
of a totally symmetric tensor of type (0, 8) in 6 dimensions. The number of arrangements of m
coins and n 1 pencils is the same as the number of ways of choosing m objects (the ones to
be called coins) from a total of m + n 1. This is the binomial coefficient (5.6.7).
5.6.2. Sneak preview of curvature, continued. Weve seen that the coordinate transformation (5.6.3) allows us to change the array Pabcd , where
Pabcd = Pbacd = Pabdc
(5.6.12)
to
Peabcd = Pabcd Wabcd Wabdc
(5.6.13)
where W is symmetric in its last 3 indices.
What is the dimension of the space of P s? P is symmetric in its first two indices and its
third and fourth indices, but there is no other symmetry. So it is like a 2-index object, PIJ ,
where I and J run over a basis of the space of symmetric 2-index tensors. If that dimension is
N , the dimension of the space of P s will be N 2 . But weve seen that N = n(n + 1)/2 (if the
dimension is nwe work generally for the moment). Hence:
In an n-dimensional manifold, the dimension of the space of P s is
1 2
n (n + 1)2
4
On the other hand, the dimension of the space of W s is n (for the extra index) times
n(n + 1)(n + 2)/6 (the dimension of totally symmetric tensors of type (0, 3).
In an n-dimensional manifold, the dimension of the space of W s is
1 2
n (n + 1)(n + 2).
6
Thus the dimension of the space of P s minus the dimension of the space of W s is
n2 (n + 1)2
4
n2 (n + 1)(n + 2)
6
=
=
=
n2 (n + 1)
(3(n + 1)
12
n2 (n + 1)
(n 1)
12
1 2 2
n (n
1)
12
2(n + 2))
(5.6.14)
Since this number if positive for n > 2, it will be impossible to solve (5.6.13) to make Pe = 0 in
general. (Our calculation shows that Pe = 0 is a system of linear equations with more equations
than unknowns.)
66
Thus, while some components of P can be killed by coordinate transformations, there are
others, in fact n2 (n2 1)/12 of them, which cannot. These unkillable components of P form
a tensor called the curvature of g at the point x = 0.
In fact, the Riemann curvature tensor at x = 0 is built out of P in the following way:
1
Rabcd = (Pacbd + Pbdac Padbc Pbcad ).
(5.6.15)
2
It can be checked that if P is changed to Pe as in (5.6.13), then the components of R do not
change! Thus if this particular combination of components of P is non-zero then it cannot be
killed by coordinate transformation.
These matters will be discussed much more extensively in the next chapter, where we shall
see a dierent, but equivalent, definition of curvature.
CHAPTER 6
c
a b
ab X Y
(6.1.1)
which we claimed form the components of a vector field rX Y the covariant derivative of Y with
respect to X. In this chapter we shall sketch a proof of this important fact and shall define the
curvature tensor. One definition of this is as follows
R(X, Y )Z = (rX rY
r Y rX
r[X,Y ] )Z
(6.1.2)
(6.1.3)
<
< .
(6.2.1)
@H
@H a @
=
@
@ @xa
(6.2.2)
Y =
@H
@H a @
=
.
@
@ @xa
(6.2.3)
and
67
68
(Here xa = H a ( , ) is the description of the family of curves with respect to the local coordinate
system xa .) Define
Z
1
E( ) =
g(X(, ), X(, )) d,
(6.2.4)
2
the energy of the curve labelled by in our family. (This is a small abuse of notation.) We
claim
Z
dE
(0) =
g(rX X, Y ) d + [gq (Xq , Yq ) gp (Xp , Yp )]
(6.2.5)
d
where the integrand is evaluated at = 0, q = H( , 0), p = H(, 0) and the subscripts on the
terms in square brackets denote evaluation of the quantities at the indicated points.
If you grant me (6.2.5) for the moment, then the lemma follows rapidly. The issue is the
coordinate independence, and the term in square brackets is certainly invariant under coordinate
changes, as is the LHS of (6.2.5). It follows that for any fixed curve and vector fields along the
curve, the integral in (6.2.5) is also coordinate independent.
Now letting and vary, we conclude that the integrand g(rX X, Y ) must be a scalar
quantity, and since Y is a vector field, rX X must also be a vector field.
So it remains to prove (6.2.5). This involves going through the calculus of variations with
the boundary term.
Z
1
E( ) =
g(@H(, )/@t, @H(, )/@t), d.
(6.2.6)
2
Dierentiate with respect to to get
Z
dE
@ 2 H @H
=
g(
,
) d.
(6.2.7)
d
@ @ @
The usual calculus-of-variations trick is to integrate by parts here. For this, note
@ @H @H
@ 2 H @H
@H @ 2 H
g(
,
) = (X c @c gab )Y a X b + g(
,
) + g(
,
)
@ @
@
@ @ @
@
@ 2
@ 2 H @H
= g(
,
) + g(rX X, Y ),
@ @ @
using the definition of the s. Integration from to yields (6.2.5).
(6.2.8)
(6.2.9)
rY X
(6.2.10)
Proof. We have
(rX Y )c = X a @a Y c +
Hence the components of (6.2.10) are
X a @a Y c
c
ab
c .
ba
c
a b
ab X Y .
Y a @a X c ,
(6.2.11)
(6.2.12)
again using
=
So it is sufficient to check that this combination of components transforms
as a vector field. Under a change of coordinates, Now if we change
xa b e a @e
xa b
e a = @e
X
X
,
Y
=
Y .
(6.2.13)
@xb
@xb
Hence
2 eb
xb
@e
xb a
c
a c @ x
e a @ Ye b = X a @a (Y c @e
X
)
=
X
@
Y
+
X
Y
a
@e
xa
@xc
@xc
@xc @xa
If we swith X and Y and subtract, the second term on the RHS drops out because of the
symmetry of the mixed partials of x
eb with respect to the x variables. Thus we are left with the
transformation law
xb a
e a @a Ye b Ye a @a X
e b = @e
X
(X @a Y c Y a @a X c ).
(6.2.14)
@xc
as required for a vector field.
69
Now we can complete our proof. We know that rX X is a vector field for every vector field
X. Replacing X by X + Y , we conclude that
rX X + r X Y + rY X + r Y Y
is a vector field for every pair of vector fields X, Y . Hence rX Y + rY X is a vector field for
every X and Y . But so is the dierence, by the above lemma. Now
1
rX Y = (rX Y + rY X + [X, Y ])
(6.2.15)
2
and we have written rX Y as a sum of two things that we know are vector fields. So it must
itself be a vector field.
Remark 6.2.3. There is a direct, computational proof, in Woodhouse, GR, 4.5.
6.2.1. The Lie bracket.
Definition 6.2.4. The quantity rX Y
bracket of the vector fields X and Y .
We proved in Lemma 6.2.2 that [X, Y ] is a vector field by a local coordinate calculation. In
this section we give a more conceptual proof of the same fact. The formula (6.2.12) shows that
the Lie bracket depends only on the vector fields X and Y and not on the metric g. That is,
we can define an operator on functions f :
[X, Y ]f = X(Y f )
Y (Xf )
(6.2.16)
Proof. Choose local coordinates xj such that xj = 0 corresponds to the point p. We aim
to show that there is a set of numbers T j such that
@f
T [f ]p = T j j (0)
@x
Define
X j = T xj .
By linearity T annihilates constants, for
T [1] = T [12 ] = 2T [1] by the Leibniz rule.
Hence T [1] = 0 and so T [c] = 0 for any constant.
Now let f be any smooth function. We can write
f (x) = f (0) + xj @j f (0) + O(|x|2 ).
Applying T , we see
T f (0) = X j (0)@j f (0)
because, by the Leibniz rule, T applied to a smooth function vanishing to order 2 or more must
vanish.
Now we have the following slick proof that [X, Y ] is a vector field if X and Y are vector
fields.
Proposition 6.2.6. If X and Y are vector fields, then so is [X, Y ].
70
. We calculate
Similarly,
Y X(' ) = (Y )(X') + 'Y X + (Y ')(X ) + Y (X').
Subtracting, we obtain
[X, Y ](' ) = '[X, Y ] + [X, Y ]'.
Example 6.2.7. If we pick coordinates xa then we have the corresponding (locally defined)
vector fields @0 , @1 , @2 , @3 . We have
[@a , @b ] = 0.
(6.2.17)
6.2.2. The covariant dierential. For functions we had df . This was a covector field.
For vector fields Z we can consider rZ. This is a (1, 1) tensor. We denote it in indices by
ra Z c . The covariant directional derivative with respect to X is then the contraction X a ra Z c .
In index form,
ra Z c = @a Z c + cab Z b .
6.3. Extension to all tensors
We already know how to dierentiate functions with respect to vector fields. From now on
we also denote Xf by rX f . We now have the covariant derivative of vector fields as well, rX Y .
There is now a natural way also to dierentiate covector fields, which respects the natural dual
pairing between vectors and covectors.
Definition 6.3.1. If is a covector (tensor of type (0, 1)), then we define r so that
for all vector fields Y .
hrX , Y i + h, rX Y i = Xh, Y i
ra Y b = @ a Y b +
we get
(ra b )Y b + b (@a Y b +
Hence we require, for all Y b ,
b
c
ac Y )
(ra b )Y b = (@a b
b
c
ac Y .
= (@a b )Y b + b @a Y b .
c
b
ab c )Y .
6.4. PROPERTIES
71
(2) rX is real-linear:
r( T + S) = rT + rS
for any two tensor of type (r, s) and real numbers and .
(3) r satisfies the Leibniz rule:
rX (T S) = (rX T ) S + T (rX S)
Remark 6.3.5. I omit the proof, but refer you to the problem set fo related exercises.
It is a pain to write out the general case, but here are some examples:
ra Tbc = @a Tbc
ra Acb = @a Acb
ra P bc = @a P bc +
s
s
ac Tbs ;
ab Tsc
s
c
c
s
ab As + as Ab ;
b
sc
+ cas P bs
as P
(6.3.2)
(6.3.3)
(6.3.4)
6.4. Properties
The covariant derivative operator we have been discussing has the following further properties:
Symmetry or Torsion-free:
for all functions f
ra rb f = rb ra f
(6.4.1)
ra gbc = 0.
(6.4.2)
Metric preservation
There are more general dierentiation operators which satisfy the Leibniz-rule type properties of the previous section. These are called connections. Given a metric, there is a unique
such connection which satisfies the two boxed properties here. This is also called the metric
connection or the Levi-Civita connection after Tullio Levi-Civita (18731941).
Proof. (Of boxed properties) We have
ra rb f = @ a @ b f
c
ab @c f.
(6.4.3)
c
ab
The metric preservation property has important consequences. Recall that the metric g can
be used to lower indices and that its inverse can be used to raise indices. Because rg = 0 it
follows that rg 1 = 0, and raising and lowering indices commutes with dierentiation.
For example,
ra (g bs s ) = ra b = g bs ra s
(6.4.5)
The middle thing is the covariant derivative of the index-raised version of ; the right-hand
thing is the derivative of , with its index raised afterwards.
72
Example 6.4.1. If Y is parallel propagated along a curve , then its length is constant.
For if = X, we have rX Y = 0. Then
rX (g(Y, Y )) = 2g(Y, rX Y ) = 0
More explicitly,
rX (Y a Ya ) = (rX Y a )Ya + Y a rX Ya
(ra rb
(6.5.1)
Definition 6.5.2. The tensor R with components Rabc d is called the Riemann curvature
tensor (or just curvature tensor) for short.
Proof. We prove first the following formula:
Rabc d = @a
d
bc
@b
d
ac
d p
ap bc
d p
bp ac
(6.5.2)
Although it is important, and you need to know that it exists, I DO NOT RECOMMEND
THAT YOU COMMIT THIS TO MEMORY.
Recall that if Tb c is a (1, 1) tensor, then
ra Tb d = @ a Tb d +
d
s
as Tb
s
d
ab Ts .
(6.5.3)
We are going to apply this with Tb d = rb X d and then subtract the corresponding equation
with a and b switched. In fact if we do this first, we get
s
ab
s .
ba
ra Tb d
r b Ta d = @ a Tb d +
d
s
as Tb
@ b Ta d
d
s
bs Ta
(6.5.4)
because
=
Now put Tbd = rb X d and expand rX = @X + X. The first two terms on the RHS of
(6.5.4) are
@a (@b X d +
d
s
bs X )
d
s
as (@b X
s
p
bp X )
d
s
d s
p
bs X + as bp X
@a @b X d + dbs @a X s +
= @a
+
(6.5.5)
d
s
as @b X .
(6.5.6)
We have rearranged the terms in this order because those on line (6.5.6) are symmetric in ab.
Hence these disappear when we subtract the corresponding expression with a and b switched.
Thus we get
Rabc d X c = (ra rb
rb ra )X d = [@a
d
bc
@b
d
ac
d p
ap bc
d p
c
bp ac ]X
(6.5.7)
@xa
,
@e
xb
(6.5.8)
It follows that
epqr s X
e r = Jpa Jqb (J
R
1 s
)d Rabc d X c ,
(6.5.9)
6.5. CURVATURE
73
where J is the Jacobian of the transformation. But we are assuming that X is a vector field as
well, so
e r.
X c = Jrc X
epqr s X
e r = Jpa Jqb Jrc (J
R
e r . Hence
for any components X
1 s
e r,
)d Rabc d X
(6.5.10)
1 s
)d Rabc d
(6.5.11)
which is the transformation law for a tensor field of type (1, 3).
Remark 6.5.3. Cf. 5.6 of Woodhouse, GR.
Example 6.5.4. Curvature of hyperbolic space.
In the previous chapter, we considered the 2-dimensional hyperbolic metric
dx2 + dy 2
.
x2
ds2 =
1
,
x
1
22
1
,
x
2
12
2
21
1
, others = 0.
x
(6.5.12)
2
21
@2
2
11
2 p
1p 21
2 p
2p 11
(6.5.13)
There is quite a bit of simplification because many of the s are zero in this case:
R121 2 = @1
2
21
2 2
12 21
2 1
21 11
(6.5.14)
)+x
=x
(6.5.15)
It turns out that all other components of the curvature are either this one, or are zero. In
particular if either 1 or 2 is repeated three or more times, then the corresponding curvature
component is zero. If 1 and 2 appear precisely twice each, then the corresponding component
is x 2 . From (6.5.15) we also have
R1212 = x
because g22 = x
2.
Example 6.5.5. Curvature of flat space in polar coordinates. In 2D polar coordinates, the
s are:
1
1
2
2
(6.5.16)
22 = r, 12 = 21 = ,
r
all others zero ( r = x1 , = x2 ). This time, (6.5.13) reduces to
R121 2 = @1
2
12
2 2
12 21
1
1
+ 2 = 0.
2
r
r
This illustrates the fact that curvature is a tensor: we knew this result ahead of time, because
it is clear that the curvature of a flat metric is zero, and weve just changed coordinates.
74
r Y rX
r[X,Y ] )Z
(6.5.17)
rb ra )Z
(6.5.18)
and comparing with the previous section, the RHS is Rabc d Z c in components. Thus the component version of the curvature tensor arises from this definition by taking X = @a and Y = @b .
We shall show that the operation
(X, Y, Z) 7 ! (rX rY
r Y rX
r[X,Y ] )Z
(6.5.19)
r Y rX
r Y rX
r[X,Y ] )Z
(6.5.20)
r Y rX
r[X,Y ] )Z.
(6.5.21)
and
(rf X rY
r Y rf X
for any smooth function f . We then prove a technical lemma which explains why being C 1 linear implies that the operation (6.5.19) is then given by a tensor
(X, Y, Z) 7! R(X, Y )Z or Rabc d X a Y b Z c in components.
(6.5.22)
rY rX ](f Z) = f [rX rY
rY rX ]Z + (XY f
Y Xf ) Z.
But
r[X,Y ] (f Z) = ([X, Y ]f )Z + f r[X,Y ] Z
and so subtracting this from each side gives the C 1 -linearity of Z 7! R(X, Y )Z for fixed X and
Y.
Remark 6.5.8. Note that all we have used in this proof is that directional derivatives satisfy
the Leibniz rule.
6.5.4. Proof of (6.5.21). This is left as an exercise for the reader. It is somewhat shorter
than the proof of (6.5.20). You will need to the formula
[X, f Y ]u = (Xf )(Y u) + f [X, Y ]u
which is also a good exercise.
(6.5.23)
6.5. CURVATURE
75
r Y rX
r[X,Y ] )Z
is certainly a dierential operator of order 2 in Z and order 1 in X and Y . The next lemma
looks complicated but it just says that a dierential operator of order 2 which is also C 1 -linear
must actually be algebraic, i.e. given by multiplication by multiplication by a tensor field.
Lemma 6.5.9. Let P be a second-order dierential operator which maps vector fields to vector
fields. Suppose further that P is C 1 -linear, that is
P (f Z) = f P (Z)
(6.5.24)
for any smooth function f . Then in fact P is given by a (1, 1) tensor in the sense that
P (Z)c = Pac Z a .
(6.5.25)
a
Abcd
a (@b f @c Z
+ @ c f @b Z ) +
(6.5.26)
Bacd @c f Z a .
(6.5.27)
Because P (f Z) = f P Z, we obtain
a
a
a
cd
a
Abcd
a (@b @c f Z + @b f @c Z + @c f @b Z ) + Ba @c f Z = 0
(6.5.28)
for any f and Z. If we pick f to be a product xp xq of two coordinate functions, substitute into
this and then set x = 0, the only surviving term is
p q
a
pqd a
Abcd
a (@b @c (x x ))Z = 2Aa Z .
and this is supposed to vanish. This being true for all vectors Z, we conclude that Apqd
a vanishes
at x = 0. This point was arbitrary, so Apqd
vanishes everywhere, and (6.5.28) now reads
a
Bacd @c f Z a = 0
(6.5.29)
for all f and Z. We apply the same argument with f = xp and this gives Bapd = 0 at x = 0.
Again the point was arbitrary, so it follows that B vanishes everywhere. Hence P (f Z) = f P Z
implies that (P Z)d = Cad Z a , as required.
Applying this Lemma to our map (6.5.19), first as an operator on Z, with (X, Y ) fixed, then
in X, with (Y, Z) fixed, we see that R(X, Y )Z is indeed given by Rabc d X a Y b Z c where Rabc d are
the components of a tensor of type (1, 3).
Remark 6.5.10. This lemma is a bit fiddly to prove, but the idea that a dierential operator
which is also C 1 -linear has to be given algebraically (by multiplication by a tensor) is a powerful
one in dierential geometry which saves a lot of computation in local coordinates.
76
(6.6.1)
(6.6.2)
Proof. This follows because in such coordinates all first derivatives of gbc and g bc vanish
at x = 0, so the s vanish at x = 0. Hence the quadratic terms in (6.5.2) vanish at p, giving
Rabc d = @a
and
@a
d
bc
so
d
bc
@b
d
ac
1
= g ds (@a @b gcs + @a @c gbs
2
1
(@a @b gcd + @a @c gbd
2
Formula (6.6.2) follows from this and (6.6.3).
gds @a
s
bc
at x = 0
(6.6.3)
@a @s gbc ) at x = 0,
(6.6.4)
@a @d gbc ) at x = 0.
(6.6.5)
(1 + t2 )dx2 .
(6.6.6)
1.
(6.6.7)
Remark 6.6.4. There is a converse statement which we shall not prove: if R = 0 in a small
neighbourhood of a point p, then there are local coordinates centred at p with respect to which
gab = ab .
6.6.1. Commutators on tensors of higher rank. Recall that the covariant derivative
on vectors has been extended to act on all tensors in a way compatible with the basic algebraic
operations on tensors. This implies that when the commutator ra rb rb ra is applied to any
tensor, the result can also be expressed in terms of the algebraic operation of the curvature on
the tensor.
There are good ways to derive and remember these results and bad ways to do it. The worst
way is to work directly with the s.
Proposition 6.6.5. If is a covector, then
(ra rb
rb ra )c =
Rabc d d .
(6.6.8)
Proof. One proof is to use the fact that the ra preserves the metric and a symmetry of
the curvature tensor proved below. If b = g ab a , then we have
If we lower the index, we get
(ra rb
rb ra )d = Rabc d c .
(6.6.9)
(ra rb
rb ra )d = Rabcd c
(6.6.10)
77
(ra rb
rb ra )d =
Rabdc c =
Rabd c c
(6.6.11)
A second proof runs as follows. Let X be any vector. Then a X a is a function and we know
rb ra )(d X d ) = (@a @b
(ra rb
@b @a )(d X d ) = 0.
(6.6.12)
(6.6.13)
The terms in the first derivatives of and X are symmetric in ab, so subtracting the corresponding expression with a and b switched, we obtain
0 = (ra rb
rb ra )(d X d ) = X d (ra rb
rb ra )d + d (ra rb
rb ra )X d ,
(6.6.14)
where in the last equation we have switched the two dummy indices c and d so as to have X d
on each side. Hence
X d (ra rb
d Rabc d X c =
rb ra )d =
X d Rabd c c .
(6.6.15)
We now make the usual argument that as X is arbitrary, this equation implies the result.
(ra rb
(ra rb
rb ra )T cd = Rabs c T sd + Rabs d T cs ,
rb ra )Adc
rb ra )Scd
=
=
Rabc Ads +
Rabc s Ssd
Rabs Asc ,
Rabd s Scs .
(6.6.16)
(6.6.17)
(6.6.18)
For a tensor T of type (r, s) the structure of the formula on the RHS will be a sum of r + s
all of the form RT ; there will be r terms with + signs, corresponding to the upstairs indices of
T and s terms with signs, corresponding to the lower indices of T .
6.6.2. Symmetries of the curvature tensor. The formula for the curvature at a point
in Proposition 6.6.1 allows us to write down the general symmetry properties of Rabcd .
Theorem 6.6.6. For any metric, the curvature tensor has the following symmetries:
Rabcd =
Rbacd , Rabcd =
Rabdc ;
(6.6.19)
(6.6.20)
Rabcd = Rcdab .
(6.6.21)
Proof. All follow by inspection of the formula in Proposition 6.6.1 and the fact that
@a @b f = @b @a f for any smooth function f .
1)(n2
n + 2).
(6.6.23)
78
It turns out that (6.6.20) is only independent of the other two if all four indices are distinct.
So this imposes another n-choose-4 conditions on the components. Hence the number of independent components is
1
n(n
8
1)(n2
n(n
n + 2)
1)(n 2)(n
24
1
n(n 1) 3n2
24
1 2 2
n (n
1).
12
=
=
3)
3n + 6
(n
2)(n
3)
(6.6.24)
(6.6.25)
Proof. A very bad way to try to do this would be from the explicit formula for R in terms
of the s. Instead we use the definition
(rb rc
rc rb )d =
ra rc rb )d =
Rbcd s s .
ra Rbcd s s
(6.6.26)
Rbcd s ra s
(6.6.27)
Now skew symmetrise on abc, in other words, add this expression to what you get by cyclically
permuting the indices a, b and c. On the LHS we have
ra (rb rc
(6.6.28)
(6.6.29)
(6.6.30)
The r terms from this cancel exactly with those on the RHS of (6.6.29), leaving us with
0 = (ra Rbcd s + rb Rcad s + rc Rabd s )s
(6.6.31)
6.6.3. Alternative proof of Bianchi identity. (Cf. Woodhouse, GR, 5.7). If we choose
coordinates such that = 0 at x = 0, we have, from (6.5.2),
ra Rbcd e = @a @b
e
cd
@a @c
e
bd
ra Rbcd e = @a @b
e
cd
(6.6.32)
so
@a @c
e
bd
at x = 0.
(6.6.33)
Summing over the cylic permutations of a, b, c, the terms on the RHS cancel out, showing that
the Bianchi identity holds at x = 0. But the LHS is a tensor equation and the point is arbitrary,
so we have obtained the Bianchi identity.
79
rbc .
Definition 6.7.2. The scalar curvature, which I shall denote by s, is defined by contracting
the Ricci,
s = g ab rab
(6.7.2)
The scalar2 curvature is, as the name implies, a scalar quantity.
Remark 6.7.3. Many texts denote the Ricci tensor by Rab and the scalar curvature by R
(and call it the Ricci scalar). You have been warned.
Theorem 6.7.4. The Riemann curvature, Ricci curvature and scalar curvature are related
by the following identities
ra rbd rb rad = re Rabde
(6.7.3)
and
1
rb rab = ra s.
(6.7.4)
2
Proof. Starting from the Bianchi identity (6.6.25), we multiply by g ce (and sum over b and
e). Taking into account the symmetries of R, we obtain (6.7.3). Similarly, from this equation
multiply by g bd (and sum). This yields (6.7.4).
Remark 6.7.5. The two-dimensional hyperbolic metric has the property that its Ricci curvature is proportional to the metric. Indeed,
R121 2 = 1/x2 so r11 = 1/x2
because R111 1 = 0. Similarly
R2121 = 1/x2 and so r22 = 1/x2 .
Also
(6.7.5)
80
(6.8.1)
for each fixed . To be definite, imagine that the curve H(, 0) is Alices worldline, and we may
as well suppose that is is parameterized by proper time. Then her velocity 4-vector is
@H
(, 0)
@
We suppose that Bobs worldline is the nearby geodesic 7! H(, ), where is very small. To
first order, then, Bobs worldline is
@H
7! H(, 0) +
(, 0) .
@
The vector Y = @H/@ is called the connecting vector as it connects events on Alices worldline
to events on Bobs worldline.
Definition 6.8.2. The relative acceleration of the family of geodesics, as measured by Alice,
is the vector field r2X Y along her worldline 7! H(, 0), where X and Y are as above.
Theorem 6.8.3. We have
(6.8.2)
(6.8.3)
From the proof of Lemma 6.2.2, this can be computed in local coordinates as
Y a @a X b
X a @a Y b .
(6.8.4)
@2H b
@ @
(6.8.5)
81
and
@2H b
.
(6.8.6)
@ @
Hence (6.8.3) follows from the symmetry of the mixed partials of H.
Let X = @H/@ be the tangent vector field of the geodesic for all fixed values of . Then
X a @a Y b =
rX X = 0
and so
(6.8.7)
rY rX X = 0.
(6.8.8)
(This would not be true if the neighbouring curves were not geodesics). On the other hand, by
Definition 6.5.6,
(rX rY rY rX )Z = R(X, Y )Z
(6.8.9)
for any vector Z, since [X, Y ] = 0. Putting Z = X,
rX rY X = R(X, Y )X.
(6.8.10)
(6.9.1)
in units where the gravitational constant G is 1, where is the mass density. The equation of
motion of a particle in the gravitational field due to the mass density is
=
x
r'.
(6.9.2)
This is an absolute statement. For comparision with relativity we need the corresponding
statement about relative acceleration. Thus we consider a set-up similar to that of the previous
section, where we have a 1-parameter family x(t, s), each of which satisfies (6.9.2) for fixed s:
@2
x(t, s) = r'.
@t2
The relative acceleration is obtained by dierentiating with respect to s:
@2 @
x(t, s) =
@t2 @s
@
r'.
@s
(6.9.3)
(6.9.4)
@ 2yj
= y i @i @ j '.
(6.9.5)
@t2
This is the formula for relative acceleration in Newtons theory.
In order to match it up with the GR version, suppose that xa are coordinates which are
inertial at a point xa = 0 and that, as in the previous section, we have a 1-parameter family
H(, ) of timelike geodesics. We assume that H(0, 0) = 0 so that at proper time = 0, Alice is
at the event x = 0. We may assume further that her 4-velocity vector at that event is standard,
@H a
(0, 0) = (1, 0).
@
With this choice, x0 should be equated (approximately) with the newtonian time-variable t. If
we assume that Y (0, 0) is orthogonal to X, then
Y (0, 0) = (0, y).
(6.9.6)
This choice corresponds physically to Alice choosing to connect the event = 0 on her worldine
with the event on Bobs worldline that she judges also to happen at = 0. Then the geodesic
equation (6.8.2)
r2X Y = R(X, Y )X or r2X Y d = Rabc d X a Y b X c
(6.9.7)
) at = = 0, while for the RHS
translates as follows: the LHS should be (0, y
Rabc d X a X c = R0c0 d .
(6.9.8)
82
0
0
d
R0c0 =
.
0 R0i0 j
where the 3 3 matrix R0i0 j is symmetric in i and j (after lowering an index). Hence (6.9.7)
reduces to the 3-dimensional equation
yj = R0i0 j y i .
(6.9.9)
This leads to a suggestion for the GR analogue of Poissons equation. Actually Ill discuss
this only in the case that there is no matter, i.e. r2 ' = 0. This translates into
R0i0 i = 0 summation over i from 1 to 3.
(6.9.10)
by definition of the Ricci tensor. Hence we are led to Einsteins vacuum equations:
Hypothesis 6.9.2. In empty space, the space-time metric g is such that
rab = 0.
(6.9.11)
Remark 6.9.3. In going from rac X a X c = 0 for all timelike X to rac = 0 there is something
to prove. We can argue that if rac X a X c = 0 for all timelike X, then dierentiation with respect
to X yields rac = 0.
Remark 6.9.4. Tidal forces The relative acceleration (in either GR or newtonian gravity)
often goes under the heading of tidal forces: for example, if you have the misfortune to be freely
falling feet-first, towards a black hole, then the attractive force on your feet will be stronger
than on your head and this will translate into an eventually unbearable stretching eect. The
bulges of water on either side of the earth due to the non-uniform gravitational field of the moon
is the more classical example of tidal forces.
6.10. Weak field limit
We can get another angle on the interplay between full GR and newtonian gravity by
considering the so-called weak field limit. This is the study of a lorentzian metric g = + h
on R4 , where is the Minkowski metric and h is a small, slowly varying perturbation. We
neglect terms quadratic and higher in h and linear in @0 h. In this section we shall compute the
= r'. All this
curvature and the geodesics in this approximation and match the up with x
is a very good exercise in understanding the material presented in this chapter.
83
Lemma 6.10.1. If
gab = ab + hab
then
1
= cs (@a hbs + @b has
2
Knowing the s we can compute the slow geodesics:
g ab = ab
c
ab
@s hab )
Lemma 6.10.2. If xa ( ) is a slow geodesic, we may assume = x0 so that the velocity vector
is
x a = (1, x)
(6.10.1)
(6.10.2)
(6.10.3)
(6.10.4)
1
Rabc d =
@a @c hdb + @b @ d hac @b @c hda @a @ d hbc
(6.10.5)
2
In particular,
1
rac =
@a @c hbb + @b @ b hac @b @c hba @a @ b hbc .
(6.10.6)
2
If we consider the 00 component of this and neglect the @0 hab terms, then
1
1 2
r h00 .
(6.10.7)
r00 = @b @ b h00 =
2
2
This leads, once again, to the proposal that the newtonian empty space postulate (r2 ' = 0
should translate into rab = 0 (since weve already seen that h00 /2 should be identified (up to a
constant) with the newtonian potential.
6.11. Physical dierential equations
In SR, the electromagnetic field is described by a skew 2-tensor F ab . In an inertial frame,
the components are
2
3
0
E1
E2
E3
6 E1
0
B3 B2 7
7
F ab = 6
(6.11.1)
4 E2 B 3
0
B1 5
E3
B2 B1
0
It can be verified that Maxwells equations in vacuum are equivalent to the system
@a F ab = 0, @a Fbc + @b Fca + @c Fab = 0.
(6.11.2)
There is a natural generalization of this to a general space time (M , g). We assume that the
electromagnetic field is given by a skew tensor of type (2, 0), F , with components F ab , on M ,
satisfying the equations
ra F ab = 0, ra Fbc + rb Fca + rc Fab = 0.
(6.11.3)
@a
g ab
c
ab @c u
(6.11.4)
84
In Minkowski space, recall that Maxwells equations imply that each of the components Ei ,
Bi , of the electric and magnetic fields satisfies the flat-space wave equation, 2Ei = 0 = BoxBi .
In general M , we have:
Proposition 6.11.1. If Fab = Fab satisfies Maxwells equations (6.11.3) in (M , g), then
Fab satisfies a modified wave equation:
ra ra Fbc = Rbcas F sa
Proof. See the Problem set.
rba Fca
rca Fab
(6.11.5)
CHAPTER 7
(7.1.1)
where is the colatitude (i.e. latitude, but measured from the north pole rather than the
equator) and ' is longitude.
The Minkowski metric in these coordinates is
ds2 = dt2
dr2
(7.1.2)
Let us write
d! 2 = d2 + sin2 d'2
(7.1.3)
which is the round metric on the unit sphere x2 + y 2 + z 2 = 1 in R3 . This will save writing later.
A spherically symmetric, static2 metric is obtained from this by introducing functions of r
as coefficients
ds2 = A(r)dt2
B(r)dr2
C(r)r2 d! 2 .
(7.1.4)
where we require A > 0, B > 0 and C > 0 in the region of interest. The dependence of these
functions only on r encodes the spherical symmetry of the metric and also its time-invariance.
One can, of course, consider more general metric forms, but that is beyond the scope of this
course.
1closest point to the sun
2i.e. with coefficients independent of t
85
86
7.2. Schwarzschild
3
(7.2.1)
Proposition 7.2.1. By a change of r variables, = f (r), (7.1.4) can be made to take the
form
ds2 = ()dt2
()d2 2 d 2 .
(7.2.2)
p
Proof. If we define = C(r)r, then we shall get the coefficient of d 2 correct. Since
C > 0 this is certainly invertible for large enough r, so we define
() = A(r()).
For the dr term,
2
B(r)dr = B(r)
so
() = B(r())
This completes the proof.
dr
d
dr
d
d2
We use this proposition, then rechristen as r. So we may as well look at metrics in the
slightly simpler form
A(r)dt2 B(r)dr2 r2 d! 2
(7.2.3)
We saw in the previous chapter that in the weak field limit, the component g00 should be
matched with twice the newtonian potential computed by an observer with 4-vector (1, 0), up
to an additive constant. So the simplest possible guess for A(r) is 1 2m/r: the value 1 comes
from the required asymptotic form of the metric.
It turns out that there is a choice of B which then gives a metric which satisfies Einsteins
equations where the metric is defined:
Theorem 7.2.2. The Schwarzschild metric
ds2 = (1
2m/r)dt2
(1
2m/r)
dr2
r2 d
, (r > 2m)
satisfies rab = 0.
We shall not prove this in full. You can see most of the details in Woodhouse.
We shall, however, record the geodesic equations and the s for this metric
Proposition 7.2.3. The geodesic equations for the metric (7.1.4) are
d
A0
At = 0
t + tr = 0 or
A
d
A0 2 B 0 2
r 2
r
r +
t +
r
sin2 ' 2 = 0
2B
2B
B
B
+ (2/r)r sin cos ' 2 = 0
2
d 2 2
' + r ' + 2 cot ' = 0 or
r sin ' = 0.
r
d
3Named in honour of Karl Schwarzschild, 18731916
(7.2.4)
7.2. SCHWARZSCHILD
87
1 2
L=
At
B r 2 r2 (2 + sin2 ' 2 ) .
(7.2.5)
2
Hence
@L
@L
= At
=0
@t
@ t
@L
@L
1 0 2
= B r
=
At
B 0 r 2 2r(2 + sin2 ' 2 ) .
@ r
@r
2
@L
@L
2
= r
= r2 sin cos ' 2
@
@
@L
@L
= r2 sin2 '
=0
@ '
@'
Here dot is dierentiation with respect to , prime is dierentiation with respect to r. Next,
d
= A(t + (A0 /A)r t)
(At)
du
d
( B r)
=
B(
r + (B 0 /B)r 2 )
du
d
=
( r2 )
r2 ( + (2/r)r )
du
d
( r2 sin2 ')
=
r2 sin2 (' + (2/r)r ' + 2 cot ')
du
and combining these with the previous calculations we get the equations of the Proposition.
Proposition 7.2.4. The non-zero Christoel symbols for the metric (7.1.4) are as follows:
0
01
0
10
1
00
A0
2B
A0
;
2A
1
11
B0
2B
1
22
r
B
1
33
1
2
sin cos
33 =
r
1
3
3
3
3
13 = 31 =
23 = 32 = cot .
r
Proof. These are read o from the geodesic equations in the usual way.
2
12
2
21
r sin2
B
d
bc
@b
d
ac
d p
ap bc
d p
bp ac
and
rac = Rabc b = Ra0c 0 + Ra1c 1 + Ra2c 2 + Ra3c 3 .
To give a flavour of these calculations, let us compute Ra0c 0 . We shall show:
R101 0 =
A00
2A
(A0 )2
4A2
A0 B 0
4AB
(7.2.6)
(7.2.7)
and
We shall also compute R121
by putting a = c = 1 in (7.2.6).
We have
Ra0c 0 = @a
A00
2A
0
0c
(A0 )2
4A2
@0
0
ac
A0 B 0
4AB
0 s
as 0c
(7.2.8)
B0
rB
0 s
0s ac .
(7.2.9)
88
0
10
= A0 /2A,
0
ab
So
Ra0c 0 = @a
0
0c
0 s
as 0c
A0
2A
0
ac .
0
0c
0 s
1s 0c
A0
2A
1
1c .
1
ab
is
B0
.
2B
So
R101 0 = @1 001 + 01s
0 0
A
=
+(
2A
A00
(A0 )2
=
2A
4A2
A0 1
2A 11
A0 B 0
0 2
)
10
4AB
0
0
AB
4AB
s
01
and
Ra0c 0 = 0 for all other ac.
We have
R121 2 = @1
2
21
@2
2
11
2 s
1s 21
2 s
2s 11
R121 2 =
=
=
2
21
2 1
21 11
B0
2Br
Similarly
R131 3 = @1
=
=
because 313 = 1/r and
Hence
1
11
3
31
1
+
r2
B0
2rB
@3
3
11
3 3
13 13
3 s
1s 13
3 s
3s 11
3 1
31 11
= B 0 /2B.
r11 = R101 0 + R111 1 + R121 2 + R131 3
= R101 0 + R121 2 + R131 3
A00
(A0 )2 A0 B 0
B0
=
.
2A
4A2
4AB rB
89
Proposition 7.2.5. The non-vanishing components of the Ricci tensor of the spherically
symmetric metric (7.1.4) are
A00
A0 B 0
A02
A0
+
+
,
2B
4B 2
4AB rB
A00
(A0 )2 A0 B 0
B0
=
,
2A
4A2
4AB rB
0
0
rA
rB
1
=
+
1
2
2AB 2B
B
= sin2 r22 .
r00 =
r11
r22
r33
Now we can verify that the Schwarzschild metric of Theorem 7.2.2 does indeed satisfy the
Einstein vacuum equations rab = 0.
Eliminating the A00 terms between r00 and r11 gives AB 0 + BA0 = 0 so AB is constant. And
this should be 1 by the boundary condition.
Inserting B = 1/A, B 0 = A0 /A2 ,
1
1
r00 =
AA00
AA0
2
r
1
r11 =
r00
A2
r22 = rA0 + A 1
r33 = sin2 (rA0 + A 1).
Solving r22 = 0 gives A = 1 2m/r, where m is a constant and one checks that this also solves
the r00 = 0 equation. Hence we arrive at the Schwarzschild metric
2m
2m 1 2
2
2
ds = 1
dt
1
dr
r2 (d2 + sin2 d'2 ).
r
r
where m > 0 and, for the moment anyway, r > 2m.
Woodhouse, GR, Sect. 7.17.2
7.3. Physical consequences
7.3.1. Gravitational time dilatation: heavy clocks run slowly. Suppose that Alice
and Bob have positions r = rA and r = rB in the Schwarzschild space-time (angular positions
also fixed) with both rA and rB > 2m. How do they compare the rates at which their ideal
clocks run?
Imagine two ticks of Alices ideal clock, separated by a small proper time interval A . To
compare, we assume that Alices clock emits photons at each of the two ticks. Bob receives
these photons and in particular can record the elapsed time between receiving the first and
second photon. This gives him a time interval B , and the ratio A / B is the amount by
which Alices clock appears to run slowly as compared with Bobs.
Lets do it. Suppose first that tA is the dierence in t-coordinate between the two ticks of
Alices clock and suppose that tB is the dierence in t-coordinates of when the two photons are
received by Bob. (NB the Schwarzschild time-coordinate is NOT proper time for either Alice or
Bob!) Then it is pretty clear that tA = tB because the metric coefficients are all independent
of t. We shall make this computation explicitly below, just to be sure. Thus we need to see how
tA is related to A and similarly for tB and B .
Now Alices world line has the simple form A 7! (U A , rA , A , 'A ), where rA , A and 'A
are constants, and A is a proper time parameter if the associated velocity 4-vector
@
U
(7.3.1)
@t
is of unit length, i.e. g(U @t , U @t ) = 1. This entails
dt
(1 2m/rA )U 2 = 1, so U =
= (1 2m/rA ) 1/2 .
(7.3.2)
dA
90
dB dA 1
dB
B =
tB =
A
dt
dt
dt
Hence
B
=
A
1
1
2m/rB
2m/rA
(7.3.3)
(7.3.4)
Thus if Alice is nearer to r = 2m then this factor is greater than one, and so Bob will record a
longer elapsed time between two ticks of Alices clocks, this becoming (in principle) infinite as
rA approaches 2m4 .
Remark 7.3.1. Note that observers with constant (r, , ') coordinates in Schwarzschild are
not freely falling.
It is interesting to compute the trajectory of a photon sent by Alice at (tA , rA , 0 , '0 ) to
Bob at (tB , rB , 0 , '0 ). This is a radial5 null geodesic for the Schwarzschild metric.
t
r = rB
r = 2m
r = rA
r
2
2m
dt
2m 1 dr 2
1
1
=0
r
d
r
d
so
dr
= (1 2m/r)
dt
(plus sign if photon is travelling outwards as t increases).
Hence
r dr
dt =
r 2m
4We shall later identify the surface r = 2m with the event horizon of a black hole
5i.e. and ' are constant along the geodesic
(7.3.5)
(7.3.6)
(7.3.7)
and
tB
tA = r B
rA + 2m log
rB
rA
91
2m
2m
(7.3.8)
(7.3.9)
1
L=
(1 2m/r)t2 (1 2m/r) 1 r 2 r2 (2 + sin2 ' 2 .
2
We take the parameter to be proper time and to mean dierentiation wrt to .
We have conserved quantities
J = r2 sin '
E = (1 2m/r)t,
(7.3.10)
and
L = 1/2 for timelike, L = 0 for null geodesics.
(7.3.11)
Remark 7.3.2. The conserved quantity E is the total energy of our particle, assumed to
have unit rest-mass. (Not the gravitating one, the one thats orbiting.) If the rest-mass of the
orbiting particle is , we claim that
E = (1 2m/r)t
(7.3.12)
is the total energy. Remember that total energy is a relative concept in relativity, so this
statement needs careful interpretation. Suppose Alice is an observer sitting at constant (r, , ')
in Minkowski space. She measures the energy of the orbiting particle as it passes her, i.e. when
its spatial coordinates are (r, , '). Let Alices 4-velocity vector be U , and that of the orbiting
particle V . Then
')
r,
U = (U a ) = (1 2m/r) 1/2 (1, 0, 0, 0) and V = (V a ) = (t,
,
where the dot denotes dierentiation with respect to the particles proper time parameter. The
instantaneous speed v of the particle as measured by Alice as it passes satisfies
(v) = g(U, V )
just as in special relativity. In this case,
g(U, V ) = (1
2m/r)1/2 t.
Hence
E = (1
2m/r)t = (1
2m/r)1/2 g(U, V ) = (1
2m/r)1/2 (1
v2)
1/2
' + v 2 /2
m/r
(7.3.13)
if m/r is small and so is v. Recall that G = 1, c = 1 here; if we restore units, then this becomes
1
Gm
E ' c2 + v 2
(7.3.14)
2
r
The terms here are the rest-energy of the particle, its kinetic energy and its gravitational
potential energy. So this approximation is in perfect agreement with newtonian gravity and
special relativity.
Proposition 7.3.3. Equatorial6 timelike geodesics in Schwarzschild are given by the equations
2
dr
= E 2 (1 2m/r) (radial geodesics)
(7.3.15)
d
and by
d2 u
m
+ u 3mu2 = 2 (non-radial geodesics),
(7.3.16)
d'2
J
6i.e. with = /2
92
where u = 1/r and the angular momentum J = r2 ' 6= 0 is a constant. The equation (7.3.16)
has the first integral
2
du
E 2 1 2m
+ u2 2mu3 =
+ 2 u,
(7.3.17)
d'
J2
J
Similarly,
Proposition 7.3.4. Radial null geodesics in Schwarzschild are given by (7.3.57.3.9). Nonradial, equatorial null geodesics in Schwarzschild satisfy
d2 u
+u
d'2
which has the first integral
du
d'
3mu2 = 0
+ u2
2mu3 =
(7.3.18)
E2
.
J2
(7.3.19)
Proof. We have seen that for Schwarzschild geodesics, with = /2 we have the conserved
quantities
J = r2 '
E = (1 2m/r)t,
(7.3.20)
and the further conservation equation
(1
2m/r)t2
(1
1 2
2m/r)
r2 ' 2 = 2L.
(7.3.21)
where L = 1/2 for timelike and L = 0 for null geodesics. In the radial case, ' = 0 and for
timelike geodesics, (7.3.21) gives
dr 2
=1
d
E2
2m
,
r
(7.3.22)
2mu)
E2
(1
2mu)
1 2
r2 ' 2 = 2L
(7.3.23)
2mu)
2
1E
J2
(1
2mu)
1 4
dr
d'
u2 =
2L
J2
(7.3.24)
Now
dr
=
d'
1 du
u2 d'
(7.3.25)
1
E2
2mu J 2
1
2mu
du
d'
2L
J2
(7.3.26)
4mL
u
J2
(7.3.27)
u2 =
du
d'
+ u2
2mu3 =
E2
2L
J2
The results in the Propositions follow from this by setting L = 1/2 for timelike and L = 0
for null. The second-order equation follows by dierentiation with respect to ', and cancelling
u' .
93
Remark 7.3.5. For newtonian gravity, the equation for orbits (see homework problem 1.10)
are
with first integral
d2 u
m
+u= 2
2
d'
J
(7.3.28)
du 2
A
2m
+ u2 = 2 + 2 u
(7.3.29)
d'
J
J
Thus the GR correction to this equation is the 2mu3 term on the LHS of (7.3.27).
Remember that u = 1/r so large r corresponds to small u and the eect of the cubic
correction term is stronger for small radii. Remember also that the Schwarzschild metric appears
only to be OK for r > 2m which corresponds to 0 < u < 1/2m.
7.3.3. Circular timelike orbits and the precession of perihelion. You can find an
extensive analysis of the timelike geodesics in Schwarzschild in Woodhouses book, Chapter 8.
We shall just look at circular orbits and small perturbations of them. This already leads to the
precession of perihelion, which I may have mentioned was one of the first verifications of GR.
Consider a circular timelike orbit in Schwarzschild. For such an orbit, evidently u'' = 0,
u' = 0. Setting u'' = 0 in (7.3.16) gives the equation
3mu2
so solving the quadratic,
u + m/J 2 = 0
(7.3.30)
p
1
12m2 /J 2
(7.3.31)
6m
Thus we have circular orbits if J 2 > 12m2 . For m small, the larger value of u is approximately
1/3m and is just less than this value. The smaller value of u is
p
1
1 12m2 /J 2
m
u = u0 =
' 2
(7.3.32)
6m
J
if m/J 2 is small, and this is the newtonian value of the radius of a circular orbit for given m
and J (cf. (7.3.28))
u=
94
Fourth perihelion
Third perihelion
Second perihelion
First perihelion
Figure 2.
Plot of r = 1/(0.1 + 0.03 cos(0.95')), showing the precession of the perihelion. The succesive perihelia are shown and occur at ' =
0, 2/0.95, 4/0.95, 6/0.95, . . .. The red arc is the part of the orbit from ' = 0
to ' = 2/.95; the blue arc is the part of the orbit from ' = 2/0.95 to 4/0.95;
the grey arc is the part of the orbit from ' = 4/0.95 to 6/0.95
7.3.4. Photon trajectories: gravitational bending of light. From Proposition 7.3.4,
non-radial photon trajectories satisfy:
d2 u
+u
d'2
3mu2 = 0.
(7.3.34)
We note first, that circular orbits exist if u 3mu2 = 0, i.e. u = 1/3m. The existence of
circular photon orbits shows clearly that light is aected by gravity in Einsteins theory.
1
These orbits are unstable. Indeed, trying u = 3m
+ v. Then
d2 v
= v + O(v 2 )
d'2
and v e' , so these perturbed solutions tend to grow exponentially.
Let us consider instead the trajectory of a photon which comes in from infinity (in a straight
line with respect to the asymptotic coordinate system) and passes near to our gravitating object.
Recall that in polar coordinates, straight lines not through the origin are given by equations
of the form
r cos(' '0 ) = C.
(7.3.35)
Indeed, remembering x = r cos ', y = r sin ', (7.3.35) is equivalent to
r cos ' cos '0
y sin '0 = C.
(7.3.36)
Thus this straight line is inclined at an angle '0 to the y-axis. In terms of the reciprocal
coordinate u = 1/r, (7.3.35) takes the form
u = cos('
'0 )
(7.3.37)
(7.3.38)
95
Figure 3. Bending of light by a star: plots of u = cos ' + 2 (1 + sin ')2 for
= 0.08, 0.06, 0.05
where v(') is small and
v( /2) = 0
(7.3.39)
so that as ' ! /2, u(') ! 0 and r ! 1. (This corresponds to being parallel to being
asymptotically parallel to the y-axis and with y 1.
This is now an exercise in dierential equations. It is better to work with the first integral
(7.3.19)
2
du
E2
+ u2 2mu3 = 2 .
d'
J
We have to follow our noses and compute:
u0 =
sin ' + v 0 .
(7.3.40)
2 ,
so that
(7.3.41)
(7.3.42)
Hence
2 = E 2 /J 2
(7.3.43)
and
v 0 sin ' v cos ' = m2 cos3 '
(7.3.44)
2
We use the integrating factor method to solve this. The integrating factor is 1/ sin ', so
3
d
v
cos '
2 cos '
2
= m
= m
cos '
(7.3.45)
d' sin '
sin2 '
sin2 '
Hence
v(') = C sin ' + m2 (1 + sin2 ').
Applying the boundary condition v( /2) = 0, we get C = 2m2 and
v(') = m2 (1 + sin ')2 .
96
We conclude that
u(') = cos ' + m2 (1 + sin ')2 .
(7.3.46)
is an approximate null geodesic in Schwarzschild for small.
We calculate the angle of deflection by looking for the values of ' for which u = 0. We
already know one of them: ' = /2. We expect the other one to be approximately ' = /2
(since this would correspond to zero deflection). So try to solve u(/2
) = 0, assuming is
small. Substituting this into (7.3.46),
) = 0 , sin + m2 (1 + cos )2 = 0
u(/2
and putting sin
'
' 1,
cos
(7.3.47)
' 4m.
(7.3.48)
So the asymptotic direction of the light-ray is approximately /2 + 4m, showing that the
photon has been deflected by an angle 4m due to the gravitational pull of the star.
Again, putting in the units, the deflection is approximately 4mG/Dc2 , where D is the
impact parameter (the smallest value of r on the trajectory).
This was observed by Eddington during the 1919 total eclipse of the sun.
7.4. Extensions of Schwarzschild: introduction to black holes
We are going to consider the significance of the set r = 2m in Schwarzschild. This is a 3dimensional surface inside the 4-dimensional space-time. If we fix t we have a two-dimensional
sphere of radius 2m, so the surface as a whole looks like a cylinder of some kind.
We do have to worry about r = 2m, because a particle in free fall along a radial timelike
geodesic will reach this set in finite proper time. Indeed, recall that for such a geodesic,
r
2m
dr
=
E2 1 +
(7.4.1)
d
r
for some constant E > 1. If = 0 when r = R, this gives
Z R
ds
p
(r) =
.
(7.4.2)
2
E
1 + 2m/s
r
So the particle reaches r = 2m at proper time (2m) < 1. This means that the problem of
understanding the metric in the vicinity of r = 2m is of real physical relevance.
7.4.1. Toy examples.
Example 7.4.1. The metric
dt2
+ 2t d2 , (0 < x < 1)
(7.4.3)
2t
is singular at t = 0. However, if we define
p
dt
dr = p , so r = 2t,
2t
the metric becomes
ds21 = dr2 + r2 d2 = dx2 + dy 2
(x = r cos , y = r sin ). The origin (x, y) = (0, 0) corresponds to the singularity t = 0 of
(7.4.3): despite appearances, the metric is perfectly good there.
ds21 =
1 < u, v < 1.
(7.4.4)
97
and the metric extends (as the flat metric) to the whole xy plane.
y
(x, y)
u
(u, v)
Definition 7.4.3. In cases where a metric is ill-defined at a point, but after a change of
coordinates it becomes well defined, we say that we have a coordinate singularity.
A coordinate singularity is not a true geometric singularity: it just corresponds to looking
at the metric in a poor choice of coordinates.
We shall now see that the surface r = 2m is a coordinate singularity rather than a metric
singularity of Schwarzschild by exhibiting new coordinates in which the singularity disappears!
7.4.2. EddingtonFinkelstein coordinates. Define
r = r + 2m log |r
2m|
(7.4.5)
so
dr
2m
r
=1+
=
= 1
dr
r 2m
r 2m
The graph of r as a function of r is shown in the diagram:
r
2m
r
(7.4.6)
r = 2m
(7.4.7)
Proposition 7.4.4. Changing variables from (t, r) to (v, r), Schwarzschild becomes
ds2 = (1
2m/r)dv 2
dr dv
r2 d! 2 .
dv dr
r ,
dv = dt + dr = dt + (1
2m/r)
dr
2m/r)
dr)2
(7.4.8)
we have
(1
2m/r)dt2 = (1
2m/r)(dv
= (1
2m/r)dv
(1
drdv
dvdr + (1
(7.4.9)
2m/r)
dr
(7.4.10)
so
(1
2m/r)dt2
(1
2m/r)
dr2 = (1
2m/r)dv 2
The metric
(1
2m/r)dv 2
dr dv
dv dr
drdv
dvdr.
(7.4.11)
98
dv dr
2m
r
1
0
dv
dr
Let us discuss whats happened here. To avoid later confusion, we shall rechristen the r
coordinate here . Thus (suppressing and ') we have two coordinate systems: the original
(t, r) and the new (v, ) with
v = t + r + 2m log |r
2m|, = r.
(7.4.12)
(7.4.13)
and7
@
@v @
@ @
@
@
=
+
= (1 2m/r) 1
+
.
@r
@r @v @r @
@v @
The following picture may help to visualize whats going on:
(7.4.14)
=0
= 2m
@
@
@
+ (1 2m/r)
.
@v
@v
@
(7.4.15)
(7.4.16)
Hence a pair of radial future-pointing null vectors in the new coordinates is:
2@v + (1
2m/r)@ , @ .
(7.4.17)
99
Remark 7.4.5. The radius r = 2m is called the Schwarzschild radius. For our sun this is
about 3 kilometres, well inside the sun itself. In particular the Schwarzschild metric is not valid
there, because matter is present in this region. The above discussion is only of significance if
matter is so highly concentrated that the region r = 2m is contained in a region of empty space.
The region r < 2m, to which we have now extended the Schwarzschild metric, is then called the
black hole region of the space-time.
7.4.3. What happens near the event horizon? Suppose Alice and Bob are near a
black hole described by the Schwarzschild metric. Alice is sitting at r = R and the unfortunate
Bob8 falls through r = 2m. How can we analyze this?
If Bob is freely falling, radially, so = ' = 0, the Lagrangian in EddingtonFinkelstein
coordinates is
1
L = ((1 2m/r)v 2 2v r).
(7.4.18)
2
Here L is independent of v and so
@L
= (1 2m/r)v r
(7.4.19)
@ v
is a constant, F , say. Also L = 1/2 along a timelike geodesic parameterized by proper time.
Suppose that = 0 when r = 2m. For small , these equations reduce to
r = F, v r = 1/2
(7.4.20)
F , for small .
(7.4.21)
Thus Bob doesnt notice anything particularly strange in crossing the event horizon: though in
reality, for a typical sized black hole, the tidal forces (the dierence between the force felt on
your head and your feet) get a bit strong well before you encounter the event horizon.
What does Alice see of Bobs descent? To answer this question, suppose she is sitting at
r = R > 2m, and receives a photon from Bob at every tick of his clock. Then As world-line is
A 7! (V A , R), her velocity 4-vector is (V, 0) and so A is proper time if this has length2 equal
to 1 with respect to our metric. This gives V = (1 2m/R) 1/2 (cf. 7.3.1). So
v(A ) = (1
2m/R)
1/2
(7.4.22)
A photon emitted by Bobs clock at proper time B < 0 and heading out to Alice will satisfy
(1
with initial conditions
v(0) =
2m/r)v 2
2r v = 0
B
, r(0) = 2m
2F
F B .
(7.4.23)
(7.4.24)
Dividing by r v,
(7.4.23) gives
dv
2r
=
dr
r 2m
(7.4.25)
R 2m
1
v(R) = 2 R 2m + 2m log
+ (F +
)B .
F |B |
4F
and by (7.4.22) the corresponding value of As proper time is
r
2m
R 2m
1
A = 2 1
R 2m + 2m log
+ (F +
)B .
R
F |B |
4F
8Perhaps Bobs a robot
(7.4.26)
(7.4.27)
so
(7.4.28)
(7.4.29)
100
2m), w = t
2m log(r
2m).
(7.4.30)
(This is more symmetrical than changing to (v, r) coordinates as we did in the previous section.)
A simple calculation shows
1
(dvdw + dwdv) = dt2
2
(dr )2 = dt2
(1
2m/r)
dr2 ,
(7.4.31)
ds2 = (1
r2 d! 2 , 1 < v, w < 1.
(7.4.32)
w) = r + 2m log(r
2m), r > 2m
(7.4.33)
Note that as r goes from 2m to 1 so the RHS of (7.4.33) goes from 1 to 1 and so given any
value of (v w)/2, theres a unique value r > 2m which solves (7.4.33). This metric degenerates
at r = 2m, which corresponds v w ! 1. There is a remarkable trickanalogous to what
happened in Example 7.4.2 abovewhich allows us to extend this metric through r = 2m.
We set
v = 4m log v 0 , w =
4m log( w0 )
or
v 0 = exp(v/4m), w0 =
exp( w/4m).
This maps the whole (v, w) plane to the quadrant {v 0 > 0, w0 < 0} in the (v,0 w0 ) plane (compare
Example 7.4.2). Note that for r just a bit bigger than 2m, v has some finite value, w ! +1,
so v 0 has some positive value and w0 is just less than zero.
Then
dvdw =
v 0 w0 =
=
dv 0 dw0
v 0 w0 ,
exp((v w)/4m)
16m2
exp(r/2m + log(r
(7.4.34)
(7.4.35)
2m)) =
r/2m
(r
2m).
(7.4.36)
16m2
e
r
r/2m
dv 0 dw0
r2 d! 2
2m) =
v 0 w0 .
(7.4.37)
101
Remark 7.4.7. This extension was obtained in 1960 by Kruskal. The crucial point is that
g is a well-defined lorentzian metric wherever r > 0. From (7.4.37)
r > 0 , er/2m (r
2m) >
v 0 w0 >
2m ,
2m.
Thus the metric is defined in the set v 0 w0 < 2m, but the coordinates (v 0 , w0 ) can have either
sign.
Remark 7.4.8. If you prefer something looking more obviously Lorentzian, define
t0 = v 0 + w 0 , x 0 = v 0
w0 , so (dt0 )2
so
g =
4m2
e
r
r/2m
[(dt0 )2
(dx0 )2 ]
r2 d! 2
Singularity (r = 0) t0
v0
r = 2m
r = 2m
Region I
Region I
Region II
Region II
x0
r = const > 2m
Singularity (r = 0)
Note:
Radial null geodesics are given by
v 0 = , w0 , , ' constant,
and
w0 = , v 0 , , ' constant.
We emphasise that the (v 0 , w0 ) axes are at 45 in this diagram. Also that the hyperbola at
the top of the picture is the true singularity of the metric and represents the black hole itself.
The ultimate fate of every particle or photon in region II (inside the event horizon) is to end
up terminated by this singularity.
The singularity is the set v 0 w0 = 2m and is shielded from view by the event horizon. GR
and all known laws of physics break down at the singularity.
102