General Relativity Notes

Maths for GR
Michael Singer
E-mail address: michael.singer@ucl.ac.uk
Department of Mathematics, University College, London WC1E 6BT
Contents
Chapter 1. Introductory mathematical material: some geometry
1.1. Vector spaces and affine spaces
1.2. Quadratic forms and bilinear forms
1.3. Curves, tangent vectors and so on
1.4. Calculus of variations
5
5
7
11
13
Chapter 2. Minkowski Spacetime and Special Relativity

2.1. Introduction
2.2. What are the inertial coordinate systems?
2.3. Worldlines
2.4. Why do clocks carried by inertial observers all go at uniform rates?
2.5. Spacetime diagrams
2.6. Time dilatationmoving clocks run slowly
2.7. Simultaneity and distance
2.8. Length contraction
2.9. Lorentz transformations
2.10. Orientation and time-orientation
2.11. Interstellar travel
2.12. Summary of key notation and definitions
17
17
18
19
21
21
22
23
25
26
27
32
32
Chapter 3. Further topics in Special Relativity

3.1. Non-inertial observers: acceleration
3.2. Momentum and energy: E = mc2
3.3. Momentum of photons
35
35
37
39
Chapter 4. Multivariable calculus

4.1. Smooth functions and changes of coordinates
4.2. Two types of vector
4.3. The Einstein summation convention
4.4. Dierentiation along a curve
4.5. Tensor fields of rank 2
4.6. General tensors
4.7. Manifolds
41
41
43
45
47
47
49
50
Chapter 5. Space-times and geodesics

5.1. Curved space-time
5.2. Events and worldlines
5.3. Geodesics
5.4. A first look at the covariant derivative
5.5. Local inertial coordinates
5.6. A sneak preview of curvature
53
53
56
56
60
61
63
Chapter 6. Covariant dierentiation and curvature

6.1. Introduction
6.2. The covariant derivative
6.3. Extension to all tensors
6.4. Properties
67
67
67
70
71
CONTENTS
6.5.
6.6.
6.7.
6.8.
6.9.
6.10.
6.11.
Curvature
Curvature at a point
Ricci and scalar curvature
Relative acceleration and geodesic deviation
Comparison with the newtonian theory
Weak field limit
Physical dierential equations
Chapter 7. The Schwarzschild metric and black holes

7.1. Spherically symmetric, static metrics
7.2. Schwarzschild
7.3. Physical consequences
7.4. Extensions of Schwarzschild: introduction to black holes
7.5. Gravitational collapse
7.6. The life and death of stars
7.7. Some figures, or a tale of 3 black holes
72
76
79
80
81
82
83
85
85
86
89
96
102
102
102
CHAPTER 1
Introductory mathematical material: some geometry

1.1. Vector spaces and affine spaces
It is a commonplace that the world around usat least on everyday scalescan be very well
described as some kind of a 3-dimensional space, in that we can use some system of coordinates
(x, y, z) to describe where things are.
One of the lessons that Einstein taught us, though, is that coordinate systems by themselves
have no physical significance, and to understand things properly we have to distinguish clearly
between physical phenomena and the way in which they may be described using particular
coordinates. The mathematics underlying this is linear algebra, so well start with a quick
reminder about abstract vector spaces and linear maps.
We shall then add additional structures piece by piece, giving abstract definitions and showing what they mean concretely.
The most important of these additional structures is one that allows us to measure lengths.
(In an abstract vector space there is no natural notion of what the length of a vector should
be.)
1.1.1. Vector spaces. In this course, we shall only be interested in real vector spaces.
Recall that a vector space V is a set of elements (called vectors) which can be added to each
other and multiplied by real numbers (often called scalars). We will not reproduce the axioms
here.
If V and W are vector spaces, then we are interested in maps between them that preserve the
given structure (the addition and scalar multiplication). A linear map (or linear transformation)
from V to W is a mapping T from V to W (written in compact form T : V ! W ) with the
properties
T (v1 + v2 ) = T (v1 ) + T (v2 ) and T ( v) = T (v)
(1.1.1)
for any vectors v, v1 , v2 of V and scalars (i.e. real numbers) .
Remark 1.1.1. It is standard to write T v for T (v) when T is a linear transformation.
Further: T is an isomorphism between V and W if it is linear and also 1:1 and onto as a
mapping, just regarding V and W as sets. If V and W are isomorphic by a linear map T , we
often also say that T is an identification of V and W , or that V and W are identified (by T ).
Example 1.1.2. As in the intro, the space of all triples (x, y, z) of real numbers is a vector
space, denoted R3 , where
(x, y, z) + (x0 , y 0 , z 0 ) = (x + x0 , y + y 0 , z + z 0 ),
(x, y, z) = ( x, y, z)
Example 1.1.3. More generally, the space of lists of n real numbers (x1 , . . . , xn ) is a real
vector space denoted by Rn .
1.1.2. Bases and matrices. We recall that a (finite) basis for a vector space V is a set
of elements (v1 , . . . , vn ) such that every element v in V has a unique representation
v=
n
X
j vj ,
j=1
2 R).
(1.1.2)
In the expansion (1.1.2), the numbers j are called the coefficients (or sometimes coordinates)
of v with respect to the basis (v1 , . . . , vn ). If V does have a finite basis consisting of n elements,
5
1. INTRODUCTORY MATHEMATICAL MATERIAL: SOME GEOMETRY
then any other basis of V will also consist of n elements. This number n is defined to be the
dimension of V .
If V does not have a finite basis, then it is said to be infinite-dimensional.
Example 1.1.4. The set Rn of n-tuples (x1 , . . . , xn ) (see above) has dimension n. This has
a standard basis
e1 = (1, 0, . . . , 0), e2 = (0, 1, 0 . . . , 0), . . . , en = (0, . . . , 0, 1).
Example 1.1.5. The set of dierentiable functions on R is an infinite-dimensional vector
space.
Theorem 1.1.6. If V is a real vector space of dimension n, then V is isomorphic to Rn .
We wont need the proof. But its important to understand that an isomorphism of V with
Rn is exactly the same thing as a choice of basis of V . For if T : Rn ! V is an isomorphism,
we define
vj = T (ej ),
where the ej is the standard basis of Rn , and then you check that the vj form a basis of V .
Conversely, given a basis vj of V , for every v 2 V we have its n-tuple of coefficients j as in
(1.1.2), and the map from v to its coefficients is an isomorphism from V to Rn .
In particular without more structure there is no natural or unique isomorphism between our
n-dimensional vector space and Rn . Returning to our original motivation, the observation that
our world appears to be very well described by a space with coordinates (x, y, z), but we dont
want to specify particular coordinates, we see that these ideas are quite well captured by saying
that our world is well described by a 3-dimensional real vector space.
Remark 1.1.7. When we are using vectors to describe physical space we often call their
components in a basis coordinates instead.
1.1.3. Symmetries of a vector space. Another way of thinking about choices of basis
in a vector space is in terms of the symmetry of the space.
Definition 1.1.8. Let V be a finite-dimensional real vector space. The set of all invertible
linear maps T : V ! V (i.e. linear isomorphisms of V with itself) is a group, denoted GL(V ).
If V = Rn , we also write GLn (R) or GL(n, R) for GL(V ).
GL here stands for general linear. Recall group just means that there is an associative
multiplication with inverses. In this case it is composition of linear maps. The group GLn (R)
is the same as the group of n n invertible matrices M . Such M defines a map from Rn to
itself by matrix multiplication, where we write the typical element x of Rn as a column vector
with coefficients (x1 , . . . , xn ), so
x 7! M x
1.1.4. Affine spaces. A linear map T between vector spaces automatically takes the zero
vector to the zero vector. In particular, thinking about R3 , the translation
(x, y, z) 7! (x + a, y + b, z + c)
for some fixed vector (a, b, c) is not a linear map. However, from the physical point of view, we
would certainly want to be able to consider such transformations as part of our story.
There is a formal abstract definition of affine space, but we shall not give it. Instead, we
work with our vector space V and just enlarge the symmetries.
Definition 1.1.9. Let V be a finite-dimensional real vector space. The affine group A(V )
is the group of all transformations of the form
x 7! T x + b
where T 2 GL(V ) (i.e. is an invertible linear transformation) and b 2 V is a vector.
1.2. QUADRATIC FORMS AND BILINEAR FORMS
Note in particular, by taking T = the identity I, we get the translations as part of A(V ).
And on the other hand it contains GL(V ) as the subgroup of elements with b = 0.
We shall somewhat loosely talk about the affine space V to mean the space V , but where
we are allowing the whole of A(V ) to act as symmetries.
The following example may entertain the mathematicians.
Example 1.1.10. Let T be a linear mapping from a vector space V to another vector space
W . Let w 2 W be any given vector. Let P be the solution set of the equation T x = w, i.e.
P = {x 2 V : T x = w}
Suppose that P is not empty. Then P is a natural example of an affine space if w 6= 0 and is a
vector space if w = 0.
To picture what is going on here, suppose that V = R3 and W = R. Then we picture P
as a two-dimensional plane inside of R3 (usually) which goes through the origin if w = 0 and
does not (necessarily) otherwise. When w = 0, P is a linear (or vector) subspace of R3 , hence a
vector space K in its own right. If w 6= 0, P is identifiable with K; geometrically, P is a plane
parallel to K, but not going through 0. We can identify K with P by picking any element p of
P and mapping k 2 K to k + p 2 P . There is no preferred choice of p and hence no given origin
in P .
A further interesting fact is that the group of linear transformations of R3 which map P
into itself is identifiable with A(P ).
If we think of V as an affine space, then given any two points p and q, we have the displacement vector !
pq of q relative to p, given in terms of the linear structure of V by
!
pq = q
p.
(1.1.3)
Note that if we subject p and q to an affine transformation

x 7 ! Tx + b
then the displacement vector is subjected only to the linear part of the transformation,
!
pq 7 ! T (!
pq);
the vector b cancels out when we form q p.
To put it another way, if we fix any point p of V , then we get a map from V (viewed as an
affine space) to V viewed as a vector space, in fact the space of all position vectors relative to
p:
q 7! !
pq.
Indeed, the set of position vectors does have a given origin, namely the zero vector which is the
position vector of p relative to itself. (A bit confusingsorry about that, but it is the price to
be paid for not burdening you with the proper definition of affine space.)
Lets make this more concrete. Consider the solar system (at a particular instant, perhaps).
We know that we can describe the position of any given object in the solar system by 3 coordinates (x, y, z). But depending upon the problem, we may want to put the origin of these
coordinates at dierent places. For describing the orbit of the Earth around the Sun, we might
very well put the origin at the centre of the Sun; for describing the Earth-Moon system, we
might put it at the centre of the Earth, or at the centre of mass of the Earth-Moon system. The
point is that the right way to think of things is that coordinates (x, y, z) give the position of
one point relative to another (the Earth relative to the Sun, or the Moon relative to the Earth).
1.2. Quadratic forms and bilinear forms
Informally, a quadratic form in n variables is a homogeneous quadratic polynomial in n
variables.
Example 1.2.1. In one variable, the only possibility is ax2 , where a is real. In two variables
(x, y), we have
a g x
2
2
ax + 2gxy + by = x y
.
g b y
In three variables,
2
32 3
a
g
f
x
ax2 + by 2 + cz 2 + 2eyz + 2f zx + 2gxy = x y z 4 g b e 5 4y 5 .

f e c
z
Example 1.2.2. All terms have to be of degree precisely two: in two variables
ax2 + bxy + cy
is not an example unless c = 0.
Definition 1.2.3. Let V be a vector space and let d be a non-negative integer. A real-valued
function f on V is said to be homogeneous of degree d if
f ( v) =
for all real
f (v)
and v 2 V .
Remark 1.2.4. This explains what homogeneous of degree 2 means above. Homogeneity
is also important later on in this chapter, when we look at Lagrangians and the calculus of
variations (cf. Proposition 1.4.4).
There are two reasons for the introduction of quadratic forms and bilinear forms. First of
all, quadratic forms on vector spaces provide the additional structure needed to define distance.
The other reason bilinear forms is that they, and their multilinear cousins, are essential in the
development of multivariable calculus (see Chapter 4).
The basic example is x2 + y 2 + z 2 in R3 . By Pythagorass theorem, this is the square of the
distance of the point (x, y, z) from (0, 0, 0) if we are thinking of a standard system of mutually
perpendicular axes.
In special relativity (see Chapter 2) the physics is captured by a quadratic form in 4 variables,
c2 t2 x2 y 2 z 2 . The significance of this will be that
c 2 t2
x2
y2
z2 = 0
if and only a photon (particle of light) emitted at the origin at t = 0 (and travelling in a straight
line) can pass through the point (x, y, z) at time t.
Thus one should think generally of quadratic forms as defining squares of distances, or
squares of lengths of vectors on a vector space. This needs to be taken with a pinch of salt,
thought, since in the above 4-dimensional example, the square of the distance can be 0 or even
negative! We shall explore this in detail in the chapter on special relativity.
We define these homogeneous quadratic polynomials in terms of bilinear forms.
Definition 1.2.5. A bilinear form B on a vector space V is a map
B :V V !R
with the property that for each fixed v, the maps
w 7! B(v, w) and w 7! B(w, v) are linear in w.
To spell this out,
B(v, u + w) = B(v, u) + B(v, w).
and similarly
B( u + w, v) = B(u, v) + B(w, v).
Definition 1.2.6. A bilinear form B is said to be symmetric if B(v, w) = B(w, v) for all
v, w. A bilinear form is said to be skew-symmetric (or just skew) if B(v, w) = B(w, v).
1.2. QUADRATIC FORMS AND BILINEAR FORMS
Definition 1.2.7. If B is a symmetric bilinear form, then the associated quadratic form is
Q(v) = B(v, v).
A bilinear form is called non-degenerate if for any v 6= 0, there exists w 2 V such that
B(v, w) 6= 0.
It is equivalent to the corresponding condition with the roles reversed: that is B is also nondegenerate if for any v 6= 0, there exists w such that
B(w, v) 6= 0
We shall see later that any bilinear form (on a finite-dimensional vector space) can be
Then
represented by a square matrix B.
is symmetric (B
=B
t );
B is symmetric if and only if the matrix B
is skew (B
= B
t );
B is skew-symmetric if and only if the matrix B
B is non-degenerate if and only if the matrix B is invertible.

A linear transformation of V is an isometry (or Q-isometry), if we need to keep track of
which quadratic form is under consideration, if it preserves lengths as defined by Q:
Definition 1.2.8. Let V be a vector space with a non-degenerate quadratic form Q. A
linear transformation T of (V, Q) is called a Q-isometry if
Q(T v) = Q(v)
(1.2.1)
for all vectors v in V . The set of all Q-isometries forms a group denoted O(V, Q).
Remark 1.2.9. Here the O stands for orthogonal. The condition
Example 1.2.10. If V = R2 and Q(x, y) = x2 + y 2 , then Q gives the ordinary euclidean
length-squared of the vector from (0, 0) to (x, y). You can verify that the linear transformation
(x, y) 7! (cx + sy, sx + cy)
where c = cos , s = sin is a Q-isometry of R2 . Geometrically, this linear transformation just

represents a rotation of the plane, and rotations preserve lengths. This gives a guide to how
you should think about isometries more generally.
1.2.1. Matrix representation. If (e1 , . . . , en ) is any basis of V , and B is any bilinear
ij = B(ei , ej ), a square matrix that well denote B.
If
form, then we may form the elements B
we identify vectors x and y of V with their coefficients (x1 , . . . , xn ), (y1 , . . . , yn ) with respect to
this same basis (e1 , . . . , en ), then it follows from the conditions of bilinearity that
n
X
ij xi yj .
B(x, y) =
B
i,j=1
Thinking of x and y as column vectors, we can also write this
B(x, y) = xt By
where xt is the transpose of x, i.e. the row vector with coefficients (x1 , . . . , xn ).
The isometry condition in terms of Q is equivalent to its polarized version
B(T v, T v 0 ) = B(v, v 0 ) for all v, v 0 2 V.
Cf. Problem 1.2.

And in terms of matrices, this is equivalent to
T = B
Tt B
(1.2.2)
(1.2.3)
if T is the matrix representation of T with respect to the basis (e1 , . . . , en ).

Note, by taking determinants of (1.2.3) we obtain (since det(AB) = det(A) det(B))
= det(B).
det(T )2 det(B)
(1.2.4)
is non-degenerate and so det B
6= 0.
and so det(T ) = 1 since B
10
Definition 1.2.11. The set of Q-orthogonal T s with det(T ) = 1 forms a subgroup denoted
SO(V, Q), read the special orthogonal group.
Example 1.2.12. Suppose that B is a symmetric bilinear form on R2 such that
B(e1 , e1 ) = 0, B(e2 , e2 ) = 0
is this enough to determine B uniquely?
Solution 1.2.13. The answer is no, we need to know B(e1 , e2 ) as well to determine B. If
B(e1 , e2 ) = , however, then by symmetry also B(e2 , e1 ) = . So the matrix representation of
this symmetric bilinear form will be
B=
.
(1.2.5)
0
1.2.2. Diagonal quadratic forms. If V = Rn , the standard quadratic form of signature

(r, s) has the matrix representation
Q = diag(1, . . . , 1, 1, . . . , 1)
with respect to the standard basis, where there are with r +1s and s
(1.2.6)
1s.
Remark 1.2.14. In the pure mathematical literature, the dierence r s of the number of
+1s and the number of 1s is often called the signature. On the other hand it is also common
to refer to a quadratic form as having signature +, +, +, + or +, , , rather than 4 or 2.
There is no real risk of confusion with these slight variations.
The corresponding orthogonal group is denoted by Or,s or O(r, s) and the corresponding
subgroup of elements with determinant equal to +1 is denoted SOr,s or SO(r, s).
One can make the case that the most important cases are s = 0, when the groups are also
just denoted O(n) and SO(n) (for ordinary classical n-dimensional euclidean geometry) and
s = 1, r = 3, for the study of special relativity. In this case O(1, 3) is called the Lorentz group.
The following is a basic fact about non-degenerate quadratic forms:
Theorem 1.2.15. Let V be a finite-dimensional vector space and let Q be a non-degenerate
of Q in this basis
quadratic form on V . Then there exists a basis of V so that the matrix Q
takes the standard form (1.2.6) for some particular r and s (which depend on Q).
Proof. (Sketch, useful to be aware of it, but not examinable.)
Pick any v at random with Q(v) 6= 0. [If Q(v) = 0 for all v, then by polarization (see
problem set) B(v, w) = 0 for all v and w, contradicting the non-degeneracy of Q. So such v
does exist.]
p
Replacing v by e1 = v/ |Q(v)|, we get
Q(e1 ) =
Q(v)
= 1.
|Q(v)|
Let V 0 be the orthogonal complement of e1 with respect to Q, that is

V 0 = {w 2 V : Q(e1 , w) = 0}.
Because Q(e1 ) = 1 and Q is non-degenerate, this is an n 1-dimensional subspace and the

restriction Q0 of Q to V 0 is a non-degenerate quadratic form. If we choose any basis e2 , . . . , en
spanning V 0 , then the matrix representation of Q in this basis will have the form
2
3
1 0
0
6 0 b0
b0n2 7
22
7
=6
Q
(1.2.7)
6 ..
7
..
..
4 .
5
.
.
0
b0n2
b0nn
The top left-hand element is Q(e1 ). The zeros in the first row and first column come from the
orthogonality condition B(e1 , w) = 0 for all w 2 V 0 .
1.3. CURVES, TANGENT VECTORS AND SO ON
11
We suppose by induction that the theorem has already been proved for non-degenerate
quadratic forms on vector spaces of dimension < n. Then we can choose e2 , . . . , en so that the
matrix of b0ij is diagonal with entries 1.
Recall that if all the signs in (1.2.6) are + then we call Q (or B) positive-definite; if they
are all , negative-definite. If the signature is r s, then r is the largest possible dimension of
subspaces of V on which Q is positive-definite; and similarly s is the largest possible dimension
of subspaces of V on which Q is negative-definite. Note, however, that such subspaces are not
unique.
1.2.3. Affine version. Suppose we want to incorporate translations. We may define the
group of affine isometries of (V, Q) to be all maps of the form
x 7! T x + b, where T 2 O(V, Q) and b 2 V.
In the case of of interest in relativity, where Q has signature (+, , , ), this is called the
Poincare group.
In this affine case, we should think of Q as giving the length-squared of position vectors of
one point relative to another, that is
!
!
Q-length-squared of AB = Q(AB).
In a euclidean space, (i.e. Q is positive-definite) we can also measure angles between dis!
placement vectors: if X, Y and Z are three points, then the angle, , that u = XY makes with
!
v = XZ satisfies
B(u, v)
p
cos = p
.
Q(u) Q(v)
Thus we have lengths and angles as in ordinary two- or three-dimensional euclidean geometry.
Notation 1.2.16. In a euclidean vector space with fixed positive definite quadratic form it
is common to refer the associate bilinear form as an inner product and denote
pit B(u, v)
p= hu, vi.
Similarly the length of a vector in this context is often simply denoted |u| = Q(u) = B(u, u).
Remark 1.2.17. Why have I gone through all this? First of all, why should it be that vector
spaces or affine spaces are the right thing for describing the world?
We agree that triples of real numbers (x, y, z) are very good for describing where things are
in the world (or indeed the solar system...). But the key thing about vector and affine spaces
(which well discuss next) is that there is a distinguished set of curves (or paths or trajectories),
namely the straight lines. These are distinguished in the sense that if you take a straight line
and apply an affine transformation (translation and linear transformation), you get another
straight line.
Fast forward to Newtons laws of motion: the first of these says that a particle remains at
rest or continues to move with a constant velocity unless acted upon by an external force.
Thus straight lines have a physical significance, as the trajectories of free particles. In this
section, we have learned or recalled the underlying geometry of 3-dimensional spaces of points
(x, y, z), as vector or affine spaces. Weve discussed the symmetries, both linear and affine, of
such spaces. Weve introduced bilinear forms so that we have a notion of length and distance
in our vector space. And this flat geometry seems a good basis for Newtonian physics, because
it has straight lines in some sense built in, and these are the trajectories of particles not acted
upon by any force.
1.3. Curves, tangent vectors and so on
For our purposes, the best definition of a curve in a vector space (or an affine space) is as a
smooth map
:I!V
12
where I is some interval which may be open, closed, (semi-)infinite, whatever. If I is closed
(and bounded), I = [u, v], say, then the endpoints of the curve are (u) and (v).
We generally assume the parameterisation is regular i.e. 0 (t) 6= 0 for every t 2 I. Then
0 (t) is called the velocity vector and it is tangent to the curve. If we choose a basis of V , then
(t) is represented in terms of its components ( 1 (t), . . . , n (t)), where each of the j is just an
ordinary smooth function of the variable t. Then 0 (t) has component ( 10 (t), . . . , n0 (t)).
If 0 (t0 ) = 0 for a point t0 of I, then the curve can be singular (have a sharp corner) at that
point. We dont want to consider such singularities.
The image of is sometimes called the trace of the curve.
The vector 00 (t) with components ( 100 (t), . . . , n00 (t)) along the curve is called the acceleration
vector.
Example 1.3.1. If
a = (a1 , . . . , an ) and b = (b1 , . . . , bn )
are two (constant) vectors in Rn , with b 6= 0, then we may construct the parameterized straight
line
(t) = a + bt
(1.3.1)
0
through a in the direction b. In fact the tangent vector (t) = b (i.e. b is the velocity vector in
this case) and the acceleration 00 (t) is zero.
Example 1.3.2. In R2 , consider the parameterized circle
(t) = (a cos t, a sin t),
(1.3.2)
where a > 0 is a constant. Then

0
(t) = ( a sin t, a cos t) and
00
(t) = ( a cos t, a sin t).
1.3.1. Curves in a euclidean space. Let V be a euclidean space, i.e a real vector space
equipped with a positive-definite inner product h, i and length2 denoted by | |2 (cf. Notation 1.2.16).
Definition 1.3.3. A curve
: I ! V is parameterized by arc-length if | 0 (t)|2 = 1 for all t.
Proposition 1.3.4. If : I ! V is any regular curve in a euclidean vector or affine space,

then there is a reparameterization of the curve so that it is parameterized by arclength.
In other words, there is a smooth inverible map : I ! J, with 0 (t) > 0 for all t and such
that
: J ! V, ( ) = (t( ))
satisfies
| 0 ( )|2 = 1 for all 2 J.
(1.3.3)
Proof. Let (t) be the given curve. Let be any increasing smooth invertible function of
t and define ( ) as above, or equivalently,
( (t)) = (t).
Dierentiating (chain rule),
0 ( ) 0 (t) =
(t).
Take the length-squared, to get

( 0 (t))2 | 0 ( )|2 = | 0 (t)|2 .
Imposing that is arclength, we get the equation
0 (t) = | 0 (t)|.
where we choose the positve square consistently with being an increasing function of t. Since
we assume that | 0 (t)| > 0 (regularity of the curve), we get an equation which determines as
a function of t, uniquely up to the addition of a constant.
1.4. CALCULUS OF VARIATIONS
13
1.4. Calculus of variations

The calculus of variations is the art of finding functions that minimize certain functionals.
It may be thought of as an infinite-dimensional version of the ordinary applications of calculus
to find maxima and minima of functions. It is infinite-dimensional in the sense that the the
independent variables are now functions and these live in an infinite-dimensional space.
A typical example would be the following: amongst all regular curves : [0, 1] ! Rn , joining
p to q, find the one which minimizes the energy
Z
1 1
E[ ] =
| (t)|2 dt.
(1.4.1)
2 0
(Here, for consistency with later sections, we use for the derivative of the curve .) This is
the main example we shall consider, though it will be substantially generalized when we come
to consider geodesics in general manifolds or space-times.
Calculus of variations was treated in a previous course, and we shall remind you of some of
the notation and other facts here.
First of all, the thing being integrated is called the Lagrangian. The simplest (timeindependent) Lagrangians are functions of 2n variables, L = L(x, y), and then the functional to
be minimized/maximized will be
Z 1
F[ ] =
L( (t), (t)) dt where (0) = p and (1) = q.
(1.4.2)
0
Notation 1.4.1. In this section, for consistency with later applications we are going to
denote the components of x and y with upstairs indices. To be clear about this, in the previous
paragraph (taking n = 3 for notational simplicity) x and y both stand for triples of numbers, or
points in a 3D vector space. If we choose a basis to expand these vectors, we would get triples
of real numbers which we are going to denote
x = (x1 , x2 , x3 ) and y = (y 1 , y 2 , y 3 ).
This notation takes some getting used to: x2 is not the square of x! With luck youll get used
to this pretty quickly, and will learn not to think that x2 is the square of x.
We shall also write x = (xa ), y = (y a ), with the understanding that the index a runs from
1 to n.
Theorem 1.4.2. If is a sufficiently smooth curve which is an extremal for (1.4.2), and
the Lagrangian is sufficiently nice, then we have
d @L
dt @y a
@L
= 0, xa (0) = pa , xa (1) = q a .
@xa
(1.4.3)
Extremal means that is a local maximum of minimum of (1.4.2) amongst all curves with
the same endpoints.
It is very important to understand the meaning of the d/dt here. It means that we calculate
the partials of L wrt the y a , and then we substitute the values xa (t) for xa and x a (t) for y a ,
before dierentiation with respect to t.
These equations (1.4.3) are called the EulerLagrange equations. The Lagrangian is often
written as a function L(xa , x a ) and the EulerLagrange equations as
d @L
@L
= 0.
(1.4.4)
dt @ x a @xa
With experience, there is no problem with this, except that one has to say something about regarding the xa and x a as independent variables when computing the partials and then regarding
them @L/@ x a as a function of t before dierentiation wrt to t.
Example 1.4.3. Consider the energy functional
1
1
L(x, y) = |y|2 =
(y 1 )2 + + (y n )2 .
2
2
(1.4.5)
14
We have
@L
@L
= 0, a = y a .
a
@x
@y
Inserting y a = x a (t) and subbing in, we get the EL equations:
x
a = 0.
(1.4.6)
(1.4.7)
We learn that xa (t) = pa + v a t where pa and v a are constants. Of course, this is just
the parametric form of the straight line through pa in the direction v. Note that with this
parameterization, the straight line is traversed at constant velocity v as well. In fact, given
xa (1) = 1, v a = q a pa .
1.4.1. Coordinate invariance. If we have a change of coordinates, z a as an invertible
function of the xa , then the result of minimizing a particular functional should not change. Of
course the explicit formulae giving the z a as a function of t will be dierent from the formulae
giving the xa as a function of t. The relation will be
z a (t) = z a (x(t)).
(1.4.8)
As a telling example, consider the same energy functional in spherical polars (r, , '). We
have
Z
1 2
E[ ] =
r + r2 (2 + sin2 ' 2 dt
(1.4.9)
2
We calculate
@L
@L
@L = r2 sin2 '.
= r,
= r2 ,
(1.4.10)
@ r
@ '
@
and
@L
@L
@L
= r(2 + sin2 ' 2 ),
= r2 sin cos ' 2 ,
= 0.
(1.4.11)
@r
@
@'
Hence the EulerLagrange equations are as follows
r = r(2 + sin2 ' 2 ),
d 2
r
= r2 sin cos ' 2 ,
dt
(1.4.12)
(1.4.13)
d 2 2
r sin ' = 0.
(1.4.14)
dt
It hardly needs saying that these equations are much more daunting than the same system in
euclidean coordinates. There are, however, some tricks to lead us to a solution, which we shall
discuss next.
1.4.2. Symmetries imply conservation laws. A Lagrangian is said to have a symmetry if it is independent of a coordinate. For example the energy Lagrangian expressed in
euclidean coordinates is independent of all the xa (it depends only upon the x a ), while in polar
coordinates it is independent of '. By the EulerLagrange equations, if @L/@xa = 0, for a particular a, then @L/@ x a is constant along a solution curve. We also sometimes say is a constant
of the motion.
This principle is very important in many cases, because by knowing that certain quantities
are constant we can render the EulerLagrange equations easier to solve.
Another important symmetry principle is the following: Suppose that we have a Lagrangian
of the form
L(x, y) = T (x, y) U (x, y)
(1.4.15)
Proposition 1.4.4. Suppose that L is as in (1.4.15) and moreover T (x, y) is homogeneous
of degree 2 in y for each fixed x. Then
E(x, y) = T (x, y) + U (x, y)
is constant along solutions of the EulerLagrange equations (1.4.3).
1.4. CALCULUS OF VARIATIONS
15
The physical interpretation is that T (x, x)

is the kinetic energy and U (x, x)
is the potential
energy. In many physical situations, the kinetic energy is quadratic in the velocities, so satisfies
the assumptions of the Proposition. Then E is the total energy, and the result is that the total
energy (kinetic + potential) is conserved.
Proof. See Problem 1.11.
Example 1.4.5. Let us see how to use these ideas to solve the system (1.4.12), or at least
how to obtain some of the solutions. We use the principles weve just discovered: along a
solution, the energy itself is constant,
r 2 + r2 (2 + sin2 ' 2 ) = 2E
(1.4.16)
and because L is independent of ' the angular velocity
J = r2 sin2 '
(1.4.17)
is also constant. The full analysis is complicated, but we see that if = /2, then the -equation
is satisfied, so we can start by considering those solutions with = /2 identically1. (Note that
= 0 or are not such good choices, as these values of are singularities of the coordinate
system.)
So, let us substitute = /2 and see what were left with:
r 2 + r2 ' 2 = 2E, J = r2 '
(1.4.18)
are both constants.

We now distinguish two cases: J = 0 radial and J 6= 0, non-radial.
If J = 0, then either
p
r = 0 which is not terribly interesting or ' = 0 and then r = 2E. So what we have found is
the path
p
(t) given by r = 2Et, = /2, ' = '0
This is a straight line emanating from the origin at t = 0 in the equatorial plane going through
the line of longitude ' = '0 .
In the non-radial case (perhaps with the benefit of hindsight) it turns out to be better to
try to find u = 1/r as a function of '. We use
du
u
= .
d'
'
So divide (1.4.18) by ' 2 , getting
Thus if u = 1/r,
1
u4
so multiplying up by u4 ,
dr
d'
du
d'
+ r2 =
2E
' 2
+ r2 =
2E
' 2
(1.4.19)
(1.4.20)
du 2
2E
2E
+ u2 = 4 2 = 2 .
(1.4.21)
d'
r '
J
We can now integrate this to obtain an equation of a non-radial, equatorial geodesic.
We can solve this by dierentiating with respect to ':
d2 u
+ u = 0.
d'2
Hence u = u0 cos(' '0 ) for arbitrary constants ' and '0 . This is the equation of a straight
line in polar coordinates, consistently with Example 1.4.3.
In Problem 1.10, you are encouraged to work through the slightly more involved problem of
finding the orbit of a particle around a heavy star, using the same ideas.
1Actually, if we have any solution curve, then by using the spherical symmetry of the problem and re-orienting
our coordinate axes, we can reduce to this case
16
Remark 1.4.6. In the calculus of variations, you can consider more general time-dependent
Lagrangians, i.e. functions L(x, y, t). While the EL equations continue to give extrema in this
case also, Proposition 1.4.4 is false in general.
CHAPTER 2
Minkowski Spacetime and Special Relativity

2.1. Introduction
In this chapter we try to explain how Newtons basic laws had to be changed to take into
account the famous null result of the MichelsonMorley experiment which appeared to show
that the the speed of light was the same independent of the motion of the light-source.
In newtonian physics, there are such things as particles, masses and so on. The basic tenet
of newtonian physics, in modern language, is that there are inertial frames of reference, and an
observer at rest in such a frame observes free particles to remain at rest or travel at constant
speeds. S/he also observes the famous law F = ma to hold for masses acted upon by forces.
These inertial frames are such that if R is such a frame and R0 is moving with constant velocity
(i.e. constant speed and direction) with respect to R, then R0 is also an inertial frame. (See
Woodhouse, Special Relativity, Ch. 1 for more details. Those with an interest in history may
be interested in the end of Sect. 1.4, which indicates that Newton was aware of the idea of
relativity, but preferred to present things in a dierent way.)
Suppose that Alice and Bob are observers at rest respectively in R and R0 , and that R0 is
moving at constant speed relative to R. With suitable orientation of the axes, wed expect
x0 = x
vt, y 0 = y, z 0 = z
(2.1.1)
where v is the relative speed of the frames. (If Bob is sitting at (x0 , y 0 , z 0 ) = (0, 0, 0), then Alice
gives Bobs coordinates as x = vt, y = z = 0.)
This transformation (often known as the Galilean transformation between inertial frames)
is clearly incompatible with the idea that the speed of light is the same in all inertial frames.
The idea that this should be the caseand more generally that all physics should appear the
same to all (inertial) observersis the first of Einsteins famous postulates:
1. The laws by which the states of physical systems undergo change are not aected,
whether these changes of state be referred to the one or the other of two systems of
coordinates in uniform translatory motion.
2. As measured in any inertial frame of reference, light is always propagated in empty
space with a definite velocity c that is independent of the state of motion of the emitting
body.
In slogan form:
1. The laws of physics are the same in all inertial frames of reference.
2. The speed of light in free space has the same value c in all inertial frames of reference.
If the laws of physics are the same for any inertial frame, then it follows that no experiment
can be performed that will single out one frame as preferred above all others. In particular,
there can be no absolute standard of rest and only relative motion is physically meaningful.
Note that the constancy of the speed of light means that something has to go wrong with
the obvious change of coordinates (2.1.1). More explicitly, this change of coordinates is not
compatible with Maxwells equations of electrodynamics.
It is convenient to make certain subsidiary assumptions explicit as well:
P1 Free particles and photons (light particles) appear to inertial observers to travel in
straight lines at constant speeds.
P2 Photons appear to travel at the same speed c to all inertial observers.
P3 The standard clock of one inertial observer appears to any other observer to run at a
constant rate.
17
18
2. MINKOWSKI SPACETIME AND SPECIAL RELATIVITY
P4 Free particles cannot travel faster than c.

Note that in P3, although we assume that all clocks are observed by inertial observers to
run at constant rates, we do not assume that these rates are all the same: if Alice and Bob
are inertial observers moving relative to each other and are carrying identical clocks, Alice will
observe the ticks of Bobs clock to be at equal intervals as measured by her clock. However,
the intervals between these two sets of ticks are not assumed to be the same.
Consideration of Maxwells equations, and the wave equation satisfied by EM radiation,
suggest that it is sensible to consider a 4-dimensional vector space as the natural arena for
physics. We need to give this space a structure which allows us to capture the constancy of the
speed of light.
Definition 2.1.1. Minkowski Spacetime M (or sometimes just Minkowski Space) is a
four-dimensional affine space equipped with with a Lorentzian symmetric bilinear form .
Lorentzian means non-degenerate, signature (+, , , ).
The points of M are called events (as they are localized in both space and time). If we
!
pick a particular event E in M, we denote the set of position vectors EP , P 2 M, by M (or
ME ). For the most part we can blur the distinction between M and Mit is, after all, just
the choice of an eventbut for a complete understanding, its worth making the eort to keep
them separate.
Having chosen E to be an origin, we can now choose a basis (e0 , e1 , e2 , e3 ) of M and expand
the position vector of the general point P as a combination
!
EP = cte0 + xe1 + ye2 + ze3 .
(2.1.2)
for real numbers (t, x, y, z). We may suppose that is diagonal with respect to this basis, with
one 1 and three 1s on the diagonals, so that
! !
(EP , EP ) = c2 t2 x2 y 2 z 2 .
(Here c is going to be the speed of light and is included for dimensional consistency to convert
the time coordinate t into a quantity with the units of length.) We interpret t and (x, y, z)
as respectively the time and space (or temporal and spatial, if you want to use the correct
adjectives) coordinates. Then
! !
(EP , EP ) = 0 , c2 t2 = x2 + y 2 + z 2
and this is the case if and only if something travelling at speed c, starting from (0, 0, 0) at time
0 reaches (x, y, z) at time t.
If we take c to be the speed of light, then the geometry of M with
its bilinear from has the geometry of light-rays built into it:
two events E and F can be connected by a light-ray if and only if
! !
(EF , EF ) = 0.
In what follows, we answer the various questions about how P1P4 are captured by the
geometry of M and .
From now on we usually take the speed of light c to be equal to 1.
This can be regarded as a choice of units. (For example, if distance is measured in lightyears, and time in years, then c = 1.) In any given formula, you can always see where the
factors of c should go by dimensional analysis.
2.2. What are the inertial coordinate systems?
Hypothesis 2.2.1. The inertial coordinate systems are those obtained as above by fixing a particular event E as an origin and introducing coordinates corresponding to a basis
(e0 , e1 , e2 , e3 ) of M with respect to which the components of are
(e0 , e0 ) = 1, (e1 , e1 ) = (e2 , e2 ) = (e3 , e3 ) =
1, (ea , eb ) = 0 for a 6= b.
(2.2.1)
2.3. WORLDLINES
19
Note that there are many choices of inertial coordinate systems. From the mathematical
point of view, this is because there are many dierent choices of basis of M with respect to
which takes standard diagonal form. From the physical point of view this is because there are
many dierent inertial observers, all on an equal footing.
Remark 2.2.2. Note that if X and Y are two vectors in M , and if (e0 , e1 , e2 , e3 ) is a basis
as in (2.2.1) then if
X = X 0 e0 + X 1 e1 + X 2 e2 + X 3 e3 , Y = Y 0 e0 + Y 1 e1 + Y 2 e2 + Y 3 e3
we have
(X, Y ) = X 0 Y 0
X 1Y 1
X 2Y 2
X 3Y 3.
2.3. Worldlines
Recall that anythingparticle, observer, photonwhich exists for an extended period of
time, is described in Minkowski spacetime by a worldline. This is a curve in M consisting of all
the events through which our particle, observer, photon passes.
For example, suppose Alice is at rest at the spacial origin of an inertial coordinate system
(t, x, y, z). The events on Alices world line have coordinates of the form (t, 0, 0, 0), t being the
time on the clock that Alice has beside her.
More generally, Alice observes a particle by noting its (x, y, z) coordinates for dierent times
t. In other words she observes the particles world-line in the form of a curve
(t) = (t, x(t), y(t), z(t))
in the given coordinates.
It is often useful to decouple the parameter which parameterises the curve from an observers time coordinate, replacing the above by the more general form
( ) = (t( ), x( ), y( ), z( ))
so that all 4 coordinates depend on the parameter .
Example 2.3.1. If Alice is an inertial observer who sets up an inertial coordinate system
as above with herself at the (spatial) origin x = y = z = 0, then her worldline will be
t( ) = , x( ) = y( ) = z( ) = 0.
Definition 2.3.2. For the worldline ( ) of a particle, observer or photon in M, d /d is
called the velocity 4-vector.
The use of the term 4-vector is traditional. It helps to distinguish this vector from ordinary
velocity vectors: e.g. the velocity vector of a particle as measured by an observer.
Note that in terms of the original parameterization, (t) = (t, x(t), y(t), z(t)),
dx dy dz
d
= 1,
, ,
.
dt
dt dt dt
and the spatial part of this is the 3-vector
dx dy dz
, ,
.
dt dt dt
which is the instanteous velocity of the particle as calculated by Alice when her clock says time
t.
For now, we shall mainly be concerned with straight, constant-speed worldlines: i.e. where
has the form
( ) = X + V
(2.3.1)
where X and V are constant vectors in M . (Here again we are regarding M and M as the same
by choice of an event E of M corresponding to the zero-vector in M .
We now have to see how P2 and P4 are to be interpreted: photons travel at speed c = 1 as
measured by any inertial observer, and no particle is ever observed to travel faster than light.
20
2.3.1. What are the photon worldlines? Suppose that a photon is emitted by a laser
at the event with coordinates (0, 0, 0, 0) and passes through the event (t, x, y, z), relative to the
above inertial coordinate system. In other
p words, at the later time t, its spatial coordinates are
(x, y, z). Then the distance covered is x2 + y 2 + z 2 but this must be equal to t as the speed
!
is 1. In particular, if E and P are two events on the worldline of a (free) photon, then EP is
! !
null in the sense that (EP , EP ) = 0. [NB, a null vector need not be the zero vector!]
The following definition is useful:
Definition 2.3.3. Two events P and Q are null-separated if the displacement vector X =
!
P Q is null, i.e. (X, X) = 0.
Remark 2.3.4. This definition depends only upon the events P and Q, and the form ; it
does not depend upon any choice of inertial basis or coordinate system.
To flesh this remark out: We saw by calculation in a particular inertial frame, that if P and
!
Q are two events on a photon worldline, then P Q is a null vector. But the latter is a statement
purely about the geometry of M: it uses only the basic facts that given any two events we have
a displacement vector, and that we can feed vectors to . In particular all inertial observers
agree about when a pair of events are null separated, and hence the speed of light is the same
for all such observers.
This leads us to the following
Hypothesis 2.3.5. The worldline of a photon has the form
( ) = X + N ,
(2.3.2)
where X and N are constant vectors and N is null, i.e. (N, N ) = 0.

This hypothesis is justified, to some extent, by the following:
Proposition 2.3.6. If P1 and P2 are any events on the worldline (2.3.2), then P and Q
are null-separated.
Proof. If P1 and P2 correspond to parameter values 1 and 2 , then
!
P1 P2 = (X + N 2 ) (X + N 1 ) = (2 1 )N.
(2.3.3)
Now, by the bilinearity of ,

((2
1 )N, (2
1 )N ) = (2
1 )2 (N, N ) = 0.
(2.3.4)
In summary, we have seen that if inertial coordinate systems are defined as in Hypothesis 2.2.1 and free photon worldlines are as in Hypothesis 2.3.5, then all inertial observers agree
on the speed at which photons travel.
2.3.2. What are the free particle worldlines? Return to the two events E and P at
the beginning of the previous section, and suppose now that they are on the worldline of a
particle travelling at uniform speed v with 0 6 v < 1. Then we must have
p
x2 + y 2 + z 2 = |vt| < |t|
(2.3.5)
and so
t2
x2
y2
z 2 > 0.
(2.3.6)
Definition 2.3.7. A vector X 2 M is timelike if (X, X) > 0. Two events P and Q are
!
timelike separated if the displacement vector P Q is timelike.
We now make the free-particle hypothesis:
2.5. SPACETIME DIAGRAMS
21
Hypothesis 2.3.8. The worldline of a free particle has the form

( ) = X + V
(2.3.7)
where X and V are constant vectors of M and V is timelike. The parameter is called proper
time if (V, V ) = 1. In this case, if P1 and P2 are two events on the worldline with parameter
values 1 and 2 respectively, then 2 1 is interpreted as the elapsed time between these two
events as measured by a clock carried by an observer on this worldline.
Remark 2.3.9. As for null-vectors, the notion of a vector being time-like is independent of
any choice of observer or coordinate system. In particular if one inertial observer thinks that a
particle is travelling at speed less than that of light, all observers will agree on this.
Remark 2.3.10. Note the analogy between a curve being parameterized by proper time
here and the idea of unit-speed curves being parameterized by arc-length for ordinary curves
in euclidean space.
As in the case of photon worldlines, we started in Alices coordinate system (t, x, y, z), and
calculated that the events E and P are on the worldline of a particle moving at speed < 1 if
!
(and only if) the displacement vector EP is timelike. This, however, is a statement which is
independent of any particular choice of inertial coordinate system. Thus it must be the case
that Bob, with an inertial coordinate system (t0 , x0 , y 0 , z 0 ) will also calculate that E and P are
events on the worldline of a particle moving at speed less than 1.
2.4. Why do clocks carried by inertial observers all go at uniform rates?
Let us remember that the postulate P3 states that if Alice and Bob are inertial observers,
possibly in relative motion, then if Alice looks at Bobs clock, she will see that it is ticking at
a uniform rate, but that this rate may be dierent from the rate at which her own (identical)
clock is ticking.
Suppose that Bob has worldline
( ) = V
(2.4.1)
(so that he passes through the event E at parameter value = 0). Recall the hypothesis that
is proper time (i.e. the time as measured on the clock hes carrying with him) if (V, V ) = 1.
In Alices coordinate system (t, x, y, z), this has the form
t( ) = V 0 , x( ) = V 1 , y( ) = V 2 , z( ) = V 3 .
(2.4.2)
where
(V 0 )2
(V 1 )2
(V 2 )2
(V 3 )2 = 1.
(2.4.3)
In particular V 0 6= 0 and we find the time measured on Bobs clock is related to Alices time
coordinate t by the fixed multiple V 0 . So with all our hypotheses made about what inertial
frames are, we see that each of P1P4 are now satisfied.
2.5. Spacetime diagrams
Minkowski spacetime can be pictured by suppressing one or two of the spatial dimensions
and drawing a picture with time going up the page and x or x and y going across.
Suppressing y and z, leaving just a space variable x and a time variable t in play, a typical
spacetime diagram is shown below. Ive drawn in the worldlines of two free massive particles,
two photon worldlines and the axes of two inertial coordinate systems.
Note that the photon worldlines are at 45 and that the free particles have worldlines inclined
at less than 45 to the vertical.
22
free particle world-line
t-axis
t0 -axis
x0 -axis
x-axis
photon worldlines
free particle worldline
2.6. Time dilatationmoving clocks run slowly

The calculation of the previous section allows us to understand and quantify the sense in
which moving clocks run slowly. We imagine our two inertial observers, Alice and Bob, with
their identical clocks, one moving relative to the other. The slogan moving clocks run slowly
actually means that Alice will observe Bobs clock runing more slowly then her clock (and
symmetrically, Bob will observe Alices clock running slowly, by the same factor).
Based on our hypotheses, we can work out exactly what is going on here. In the previous
section, we had that Alice set up inertial coordinates (t, x, y, z), with her worldline being x =
y = z = 0. The time coordinate t is time as measured on her clock.
We saw that if Bobs worldline is ( ) = V , with (V, V ) = 1, then the components
(v 1 , v 2 , v 3 ) of Bobs velocity with respect to Alices coordinates are related to Bobs 4-velocity
vector by
v i = V i /V 0 , (i = 1, 2, 3).
At this point it is convenient to use ordinary euclidean 3-dimensional vector notation v for the
three dimensional vector with components (v 1 , v 2 , v 3 ). We can write
(V 0 , V 1 , V 2 , V 3 ) = V 0 (1, v)
from which
(V, V ) = (V 0 )2 (1
|v|2 ) = 1.
Thus
1
V0 = p
.
1 |v|2
The RHS here is usually denoted by (v) (where as usual v = |v|),
1
(v) = p
.
1 |v|2
This has to be replaced by
in units in which c 6= 1. Notice that
(v) = p
1
|v|2 /c2
(2.6.1)
(2.6.2)
(v) > 1
(2.6.3)
with equality if and only if v = 0. Moreover (v) ! 1 as v ! c.
Now the relation
t = V 0 = (v)
(2.6.4)
encodes the time dilatation or moving clocks run slowly: indeed, if P1 and P2 are two events
on Bobs world line which occur at parameter values 1 and 2 , then he reckons that the time
2.7. SIMULTANEITY AND DISTANCE
23
dierence is 2 1 . Alice however, reckons that the time dierence between these two events
is (v)(2 1 ), so the time elapsed is greater, according to Alice, by a factor of (v).
2.7. Simultaneity and distance
We have made a hypothesis about how coordinates introduced by diagonalizing are inertial
coordinates introduced by inertial observers and how unit speed straight lines are worldlines
parameterized by proper time.
There is a good question, though, which is how would an observer actually try to set up
coordinates without appealing to an absolute standard of rest. So suppose Alice wants to set
up such coordinates.
Suppose that F is any event in M. Alice is travelling on her straight world line which doesnt
pass through F . She sends out light signals and lets them scatter o F . She finds that the signal
emitted at time 1 on her clock scatters o F and is picked up by her at time 2 . She infers
two things: that the distance to F is c(2 1 )/2. And that F should have time coordinate
1
2 (1 + 2 ). This is the radar method of assigning times and measuring distances.
So using only allowable methods, she assigns a position and time coordinate to F .
Lets see how all this looks in terms of trajectories and worldlines.
If Alices trajectory is ( ) = U , and F has position vector Y relative to the chosen origin,
the photon trajectory is
7! U 1 + N
on the outward leg and
0
7! U 2 + N 0
on the return leg. Here N and N 0 are null vectors, and we may suppose that the parameters
and 0 are chosen so that these trajectories hit Y at = 1, 0 = 1:
U 1 + N = Y = U 2 + N 0
(2.7.1)
The displacement vector from E to F is

!
X := EF = Y
1
1
(1 + 2 )U = (2
2
2
1
1 )U + N 0 = (1
2
2 )U + N
(2.7.2)
We also assume (U, U ) = 1 so that Alices worldline is parameterised by her proper time .
!
Proposition 2.7.1. (U, EF ) = 0.
Proof. From (2.7.1),
U (1
2 ) = N 0
Take the -inner product of this with N and with

(1
(1
N0
N.
(2.7.3)
to get
2 )(U, N ) = (N, N 0 )
2 )(U, N 0 ) =
(N, N 0 )
(2.7.4)
(2.7.5)
where weve used

(U, U ) = 1, (N, N ) = (N 0 , N 0 ) = 0.
We can get our hands on (N, N 0 ) by squaring (2.7.3)
(N 0
N, N 0
N ) = (1
2 )2 (U, U ) so
2(N, N 0 ) = (1
2 ) 2 .
(2.7.6)
Combining with (2.7.4),

(U, N ) =
1
(1
2
2 ).
(2.7.7)
24
Now we calculate
!
1
(U, EF ) = U, N + (1 2 )U
2
1
= (U, N ) + (1 2 )
2
1
1
=
(1 2 ) + (1 2 )
2
2
= 0
as required.
If you didnt like that proof, here is another. There are often several dierent ways to
accomplish the same thing.
Proof. The idea of this proof is to write everything in terms of the null vectors N and N 0 .
It is perhaps a more symmetrical proof than the previous one. From (2.7.3), we obtain
U=
N0
1
N
.
2
(2.7.8)
!
We also have the two formulae for EF in (2.7.2). Adding these, we get
!
2EF = N + N 0 .
Now we calculate
!
(U, EF ) =
1
(N + N 0 , N 0
2(1 2 )
by using again the bilinearity of to expand the RHS.
N ) = 0.
(2.7.9)
(2.7.10)
U 2
N0
As world-line ( ) = U
F
E
U 1
Definition 2.7.2. If Alice is moving uniformly with 4-velocity vector U , then she reckons
two events E and F to be simultaneous if
!
(U, EF ) = 0.
These events then have a well-defined spatial separation d, where
! !
d2 = (EF , EF ).
However we look at it, the key point is that if we have two observers, Alice and Bob, moving
relative to each other, then they will generally disagree about which pairs of distant events are
simultaneous.
From the mathematical or geometric point of view, they have dierent 4-velocity vectors U
!
and V . If F and G are distant events, Alice thinks they are simultaneous if (U, F G) = 0, while
!
Bob thinks they are simultaneous if (V, F G) = 0. These are dierent conditions, and if one is
satisfied, then there is no guarantee that the other one will also be.
Definition 2.7.3. Let P and Q be two particles. If Alice is an inertial observer with
4-velocity U , she measures the distance between these two particles at time on her clock by
2.8. LENGTH CONTRACTION
25
Finding events F on the world-line of P and G on the worldline of Q which are simultaneous with the event E at time on her worldline; in other words, finding F and G
!
!
such that (U, EF ) = 0, (U, EG) = 0.
Calculating the distance as
q
! !
d=
(F G, F G).
2.8. Length contraction
Lets push this operational definition of distance or length to see how an inertial observer will
measure the length of a moving rod. Suppose that we have a rod of length d, whose endpoints
have world-lines
( ) = V , ( 0 ) = D + V 0
where (D, V ) = 0. The length of the rod should be defined as the length of the rod as measured
by an observer at rest with respect to the rod. For such an observer, with 4-velocity V , we ask:
which pairs of events ( ) and ( 0 ) are simultaneous. Plugging in the defintion we need
(V, D + V ( 0
)) = (V, D) + 0
= 0.
By assumption (V, D) = 0, so we get 0 if and only if 0 = . For these simultaneous events,

!
the
p relative position vector EF is equal to D, independently of . So the length of the rod is
(D, D).
To be more concrete, in the rest-frame of the rod, if it is lying along the x-axis, wed have
( ) = (, 0, 0, 0),
( ) = (, d, 0, 0).
If Alices worldline is ( ) = U , to measure the length, she has to find events E and F on
the world-lines that she considers to be simultaneous.
If these are V and D + V 0 , then the displacement vector is
X = D + V ( 0
(2.8.1)
To satisfy the simultaneity condition (Definition 2.7.3) we need to solve

(U, D + V ( 0
)) = 0
(2.8.2)
which gives 0 = (U, D)/(U, V ).

Hence the relative position vector X between these simultaneous events will be
X=D
(U, D)
V.
(U, V )
(2.8.3)
Thus we compute
(X, X) = (D, D)
(U, D)(V, D) (U, D)2

(U, D)2
+
= (D, D) +
,
2
(U, V )
(U, V )
(U, V )2
making heavy use of the bilinearity of , and the fact (V, D) = 0.

Hence Alice calculates the length of the rod as
s
(U, D)2
d0 =
(D, D)
.
(U, V )2
(2.8.4)
(2.8.5)
Recall that (D, D) = d2 is the square of the length of the rod as measured in its rest-frame,
so d0 6 d in general.
To understand this calculation, we may work explicitly in the frame in which the rod is at
rest. Then
V = (1, 0, 0, 0), D = (0, d, 0, 0).
(2.8.6)
Alices 4-velocity vector U has the form
U = (u)(1, u) = (u)(1, u1 , u2 , u3 )
(2.8.7)
26
where u is the velocity of Alice relative to Bob and (u1 , u2 , u3 ) are its components in Bobs
coordinates and
1
(u) = p
.
(2.8.8)
1 |u|2
as in (2.6.1). Hence
(U, D) =
(u)du1 , (U, V ) = (u)
(2.8.9)
and so from (2.8.5), we find
q
0
d = d 1 u21
This is the famous LorentzFitzgerald length contraction: if the component u1 of the relative
velocity of the observer in the direction of the
p rod is non-zero, then the observer judges the length
of the rod to be less than d by the factor 1 u21 . Note that there is no length contraction if
the observer is moving at right-angles to the rod.
2.9. Lorentz transformations
The set of linear transformations of M which preserve is called the Lorentz group. This
means, concretely, the set of linear maps L : M ! M such that
(LX, LY ) = (X, Y )
for all X, Y 2 M . Even more concretely, if we choose a diagonalizing basis (e0 , e1 , e2 , e3 ) of M ,

then M is identified with R4 , L becomes a real 4 4 matrix and the condition is
Lt L = , = diag(1, 1, 1, 1).
The group of all Lorentz transformations, or the Lorentz group, is also denoted by O(1, 3).
We shall use Lorentz transformations mainly to compare dierent inertial frames with the same
event in M as origin. More precisely, suppose that Alice introduces an inertial basis (e0 , e1 , e2 , e3 )
and Bob introduces an inertial basis (e
e0 , ee1 , ee2 , ee3 ). Their coordinates are respectively (t, x, y, z)
and (e
t, x
e, ye, ze).
Because (ei ) and (e
ei ) are both diagonalizaing bases, there is a Lorentz transformation L
with the property
(e
e0 , ee1 , ee2 , ee3 ) = (e0 , e1 , e2 , e3 )L
(2.9.1)
This is shorthand for expressing the primed basis vectors as linear combinations of the unprimed
ones.
Note that the matrix product
0 t1
ee0
Beet C
B 1t C (e
(2.9.2)
@ee2 A e0 , ee1 , ee2 , ee3 )
t
ee3
has as its ab component the scalar (e
ea , eeb ). Hence, as (e
ea ) are diagonalizing, this matrix is
diagonal, with entries
(e
e0 , ee0 ) = 1, (e
e1 , ee1 ) = (e
e2 , ee2 ) = (e
e3 , ee3 ) =
But substituting in terms of e and L,
Lt et eL = , so Lt L = ,
confirming the relevance of the Lorentz group for changing frame between observers.
Suppose that our frames are related by (2.9.1). Multiplying on the right by the column
vector (e
t, x
e, ye, ze), we get
0 1
e
t
Bx
C
e
C
e
tee0 + x
eee1 + yeee2 + zeee3 = (e0 , e1 , e2 , e3 )L B
(2.9.3)
@yeA
ze
2.10. ORIENTATION AND TIME-ORIENTATION
27
so that by definition of (t, x, y, z),
0 1
0 1
e
t
t
Bx C
Bx
C
e
B C = LB C
(2.9.4)
@y A
@yeA
z
ze
Note the usual confusing point that the L appears multiplying the untilded es in (2.9.1) but
the tilded coordinates in (2.9.4).
2.9.0.1. Examples. A particular example (with c = 1) is
0
1
(v)
(v)v 0 0
B (v)v
(v) 0 0C
C
L=B
(2.9.5)
@ 0
0
1 0A
0
0
0 1
This is often referred to as the standard 2D Lorentz transformation. It is of course, fourdimensional, but y = ye and z = ze, so all the action is going on in the way the (t, x) and (e
t, x
e)
variables are related to each other. To save writing, well ignore the y, z and ye, ze variables in
the rest of the discussion of this
p example.
Here, as before, (v) = 1/ 1 v 2 . Suppose Bob is sitting at the origin of the coordinate
system. Then his world line is x
e = 0. Inserting (e
t, 0) into the coordinate transformation, we see
that
t = (v)e
t, x = (v)ve
t.
This gives Bobs worldline, as a parameterized curve in Alices coordinate system (the parameter
being e
t). Since x/t = v for this curve, we see that Bob is moving at speed v in the direction of
Alices positive x-axis. The conclusion is that this Lorentz transformation corresponds precisely
to two inertial observers one moving at speed v relative to the other.
It is of interest to derive this from the postulates P1P4 and the relativity principle R. I
shall omit this here: you can find it in Woodhouse, SR (new edition, 4.44.6.)
Consideration of this transformation gives a dierent way to derive the standard counterintuitive properties of SR: time dilatation, length contraction, and so on.
2.9.1. The Lorentz and Poincar
e Groups. The Poincare group is the Lorentz group
with (4-dimensional) translations included. Thinking in terms of coordinate transformations,
the typical element of the Poincare group has the form
0 1
0 1 0 1
e
t
p
t
Bx C
Bx
C BqC
e
B C = L B C + B 0C
(2.9.6)
@y A
@yeA @r A
z
s
ze
where (p, q, r, s) are constants and L is a Lorentz transformation.
From a more sophisticated point of view, it is the natural symmetry group of the affine
space M, preserving the bilinear form on the set ME of all position vectors relative to E.
Remark 2.9.1. The 3-dimensional euclidean group is contained in the Poincare group, via
0 1
0 1 0 1
e
t
0
t
Bx C
Bx
C Bq C
1
0
e
B C=
B C+B C
(2.9.7)
@y A
0 R @yeA @r A
z
s
ze
where R is a 3-dimensional orthogonal transformation.
2.10. Orientation and time-orientation
Just as the reflections are isometries of euclidean space that do not seem to be realized
physically, so there are some Lorentz transformations that are not so relevant as others. The
physically most relevant Lorentz transformations are those that preserve the spatial orientation
(rotations rather than reflections) and a time orientation or times arrow.
28
Definition 2.10.1. A time orientation of M or M is a choice of one half or nappe1 of the

cone of timelike vectors in M . Accordingly every timelike-vector is either future-pointing (if it
is in the chosen nappe) or past-pointing (if it is in the other nappe).
An inertial basis (e0 , e1 , e2 , e3 ) is time-oriented if the timelike vector e0 is future-pointing.
Lemma 2.10.2. Let X and Y be timelike vectors. Then (X, Y ) 6= 0. Moreover, (X, Y ) > 0
if X and Y are in the same nappe of the cone and (X, Y ) < 0 otherwise.
Proof. If X is timelike, then we know that (X, X) > 0. On the orthogonal space =
{S : (X, S) = 0}, is negative definite, because its signature is one plus and three minuses.
Thus any vector S which is -orthogonal to X must satisfy (S, S) < 0. So (X, Y ) 6= 0 if X
and Y are timelike.
The last part is perhaps most easily seen by choosing a frame in which X = e0 , for some
> 0. If the components of Y are (Y 0 , Y 1 , Y 2 , Y 3 ), then (X, Y ) = Y 0 . Clearly X and Y are
in the same nappe if Y 0 > 0.
One can show that if a Lorentz transformation maps one future-pointing timelike vector to
a future-pointing timelike vector, then it maps all future-pointing timelike vectors to futurepointing timelike vectors.
Definition 2.10.3. A Lorentz transformation is orthochronous if it maps the future-pointing
nappe to the future-pointing nappe, in other words, if it preserves the time-orientation. The
subgroup of such transformations is denoted O+ (1, 3).
Definition 2.10.4. A Lorentz transformation is called proper if its determinant is 1. The
group of proper, orthochronous Lorentz transformations is denoted SO+ (1, 3).
We note that spatial reflections are excluded from SO+ (1, 3), and so is any transformation
that reverses the arrow of time. Thus this seems to be the most physically appropriate subgroup
of O(1, 3).
Alongside these restricted groups, we should also restrict the allowable inertial bases. We
say that a basis is oriented and time-oriented if e0 is future-pointing and (e1 , e2 , e3 ) is righthanded. Then SO+ (1, 3) maps any oriented and time-oriented basis to another such basis, and
conversely any two such bases are related by an element of SO+ (1, 3).
Remark 2.10.5. It is worth mentioning that X is future-pointing (timelike or null) if and
only if X is past-pointing (timelike or null).
2.10.1. Causality in Special Relativity. If M is given a time-orientation, then the nonzero null vectors also fall into two distinct sets, the future-pointing and the past-pointing. A null
vector N is future-pointing if (X, N ) > 0 for any given future-pointing timelike X. Similarly
a null vector is past-pointing if (X, N ) < 0 for future-pointing timelike X. (Of course, if N is
null future-pointing, then (Y, N ) < 0 if Y is timelike past-pointing.
A future-pointing null vector is in the boundary of the future-pointing nappe of the coneit
is a limiting case of future-pointing timelike vectors. For example, if we consider
Xt = e0 + te1 ,
where e0 is future-pointing timelike and e1 is spacelike (i.e. (e1 , e1 ) =
(Xt , Xt ) = 1
1), then
is timelike future-pointing if |t| < 1 and null future-pointing if t = 1.

Let E and F be two events in M. The event E (for example an explosion or a light signal)
!
can only have a causal eect on F if the displacement vector EF is timelike or null futurepointing. This condition guarantees that E and F can be connected by a particle travelling in
a straight line at speed 6 c = 1 (the speed of light) and that F is in the future of E.
The set of all events F which can be aected causally eected by a given event E is the set
!
Fut(E) = {F 2 M such that EF is timelike or null future-pointing}
(2.10.1)
1Dictionary definition: In geometry, a nappe is half of a double cone
29
The set Fut(E) can be pictured as the solid half-cone whose boundary is the set of futurepointing null vectors emanating from E.
Similarly, the set of all events G which can aect or influence E is
!
Past(E) = {G 2 M such that GE is timelike or null future-pointing}
!
= {G 2 M such that EG is timelike or null past-pointing}
(2.10.2)
This can be pictured as the solid half-cone whose boundary is the set of past-pointing null
vectors emanating from E.
So although space and time are mixed up in the geometry of special relativity, there is still
a well defined notion of causality.
We have defined timelike and null vectors. If a vector is not timelike or null, it is called
spacelike:
Definition 2.10.6. If X 2 M , then X is called spacelike if (X, X) < 0. Two events E and
! !
F in M are said to be spacelike separated if (EF , EF ) < 0.
We end by noting the following:
!
Proposition 2.10.7. Suppose that E and F are events such that EF is future-pointing
timelike. Then there exist inertial coordinates such that E has coordinates (0, 0, 0, 0) and F has
coordinates (t, 0, 0, 0) with t > 0.
Suppose that E and F are spacelike separated events. Then there exists an inertial frame
with respect to which E and F are simultaneous (e.g. E has coordinates (0, 0, 0, 0) and F has
coordinates (0, d, 0, 0). Moreover, there exist other coordinate systems in which E occurs before
F.
Proof. The first follows from the basic fact that given if X is any future-pointing timelike
vector, then there is an oriented and time-oriented basis (e0 , e1 , e2 , e3 ) with respect to which
!
is diagonal, and such that X is a positive multiple of e0 . In such a basis, EF = (t, 0, 0, 0),
where t > 0, and if we choose the origin so that the coordinates of E are (0, 0, 0, 0), then the
coordinates of F will be (t, 0, 0, 0).
! !
!
Similarly if (EF , EF ) < 0, we can pick a multiple e1 of EF such that (e1 , e1 ) = 1.
!
We extend this to a diagonalizing (oriented and time-oriented) basis of , and then EF has
the desired form. In particular, it is -orthogonal to e0 and so these events will be judged
simultaneous by an observer with 4-velocity e0 .
For the last part, let V = e0 + e1 . Then
(V, e1 ) =
So if V is the 4-velocity vector of an observer, Bob, he will reckon that F happens after E if
< 0 and that F happens before E if > 0.
Remark 2.10.8. The above makes complete sense from the point of view of the radar
method. See the picture below. Consider two inertial observers, Alice and Bob, and suppose
that E is an event on both of their worldlines. If we suppose that Alice judges E and F to be
simultaneous, this means that Alice bounces a light signal o F , then if she sends it out at 1
and receives it at 2 , she assigns E time (1 + 2 )/2. In the diagram, Alice sends her light signal
out at event A1 , and receives it at A2 , and E is the midpoint of the segment A1 A2 .
Now if Bob is heading towards F , it is clear that the light signal he needs to send to bounce
of F has to be transmitted at event B1 and received at event B2 . The event on his worldline
that he judges to be simultaneous with F is therefore the midpoint of B1 B2 , shown as E 0 . It
is clear from the geometry that the segment EB1 is longer than EB2 , so E 0 will be, as shown,
before E on his worldline.
Similarly, if B is heading away from F , the event he judges simultaneous with F will be
later, on his worldline, then the event E.
30
A2
B2
As world-line
E0
Bs world-line
A1
B1
2.10.2. Spatial and temporal components. We have seen that inertial observers, and
free particles and photons have straight lines, and the basic feature of a worldline is the 4velocity vector. It is often annoying to choose a full inertial basis to solve particular problems,
but it is important to split Minkowski vectors into their spatial and temporal components with
respect to a particular timelike vector.
Suppose that V is a timelike future-pointing 4-vector. Then we can write any vector X in
terms of its components parallel to and -orthogonal to V . That is
X = V + Y, where (V, Y ) = 0.
(2.10.3)
Taking the scalar product with V ,

(V, X) = (V, V ) so
(V, X)
.
(V, V )
(2.10.4)
Then
(V, X)
V.
(V, V )
More concretely, relative to an inertial basis in which V is a positive multiple of e0 ,
Y =X
(2.10.5)
V = (V 0 , 0), X = (X 0 , )
(2.10.6)
(V, X) = V 0 X 0 , (V, V ) = (V 0 )2
(2.10.7)
Y = (0, ).
(2.10.8)
where
and
Here 0 is the ordinary 3-dimensional zero-vector and is also a euclidean 3-vector.
It is worth spelling out that if X and Z are any two Minkowski vectors with components
X = (X 0 , ), Z = (Z 0 , )
then
(X, Z) = X 0 Z 0
A photons velocity 4-vector will take the form
!(1, e)
.
(2.10.9)
where ! > 0 (the photon is travelling forward in time) and e is a unit vector. For physical
reasons, ! is identified with the frequency of the photon as measured by an observer with
4-velocity V .
31
Example 2.10.9. Relative velocity. Suppose that Alice and Bob are inertial observers with
4-velocity vectors U and V , with (U, U ) = (V, V ) = 1.
In Alices rest-frame,
U = (1, 0), V = (1, v)
for some constant . This is, of course, the
says
2
(2.10.10)
factor again, because the condition (V, V ) = 1

|v|2 ) = 1.
(1
Thus v is the velocity vector of Bob as measured by Alice.

Remark 2.10.10. What weve just seen is a very useful way of calculating -factors: if
Alice and Bob are inertial observers with 4-velocities U and V , then the -factor of their relative
speed is equal to (U, V ).
We shall use this in the following:
Example 2.10.11. Alice, Bob and Chris are inertial observers with 4-velocity vectors U ,
V , and W , respectively, so (U, U ) = (V, V ) = (W, W ) = 1. Suppose that Bob reckons that
Chriss (relative) speed is w and Alice reckons Bobs (relative) speed is u. What does Alice
reckon that Chriss speed (relative to her) is?
Call the unknown speed . Then from the above, the -factor of is (U, W ),
(U, W ) = ().
(2.10.11)
(U, V ) = (u), (V, W ) = (w).
(2.10.12)
What we know is that
To get a handle on this it turns out to be simplest to work in Bs rest-frame. Then

U = (u)(1, u), V = (1, 0), W = (w)(1, w)
(2.10.13)
where u is the velocity vector of Alice relative to Bob and w is the velocity vector of Chris
relative to Bob.
Then
() = (U, W ) = (u) (w)(1
u w)
(2.10.14)
This is an answer, but it is instructive to rearrange it a bit. By squaring and taking the
reciprocal,
1
2 =
(1
u2 )(1 v 2 )
(1 u w)2
(2.10.15)
Hence
2 =
=
=
=
u w)2 (1 u2 )(1 w2 )
(1 u w)2
1 2u w + (u w)2 1 + u2 + w2
(1 u w)2
(u2 2u w + w2 ) + u w2 u2 w2
(1 u w)2
2
|u w| + u w2 u2 w2
(1 u w)2
(1
(2.10.16)
u2 w 2
This remarkably complicated formula nonetheless reproduces the classical answer |u

first approximation if u and w are much less than the light-speed c = 1.
(2.10.17)
(2.10.18)
(2.10.19)
w|2 to a
32
2.11. Interstellar travel

Consider the following simplified model of a voyage made by a space-ship from earth. We
will make it more complicated when we discuss uniform acceleration in the next Chapter. A
spaceship sets o from the earth at constant speed v to travel to a star distance D from the
earth. In the earths frame of reference, the travel time is clearly
T = 2D/v.
(2.11.1)
Let E be the event spaceship leaves earth let A be the event spaceship reaches destination
and let R be the event spaceship arrives back at earth. We assume that the space ship travels
at constant speed v on outward and return trip. Continuing the idealization, lets suppose that
the earth is inertial, with 4-velocity U . Taking E to be the origin of M, the world-line of the
earth is then 7! U . On the outward leg, the spaceships trajectory is 7! V1 and on the
inward leg it is 7! T V1 + V2 , where T V1 is the displacement vector from E to the arrival
event A.
We may choose the frame so U = (1, 0, 0, 0), V1 = (v)(1, v, 0, 0), V2 = (v)(1, v, 0, 0). By
geometry (see the picture below, in which the y and z directions have been suppressed)
T (1, 0) = S (v)(1, v) + S (v)(1, v).
(2.11.2)
Hence T = 2S (v) and so

2D
.
(2.11.3)
v (v)
This quantity is the elapsed time from the point of view of the astronauts, as measured by their
on-board clocks. As v approaches the light speed 1, so D/v approaches D and 1/ (v) ! 0.
Thus S ! 0 as v ! 1.
The conclusion is that with a fast enough space ship, astronauts could complete a trip to a
stars hundreds of light-years away within their life-spans. However, none of their friends would
be alive when they got back to Earth...
2S = T (v) =
R = (T, 0)
A = S (v)(1, v)
E = (0, 0)
2.12. Summary of key notation and definitions
Definition 2.12.1. If X 2 M is a vector, we say that X is
timelike if (X, X) > 0;
null if (X, X) = 0;
spaceelike if (X, X) < 0.
!
If X is the displacement vector EF from an event E to an event F in M, we have the
following corresponding definitions
!
Definition 2.12.2. Let E and F be events in M and let X = EF be the displacement
vector. Then
2.12. SUMMARY OF KEY NOTATION AND DEFINITIONS
33
E and F are timelike separated if X is timelike;

E and F are null separated if X is null;
E and F are spacelike separated if X is spacelike.
We usually assume that M and hence M are given a time-orientation. Then for timelike
and null vectors, we can say whether they are future or past-pointing. 4-velocity vectors of
genuine particles are then taken to be future-pointing. If E and F are timelike or null separated
!
and EF is future-pointing, then we assume that something happening at event E can have a
causal influence on what happens at F : and only in this case. In particular, space-like separated
events cannot causally aect each other (as faster-than-light travel would be needed). And also
!
E cannot aect F even if they are timelike or null separated if EF is past-pointing.
A velocity 4-vector U is a timelike (future-pointing) vector with (U, U ) = 1. If Alice has
velocity 4-vector U , then in her rest-frame, U = (1, 0) and the velocity 4-vector V of a particle
is written V = (v)(1, v) where v is the relative velocity of another particle as measured by
Alice. In particular (U, V ) = (v) gives the -factor of the relative velocity v.
Remark 2.12.3. If you compare the way I have set out SR with what is in Woodhouse,
you will notice that throughout his book, 4-vectors are denoted X a , Y a , V a , etc., and that
the Minkowski metric is denoted by g or gab . The point is that X a is the set of components
(X 0 , X 1 , X 2 , X 3 ) in a basis (e0 , e1 , e2 , e3 ) which is not necessarily explicitly mentioned. Thus
my X corresponds to Woodhouses X a via
X = X 0 e0 + X 1 e1 + X 2 e2 + X 3 e3 .
He also uses the summation convention over repeated upstairs and downstairs indices (which
well come back to) and would write the RHS X a ea . I have avoided the use of the summation
convention so far, though there will be no escape when we come to GR.
I have chosen rather than g as notation for the Minkowski metric in order to reserve g as
notation for the curved metrics that well use in GR.
CHAPTER 3
Further topics in Special Relativity

3.1. Non-inertial observers: acceleration
Particles acted upon by forces will not have straight worldlines in M. We want to capture
two key notions in the straight-line case: that of not travelling faster than the speed of light
(which we continue to take to be 1 most of the time) and that of proper time parameter, namely
the time as measured by a clock travelling along the worldline.
Hypothesis 3.1.1. If ( ) is the worldline of any massive particle, then is a proper time
parameter along if the velocity vector
d
V ( ) =
d
is timelike, future-pointing, and satisfies
(V ( ), V ( )) = 1
for all .
We also make an assumption about how such an accelerating observer will judge simultaneity:
Hypothesis 3.1.2. If Alice is an observer with worldline 7! ( ) and E is an event on her
worldline with parameter value 1 , say, then an event F is judged by Alice to be simultaneous
!
with E if (V (1 ), EF ) = 0.
Another way of thinking about this is as follows. If Bob is an inertial observer with 4-velocity
!
V (1 ) then he will judge E and F to be simultaneous if (V (1 ), EF ) = 0. So we are saying
that Alices notion of simultaneity at the event E = (1 ) should be the same as the notion of
simultaneity of an inertial observer (Bob) who has the same 4-velocity as she does, at the event
E.
From now on we shall use dot to denote dierentiation with respect to . Thus
V ( ) = ( ).
(3.1.1)
The acceleration vector A = = V is the second derivative of the parameterized curve.
Note the following calculation:
d
(V, V ) = 2(V, V ) = 0
(3.1.2)
d
The zero on the RHS is because (V ( ), V ( )) is constantly equal to 1!
Thus A is -orthogonal to V and in particular is space-like. In particular in the frame in
which V has components (1, 0, 0, 0), A will have components (0, a) and this acceleration is what
the particle actually feels. So the magnitude of the acceleration felt is
q
a=
( ( ), ( ))
Example 3.1.3. Constant acceleration in a plane Consider an accelerating particle in
a plane, which we may as well take to be the (t, x) plane. Then ( ) has the form
(t( ), x( )).
(3.1.3)
t2
(3.1.4)
The proper time condition is

x 2 = 1
35
36
3. FURTHER TOPICS IN SPECIAL RELATIVITY
Constant acceleration is the condition

t2
x
2 =
a2
(3.1.5)
where a is a constant. A trick to solve this is to suppose

t = cosh u( ), x = sinh u( ).
(3.1.6)
t = u sinh u, tu cosh u.
(3.1.7)
Then
Substituting in to (3.1.5)
So u = a(
u 2 = a2
0 ) (if a > 0, so the curve is future-pointing). Integrating the equations
t = cosh a, x = sinh a,
(3.1.8)
(3.1.9)
yields
1
1
sinh a, x( ) = (cosh a 1)
(3.1.10)
a
a
choosing the constants of integration so that the particle is at (t, x) = (0, 0) when = 0.
t( ) =
3.1.1. Interstellar travel revisited. Suppose that a spaceship starts from rest and its
engines deliver uniform acceleration a. What happens?
One thing is that the velocity remains below that of light, as it must. For large ,
1 a
1 a
t( )
e , x( )
e
(3.1.11)
2a
2a
which is a parameterization (not by proper time) of the null ray t = x.
[The trajectory is a hyperbola.]
The relation t = cosh a relates the time which elapses on board the ship, compared with
that t measured by clocks left behind on earth.
To reach distance D, you have to solve D = a1 (cosh a
1). If D is reasonably large,
sinh a ' exp(a )/2 so
1
= log(2D).
(3.1.12)
a
This logarithmic relationship means that in principle, with modest accelerations from rest, a
uniformly accelerating spaceship can cover interstellar distances in reasonable times (as measured by the astronauts). For example, suppose that we measure distance in light-years and
time in years. There are 1016 metres in a light-year and 3 107 seconds in a year.
So the acceleration due to gravity, 10ms 2 , is equal to 10 10 16 (3 107 )2 light-years
per year2 . This is (miraculously) approximately 1. So according to the above, a spaceship
accelerating so that the astronauts would feel a earths gravity on board would cover distance
D light-years in years, where log(2D), if D is reasonably large. If D = 100, then
= log(200) ' 5.3 years.
3.1.2. Relativistic motion in a circle. (Cf. Woodhouse, SR, p. 111).
Suppose that a particle moves on a circle
x(t) = R cos !t, y(t) = R sin !t, z(t) = 0.
In other words,
(t) = (t, R cos !t, R sin !t, 0)
We do not claim t is proper time, and the first thing to work out is the relation between t and
. We have
d
= (1, R! sin !t, R! cos !t, 0)
dt
which has Minkowski length-squared equal to 1 R2 ! 2 . Thus
p
p
d
= (d /dt, d /dt) = 1 R2 ! 2 .
dt
3.2. MOMENTUM AND ENERGY: E = mc2
37
Hence the 4-velocity vector of the particle is

d
d dt
1
=
=p
(1, !R sin !t, !R cos !t, 0).
d
dt d
1 R2 ! 2
Then the acceleration is
d2
1
==
(0, ! 2 R cos !t, ! 2 R sin !t, 0).
2
d
1 R2 ! 2
The Minkowski length-squared of this is
! 4 R2
(1 R2 ! 2 )
and so the acceleration felt by the particle is
a=
!2R
1 R2 ! 2
This is larger than the non-relativistic value by the factor 1/(1
R2 ! 2 ).
3.2. Momentum and energy: E = mc2

In this section we shall consider collisions between particles (including photons) in special
relativity.
We have to make some physical assumptions. We assume that massive particles have a
well-defined mass (the rest-mass). If a particle has 4-velocity V , with (V, V ) = 1, then the
4-momentum of the particle is defined to be
P = mV.
(3.2.1)
Suppose we have k particles with 4-momenta P1 , . . . , Pk which interact (for example in the
LHC) and after the interaction there are m outgoing particles with momenta Q1 , . . . , Qm . The
basic assumption is the conservation of total 4-momentum
Q 1 + + Q m = P1 + + Pk .
(3.2.2)
If Alice is an inertial observer with 4-velocity U , and we have a particle of rest-mass m and
with 4-velocity P = mV , then Alice can look at the spatial and temporal parts of P . If she
measures the velocity of the particle in her rest frame as v, then
P = m (v)(1, v)
where as usual
(v) = p
, v = |v|.
1 v2
Expanding this using the binomial expansion for small v,
(v) = 1 +
(3.2.3)
(3.2.4)
v2
+ O(v 4 )
2
we see that
P ' m(1 + v 2 /2)(1, v) = (m + mv 2 /2, mv) + O(v 3 ).
It is good to restore c here, in which case wed get
P ' (mc2 + mv 2 /2, mv) + O(v 3 /c).
Now the term mv 2 /2 appearing here is the classical kinetic energy of a particle of mass m and
moving at speed v. The spatial component mv is just the classical momentum.
Einsteins conclusion from these considerations was the equivalence of mass and energy:
that a particle of rest-mass m should have total energy mc2 : an observer with 4-velocity U
assigns total energy (U, P ) to a particle with 4-momentum P . The very surprising conclusion
is that a free particle of mass m has to be assigned energy mc2 by an inertial observer for whom
the particle is at rest.
38
3. FURTHER TOPICS IN SPECIAL RELATIVITY
Example 3.2.1. Elastic collisions A particle of rest mass m, velocity u relative to an

inertial frame, collides with a particle of rest mass m, which is at rest. After the collision, the
velocities of the particles are v and w. If is the angle between v and w, show that
1
cos =
(1 1/ (v))(1 1/ (w)).
(3.2.5)
vw
Let the incoming momenta be P1 = mU1 , P2 = mU2 , and the outgoing momenta be Q1 =
mV1 and Q2 = mV2 . In the rest frame of the at-rest particle, we have
P1 = m (u)(1, u), P2 = (m, 0), Q1 = m (v)(1, v), Q2 = m (w)(1, w).
(3.2.6)
Conservation of momentum says

P1 + P2 = Q 1 + Q 2 .
We can get some information by taking the -square of each side,
(P1 + P2 , P1 + P2 ) = (Q1 + Q2 , Q1 + Q2 )
(3.2.7)
(3.2.8)
which yields
2m2 + 2(P1 , P2 ) = 2m2 + 2(Q1 , Q2 )
(3.2.9)
2
since (P1 , P1 ) = (P2 , P2 ) = (Q1 , Q1 ) = (Q2 , Q2 ) = m . Using (3.2.6) to compute the
cross-terms in (3.2.9), we get
m2 (u) = m2 (v) (w)(1
v w).
(3.2.10)
This is useful because v w = vw cos , and cos is what we are looking for. On the other
hand, we need to eliminate (u). This can be done by taking the scalar product with P2 , or, in
down-to-earth terms, just by inspecting the temporal component of the conservation equation
(3.2.7), which gives
(u) + 1 = (v) + (w)
(3.2.11)
Combining this with (3.2.10) gives
1
vw cos =
(v) + (w) 1
(v) (w)
(3.2.12)
so
(v) + (w) 1
(v) (w)
(v) (w)
(v)
(w) + 1
(v) (w)
( (v) 1)( (w) 1)
(v) (w)
(1 1/ (v))(1 1/ (w)).
(3.2.13)
p
p
2 (v)
v/ 1 v 2 =
1, so a nice way of writing this
vw cos = 1
vw cos =
=
=
This is the result. Notice that v (v) =
is
( (v) 1)( (w) 1)

p
p
(v)2 1
(w)2 1
s
s
(v) 1
(w) 1
=
(v) + 1
(w) + 1
cos =
(3.2.14)
If v and w are small,

(v) 1 ' v 2 /2, (v) + 1 ' 2
and so cos ' 0. The fact that the outgoing trajectories are at 90 is a standard consequence
of conservatio of energy in newtonian mechanics.
Relativistically, however, (v) > 1 and (w) > 1, so cos is strictly less than 90 . Such
trajectories are observed in high energy particle interactions (e.g. in the LHC) and provide
confirmation of special relativity (or more precisely of the conservation of 4-momentum).
3.3. MOMENTUM OF PHOTONS
39
3.3. Momentum of photons

Photons are massless, but they do have momentum. To any photon is associated a null
vector, K, say. If Alice is an inertial observer with 4-velocity U , then in Alices rest frame, we
have
L = !(1, e),
(3.3.1)
where e is a unit vector. A natural assumption (by considering solutions of the wave equation) is
that ! is the (angular) frequency of the photon, as measured by Alice. We make this assumption
without further justification. We then assume that the momentum of the photon is
~L
(3.3.2)
where ~ ' 1.05 10

is Plancks constant.
By way of partial justification (at least if you know a tiny bit of quantum mechanics):
recall that the temporal component (relative to Alices frame) of the 4-momentum of a massive
particle is the energy of the particle as measured by Alice. Thus a reasonable requirement
of the 4-momentum of a photon is that it should be a multiple of its 4-velocity, such that its
temporal component is the energy that Alice would measure. With K as in (3.3.1) the temporal
component of (3.3.2) is ~!. That the energy of a photon is given by E = ~! is a basic principle
of quantum mechanics.
It is now natural to assume that 4-momentum is also conserved in collisions involving photons.
34 Js
Example 3.3.1. A photon with frequency ! collides with an electron at rest in an inertial
frame. After the collision, the frequency of the electron is ! 0 . Obtain a relation between the
scattering angle of the photon, the frequencies and the rest-mass of the electron.
The initial momenta (in the electrons rest frame) are
P1 = ~!(1, e), P2 = (m, 0)
(3.3.3)
The final momenta are
Q1 = ~! 0 (1, e0 ), Q2 = m (v)(1, v).

The momentum conservation equation is
(3.3.4)
P1 + P2 = Q1 + Q2 .
(3.3.5)
e0
Since we want to know cos , we want to square (3.3.5) in such a way as to get e as a cross
term. For this purpose we rearrange it so that both photon momenta are on the same side of
the equation:
P1 Q 1 = Q 2 P2 ,
(3.3.6)
which gives
(P1
Q1 , P1
Q1 ) =
2(Q1 , P1 ) = (Q2
P2 ) = 2m2
P2 , Q2
2m2 (v).
(3.3.7)
Simplifying,
~2 !! 0 (1 cos ) = m2 ( (v)
Looking at the temporal component of (3.3.5), we find
so finally
~! + m = m (v) + ~! 0 ) m( (v)
~!! 0 (1
cos ) = m(!
1).
1)) = ~(!
!0)
Remark 3.3.2. This process is known as Compton scattering.
(3.3.8)
!0)
(3.3.9)
(3.3.10)
CHAPTER 4
Multivariable calculus
4.1. Smooth functions and changes of coordinates
4.1.1. Smooth functions. Let be an open set of Rn . Open means that is a possibly
infinite union of open balls. The open balls of Rn are all of the form
B(a, r) = {x 2 Rn such that |x
a| < r}
(4.1.1)
where a is any point of Rn and r > 0. It is the strict inequality in (4.1.1) that makes B(a, r)
an open ball.
Let f : ! R be a real-valued function in . I assume you know what partial derivatives
are.
Definition 4.1.1. A function f is smooth, also written C 1 , if all partial derivatives, of any
order, exist. That is, for any non-negative integers, 1 , . . . , n ,
@ 1
@ n
f (p)
(4.1.2)
@x1
@xn
exists for every point p of . The set of all functions which are smooth in is denoted by
C 1 (). Then C 1 () is an infinite-dimensional vector space.
Remark 4.1.2. The order of the partial derivative in (4.1.2) is = 1 + + n .
Recall that for smooth functions, partial dierentiation with respect to dierent variables
commutes in the sense that
@ @
@ @
f (x) =
f (x) =
(4.1.3)
i
j
@x @x
@xj @xi
for all 1 6 i, j 6 n.
Remark 4.1.3. Smooth functions model scalar physical quantities such as density, pressure, charge-density, temperature,...
4.1.2. Changes of coordinates. We have tacitly taken the (x1 , . . . , xn ) to be standard
linear coordinates in Rn (i.e. associated by the standard basis of Rn ). It is fairly clear that the
idea of smoothness of functions should be independent of the choice of such linear coordinates,
but here we are going to take things further by considering more general coordinate systems.
As far as GR goes, this is a requirement of trying to make a theory that is generally covariant
(i.e. transforms predictably under general changes of coordinates).
So, what are changes of coordinates?
Definition 4.1.4. A change of coordinates written in compact form x = x(y) is a collection
of smooth functions
x1
x1 (y 1 , . . . , y n )
x2 = x2 (y 1 , . . . , y n )

xn = xn (y 1 , . . . , y n )
with the further properties:
x = x(y) gives a 1:1 correspondence between points x 2 and points y in some other
open set 0 ;
41
42
4. MULTIVARIABLE CALCULUS
The corresponding inverse map y = y(x) is also smooth, i.e.

y j = y j (x1 , . . . , xn )
is smooth for each j = 1, . . . , n.

Example 4.1.5. Any affine linear transformation,
X j
yj =
L k x k + Aj
(4.1.4)
is invertible if the matrix L is invertible. (In this case, the Jacobian of the transformation is L.)
Example 4.1.6. Plane polar coordinates.
x = r cos , y = r sin .
(To make this look like the above, write it as
x1 = y 1 cos y 2 , x2 = y 1 sin y 2 ,
i.e. (x, y) = (x1 , x2 ), (r, ) = (y 1 , y 2 ).) The Jacobian is
xr x
cos
r sin
=
y r y
sin r cos
This gives a change of coordinates between
= {(x1 , x2 ) such that(x1 , x2 ) 6= (t, 0) with t > 0}
and
0 = {(y 1 , y 2 ) such that y 1 > 0 and 0 < y 2 < 2}
Remark 4.1.7. Make sure you understand why we have to restrict the values of (y 1 , y 2 ) to
get a change of coordinates.
We can think of a change of coordinates more actively as follows. Given a function f in
C 1 (), we get a new function fe 2 C 1 (0 ) by the formula
Similarly, given ge 2
C 1 (0 )
fe(y) = f (x(y))
we get a new function g 2
(4.1.5)
C 1 ()
by the formula
g(x) = ge(y(x)).
(4.1.6)
Remark 4.1.8. The fancy terminology for this is pull-back: fe is obtained from f by pulling
back by the change-of-coordinates map 0 ! . If you find this helpful (because youve seen it
elswhere) fine. If you havent, dont worry.
The fact that fe is smooth if f is smooth follows from the chain rule:
n
X @xi @
@ e
f
(y)
=
f (x) where x = x(y),
@y j
@y j @xi
(4.1.7)
i=1
which I assume youve seen before and are happy with. The matrix whose components are
(@xj /@y i ) entering in (4.1.7) is called the Jacobian of the coordinate transformation x = x(y).
The Jacobian matrix (@y j /@xi ) of the inverse transformation y = y(x) is the inverse of
the Jacobian matrix (@xj /@y i ). As an exercise, you can verify this by using the chain rule to
dierentiate the equation
xj (y(x)) = xj for j = 1, . . . , n
(4.1.8)
which is true by definition of the transformations being inverse to each other. In particular for
a change of coordinates, the Jacobian matrix must be invertible everywhere.
The explicit forms of these inverse relationships are
n
X
@xk @y j
j=1
@y j @xi
k
i,
n
X
@y k @xj
j=1
@xj @y i
k
i.
(4.1.9)
4.2. TWO TYPES OF VECTOR
43
Remark 4.1.9. The inverse function theorem says that if x = x(y) is just smooth for y 2 0
and x 2 and if the Jacobian matrix if invertible at a point q, say, in 0 , then in fact x = x(y)
is invertible at least if you restrict the transformation to a small ball B 0 containing q inside 0
and its image W . That is, after restricting in this way, there is smooth y = y(x), for x 2 W
such that y(x) 2 B 0 inverting x = x(y).
Thus the inverse function is a (partial) converse to the fact that Jacobians of coordinate
transformations must be invertible.
Example 4.1.10. In the case of polar coordinates, the determinant of the Jacobian is just
r. This is invertible if and only if r 6= 0. This ties in with the fact that polar coordinates go
wrong at r = 0.
4.2. Two types of vector
Many physical quantities are vectorial, as we know. We now consider vectorial quantities
in the context of general coordinate transformations. A major subtlety is that there are two
dierent kinds of vectorial quantities and we need to be clear on the dierence between them.
4.2.1. Vector fields. A vector field in is smooth first-order dierential operator of the
form
n
X
@
V =
V j (x) j
(4.2.1)
@x
j=1
where the V
are smooth functions in . If f 2 C 1 () we obtain a new function V f 2
1
C () called the derivative of f along V ,
n
X
@f
V f (x) =
V j (x) j (x)
(4.2.2)
@x
j (x)
j=1
A vector field is also known as a tensor (field) of type (1, 0).
4.2.2. Covector fields. A covector field in is a quantity

n
X
!=
!j (x) dxj
(4.2.3)
j=1
where the !j (x) are smooth functions in .

If f 2 C 1 () we obtain a covector field
df =
n
X
@f
dxj
@xj
(4.2.4)
j=1
also known as the exterior derivative or dierential of f . To match up the notation between
(4.2.3) and (4.2.4), df is the covector whose components are !j = @f /@xj .
Remark 4.2.1. If ! is a covector field in it is not generally true that ! = df for some
function f . (Indeed the condition
@!j
@!i
=
j
@x
@xi
is necessary for ! = df , by (4.1.3).)
A covector field is also known as a tensor (field) of type (0, 1).
4.2.3. Dual pairing (or contraction). Given a vector field V and a covector field !, the
contraction hV, !i is a scalar function defined in terms of components by
n
X
hV, !i =
V j (x)!j (x).
(4.2.5)
j=1
The directional derivative is an example of this contraction: from the above formulae,
hV, df i = V f
(4.2.6)
44
4.2.4. Transformation laws for vector fields and covector fields. Let V be a vector
field in and let x = x(y) (for y 2 0 ) be a change of coordinates with inverse y = y(x). We
get a vector field Ve in 0 as follows: Ve is supposed to dierentiate functions in 0 . We know
how to dierentiate functions in . But we can transfer a function in 0 to by change of
variables. So we define
Ve ge (y) = (V g)(x(y))
(4.2.7)
where g and ge are related as in (4.1.6).
P j
P ej
Proposition 4.2.2. If V =
V (@/@xj ) in terms of the xj in and Ve =
V (@/@y j )
j
0
in terms of the y in , then
X
@y i
Ve i =
Vj j
(4.2.8)
@x
Proof. The chain rule tells us everything: from g(x) = ge(y(x)), we get
X @y i @e
@g
g
=
.
@xj
@xj @y i
i
Multiplying by V and summing,

X
Vj
i g
X
@g
j @y @e
=
V
.
@xj
@xj @y i
i,j
P
But the RHS is supposed to be i Ve i (@e
g /@y i ), so the result follows by equating coefficients.
Remark 4.2.3. This is an example of covariance (as opposed to invariance). The coefficients
of a vector depend on a choice of coordinates, but they transform in a predictable and linear
way. In particular if the coefficients are all zero at a given point in one coordinate system then
they are also zero in any other coordinate system. This is as it should be: if there is no wind
at a particular point (and time) in the atmosphere, then all observers should agree on this fact,
regardless of how they choose their coordinates!
The classical formula
dxj =
suggests that if
!
ej dy j is to agree with
X
X @xj
@y i
dy i
(4.2.9)
!j dxj , then we should have
!j dxj =
X
i,j
!j
@xj i
dy
@y i
and equating coefficients

!
ei =
X
j
!j
@xj
.
@y i
(4.2.10)
Note that even allowing for the dierence between upstairs and downstairs indices, (4.2.8) and
(4.2.10) are dierent transformation laws.
The rule (4.2.10) gives a way of transferring a covector field ! in to a new covector field !
e
in 0 . We already have a rule for transferring vector fields from to 0 . These are compatible
in the sense that the contraction is invariant:
Proposition 4.2.4. Let ! and V be a covector field and a vector field on and let !
e and
0
e
V be the corresponding covector field and vector field on . Then we have
hVe , !
e i = hV, !i
where the LHS is calculated at y in 0 and the RHS at x = x(y) .
(4.2.11)
4.3. THE EINSTEIN SUMMATION CONVENTION
Proof. This follows at once from (4.2.8) and (4.2.10) for

X
X
@y i
hVe , !
ei =
Ve i !
ei =
Vj j!
ei .
@x
i
45
(4.2.12)
i,j
Using the fact that the Jacobians (@y i /@xj ) and (@xi /@y j ) are inverse to each other, (4.2.10) is
seen to be equivalent to
X @y i
!j =
!
ei j .
(4.2.13)
@x
i
Hence the RHS of (4.2.12) can be written

X
X
@y i
Vj j!
ei =
V j !j = hV, !i.
@x
i,j
This completes the proof.
Remark 4.2.5. It is possible to change the logic around: given the relation between V and
e
V which is natural in terms of the way vector fields are supposed to dierentiate functions, we
could have defined !
e in terms of ! so that (4.2.11) holds. This would then have implied the
transformation law (4.2.10) for the coefficients of the covector field !.
4.2.5. Tangent space and cotangent space. We shall not make great use of the following, but they are really important in a more systematic development of these ideas.
If p is a point of then Tp , the tangent space to at p, is defined to be the space of all
directional derivatives acting at p. A typical element of Tp is thus written
V =
n
X
Vj
j=1
@
@xj
(4.2.14)
p
where the V are just numbers and by definition

@
@xj
f=
p
@f
(p)
@xj
(4.2.15)
(i.e. dierentiate the function then evaluate it at p).

Then Tp is a vector space of dimension n (the dimension of the vector space in which
is sitting as an open set) and is independent of coordinates. One should think of Tp as the
set of arrows emanating from p, and pointing in every possible direction in . The coordinate
independence follows from equation (4.2.10).
Similarly the cotangent space Tp is the dual vector space to Tp ; by definition this is the
space of linear maps Tp ! R and a typical element has the form
!=
n
X
j=1
!j dxj |p .
(4.2.16)
This is also an n-dimensional vecto space, independent of choice of coordinates. The duality
between Tp and Tp is given by
X
hV, !ip =
V j !j
(4.2.17)
as above but now producing a number rather than a function on the RHS.
4.3. The Einstein summation convention
In the above (and in the previous chapters), there are many expressions involving a summation over repeated indices, one upstairs and one downstairs. The Einstein summation convention
is to omit the symbol, so that whenever a repeated index appears in an expression it is to be
understood that you sum over the range of that index (in this case from 1 to n).
46
In order for this to work, of course, it is essential that if an index is repeated then it must
not occur anywhere else in the expression, so for example
Ai B i Ci is not OK.
are matrices with
Multiple sums (unfortunately) very often occur. For instance, if L and L
i
i
components Lj and Lj so that

X
LX has components
Lij X j = Lij X j (summation convention)
j
and
has components
LX
i Xj = L
i Xj
L
j
j
Then
i p q
has components Li [LX]
p = Li [ L
p q
LLX
(4.3.1)
p
p q X ] = Lp Lq X .
When the summation convention is in operation, repeated indices are dummy indices in the
sense that
Ai B i = Ap B p = As B s
as each of these is equal to
A1 B 1 + A2 B 2 + + An B n .
in (4.3.1) is unpacked as
The expression for the components of LLX
n X
n
X
pq X q
Lip L
p=1 q=1
while the summation

and the summation over p corresponds to matrix multiplication of L and L
over q corresponds to the multiplication of L by the column vector with components X i .

Here is an example where you have to be careful to change the dummy indices to get an
unambigous expression. Suppose and are covector fields and X and Y are vector fields.
Consider
P = hX, ihY, i
(4.3.2)
We can write
hX, i = X i i , hY, i = Y i i .
Substitution of these into (4.3.2) give
P = X i i Y i
However this is an ambiguous expression because the index i has been overworked, appearing
4 times. So before putting them together we should change one of the dummy indices, writing
(say)
Y i i = Y j j.
Thus
P = X i i Y j j
is an unambigous way to write P using the summation convention.
Example 4.3.1. Write the expression
X i j Y j
without indices, in terms of the pairing operation h, i.
Definition 4.3.2. The Kronecker has components kj , equal to 1 if j = k and 0 otherwise.

This is the representation, in terms of indices, of the identity matrix.
Remark 4.3.3. If, later on in this chapter or the course, you find the expressions with
multiple repeated indices confusing, it can help to put the signs back in. The summation
convention does take some getting used to.
4.5. TENSOR FIELDS OF RANK 2
47
4.4. Dierentiation along a curve

Let ( ) be a curve in . That is,
is a smooth map from an interval I = (1 , 2 ), say, into
.
If f 2 C 1 (), then we get a function of
F ( ) = f ( ( )).
(4.4.1)
This is the function f along the curve. If is the world-line of an observer and f is some
physical quantity (like pressure) then F ( ) would be the pressure measured by the observer at
dierent times along her worldline.
If the components of are (x1 ( ), . . . , xn ( )) as before, then we compute
dF
@f
= x j ( ) j
(4.4.2)
d
@x
so that defines the vector field
d
@
= x j j
(4.4.3)
d
@x
along . Here the LHS will be used as short-hand for the RHS!
In contrast to the vector fields weve considered before, this one is only defined along the
curve and not in an open set . Notice that the definition (4.4.2) is independent of any choice
of coordinates and gives a suitably invariant definition of tangent vector to the curve .
In so far as is a mapping from a subset of R into a subset of V , its derivative ( ) is a
mapping from I into V . Again, it is better to regard ( ) as being in the tangent space to ( ),
( ) 2 T ( ) .
Then ( ) is called the tangent vector to
at the point ( )
4.5. Tensor fields of rank 2

Higher order (or higher rank) tensors are, from the naive point of view, objects with more
indices, upstairs or downstairs, or both.
We have already seen examples of such objects with two indices at least in the context of
vector spaces. First of all, a bilinear form on Rn is an object with two lower indices. We have
seen that if we choose a basis of Rn , such that its elements are identified with column vectors
with components X j , then a bilinear form B has components Bij , such that
B(X, Y ) = Bij X i Y j (summation convention).
4.5.1. Tensor fields of type (0, 2). A tensor field of type (0, 2) is an object of the form
B = Bij dxi dxj (summation convention).
(4.5.1)
(In the mathematical literature, you will often see the notation dxi dxj on the RHS. I think
this can be fairly safely ignored in this course. is pronounced tensor, by the way. )
The transformation law for the components Bij are deduced as for covector fields: from
(4.2.9),
i
j
@x
@x
i
j
p
q
B = Bij dx dx = Bij
dy
dy
(4.5.2)
@y p
@y q
epq are the components of B in the y coordinates,
so if B
i
epq = Bij @x @x
B
(4.5.3)
@y p @y q
Note that if X and Y are vector fields and B is a tensor field of type (0, 2) we can form
! = !i dxi , !i = Bij Y j .
(4.5.4)
As a dierential geometer, I might write this as ! = B(, Y ) if I wanted to avoid using indices
and components.
Proposition 4.5.1. ! defined from B and Y as above is a covector field.
48
The proof is left as an exercise: you have to write down the transformation laws and check
that the components of ! transform correctly.
We can further form the scalar
B(X, Y ) = Bij X i Y j .
(4.5.5)
As the notation suggests, this is a well-defined scalar function; its value at a point does not
depend on the coordinates used to write out the components of B, X and Y .
Remark 4.5.2. The formula (4.5.5) gives another way to think about tensors of type (0, 2).
Namely, you can reverse the logic and define such a B to be a smoothly varying bilinear form
Bp on the tangent space Tp , for each p 2 . Smoothness can be defined by saying that the
coefficients Bij are smooth functions in for any choice of coordinates x. If we do this for the
tangent space, and require B(X, Y ) to be invariant (i.e. independent of choice of coordinates),
then we say that B is a tensor of type (0, 2).
Tensors of type (0, 2) are important because the metric tensor, which is the fundamental
object in GR, is an example.
Remark 4.5.3. It is very important not to switch the order of the dxi symbols in computations of this kind. In other words, dxi dxj 6= dxj dxi . Indeed the first one is the bilinear form B
such that B(X, Y ) = 0 unless X = @i , Y = @j , whereas the second represents the bilinear form
C such that C(X, Y ) = 0 unless X = @j , Y = @i .
4.5.2. Tensor fields of type (1, 1). Suppose that for each p in , we have linear map
A(p) from Tp V to Tp V which varies smoothly with p. Such a thing is called a smooth tensor
field of type (1, 1). In coordinates, A has an expression of the form
@
A = Akj dxj k
(4.5.6)
@x
where the Akj are a collection of n2 functions of x. A can be pictured as an n n matrix whose
entries are smooth functions of x.
We obtain the transformation law under a change of coordinates by substituting
@
@y q @
@xj p
j
=
,
dx
=
dy
@y p
@xk
@xk @y q
into (4.5.6), getting
@y q @
A=
@xk @y q
Hence the transformation law is
Akj
@xj p
dy
@y p
@xj
Akj p
j
q
eq = Ak @x @y
A
p
j
@y p @xk
@y q
@y @xk
(4.5.7)
dy p
@
.
@y q
(4.5.8)
(4.5.9)
Example 4.5.4. The identity matrix is an example of a (1, 1) tensor. Its components in
any coordinate system are the Kronecker , kj .
The transformation law (4.5.9) means that if X is a vector field on then Y = AX, with
components Ajk X k is again a vector field on . This means that under coordinate transforeX
e where the relation between A
e and A is given by (4.5.9) and the
mations we have Ye = A
transformation law (4.2.8) is used for the components of the vector fields X and Y .
4.5.3. Tensor fields of type (2, 0). Weve seen tensors with two downstairs indices and
one up and one down. The zoo of two-index tensors is completed by the ones with two upstairs
indices.
We give the transformation law first:
Definition 4.5.5. A tensor field H of type (2, 0) is an object whose components H ij after
a choice of coordinates transform according to the rule
p
q
e pq = H ij @y @y .
H
(4.5.10)
@xi @xj
4.6. GENERAL TENSORS
49
4.5.4. Algebraic operations on tensors. We mention various interrelations between

these types of tensors.
Example 4.5.6. Outer product. If X and Y are two vector fields, then there is a tensor
X Y whose components are X i Y j . This is an example of a tensor field of type (2, 0). The
verification of this is straightforward.
Example 4.5.7. Similarly if is a covector field then X is a tensor whose components
are i X j and this is a tensor field of type (1, 1).
We can also decrease the number of indices.
Example 4.5.8. If is a covector field and H is of type (2, 0), then the contraction of H
is a (1, 0) vector field with components
H ij j .
In fact there are two such contractions this one, and the one with components
H ij i
Example 4.5.9. If H is a (2, 0) tensor and B is a (0, 2) tensor, then there is a tensor of
type (1, 1) with components
H ik Bkj
In fact there are four generally dierent tensors of this kind:
H ik Bjk , H ki Bkj , H ki Bjk
along with the one above.
The verification that the components of these objects transform correctly is straightforward,
as long as we remember that
@e
xi @xj
= ki
@xj @e
xk
Example 4.5.10. The trace of a (1, 1) tensor with components Aij is the scalar Aii . We can
write this as the contraction
Aii = ji Aji
with the Kronecker .
4.6. General tensors
Definition 4.6.1. A tensor field of type (r, s) is an object whose components in any basis
are of the form
...jr
Tij11...i
(4.6.1)
s
and which transform under change of coordinates according to the rule
i1
@xis @y q1
@y qr
j1 ...jr @x
...qr
Tepq11...p
=
T
i1 ...is
s
@y p1
@y ps @xj1
@xjr
(4.6.2)
It is possible to give the components the interpretation of a more geometric object, as we

have done for vectors, covectors, and tensors of type (0, 2) and (1, 1). We shall not do this now.
By way of motivation, note that when we dierentiate a function we get a covector. In other
words, dierentiation seems to increas the number of indices. Thus if we dierentiate a function
twice, we might expect to get a tensor of type (0, 2) and if we dierentiate a vector field twise
we might expect to get a tensor of type (1, 2).
This is true if we are working in a subset of a vector space and restrict ourselved only to linear
or affine transformations. It is not, however, true for more general types of transformations.
We can see this in the simplest case: starting from
@f
@f @yq
=
(4.6.3)
j
@x
@y q @xj
50
we dierentiate again, getting

@2f
@ 2 f @y p @y q
@f @ 2 y q
=
+ q i j
(4.6.4)
i
j
p
q
i
j
@x @x
@y @y @x @x
@y @x @x
This is not the correct transformation law for a (0, 2) tensor (compare with (4.5.3)) unless the
Jacobian @y q /@xj is constant. And the Jacobian is only constant if the original change of
coordinates is affine-linear.
Although this seems like a serious problem, we shall see in the next chapters a work-around:
there is a way to change the way we dierentiate vectors (and covectors) by a term which also
transforms in such a way as to compensate for the bad term on the RHS of (4.6.4).
4.6.1. Algebra of tensors.
For each (r, s), the set of all tensors (or tensor fields) forms a vector space. That is
you can add any two tensors of the same type and multiply tensors by scalars.
If T is of type (r, s) and S is of type (p, q), then the tensor product T S is a tensor
of type (r + p, s + q). The components are just
m ...mp
...jr
Tij11...i
Sk11...kq
s
If T is of type (r, s), then picking a pair of indices, one up and one down, we have a
contraction of T , a tensor of type (r 1, s 1). For example if we pick the first indices
upstairs and downstairs, we get
...jr
Tiiij22...i
.
s
Recall that by the summation convention, this is actually a sum over the index i.
Contraction of a dierent pair of indices will generally give a dierent tensor.
We note that the tensor product is distributive over addition.
4.7. Manifolds
A manifold is, roughly speaking, a topological space M , with the additional structure necessary to be able to speak of smooth functions from M to R. This additonal structure is called
a smooth atlas and consists of systems of local coordinates satisfying certain compatibility conditions. A function M ! R is then called smooth if it is smooth when written in terms of any
of these local coordinate systems.
We are not going to get into the details of what a topological space is: it is a set of points
with enough structure (open sets) to be able to define continuous functions.
As in the definition of curvilinear coordinates on an open subset of Rn , suppose we have a
set of n continuous functions x(p) = (x1 (p), . . . , xn (p)) from U M with image some open set
of Rn .
Definition 4.7.1. The functions (x1 , . . . , xn ) from U to form a local coordinate system
on M if the map U ! is one-one and onto, and if the inverse is continuous.
Thus every point p of U gets labelled by an ordered set of n real numbers which were
calling the coordinates of p, and conversely if this set of labels is taken from , then it is the
label of one and only one point of U .
Definition 4.7.2. If p0 2 U is a given point, we say the coordinate system is centred at p0
if xj (p0 ) = 0 for all j.
Definition 4.7.3. An atlas on M is a collection of local coordinate systems x : U ! 0 ,
where the open sets U cover M .
In this definition the subscript does not refer to the dierent components of the coordinate
system, but rather to the dierent local coordinate systems needed to cover M .
Remark 4.7.4. The individual local coordinate systems x : U ! are often called
charts: thus an atlas is a set of a lot of charts (which made a lot more sense before everyone
was using GPS to find their way around).
4.7. MANIFOLDS
51
An atlas, without further conditions, is insufficient to define a consistent notion of when a

function M ! R should be smooth. To explain what the additional condition is, lets see how
far we can get with our atlas. If F is a function from M to R, then for each we get a function
F : ! R, defined by
F (x (p)) = F (p)
which makes sense for p in U and x in .
Now since is an open set of Rn , we know what it means for F to be smooth: it is the
classical condition that all partials of F should exist. So we want to say that F : M ! R is
smooth if and only if all the F are smooth. For this to be consistent, we need it not to matter
which coordinate chart we use at p for those points which belong to two or more coordinate
charts (which will definitely happen).
So suppose that U \ U 6= ; and consider the functions F and F . We have, for
F (x (p)) = F (x (p)) for p 2 U \ U .
Let 0 be the subset of consisting of x (p with p 2 U \ U , and let 0 be the corresponding

subset of . Then there is a 1:1 correspondence
0
! 0 , x (p) = x (p).
(4.7.1)
For clarity, write this as x = x(y), where x 2 0 and y 2 0 . Then

F (x) = F (y(x))
and we want the LHS of this to be smooth whenever the RHS is smooth. This entails that the
functions y = y(x) with x 2 0 and y 2 0 should be a smooth change of coordinates in the
sense of 4.1.2. Thus we build this into our definition of smooth atlas:
Definition 4.7.5. An atlas as in Definition 4.7.3 is called smooth or dierentiable if all the
change-of-coordinates maps (4.7.1), for all possible pairs and , are smooth where they are
defined1. A topological space, equipped with a smooth atlas, is called a smooth manifold of
dimension n if the image sets are open subsets of Rn .
Remark 4.7.6. This is a daunting definition. In practice, some basic theorems are proved
(using the definition) which give us our supply of manifolds.
Example 4.7.7. The circle x2 + y 2 = 1 is a smooth manifold, of dimension 1.
Example 4.7.8. Rn is a smooth manifold, of dimension n. So is any open subset of Rn .
Example 4.7.9. The closed half-space x > 0 inside R2 is not a manifold. (Though it is a
manifold with boundary.)
Example 4.7.10. The closed quadrant {(x, y) : x > 0 and y > 0} is not a manifold.
(Though it is a manifold with corners.)
Example 4.7.11. The null cone {X 2 M : (X, X) = 0} is not a manifold for the more
serious reason that you cant introduce coordinates at the vertex X = 0.
Example 4.7.12. Generalizing the example of the circle, if f (x, y, z) is a smooth function
of 3 variables then the level-set
{(x, y, z) : f (x, y, z) = c}
(any constant c) is a smooth manifold of dimension 2 if at least one of the partial derivatives
@f @f @f
,
,
,
@x @y @z
is non-zero for every point (x, y, z) on the level set.
1A lot of notation is involved in making this precise, and Im sparing you the details, which you can find in any
basic book on the subject
52
The previous example generalises to functions of any number of variables, but the condition
on the non-vanishing of at least one of the partials at every point on the level-set f = c continues
to be essential.
It is interesting to note that the null cone, defined by
t2
x2
y2
z2 = 0
exactly fails to satisfy this condition on the partials at the origin: all partials vanish there as
well.
4.7.1. The tangent space. Let M be a smooth manifold. For each point p in M , we
can use the local coordinates defined near p to define the tangent space. We can either say
that it is the abstract vector space spanned by the partials corresponding to any choice of
local coordinates from the atlas or we can say that it is the space of directional derivatives at
p (and then show that this space is an n-dimensional vector space). Either way Tp M is an
n-dimensional vector space naturally associated with the point p.
We can now define a vector field X on M as a function which assigns to each p in M , a
vector Xp in Tp M , which is required to vary smoothly with p. As in the case of open subset of
Rn varying smoothly with p means: when expanded as a linear combination of the @/@xj , the
coefficients are smooth (in the domain of the coordinate system).
4.7.2. The cotangent space. This is the dual to the tangent space. If f is a smooth
function on M , then df is a smooth covector field on M : at each point it is in the dual space
Tp M and varies smoothly with p.
If X is a vector field and f is a function, then Xf is the directional derivative of f with
respect to X. It is again a smooth function on M . It can also be written hX, df i, where h, i
is the pairing between T M and T M .
4.7.3. General tensors. Taking it further, we can extend the idea of tensor field of any
type (r, s) to a manifold M , using the low-tech definition above: in any chart a tensor is given
by a collection of components, and these are required to transform according to (4.6.2) where
x = x(y) is any of the change-of-coordinates map arising from a smooth atlas on M .
4.7.4. Other smooth gadgets. With the aid of a smooth atlas we can define more than
just smooth function on M . For example, a smooth curve on M is defined as a continuous
map : I ! M (I is an interval) with the property that the corresponding maps : I ! ,
are all smooth, where
( ) = x ( ( )) if ( ) 2 U .
Similarly (Ill omit the details) if M and M 0 are two manifolds, and F : M ! M 0 is a mapping,
we can define what it means for F to be a smooth map between manifolds. The idea is that we
can look at F using the charts on M and M 0 and define F to be smooth if and only if all these
functions are smooth. The interested reader is referred to any standard introductory book on
dierential geometry.
Remark 4.7.13. The formalism of general relativity works most naturally on the assumption
that space-time is a smooth 4-dimensional manifold. This is particularly important when trying
to understand black holes and the large-scale structure of the universe. For the purposes of this
course we shall mostly work with space-times that are subsets of R4 : but we shall need to work
as if it were a manifold, in other words, without assigning any privileged role to the standard
flat coordinates on R4 .
CHAPTER 5
Space-times and geodesics

The happiest thought of Einsteins life was a brilliant insight which led to the geometrization of gravity. The physics you observe in a spaceship, with its engines switched o, far from
any source of gravity, is the same as the physics you observe if you are freely falling in a lift
under the influence of the earths gravity. This suggests that particles which are acted upon by
no force other than gravity should be regarded as freely falling, or essentially inertial. To incorporate the idea mathematically, we need an extension of the formalism of SR which is generally
covariant. Where the formalism of SR (4-vectors) was invariant under the Lorentz group (and
even the Poincare group), these transformations are essentially linear. In GR we need a set-up
which allows for the physics to be equivalent under arbitrary coordinate transformations.
In this chapter we shall introduce curved 4-dimensional space-times and study geodesics.
The hypothesis is that freely falling particles follow such geodesics. We shall also show that at
any event in space-time, local inertial coordinates can be introduced. These are coordinates
in which the coefficients are the same as the Minkowski metric up to second order.
In the next chapter we shall introduce curvature. This is the quantity (it is, unfortunately,
a 4-index tensor) which allows us to tell whether space-time is actually curved or not. One
remark about this here. The general equivalence principle described above: that the physics
of the freely falling lift and the inertial spaceship should be the sameonly applies on small
scales of space and time, or locally as we say in the trade. There is a dierence in behaviour
of freely falling particles in deep space as opposed to near the earth. According to Newton, the
gravitational field near the earth is not uniform: in fact it is given by the famous inverse-square
law. This means that if if Alice and Bob are freely falling near the earth, and if Alice is nearer to
the Earths surface than Bob, then her acceleration will be greater. In other words the relative
acceleration of Alice, as measured by Bob, will be non-zero. In deep space, far from any stars,
planets, or other gravitating objects, the relative acceleration of two observers will be zero. This
is extremely subtle and is tied in with the subtle notion of curvature. More on that story later.
5.1. Curved space-time

Definition 5.1.1. A curved space-time (or simply space-time (M , g) is a 4-dimensional
manifold, with a given lorentzian metric g.
Remark 5.1.2. For those of you who havent yet mastered 4.7, you can think that M is
just fancy notation for R4 , or perhaps an open subset of it. But we are not allowed to use the
vector-space structure of R4 in developing the formalism: everything we do from now on has to
use the metric g and must work in any choice of local coordinates, or be generally covariant
to use somewhat old-fashioned terminology.
Recall that relative to a choice of (local) coordinates xa = (x0 , x1 , x2 , x3 ), the metric has
the form
ds2 = gab dxa dxb .
(5.1.1)
Here the summation convention is in force and the components gab of g with respect to the
coordinates xa form a 4 4 symmetric matrix whose entries are smooth functions. To say
the metric is lorentzian is to say that at any point x, the matrix (gab (x)) is invertible and has
signature (+, , , ).
53
54
5. SPACE-TIMES AND GEODESICS
Remark 5.1.3. A closely related notion is that of a riemannian manifold of dimension n.

This is a pair (M , g), where M is a manifold of dimension n and g is a riemannian metric, i.e.
ds2 = gij dxi dxj
where the components gij form a symmetric n n positive-definite matrix. We shall use the
case n = 2 to illustrate the theory.
Remark 5.1.4. The right way to think of a lorentzian or riemannian metric is an assignment p 7! g(p), where g(p) is a symmetric bilinear form on the tangent space Tp M . For
each p, g(p) is positive-definite in the riemannian case and has signature (+, , . . . , ) in the
lorentzian case. Of course the assignment is supposed to be smooth as p varies in M . To make
this smoothness precise, we look at the components gab (x) of the metric in some smooth local
coordinate system, and insist that these be smooth.
We shall generally use indices a, b, c, . . . for space-time indices running from 0 to 3; and
indices i, j, k, . . . for indices running from 1 to n (where n is usually 2 or 4 in examples).
Our metric is a symmetric (0, 2)-tensor. It has an inverse, with components denoted g ab ,
which form a (2, 0)-tensor. The statement that g ab is inverse to gab is written
gab g bc =
c
a.
(5.1.2)
Example 5.1.5. The standard metric euclidean metric on R2 is

ds2 = dx2 + dy 2
The same metric in polar coordinates can be computed as follows: from
x = r cos , y = r sin ,
then
dx = cos dr
r sin d, dy = sin dr + r cos d.
Squaring and adding,

dx2 + dy 2 = (cos dr
r sin d)2 + (sin dr + r sin d)2
= dr2 + r2 d2 .
Note that the coefficient of the cross-term dr d + d dr is zero.
Example 5.1.6. If
ds2 = dr2 + r2 d2 ,
then writing x1 = r, x2 = ,
g11 = 1, g12 = g21 = 0, g22 = r2 .
Hence
g 11 = 1, g 12 = g 21 = 0, g 22 =
This is because
1 0
gij =
0 r2
are inverse to each other.
ij
and g =
1
.
r2
1 0
0 r 2
Example 5.1.7. The 2D metric

ds2 = dudv + dvdu
has components
g00 = g11 = 0, g01 = g10 = 1
if u = x0 and v = x1 . In this case the components of g ij are the same
g 00 = g 11 = 0, g 01 = g 10 = 1.
Remark 5.1.8. The change of variables t = u + v, x = u v brings the previous metric into
inertial form (diagonal, with one + and one on the diagonal).
5.1. CURVED SPACE-TIME
55
Example 5.1.9. Spherical polars 3-dimensional spherical polar coordinates are given by
x = r sin cos ', y = r sin sin ', x = r cos .
(5.1.3)
Show that the euclidean metric

ds2 = dx2 + dy 2 + dz 2
takes the form
ds2 = dr2 + r2 (d2 + sin2 d'2 ).
(5.1.4)
5.1.1. Lorentzian metric at a point.

Proposition 5.1.10. Suppose that
ds2 = gab dxa dxb
is a 4-dimensional lorentzian metric defined near the point p with coordinates xa (p) = 0 (a
coordinate system centred at p). Then there are coordinates y a , also centred at p, such that
geab (p) = ab .
Thus, very near any given point p of M , the geometry of M is approximately the same as
Minkowski space.
Remark 5.1.11. We shall do better than this: in 5.5, we shall see that we can choose
coordinates so that
geab (e
x) = ab + O(|e
x| 2 )
for small x
e. Such a choice of coordinates will be called local inertial coordinates. They give the
best approximating Minkowski space at the point with coordinates x
e = 0.
Proof. It is sufficient to make a change of coordinates which is linear:
xa = Jba y b .
Then by the transformation law for tensors of type (0, 2),
geab = gpq Jap Jbq
In matrix form this is
ge = J t gJ.
By the basic theorem about diagonalization of symmetric bilinear forms, K can be chosen to
make ge(0) diagonal, with diagonal entries 1. The signs are determined by the signature of g.
If the latter is Lorentzian, this yields the Minkowski metric = diag(1, 1, 1, 1).
5.1.2. Timelike/spacelike/null.
Definition 5.1.12. A tangent vector X = X a @a is called timelike at p in M if
g(X, X)(p) = gab X a X b |p > 0,
null if
g(X, X)(p) = gab X a X b |p = 0
and spacelike if
g(X, X)(p) = gab X a X b |p < 0.
The set of null vectors at a point p form a cone (in Tp M ) which are supposed to be tangent
to photon worldlines through p.
56
5.2. Events and worldlines

As in the case of SR, the points of our 4-dimensional space M are called events, localized in
time and space. Particles, observers and photons are described by worldlines, i.e. parameterized
curves in M . The following definition captures the idea that massive particles cannot travel
faster than light.
Definition 5.2.1. Let (M , g) be a 4-dimensional space-time. A paramaterized curve (t)
is called timelike if its tangent vector is timelike for every value of the parameter t,
g( , ) > 0
(5.2.1)
Similarly, a parameterized curve (t) is called null if its tangent vector is null for every value of
the parameter t,
g( , ) = 0.
(5.2.2)
As in SR, where (M , g) reduces to (M, ), massive particles follow timelike curves in M : this
corresponds to the speed of the particle being everywhere less than the speed of light. Similarly
photons follow null curves. Also as in Minkowski space, these curves are called worldlines.
Hypothesis 5.2.2. A timelike curve
g
is parameterized by proper time if
d d
,
= 1.
d d
(5.2.3)
Then is the time that would be shown on a clock with worldline ( ). More precisely, if Alices
worldline is ( ) and p = (1 ) and q = (2 ) are two events on her worldline, then her clock
will show an elapsed time 2 1 between these two events if (5.2.3) holds.
Given any timelike curve, e(u), there is always a reparameterization
( ) = e(u( ))
of the curve by proper time, cf. Proposition 1.3.4.
5.3. Geodesics
In special relativity, inertial observers were taken to travel at constant speed along straight
lines in Minkowski space M. One of the definitions of straight line is a curve which minimises
the energy (cf. 1.4), amongst all those with fixed endpoints.
Using the space-time metric g on M , we can do the same thing.
Definition 5.3.1. The energy of a curve : [t0 , t1 ] ! M is defined to be
Z
1 t1
E[ ] =
g( (t), (t)) dt.
2 t0
(5.3.1)
A geodesic with end-points p and q is a curve which extremizes the energy among all curves
with (t0 ) = p, (t1 ) = q.
Hypothesis 5.3.2. In GR, freely falling particles (and free photons) follow geodesics, timelike for particles and null for photons. Here freely falling means acted upon by no force except
gravity.
Definition 5.3.3. Let xa = (x0 , x1 , x2 , x3 ) be a given coordinate system, such that the
metric coefficients are gab ,
ds2 = gab dxa dxb .
The Christoel symbols of gab are defined by the formula
1 cs
c
@s gab ) .
(5.3.2)
ab = g (@a gbs + @b gas
2
Remark 5.3.4. Note the symmetry of in its two lower indices,
c
ab
c
ba .
(5.3.3)
5.3. GEODESICS
57
Theorem 5.3.5. Let (t) be a curve in M and suppose that in some coordinate system, it is
given by t 7! xc (t). Then the EulerLagrange equations for E ( ) are equivalent to the equations
x
c +
where
c
ab
c a b
x
ab x
= 0.
(5.3.4)
are as in the previous definition. If xc (t) is a geodesic, then gcd x c x d is constant.
Remark 5.3.6. The system of equations (5.3.4) are called the geodesic equations. They are
frequently a more convenient way of getting at the s, as we shall see in examples.
Proof. This is a calculus of variations problem with Lagrangian L(x, x)
= 12 g(x)[x,
x]
=
1
a
b
x . We have
2 gab (x)x
@L
@L
1
= gsb x b ,
= @s gab x a x b
(5.3.5)
s
s
@ x
@x
2
so
d @L
@gas a b
= gsb x
b +
x x .
(5.3.6)
s
dt @ x
@xb
Thus the EulerLagrange equations are
gsb x
b +
@gas a b
x x
@xb
1
@s gab x a x b = 0
2
Now multiply through by g cs (and sum over s):
1
c
cs @gas
x
+g
@s gab x a x b = 0
2
@xb
Now
@gas a b 1
x x =
2
@xb
@gas @gbs
+
@xa
@xb
x a x b
(5.3.7)
(5.3.8)
(5.3.9)
and substituting this into (5.3.8), taking into account the definition of the s, yields (5.3.4).
For the last part, the Lagrangian is homogeneous of degree 2 in the velocities, and so is
conserved along a solution curve (Proposition 1.4.4: the potential energy part is zero in the case
at hand.)
Computing the s is the first step in computing the curvature of the metric, and computing
the geodesic equations is needed to understand particle (and photon) motion in GR. We therefore
give some worked examples. In each case, the s are read o the geodesic equations (5.3.4) rather
than from the formula (5.3.2).
Example 5.3.7. Minkowski space. This is
ds2 = ab dxa dxb .
The Lagrangian is
1
L = ab x a x b
2
Then
@L
@L
= ab x b , a = 0.
a
@ x
@x
(5.3.10)
Then the geodesic equations are

d @L
= ab x
b = 0.
(5.3.11)
d @ x a
Thus the geodesic equations are x
b = 0, from which we read that all s are zero. The geodesics
have the form
xa ( ) = Y a + U a .
Of course, we already knew this.
58
Example 5.3.8. (The 2D hyperbolic metric) This is

dx2 + dy 2
x2
The Lagrangian L = (x 2 + y 2 )/2x2 . The EulerLagrange equations are
ds2 =
(5.3.12)
d x
1
d y
+ (x 2 + y 2 ) = 0;
= 0.
d x2 x3
d x2
Rearranging, these become
x 2 y 2
x y
+
= 0, y 2
= 0.
x
x
x
With x1 = x, x2 = y, these are supposed to be identical to the geodesic equations
x
x
1 +
Hence
1
11
NB the factor of 2 in
2
12
1 i j
x
ij x
= 0, x
2 +
2 i j
x
ij x
= 0.
(5.3.13)
(5.3.14)
1 1
1
1
,
= , 2 = 221 =
, others = 0.
x 22 x 12
x
coming from the symmetry i12 = i21 .
=
Example 5.3.9. The geodesics in 2D hyperbolic space Rather than tackle the secondorder equations (5.3.13) directly, it is better to work with conserved quantities. The first is the
length of the velocity vector. If we assume is arc-length (i.e. L = 1), then
x 2 + y 2 = x2 .
(5.3.15)
The Lagrangian is also independent of y, and so @L/@ y is constant along geodesics. So in

addition to (5.3.15), we also have
y
= C (constant).
(5.3.16)
x2
If C = 0, then y = 0, so y = y0 (constant). Subsituting this into (5.3.15) we get dx/x = d .
Picking the + sign, and integrating, we get x = x0 e . Thus one family of geodesics are half-lines
parallel to the x-axis, given by
(x( ), y( )) = (x0 e , y0 ).
Note the non-standard parameterization of these straight lines. In particular, the boundary
point (0, y0 ) is infinitely far away in : we require ! 1 for (x, y) ! (0, y0 ). This is not
outrageous because the metric (5.3.12) blows up at these points with x = 0.
Now we have to deal with the case C 6= 0. For this, divide (5.3.15) through by y 2 ,
2
dx
x2
1
+ 1 = 2 = 2 2,
(5.3.17)
dy
y
C x
where weve used (5.3.16) and
x
dx
=
.
y
dy
(5.3.18)
Rearranging (5.3.17), we obtain

p
and this can be integrated to get
Cxdx
= dy
1 C 2 x2
p
1
C 2 x2 = C(y
y0 )
for some constant of integration y0 . This can be rearranged to give

x2 + (y
y0 ) 2 = C
,x > 0
(5.3.19)
i.e. a semicircle with diameter along the y axis. It is pleasing that as C ! 0, the radius C 1
tends to infinity and these semicircles approach the straight half-lines that we saw previously.
5.3. GEODESICS
59
You can verify that

(x( ), y( )) = (C
sech , y0 + C
tanh u)
(5.3.20)
is a parameterizaiton by arclength. (To obtain this, you eliminate x from (5.3.16) using (5.3.19),
to get
dy
= C(C 2 (y y0 )2 )
d
and integrate this up to get y as a function of .)
Note that in this example it is relatively easy to obtain the geodesics as an implicit relation
between x and y (5.3.19), whereas finding x and y as a function of is rather more involved.
This is quite typical of the simple examples we shall see in this course.
Example 5.3.10. Minkowski space in polar coordinates. In spherical polars, the Minkowski
metric takes the form
ds2 = dt2 dr2 r2 (d2 + sin2 d'2 ).
(5.3.21)
The Lagrangian is
1 2
L=
t
r 2 r2 2 r2 sin2 ' 2
(5.3.22)
2
Then
@L
@L
@L
@L
= t,
= r,
= r2 ,
= r2 sin2 ' (5.3.23)
@ r
@ '
@ t
@
@L
@L
@L
@L
= 0,
= r(2 sin2 ' 2 ),
= r2 sin cos '2 ,
= 0.
(5.3.24)
@t
@r
@
@'
Thus the geodesic equations are
t = 0,
(5.3.25)
2
2
2
r r( + sin ' ) = 0,
(5.3.26)
2
+ r sin cos ' 2 = 0,
r
2
' + r ' + 2 cot ' = 0.
r
0
1
2
3
With x = t, x = r, x = , x = ', we read o that
1
22
1
33
r,
(5.3.27)
(5.3.28)
r sin2 ,
(5.3.29)
1
2
= ,
sin cos ,
(5.3.30)
33 =
r
1
3
3
3
3
(5.3.31)
13 = 31 = ,
23 = 32 = cot ,
r
while all others are zero. Note again that there are factors of 2 between the s and the coefficients
in the geodesic equations for the abc with b 6= c.
We will not go into finding the geodesics here as the moves you have to make were already
described in Example 1.4.5. And you are urged to review that example now! Of course this is
a bit dierent because of having the variable t as an additional coordinate in the problem. But
since t is a constant, , say, the constancy of L means that
r 2 + r2 (2 + sin2 ' 2 ) = 2L 2 2
2
12
2
21
is a constant. Also J = r2 sin2 ' is a constant and we can restrict to equatorial curves where
= /2 identically. Then we find that
r 2 + r2 ' 2 = 2L
and J = r2 '
(5.3.32)
are both constants just as in (1.4.18); the solution follows as there.

Remark 5.3.11. One important remark from this is that the Christoels can be non-zero
even if the metric is the Minkowski metric. Written in funny coordinates (here spherical polars)
many of the s are non-zero, although the intrinsic geometry of the metric is unchanged.
60
5.4. A first look at the covariant derivative

If X and Y are two vector fields on M , then we denote by rX Y the vector field with
components
(rX Y )c = X a @a Y c + cab X a Y b .
(5.4.1)
This is called the covariant derivative of Y with respect to X. In the previous sentence I slipped
in the idea that rX Y is a vector field. This is not obvious: neither of the two terms on the RHS
in (5.4.1) transforms as a vector field under a change of coordinates. However, the combination
does transform as a vector field:
Theorem 5.4.1. For any vector fields X and Y , rX Y is again a vector field on M .
A proof will be given in the next chapter. You can also look in 5.1 of Woodhouse, GR, for
a more computational proof.
5.4.1. Covariant derivative of a vector along a curve. Let ( ) be a curve in M with
tangent vector
= X.
(5.4.2)
If Y is another vector field defined along (or near) , then
rX Y = r (Y )
(5.4.3)
A = r
(5.4.4)
is the covariant derivative of Y along .

On a curved space-time this is the best available replacement for the derivative of a vector
along a curve in Minkowski space. In particular, if we take Y = X = we define the acceleration
of the curve to be the vector field along
Then the LHS of the geodesic equation
x
c + cab x a x b = X c + cab X a X b = X a @a X c +
Hence
c
a b
ab X X
= rX X.
(5.4.5)
Proposition 5.4.2. The curve 7! ( ) is a geodesic if and only its acceleration is zero.
This is pleasing because it is a natural generalization of what happens in Minkowski space.
In that case, the geodesics are of the form
7! X + U
where X and U are constant vectors. Then the tangent vector is U and the acceleration is zero.
And conversely any curve with zero acceleration in Minkowski space is of the above form and
extremizes the energy.
Definition 5.4.3. If ( ) is a curve in M with tangent vector = X then a vector field Y
along is said to be parallel, parallel-transported, or parallel-propagated along if
rX Y = 0 along .
(5.4.6)
Explicitly, in local coordinates, the parallel propagation equation (5.4.6) has the form
Y c +
c a b
Y
ab x
= 0.
(5.4.7)
Note that to be completely explicit, the s here are evaluated at the point xa ( ). With the
curve fixed, xa ( ) and x a ( ) are known, and so (5.4.7) is a first-order system of equations for the
unknown components Y b ( ) as a functions of . Thus given any point 0 and a given tangent
vector Z at (0 ), there is a unique solution of (5.4.7) with initial condition Y (0 ) = Z. In this
situation, we say that Y (1 ) is obtained from Z by parallel transport along .
In Minkowski space we know when two vectors at dierent points are parallel (or point in
the same direction). In a general curved space M there is no such global notion of parallelism,
and the above is the best one can do.
5.5. LOCAL INERTIAL COORDINATES
61
5.4.2. Photons. A photon worldline in GR is a null geodesic 7! ( ),

g( , ) = 0.
The frequency 4-vector of a photon, K, is assumed to be a constant multiple of its velocity
vector . If U is an observer with velocity 4-vector U (i.e. g(U, U ) = 1), and p is an event on
the photon worldline and the observers worldline, then the frequency measured by the observer
is g(p)[U (p), K(p)]. Compare with analogous situation in SR, 3.3.
5.5. Local inertial coordinates
Let g be a lorentzian metric with components gab in local coordinates defined near xa = 0.
We have seen that we can suppose that gab (0) = ab , the standard Minkowski metric, by a linear
change of coordinates. Then
gab (x) = ab + Habc xc + O(|x|2 )
(5.5.1)
for some set of numbers Habc with

Habc = Hbac .
(5.5.2)
Proposition 5.5.1. The Christoel symbols of (5.5.1) at x = 0 are

1
= cs (Hsab + Hsba
2
Proof. This follows by substituting
c
ab (0)
Habs ) .
@s gab (0) = Habs
(5.5.3)
(5.5.4)
into the formula for the Christoels (5.3.2).
In particular, geodesics do not have the simple form x

a = 0, even at x = 0, if H 6= 0. So
such a coordinate system is not a particularly good approximation to an inertial coordinate
system in Minkowski space.
However, a further change of coordinates can always be made to kill the coefficients Habc at
x = 0.
Theorem 5.5.2. Let (M , g) be a space-time and let p be any point of M . There exists a
choice of coordinates xa such that xa (p) = 0 and
gab = ab + O(|x|2 ) for small x.
(5.5.5)
Such coordinates are called local inertial coordinates at p. Note that no statement is made
about the form of the metric at other points near, but not equal to p.
The notation is motivated as follows: from the Proposition, if the metric has the form (5.5.5)
then cab = 0 with respect to these coordinates. Thus the geodesic equations
x
c +
c a b
x
ab x
=0
reduce to x
c at xa = 0, so at this point, at least, the equation is the same as for inertial worldlines
in Minkowski space. In particular freely falling particles and free photons will have worldlines
that appear straight in a very small neighbourhood of x = 0 in these special coordinates. This
is the precise sense in which GR reduces to SR over small length- and time-scales, but it only
works well relative to one of these inertial coordinate systems.
5.5.1. Proof of Theorem 5.5.2. We may suppose that
gab = ab + Habc xc
as the next term in the Taylor expansion is already of quadratic order. Consider the coordinate
transformation
1 c a b
xc = y c
G y y
(5.5.6)
2 ab
62
where the Gcab is an array of numbers to be determined, symmetric in the indices ab. The
Jacobian of the transformation is
@e
xc
= Jpc = pc + jpc , where jpc = Gcap y a .
(5.5.7)
@xp
This is invertible when y = 0 so by the inverse function theorem, (5.5.6) has a smooth inverse
as a mapping between a neighbourhood of y = 0 and a neighbourhood of x = 0.
Now we calculate, keeping only the leading terms (because we dont really care whats
happening at O(|y|2 ) and beyond):
gpq dxp dxq = gpq (x)(dy p
= gpq dy p dy q
Gpac y a dy c )(dy q
Gacq y a dy c dy q
Gqbd y b dy d )
Gbdp y b dy p dy d + O(|y|2 )
(5.5.8)
where we have set

Gacp = gpq Gqac , so Gacp = Gcap .
(5.5.9)
Hence
gepq (y) = gpq (x(y)) Gapq y a Gaqp y a + O(|y 2 |).
(5.5.10)
Now on the RHS we still have g(x) and we need to write this in terms of y. The inverse to our
transformation has the form
1
y a = xa + Gabc xb xc + O(|x|3 )
(5.5.11)
2
as you can see by inserting (5.5.6) on the RHS of this equation. It follows that
gpq (x(y)) = gpq (y) + O(|y|2 ) = pq + Hpqr y r + O(|y|2 ).
and so
gepq (y) = pq + Hpqa y a Gapq y a Gaqp y a + O(|y|2 ).
(5.5.12)
Thus we will get rid of the first order terms in y if we can choose the numbers G so that
Gapq + Gaqp = Hpqa .
(5.5.13)
(Here we have used the symmetry of the indices of G to neaten up the equation.) This is an
equation for the array of number G in terms of the array of numbers H. In Lemma 5.5.9 below
it is shown that this can always be solved, the solution being
1
Gabc = (Hacb + Hbca Habc ).
(5.5.14)
2
Note that this formula depends in an important way on the symmetry (5.5.9) of G. Inverting
(5.5.9) we define
1
Gdab = cd Gabc = cd (Hacb + Hbca Habc ).
2
Thus G is uniquely determined by H, and the by the above calculations, the change of coordinates (5.5.6) gives metric components geab which satisfy the conditions of the Theorem. The
proof is complete.
Remark 5.5.3. The array of numbers Gcab is nothing but cab (0), the Christoel symbols of
the metric components gab , evaluated at x = 0 (cf. Proposition 5.5.1).
It remains to prove:
Lemma 5.5.4. The equation (5.5.13) is solved by (5.5.14),
1
Gabc = (Hcab + Hcba Habc )
2
Proof. One proof of this is simply to substitute (5.5.14) into (5.5.13) and see that it works.
Namely, G is symmetric in its first two indices: simply switch them, and use the symmetry of
H in its first two indices. And then
1
Gabc + Gacb = (Hcab + Hcba Habc + Hbac + Hbca Hacb ).
(5.5.15)
2
5.6. A SNEAK PREVIEW OF CURVATURE
63
Now use the symmetry of H in its first two indices to arrange the indices as far as possible in
alphabetical order:
1
Gabc + Gacb = (Hacb + Hbca Habc + Habc + Hbca Hacb ) = Hbca
2
as required.
There is also a derivation of this formula in the Problem set, Problem 4.9.
(5.5.16)
5.6. A sneak preview of curvature

By Theorem 5.5.2, in inertial coordinates,
1
gab = ab + Pabcd xc xd + O(|x|3 ),
2
where P is an array of numbers with the symmetry properties
Pabcd = Pbacd = Pabdc .
(5.6.1)
(5.6.2)
We ask: is there a change of coordinates which can get rid of the P term here? The answer is
no in general, but it is interesting to try.
It would be natural to try
1 a b c d
x
ea = xa + Wbcd
xxx
(5.6.3)
6
to change P . Note that here the array of numbers W satisfies
a
Wbcd
is totally symmetric in bcd
(5.6.4)
in the sense that

a
a
a
a
Wbcd
= Wcbd
= Wbdc
= Wdcb
etc.
In Problem 4.8, you are invited to show that if gab is as in (5.6.1) and x
e and x are related
by (5.6.3), then
1
geab = ab + Peabcd x
ec x
ed + O(|e
x| 3 )
(5.6.5)
2
where
Peabcd = Pabcd Wabcd Wabdc
(5.6.6)
Now I claim that this cannot be solved in general, because W just does not have enough
parameters! For this, we need to do some counting.
5.6.1. Counting tensor components.
Definition 5.6.1. A tensor T of type (0, m) in n dimensions is said to be totally symmetric
if for the corresponding m-linear form we have
T (v1 , . . . , vi , . . . , vj , . . . , vm ) = T (v1 , . . . , vj , . . . , vi , . . . , vm )
for any i and j. In components this is the same as saying
Tp1 ...pi ...pj ...pm = Tp1 ...pj ...pi ...pm .
Proposition 5.6.2. The dimension of the vector space of all totally symmetric tensors of
type (0, m) in n dimensions is
n+m 1
.
(5.6.7)
m
Definition 5.6.3. A tensor of type (0, m) in n dimensions is totally skew (or totally skew
symmetric if the corresponding m-linear form has the property
T (v1 , . . . , vi , . . . , vj , . . . , vm ) =
for any i and j.
T (v1 , . . . , vj , . . . , vi , . . . , vm )
64
Proposition 5.6.4. The dimension of the vector space of totally skew tensors of type (0, m)
in n dimensions is

n
(5.6.8)
m
(In particular the dimension is 1 if n = m and 0 if m > n.)
Proof. We shall prove both of these together. By way of a warm-up lets make sure that
we understand that
The dimension of the space of all tensors of type (0, m) in n dimensions is nm
(5.6.9)
To see this, note that such a tensor has m indices. We can choose each one of these in n ways1.
All choices are independent because there are no symmetry conditions. So the total number of
possibilities is nm . The number of these choices is equal to the dimension of the space of these
tensors.
Let us move on to the proof of Proposition 5.6.4. Again we have a tensor with m indices.
For ease of exposition, suppose that m = 3. Any component with two indices the same must
be zero, because (for example)
T115 = T115
by switching the first two indices. So the only non-zero components of T have distinct indices.
If we have a set of 3 distinct indices, say, 523, then we can use the skew symmetry to relate the
T523 to T235 , where the indices are now in increasing order:
T523 =
T253 = T235
(switching first the first two indices and then the last two). Thus the number of independent
components of a totally skew tensor of type (0, 3) is equal to the number of unordered subsets of
3 elements of the set {1, . . . , n}. This is the binomial coefficient

n
3
The general case, with 3 replaced by m, works in the same way.
Finally let us prove Proposition 5.6.2. The big dierence from the case of skew symmetry is
that now that indices can take the same value without that component being zero. For a small
number of indices (e.g. 3) its possible to count by hand. There are n components where the
indices are the same:
T111 , T222 , . . . , Tnnn .
There are n(n 1) with precisely two indices the same:
T112 , T113 , . . . , T11n ; T221 , T223 , . . . .
And there are n(n 1)(n 2)/6 where the indices are all distinct. Thus the total number of
independent components of a totally symmetric tensor of type (0, 3) in n dimensions is:
1
n(n + 1)(n + 2)
1) + n(n 1)(n 2) =
(5.6.10)
6
6
which checks with (5.6.7) if m = 3.
This approach can be generalized to tensor of any rank, but its pretty messy. The following
is the cunning way of doing it.
Consider a collection of indices on our totally symmetric tensor with m indices. It will
consist of m1 1s, m2 2s, and so on, up to mn ns. Here the mj are allowed to be 0, but the
constraint that the tensor is of type (0, m) is
n + n(n
m1 + m2 + + mn = m
(5.6.11)
This combinatorial problem can be visualised in the following way. Consider an arrangement of
m coins and n 1 pencils in a line, as in the example below:
1Since they vary from 1 to n
5.6. A SNEAK PREVIEW OF CURVATURE
65
Given such an arrangement, we count the coins to the left of the first pencil, and call that m1 .
Then we count the coins between the first and second pencils, and call that m2 . Proceeding
in this way, we get a collection of n integers mj > 0, satisfying the constraint (5.6.11). (Note
that mj = 0 if two of the pencils are right next to each other, or if there is a pencil at the very
beginning or the very end of the line.)
In the pictured configuration there are 5 pencils and 8 coins, and
m1 = 3, m2 = 2, m3 = 1, m4 = 2, m5 = 0, m6 = 0.
This would correspond to the component
T11122344
of a totally symmetric tensor of type (0, 8) in 6 dimensions. The number of arrangements of m
coins and n 1 pencils is the same as the number of ways of choosing m objects (the ones to
be called coins) from a total of m + n 1. This is the binomial coefficient (5.6.7).
5.6.2. Sneak preview of curvature, continued. Weve seen that the coordinate transformation (5.6.3) allows us to change the array Pabcd , where
Pabcd = Pbacd = Pabdc
(5.6.12)
to
Peabcd = Pabcd Wabcd Wabdc
(5.6.13)
where W is symmetric in its last 3 indices.
What is the dimension of the space of P s? P is symmetric in its first two indices and its
third and fourth indices, but there is no other symmetry. So it is like a 2-index object, PIJ ,
where I and J run over a basis of the space of symmetric 2-index tensors. If that dimension is
N , the dimension of the space of P s will be N 2 . But weve seen that N = n(n + 1)/2 (if the
dimension is nwe work generally for the moment). Hence:
In an n-dimensional manifold, the dimension of the space of P s is
1 2
n (n + 1)2
4
On the other hand, the dimension of the space of W s is n (for the extra index) times
n(n + 1)(n + 2)/6 (the dimension of totally symmetric tensors of type (0, 3).
In an n-dimensional manifold, the dimension of the space of W s is
1 2
n (n + 1)(n + 2).
6
Thus the dimension of the space of P s minus the dimension of the space of W s is
n2 (n + 1)2
4
n2 (n + 1)(n + 2)
6
=
=
=
n2 (n + 1)
(3(n + 1)
12
n2 (n + 1)
(n 1)
12
1 2 2
n (n
1)
12
2(n + 2))
(5.6.14)
Since this number if positive for n > 2, it will be impossible to solve (5.6.13) to make Pe = 0 in
general. (Our calculation shows that Pe = 0 is a system of linear equations with more equations
than unknowns.)
66
Thus, while some components of P can be killed by coordinate transformations, there are
others, in fact n2 (n2 1)/12 of them, which cannot. These unkillable components of P form
a tensor called the curvature of g at the point x = 0.
In fact, the Riemann curvature tensor at x = 0 is built out of P in the following way:
1
Rabcd = (Pacbd + Pbdac Padbc Pbcad ).
(5.6.15)
2
It can be checked that if P is changed to Pe as in (5.6.13), then the components of R do not
change! Thus if this particular combination of components of P is non-zero then it cannot be
killed by coordinate transformation.
These matters will be discussed much more extensively in the next chapter, where we shall
see a dierent, but equivalent, definition of curvature.
CHAPTER 6
Covariant dierentiation and curvature

6.1. Introduction
In the previous chapter we introduced geodesics in a curved space-time. The geodesic
equation motivated the introduction of the combination
X a @a Y c +
c
a b
ab X Y
(6.1.1)
which we claimed form the components of a vector field rX Y the covariant derivative of Y with
respect to X. In this chapter we shall sketch a proof of this important fact and shall define the
curvature tensor. One definition of this is as follows
R(X, Y )Z = (rX rY
r Y rX
r[X,Y ] )Z
(6.1.2)
for vector fields X, Y , Z. In index form, the LHS is written

Rabc d X a Y b Z c
(6.1.3)
where Rabc are the components of a tensor of type (1, 3).

The curvature is zero for Minkowski space, and conversely (though we wont prove this)
if the curvature is zero near a point, then we can find coordinates in which the metric is the
Minkowski metric near that point. R is thus a good coordinate-independent measure of whether
a metric is curved or not.
Curvature makes its presence felt in the behaviour of families of geodesics. We shall derive
the geodesic deviation equation and interpret this in terms of relative acceleration of nearby
geodesics. By comparing this with what happens in newtonian gravity we shall relate the
newtonian potential to the metric and derive the analogue of Laplaces equation r2 ' = 0 for
(time-independent) gravitational fields in empty space. Thus we are led to Einsteins equations
rab = 0, where r is the Ricci tensor of the metric.
Finally we shall work out the weak-field limit as another cross-check between the formalism
of GR and newtonian gravity.
6.2. The covariant derivative
We shall start by showing that if X and Y are vector fields, then so is rX Y , defined in
components by (6.1.1). This is not trivial because the two terms separately certainly do not
transform as the components of a vector field. The term in the s is needed to correct for
the bad transformation laws of the components of the partial derivatives of a vector field with
respect a given system of coordinates.
Lemma 6.2.1. If X is any vector field, then so is rX X.
Proof. We obtain this from the variational formulation.
Consider a 1-parameter family of curves
H(, ) 6 6 ,
<
< .
(6.2.1)
For each fixed , 7! H(, ) is a curve in M . We get two vector fields

X=
@H
@H a @
=
@
@ @xa
(6.2.2)
Y =
@H
@H a @
=
.
@
@ @xa
(6.2.3)
and
67
68
6. COVARIANT DIFFERENTIATION AND CURVATURE
(Here xa = H a ( , ) is the description of the family of curves with respect to the local coordinate
system xa .) Define
Z
1
E( ) =
g(X(, ), X(, )) d,
(6.2.4)
2
the energy of the curve labelled by in our family. (This is a small abuse of notation.) We
claim
Z
dE
(0) =
g(rX X, Y ) d + [gq (Xq , Yq ) gp (Xp , Yp )]
(6.2.5)
d
where the integrand is evaluated at = 0, q = H( , 0), p = H(, 0) and the subscripts on the
terms in square brackets denote evaluation of the quantities at the indicated points.
If you grant me (6.2.5) for the moment, then the lemma follows rapidly. The issue is the
coordinate independence, and the term in square brackets is certainly invariant under coordinate
changes, as is the LHS of (6.2.5). It follows that for any fixed curve and vector fields along the
curve, the integral in (6.2.5) is also coordinate independent.
Now letting and vary, we conclude that the integrand g(rX X, Y ) must be a scalar
quantity, and since Y is a vector field, rX X must also be a vector field.
So it remains to prove (6.2.5). This involves going through the calculus of variations with
the boundary term.
Z
1
E( ) =
g(@H(, )/@t, @H(, )/@t), d.
(6.2.6)
2
Dierentiate with respect to to get
Z
dE
@ 2 H @H
=
g(
,
) d.
(6.2.7)
d
@ @ @
The usual calculus-of-variations trick is to integrate by parts here. For this, note
@ @H @H
@ 2 H @H
@H @ 2 H
g(
,
) = (X c @c gab )Y a X b + g(
,
) + g(
,
)
@ @
@
@ @ @
@
@ 2
@ 2 H @H
= g(
,
) + g(rX X, Y ),
@ @ @
using the definition of the s. Integration from to yields (6.2.5).
(6.2.8)
(6.2.9)
Lemma 6.2.2. The quantity

rX Y
is a vector field, for any vectors X and Y .
rY X
(6.2.10)
Proof. We have
(rX Y )c = X a @a Y c +
Hence the components of (6.2.10) are
X a @a Y c
c
ab
c .
ba
c
a b
ab X Y .
Y a @a X c ,
(6.2.11)
(6.2.12)
again using
=
So it is sufficient to check that this combination of components transforms
as a vector field. Under a change of coordinates, Now if we change
xa b e a @e
xa b
e a = @e
X
X
,
Y
=
Y .
(6.2.13)
@xb
@xb
Hence
2 eb
xb
@e
xb a
c
a c @ x
e a @ Ye b = X a @a (Y c @e
X
)
=
X
@
Y
+
X
Y
a
@e
xa
@xc
@xc
@xc @xa
If we swith X and Y and subtract, the second term on the RHS drops out because of the
symmetry of the mixed partials of x
eb with respect to the x variables. Thus we are left with the
transformation law
xb a
e a @a Ye b Ye a @a X
e b = @e
X
(X @a Y c Y a @a X c ).
(6.2.14)
@xc
as required for a vector field.
6.2. THE COVARIANT DERIVATIVE
69
Now we can complete our proof. We know that rX X is a vector field for every vector field
X. Replacing X by X + Y , we conclude that
rX X + r X Y + rY X + r Y Y
is a vector field for every pair of vector fields X, Y . Hence rX Y + rY X is a vector field for
every X and Y . But so is the dierence, by the above lemma. Now
1
rX Y = (rX Y + rY X + [X, Y ])
(6.2.15)
2
and we have written rX Y as a sum of two things that we know are vector fields. So it must
itself be a vector field.
Remark 6.2.3. There is a direct, computational proof, in Woodhouse, GR, 4.5.
6.2.1. The Lie bracket.
Definition 6.2.4. The quantity rX Y
bracket of the vector fields X and Y .
rY X is also denoted [X, Y ] and is called the Lie
We proved in Lemma 6.2.2 that [X, Y ] is a vector field by a local coordinate calculation. In
this section we give a more conceptual proof of the same fact. The formula (6.2.12) shows that
the Lie bracket depends only on the vector fields X and Y and not on the metric g. That is,
we can define an operator on functions f :
[X, Y ]f = X(Y f )
Y (Xf )
(6.2.16)
We aim to show directly that [X, Y ], so defined, is a vector field.

We begin with a lemma:
Lemma 6.2.5. Suppose that T : C 1 (U ) ! C 1 (U ) is a real-linear map which satisfies the
Leibniz rule
T [' ](p) = '(p)T ] (p) + (p)T '(p).
for all smooth functions ' and
. Then T is equal to a vector field on U .
Proof. Choose local coordinates xj such that xj = 0 corresponds to the point p. We aim
to show that there is a set of numbers T j such that
@f
T [f ]p = T j j (0)
@x
Define
X j = T xj .
By linearity T annihilates constants, for
T [1] = T [12 ] = 2T [1] by the Leibniz rule.
Hence T [1] = 0 and so T [c] = 0 for any constant.
Now let f be any smooth function. We can write
f (x) = f (0) + xj @j f (0) + O(|x|2 ).
Applying T , we see
T f (0) = X j (0)@j f (0)
because, by the Leibniz rule, T applied to a smooth function vanishing to order 2 or more must
vanish.
Now we have the following slick proof that [X, Y ] is a vector field if X and Y are vector
fields.
Proposition 6.2.6. If X and Y are vector fields, then so is [X, Y ].
70
Proof. By the preceding Lemma, it is enough to prove that

[X, Y ](' ) = '[X, Y ] + [X, Y ]'
for any two smooth functions ' and
XY (' ) = X['Y
. We calculate
+ Y '] = (X')(Y ) + 'X(Y ) + (X )(Y ') + X(Y ').
Similarly,
Y X(' ) = (Y )(X') + 'Y X + (Y ')(X ) + Y (X').
Subtracting, we obtain
[X, Y ](' ) = '[X, Y ] + [X, Y ]'.
Example 6.2.7. If we pick coordinates xa then we have the corresponding (locally defined)
vector fields @0 , @1 , @2 , @3 . We have
[@a , @b ] = 0.
(6.2.17)
6.2.2. The covariant dierential. For functions we had df . This was a covector field.
For vector fields Z we can consider rZ. This is a (1, 1) tensor. We denote it in indices by
ra Z c . The covariant directional derivative with respect to X is then the contraction X a ra Z c .
In index form,
ra Z c = @a Z c + cab Z b .
6.3. Extension to all tensors
We already know how to dierentiate functions with respect to vector fields. From now on
we also denote Xf by rX f . We now have the covariant derivative of vector fields as well, rX Y .
There is now a natural way also to dierentiate covector fields, which respects the natural dual
pairing between vectors and covectors.
Definition 6.3.1. If is a covector (tensor of type (0, 1)), then we define r so that
for all vector fields Y .
hrX , Y i + h, rX Y i = Xh, Y i
In indices, this reads

(X a ra b )Y b + b (X a ra Y b ) = X a ra (b Y b ).
This determines ra b uniquely. Indeed, using
ra Y b = @ a Y b +
we get
(ra b )Y b + b (@a Y b +
Hence we require, for all Y b ,
b
c
ac Y )
(ra b )Y b = (@a b
b
c
ac Y .
= (@a b )Y b + b @a Y b .
c
b
ab c )Y .
As this is true for all Y , we have

Proposition 6.3.2. The covariant derivative operator acting on covector fields is given in
coordinates by
c
ra b = @ a b
(6.3.1)
ab c
Remark 6.3.3. We say that we have extended r from vector fields to covector fields by
using the Leibniz rule.
Similarly, we use the Leibniz rule to extend from tensors of type (1, 0) and (0, 1) to any
tensor of type (r, s).
Theorem 6.3.4. There is a unique extension of r to act on all tensors so that:
(1) If T is a tensor of type (r, s) then rX T is again a tensor of type (r, s); equivalently
rT is a tensor of type (r, s + 1).
6.4. PROPERTIES
71
(2) rX is real-linear:
r( T + S) = rT + rS
for any two tensor of type (r, s) and real numbers and .
(3) r satisfies the Leibniz rule:
rX (T S) = (rX T ) S + T (rX S)
for any tensors T and S, not necessarily of the same type;

(4) If U is a contraction of tensors T and S, then
rX U = c(rX T )S + c(T rX S)
where c denotes the contraction.
Remark 6.3.5. I omit the proof, but refer you to the problem set fo related exercises.
It is a pain to write out the general case, but here are some examples:
ra Tbc = @a Tbc
ra Acb = @a Acb
ra P bc = @a P bc +
s
s
ac Tbs ;
ab Tsc
s
c
c
s
ab As + as Ab ;
b
sc
+ cas P bs
as P
(6.3.2)
(6.3.3)
(6.3.4)
6.4. Properties
The covariant derivative operator we have been discussing has the following further properties:
Symmetry or Torsion-free:
for all functions f
ra rb f = rb ra f
(6.4.1)
ra gbc = 0.
(6.4.2)
Metric preservation
There are more general dierentiation operators which satisfy the Leibniz-rule type properties of the previous section. These are called connections. Given a metric, there is a unique
such connection which satisfies the two boxed properties here. This is also called the metric
connection or the Levi-Civita connection after Tullio Levi-Civita (18731941).
Proof. (Of boxed properties) We have
ra rb f = @ a @ b f
c
ab @c f.
(6.4.3)
c
ab
But the mixed partials of f are symmetric and

is symmetric in its lower indices. Hence the
RHS is symmetric and the result follows.
Similarly, we have
s
s
ra gbc = @a gbc
(6.4.4)
ac gbs .
ab gsc
Now from the definition of the s,
1
@c gab )
abc = (@a gbc + @b gac
2
and so the result follows.
The metric preservation property has important consequences. Recall that the metric g can
be used to lower indices and that its inverse can be used to raise indices. Because rg = 0 it
follows that rg 1 = 0, and raising and lowering indices commutes with dierentiation.
For example,
ra (g bs s ) = ra b = g bs ra s
(6.4.5)
The middle thing is the covariant derivative of the index-raised version of ; the right-hand
thing is the derivative of , with its index raised afterwards.
72
Example 6.4.1. If Y is parallel propagated along a curve , then its length is constant.
For if = X, we have rX Y = 0. Then
rX (g(Y, Y )) = 2g(Y, rX Y ) = 0
More explicitly,
rX (Y a Ya ) = (rX Y a )Ya + Y a rX Ya
And rX Y a = 0 implies also that rX Ya = 0. So both terms are 0.

6.5. Curvature
From the mathematical point of view, this section is possibly the most important in these
notes.
6.5.1. Component definition.
Theorem 6.5.1. Let X be a vector field. Then
rb ra )X d = Rabc d X c
(ra rb
(6.5.1)
where Rabc d are the components of a tensor of type (1, 3).
Definition 6.5.2. The tensor R with components Rabc d is called the Riemann curvature
tensor (or just curvature tensor) for short.
Proof. We prove first the following formula:
Rabc d = @a
d
bc
@b
d
ac
d p
ap bc
d p
bp ac
(6.5.2)
Although it is important, and you need to know that it exists, I DO NOT RECOMMEND
THAT YOU COMMIT THIS TO MEMORY.
Recall that if Tb c is a (1, 1) tensor, then
ra Tb d = @ a Tb d +
d
s
as Tb
s
d
ab Ts .
(6.5.3)
We are going to apply this with Tb d = rb X d and then subtract the corresponding equation
with a and b switched. In fact if we do this first, we get
s
ab
s .
ba
ra Tb d
r b Ta d = @ a Tb d +
d
s
as Tb
@ b Ta d
d
s
bs Ta
(6.5.4)
because
=
Now put Tbd = rb X d and expand rX = @X + X. The first two terms on the RHS of
(6.5.4) are
@a (@b X d +
d
s
bs X )
d
s
as (@b X
s
p
bp X )
d
s
d s
p
bs X + as bp X
@a @b X d + dbs @a X s +
= @a
+
(6.5.5)
d
s
as @b X .
(6.5.6)
We have rearranged the terms in this order because those on line (6.5.6) are symmetric in ab.
Hence these disappear when we subtract the corresponding expression with a and b switched.
Thus we get
Rabc d X c = (ra rb
rb ra )X d = [@a
d
bc
@b
d
ac
d p
ap bc
d p
c
bp ac ]X
(6.5.7)
which is equivalent to (6.5.2) as X is an arbitrary vector field.

It remains to show that the components Rabc d transform correctly under a change of coordinates. The very bad way to do this is to try to change coordinates in (6.5.2). We do know,
however, that ra rb X d is a tensor of type (1, 2) for any given vector field X. It follows that the
LHS of (6.5.1) is a tensor of type (1, 2) as well. If we write
Jba =
@xa
,
@e
xb
(6.5.8)
It follows that
epqr s X
e r = Jpa Jqb (J
R
1 s
)d Rabc d X c ,
(6.5.9)
6.5. CURVATURE
73
where J is the Jacobian of the transformation. But we are assuming that X is a vector field as
well, so
e r.
X c = Jrc X
Inserting this into (6.5.9) we obtain
epqr s X
e r = Jpa Jqb Jrc (J
R
e r . Hence
for any components X
1 s
e r,
)d Rabc d X
(6.5.10)
1 s
)d Rabc d
(6.5.11)
epqr s = Jpa Jqb Jrc (J

R
which is the transformation law for a tensor field of type (1, 3).
Remark 6.5.3. Cf. 5.6 of Woodhouse, GR.
Example 6.5.4. Curvature of hyperbolic space.
In the previous chapter, we considered the 2-dimensional hyperbolic metric
dx2 + dy 2
.
x2
ds2 =
With x = x1 and y = x2 we computed the non-zero Christoel symbols:

1
11
1
,
x
1
22
1
,
x
2
12
2
21
1
, others = 0.
x
(6.5.12)
Using the formula (6.5.2) for curvature, we have

R121 2 = @1
2
21
@2
2
11
2 p
1p 21
2 p
2p 11
(6.5.13)
There is quite a bit of simplification because many of the s are zero in this case:
R121 2 = @1
2
21
2 2
12 21
2 1
21 11
(6.5.14)
is all that survives. Substituting from (6.5.12), we obtain

R121 2 = @x ( x
)+x
=x
(6.5.15)
It turns out that all other components of the curvature are either this one, or are zero. In
particular if either 1 or 2 is repeated three or more times, then the corresponding curvature
component is zero. If 1 and 2 appear precisely twice each, then the corresponding component
is x 2 . From (6.5.15) we also have
R1212 = x
because g22 = x
2.
Example 6.5.5. Curvature of flat space in polar coordinates. In 2D polar coordinates, the
s are:
1
1
2
2
(6.5.16)
22 = r, 12 = 21 = ,
r
all others zero ( r = x1 , = x2 ). This time, (6.5.13) reduces to
R121 2 = @1
2
12
2 2
12 21
1
1
+ 2 = 0.
2
r
r
This illustrates the fact that curvature is a tensor: we knew this result ahead of time, because
it is clear that the curvature of a flat metric is zero, and weve just changed coordinates.
74
6.5.2. Directional derivative definition. In this section, we pursue an alternative, more

invariant, way of defining the curvature tensor.
Definition 6.5.6. The curvature R of the covariant derivative r is defined, for any three
vector fields X, Y , Z, as follows:
R(X, Y )Z = (rX rY
r Y rX
r[X,Y ] )Z
(6.5.17)
Remark 6.5.7. If X = @a and Y = @b , then the commutator is zero and we get

R(@a , @b )Z = (ra rb
rb ra )Z
(6.5.18)
and comparing with the previous section, the RHS is Rabc d Z c in components. Thus the component version of the curvature tensor arises from this definition by taking X = @a and Y = @b .
We shall show that the operation
(X, Y, Z) 7 ! (rX rY
r Y rX
r[X,Y ] )Z
(6.5.19)
is C 1 -linear in each variable, i.e.

(rX rY
r Y rX
r[X,Y ] )(f Z) = f (rX rY
r Y rX
r[X,Y ] )Z
(6.5.20)
r[f X,Y ] )Z = f (rX rY
r Y rX
r[X,Y ] )Z.
(6.5.21)
and
(rf X rY
r Y rf X
for any smooth function f . We then prove a technical lemma which explains why being C 1 linear implies that the operation (6.5.19) is then given by a tensor
(X, Y, Z) 7! R(X, Y )Z or Rabc d X a Y b Z c in components.
(6.5.22)
6.5.3. Proof of (6.5.20). We simply compute:

rX rY (f Z) = rX (f rY Z + (Y f )Z).
Continuing,
rX rY (f Z) = f rX rY Z + Xf rY Z + XY f Z + (Y f )rX Z.
Interchanging X and Y and subtracting, we get
[rX rY
rY rX ](f Z) = f [rX rY
rY rX ]Z + (XY f
Y Xf ) Z.
But
r[X,Y ] (f Z) = ([X, Y ]f )Z + f r[X,Y ] Z
and so subtracting this from each side gives the C 1 -linearity of Z 7! R(X, Y )Z for fixed X and
Y.
Remark 6.5.8. Note that all we have used in this proof is that directional derivatives satisfy
the Leibniz rule.
6.5.4. Proof of (6.5.21). This is left as an exercise for the reader. It is somewhat shorter
than the proof of (6.5.20). You will need to the formula
[X, f Y ]u = (Xf )(Y u) + f [X, Y ]u
which is also a good exercise.
(6.5.23)
6.5. CURVATURE
75
6.5.5. Completion of proof. The operation (6.5.19)

(X, Y, Z) 7 ! (rX rY
r Y rX
r[X,Y ] )Z
is certainly a dierential operator of order 2 in Z and order 1 in X and Y . The next lemma
looks complicated but it just says that a dierential operator of order 2 which is also C 1 -linear
must actually be algebraic, i.e. given by multiplication by multiplication by a tensor field.
Lemma 6.5.9. Let P be a second-order dierential operator which maps vector fields to vector
fields. Suppose further that P is C 1 -linear, that is
P (f Z) = f P (Z)
(6.5.24)
for any smooth function f . Then in fact P is given by a (1, 1) tensor in the sense that
P (Z)c = Pac Z a .
(6.5.25)
Proof. Since P is a second-order operator, in local coordinates we can write it as

a
cd
a
d a
P (Z) = Abcd
a @ b @ c Z + Ba @ c Z + C a Z .
We compute P (f Z). We have

@b @c (f Z a ) = @b @c f Z a + (@b f @c Z a + @c f @b Z a ) + f @b @c Z a .
Similarly
@c (f Z a ) = @c f Z a + f @c Z a
Hence,
a
P (f Z) = f P (Z) + Abcd
a (@b @c f )Z
a
Abcd
a (@b f @c Z
+ @ c f @b Z ) +
(6.5.26)
Bacd @c f Z a .
(6.5.27)
Because P (f Z) = f P Z, we obtain
a
a
a
cd
a
Abcd
a (@b @c f Z + @b f @c Z + @c f @b Z ) + Ba @c f Z = 0
(6.5.28)
for any f and Z. If we pick f to be a product xp xq of two coordinate functions, substitute into
this and then set x = 0, the only surviving term is
p q
a
pqd a
Abcd
a (@b @c (x x ))Z = 2Aa Z .
and this is supposed to vanish. This being true for all vectors Z, we conclude that Apqd
a vanishes
at x = 0. This point was arbitrary, so Apqd
vanishes everywhere, and (6.5.28) now reads
a
Bacd @c f Z a = 0
(6.5.29)
for all f and Z. We apply the same argument with f = xp and this gives Bapd = 0 at x = 0.
Again the point was arbitrary, so it follows that B vanishes everywhere. Hence P (f Z) = f P Z
implies that (P Z)d = Cad Z a , as required.
Applying this Lemma to our map (6.5.19), first as an operator on Z, with (X, Y ) fixed, then
in X, with (Y, Z) fixed, we see that R(X, Y )Z is indeed given by Rabc d X a Y b Z c where Rabc d are
the components of a tensor of type (1, 3).
Remark 6.5.10. This lemma is a bit fiddly to prove, but the idea that a dierential operator
which is also C 1 -linear has to be given algebraically (by multiplication by a tensor) is a powerful
one in dierential geometry which saves a lot of computation in local coordinates.
76
6.6. Curvature at a point

Cf. Woodhouse, GR, 5.7. Recall from Theorem 5.5.2 that at any point p, inertial coordinates can be chosen so that
gab (x) = ab + O(|x|2 ), (x ! 0).
(6.6.1)
The formula (6.5.2) for curvature simplifies at such a point:
Proposition 6.6.1. In inertial coordinates xa which satisfy (6.6.1),

1
Rabcd (0) = (@a @c gbd (0) + @b @d gac (0) @a @d gbc (0) @b @c gad (0))
2
(6.6.2)
Proof. This follows because in such coordinates all first derivatives of gbc and g bc vanish
at x = 0, so the s vanish at x = 0. Hence the quadratic terms in (6.5.2) vanish at p, giving
Rabc d = @a
and
@a
d
bc
so
d
bc
@b
d
ac
1
= g ds (@a @b gcs + @a @c gbs
2
1
(@a @b gcd + @a @c gbd
2
Formula (6.6.2) follows from this and (6.6.3).
gds @a
s
bc
at x = 0
(6.6.3)
@a @s gbc ) at x = 0,
(6.6.4)
@a @d gbc ) at x = 0.
(6.6.5)
Example 6.6.2. Consider the two-dimensional lorentzian metric

ds2 = dt2
(1 + t2 )dx2 .
If t = x0 and x = x1 , then by the Proposition,

1
1
R0101 = (@0 @0 g11 + @1 @1 g00 @0 @1 g01 @1 @0 g10 ) = ( 2) =
2
2
So the curvature is non-zero at the point x = 0, t = 0.
(6.6.6)
1.
(6.6.7)
Theorem 6.6.3. If Rabc d 6= 0 at a point, then there is no choice of coordinates centred at

that point with respect to which gab = ab near the point.
Proof. Suppose that there is such a choice of coordinates. Then the s all vanish, ra = @a
and curvature = 0. Contradiction.
Remark 6.6.4. There is a converse statement which we shall not prove: if R = 0 in a small
neighbourhood of a point p, then there are local coordinates centred at p with respect to which
gab = ab .
6.6.1. Commutators on tensors of higher rank. Recall that the covariant derivative
on vectors has been extended to act on all tensors in a way compatible with the basic algebraic
operations on tensors. This implies that when the commutator ra rb rb ra is applied to any
tensor, the result can also be expressed in terms of the algebraic operation of the curvature on
the tensor.
There are good ways to derive and remember these results and bad ways to do it. The worst
way is to work directly with the s.
Proposition 6.6.5. If is a covector, then
(ra rb
rb ra )c =
Rabc d d .
(6.6.8)
Proof. One proof is to use the fact that the ra preserves the metric and a symmetry of
the curvature tensor proved below. If b = g ab a , then we have
If we lower the index, we get
(ra rb
rb ra )d = Rabc d c .
(6.6.9)
(ra rb
rb ra )d = Rabcd c
(6.6.10)
6.6. CURVATURE AT A POINT
Now, as proved below Rabcd =
77
Rabdc . So we can rearrange the indices to get
(ra rb
rb ra )d =
Rabdc c =
Rabd c c
(6.6.11)
A second proof runs as follows. Let X be any vector. Then a X a is a function and we know
rb ra )(d X d ) = (@a @b
(ra rb
@b @a )(d X d ) = 0.
(6.6.12)
On the other hand, by the Leibniz rule

ra rb (d X d ) = (ra rb d )X d + (ra d rb X d + ra X d rb d + d ra rb X d .
(6.6.13)
The terms in the first derivatives of and X are symmetric in ab, so subtracting the corresponding expression with a and b switched, we obtain
0 = (ra rb
rb ra )(d X d ) = X d (ra rb
rb ra )d + d (ra rb
rb ra )X d ,
(6.6.14)
where in the last equation we have switched the two dummy indices c and d so as to have X d
on each side. Hence
X d (ra rb
d Rabc d X c =
rb ra )d =
X d Rabd c c .
(6.6.15)
We now make the usual argument that as X is arbitrary, this equation implies the result.
Similarly, we obtain results such as

(ra rb
(ra rb
(ra rb
rb ra )T cd = Rabs c T sd + Rabs d T cs ,
rb ra )Adc
rb ra )Scd
=
=
Rabc Ads +
Rabc s Ssd
Rabs Asc ,
Rabd s Scs .
(6.6.16)
(6.6.17)
(6.6.18)
For a tensor T of type (r, s) the structure of the formula on the RHS will be a sum of r + s
all of the form RT ; there will be r terms with + signs, corresponding to the upstairs indices of
T and s terms with signs, corresponding to the lower indices of T .
6.6.2. Symmetries of the curvature tensor. The formula for the curvature at a point
in Proposition 6.6.1 allows us to write down the general symmetry properties of Rabcd .
Theorem 6.6.6. For any metric, the curvature tensor has the following symmetries:
Rabcd =
Rbacd , Rabcd =
Rabdc ;
(6.6.19)
Rabcd + Rbcad + Rcabd = 0.
(6.6.20)
Rabcd = Rcdab .
(6.6.21)
And the interchange symmetry
Proof. All follow by inspection of the formula in Proposition 6.6.1 and the fact that
@a @b f = @b @a f for any smooth function f .
Theorem 6.6.7. In n dimensions, the number of algebraically independent components of

Rabcd is
1 2 2
n (n
1).
(6.6.22)
12
Proof. The number of independent components of a skew two-index tensor in n dimensions
is n(n 1)/2. As at the end of the previous chapter, taking into account (6.6.19) and (6.6.21),
the number of independent components of a tensor with just those symmetries is
1
n(n
8
1)(n2
n + 2).
(6.6.23)
78
It turns out that (6.6.20) is only independent of the other two if all four indices are distinct.
So this imposes another n-choose-4 conditions on the components. Hence the number of independent components is
1
n(n
8
1)(n2
n(n
n + 2)
1)(n 2)(n
24
1
n(n 1) 3n2
24
1 2 2
n (n
1).
12
=
=
3)
3n + 6
(n
2)(n
3)
(6.6.24)
If n = 2 this is equal to 1, and the curvature of a 2D metric is completely determined by

R1212 .
If n = 3 we have 6 components and if n = 4 we have 20 components.
The curvature is precisely the coordinate independent part of the second-order part of the
metric in inertial coordinates.
Another important general property of the curvature is the Bianchi identity
Theorem 6.6.8. For any metric, the curvature tensor satisfies
ra Rbcde + rb Rcade + rc Rabde = 0.
(6.6.25)
Proof. A very bad way to try to do this would be from the explicit formula for R in terms
of the s. Instead we use the definition
(rb rc
rc rb )d =
Apply ra to this to get

(ra rb rc
ra rc rb )d =
Rbcd s s .
ra Rbcd s s
(6.6.26)
Rbcd s ra s
(6.6.27)
Now skew symmetrise on abc, in other words, add this expression to what you get by cyclically
permuting the indices a, b and c. On the LHS we have
ra (rb rc
rc rb )d + cyclic perms = (ra rb
rb ra )rc d + cyclic perms
(6.6.28)
Rabd s rc s + cyclic perms
(6.6.29)
Using this rebracketing, the LHS of (6.6.28) is

Rabc s rs d
Rabd s rc s + cyclic perms =
by the symmetry (6.6.20) of R. The RHS of (6.6.27), skew symmetrized, is

ra Rbcd s s
Rbcd s ra s + cyclic perms.
(6.6.30)
The r terms from this cancel exactly with those on the RHS of (6.6.29), leaving us with
0 = (ra Rbcd s + rb Rcad s + rc Rabd s )s
As s was arbitrary, we obtain (6.6.25).
(6.6.31)
6.6.3. Alternative proof of Bianchi identity. (Cf. Woodhouse, GR, 5.7). If we choose
coordinates such that = 0 at x = 0, we have, from (6.5.2),
ra Rbcd e = @a @b
e
cd
@a @c
e
bd
ra Rbcd e = @a @b
e
cd
+ terms of the form @ ,
(6.6.32)
so
@a @c
e
bd
at x = 0.
(6.6.33)
Summing over the cylic permutations of a, b, c, the terms on the RHS cancel out, showing that
the Bianchi identity holds at x = 0. But the LHS is a tensor equation and the point is arbitrary,
so we have obtained the Bianchi identity.
6.7. RICCI AND SCALAR CURVATURE
79
6.7. Ricci and scalar curvature

Definition 6.7.1. The Ricci1 tensor, which I shall denote by r or rab is defined as a contraction of the Riemann tensor:
rac = Rabc b .
(6.7.1)
It is thus a tensor of type (0, 2).
The symmetries of R imply that
Rabc c = 0 and Rdbc d =
Also
rca = Rcba b = ( Racb b

so the Ricci is a symmetric 2-tensor.
rbc .
Rbac b ) = Rabc b = rac
Definition 6.7.2. The scalar curvature, which I shall denote by s, is defined by contracting
the Ricci,
s = g ab rab
(6.7.2)
The scalar2 curvature is, as the name implies, a scalar quantity.
Remark 6.7.3. Many texts denote the Ricci tensor by Rab and the scalar curvature by R
(and call it the Ricci scalar). You have been warned.
Theorem 6.7.4. The Riemann curvature, Ricci curvature and scalar curvature are related
by the following identities
ra rbd rb rad = re Rabde
(6.7.3)
and
1
rb rab = ra s.
(6.7.4)
2
Proof. Starting from the Bianchi identity (6.6.25), we multiply by g ce (and sum over b and
e). Taking into account the symmetries of R, we obtain (6.7.3). Similarly, from this equation
multiply by g bd (and sum). This yields (6.7.4).
Remark 6.7.5. The two-dimensional hyperbolic metric has the property that its Ricci curvature is proportional to the metric. Indeed,
R121 2 = 1/x2 so r11 = 1/x2
because R111 1 = 0. Similarly
R2121 = 1/x2 and so r22 = 1/x2 .
Also
r12 = R112 1 + R122 2 = 0

Thus rij = gij for this particular metric.
Definition 6.7.6. A metric such that rab = gab is called an Einstein metric.
Proposition 6.7.7. If gab is an Einstein metric, then the proportionality factor , which a
priori is an arbitrary function, must be a constant. The scalar curvature of an Einstein metric,
being /4 (in 4 dimensions) is also automatically constant.
Proof. From the Einstein equation,
s = g ab rab = g ab gab = 4 .
Hence rab = 4s gab . Now apply (6.7.4):
1
1
1
ra s = rb rab = rb (sgab ) = ra s
2
4
4
because r commutes with g. Hence ra s = 0 and s is constant.
1Named after Gregorio Ricci-Curbastro, 18531925
2Not named after a mathematician called scalar
(6.7.5)
80
6.8. Relative acceleration and geodesic deviation

[Cf. Woodhouse, GR, 5.8.]
What is the meaning of curvature? We have already said something about it: non-zero
curvature at a point will manifest itself in unkillable second-order terms in the Taylor expansion
of the metric at that point, no matter what coordinates are used. And we have seen that
curvature is a tensor, so if it is zero in one coordinate system it is zero in any coordinate system
(and conversely if it is non-zero) and that the curvature of euclidean metric and the Minkowski
metric are both zero.
In this section we shall relate curvature to the subtle concept of relative acceleration of
nearby freely falling particles. We take the time to set this up carefully.
What is a good way to think of a family of curves (which will later be geodesics)? It is
a map H from the rectangle (0 , 1 ) ( , ) into M , the idea being that for fixed in the
interval ( , ), 7! H(, ) is a curve in M , and labels the dierent curves. Such a map H
is called a 1-parameter family of curves in M .
Example 6.8.1. We can view plane polar coordinates as giving us a 1-parameter family of
curves in R2 . In fact it gives us two such families. We have
H(, ) = ( cos , sin ).
For each fixed , this is the straight line through the origin, inclined at angle to the x-axis.
So dierent values of give us dierent curves in our family. Switching the roles of and ,
we get
e
H(,
) = ( cos , sin ).
e
This time, fixing , the curve 7! H(, ) is a circle of radius . Varying gives curves of
dierent radii, all centred at the origin.
We now assume that H gives a 1-parameter family of geodesics, that is
7! H(, ) is a geodesic
(6.8.1)
for each fixed . To be definite, imagine that the curve H(, 0) is Alices worldline, and we may
as well suppose that is is parameterized by proper time. Then her velocity 4-vector is
@H
(, 0)
@
We suppose that Bobs worldline is the nearby geodesic 7! H(, ), where is very small. To
first order, then, Bobs worldline is
@H
7! H(, 0) +
(, 0) .
@
The vector Y = @H/@ is called the connecting vector as it connects events on Alices worldline
to events on Bobs worldline.
Definition 6.8.2. The relative acceleration of the family of geodesics, as measured by Alice,
is the vector field r2X Y along her worldline 7! H(, 0), where X and Y are as above.
Theorem 6.8.3. We have
r2X Y = R(X, Y )X.
(6.8.2)
Remark 6.8.4. Equation (6.8.2) is called the geodesic deviation equation.

Proof. We note first that with X and Y as above,
rX Y = rY X.
(6.8.3)
From the proof of Lemma 6.2.2, this can be computed in local coordinates as
Y a @a X b
X a @a Y b .
(6.8.4)
Now if we make H correspond to xa = H a (, ), then

Y a @a X b =
@2H b
@ @
(6.8.5)
6.9. COMPARISON WITH THE NEWTONIAN THEORY
81
and
@2H b
.
(6.8.6)
@ @
Hence (6.8.3) follows from the symmetry of the mixed partials of H.
Let X = @H/@ be the tangent vector field of the geodesic for all fixed values of . Then
X a @a Y b =
rX X = 0
and so
(6.8.7)
rY rX X = 0.
(6.8.8)
(This would not be true if the neighbouring curves were not geodesics). On the other hand, by
Definition 6.5.6,
(rX rY rY rX )Z = R(X, Y )Z
(6.8.9)
for any vector Z, since [X, Y ] = 0. Putting Z = X,
rX rY X = R(X, Y )X.
Using (6.8.3) once more, we get the result.
(6.8.10)
6.9. Comparison with the newtonian theory

In Newtons theory, there is a potential function ' such that
r2 ' = 4, (Poissons equation)
(6.9.1)
in units where the gravitational constant G is 1, where is the mass density. The equation of
motion of a particle in the gravitational field due to the mass density is
=
x
r'.
(6.9.2)
This is an absolute statement. For comparision with relativity we need the corresponding
statement about relative acceleration. Thus we consider a set-up similar to that of the previous
section, where we have a 1-parameter family x(t, s), each of which satisfies (6.9.2) for fixed s:
@2
x(t, s) = r'.
@t2
The relative acceleration is obtained by dierentiating with respect to s:
@2 @
x(t, s) =
@t2 @s
@
r'.
@s
(6.9.3)
(6.9.4)
If @/@s = y j @j , then we get
@ 2yj
= y i @i @ j '.
(6.9.5)
@t2
This is the formula for relative acceleration in Newtons theory.
In order to match it up with the GR version, suppose that xa are coordinates which are
inertial at a point xa = 0 and that, as in the previous section, we have a 1-parameter family
H(, ) of timelike geodesics. We assume that H(0, 0) = 0 so that at proper time = 0, Alice is
at the event x = 0. We may assume further that her 4-velocity vector at that event is standard,
@H a
(0, 0) = (1, 0).
@
With this choice, x0 should be equated (approximately) with the newtonian time-variable t. If
we assume that Y (0, 0) is orthogonal to X, then
Y (0, 0) = (0, y).
(6.9.6)
This choice corresponds physically to Alice choosing to connect the event = 0 on her worldine
with the event on Bobs worldline that she judges also to happen at = 0. Then the geodesic
equation (6.8.2)
r2X Y = R(X, Y )X or r2X Y d = Rabc d X a Y b X c
(6.9.7)
) at = = 0, while for the RHS
translates as follows: the LHS should be (0, y
Rabc d X a X c = R0c0 d .
(6.9.8)
82
Now by the symmetries of R, if we put c = 0 or d = 0 in this we get 0, in other words we can

write the RHS, which is a 4 4 matrix, in block form where
0
0
d
R0c0 =
.
0 R0i0 j
where the 3 3 matrix R0i0 j is symmetric in i and j (after lowering an index). Hence (6.9.7)
reduces to the 3-dimensional equation
yj = R0i0 j y i .
(6.9.9)
Comparison with (6.9.5) suggests that

The components R0i0 j of the Riemann curvature tensor should be equated, to leading
order, with @i @ j ', where ' is the newtonian potential, as observed by an observer with
4-velocity vector with components (1, 0) in inertial coordinates.
Thus we are led to:
Hypothesis 6.9.1. If an observer in GR has velocity 4-vector V , and the gravitational field
is weak and slowly varying, then she reckons that
Rabc d V a V c
should be identified with
@i @ j ', where ' is the gravitational potential she observes.
This leads to a suggestion for the GR analogue of Poissons equation. Actually Ill discuss
this only in the case that there is no matter, i.e. r2 ' = 0. This translates into
R0i0 i = 0 summation over i from 1 to 3.
Since R000 0 = 0 by symmetry, this is the same as

r00 = 0
(6.9.10)
by definition of the Ricci tensor. Hence we are led to Einsteins vacuum equations:
Hypothesis 6.9.2. In empty space, the space-time metric g is such that
rab = 0.
(6.9.11)
Remark 6.9.3. In going from rac X a X c = 0 for all timelike X to rac = 0 there is something
to prove. We can argue that if rac X a X c = 0 for all timelike X, then dierentiation with respect
to X yields rac = 0.
Remark 6.9.4. Tidal forces The relative acceleration (in either GR or newtonian gravity)
often goes under the heading of tidal forces: for example, if you have the misfortune to be freely
falling feet-first, towards a black hole, then the attractive force on your feet will be stronger
than on your head and this will translate into an eventually unbearable stretching eect. The
bulges of water on either side of the earth due to the non-uniform gravitational field of the moon
is the more classical example of tidal forces.
6.10. Weak field limit
We can get another angle on the interplay between full GR and newtonian gravity by
considering the so-called weak field limit. This is the study of a lorentzian metric g = + h
on R4 , where is the Minkowski metric and h is a small, slowly varying perturbation. We
neglect terms quadratic and higher in h and linear in @0 h. In this section we shall compute the
= r'. All this
curvature and the geodesics in this approximation and match the up with x
is a very good exercise in understanding the material presented in this chapter.
6.11. PHYSICAL DIFFERENTIAL EQUATIONS
83
Lemma 6.10.1. If
gab = ab + hab
then
1
= cs (@a hbs + @b has
2
Knowing the s we can compute the slow geodesics:
g ab = ab
hab + O(h2 ) and
c
ab
@s hab )
Lemma 6.10.2. If xa ( ) is a slow geodesic, we may assume = x0 so that the velocity vector
is
x a = (1, x)
(6.10.1)
Then geodesic equations are

1
1 cs
x
c + cs (@0 h0s @s h00 ) = 0, i.e. x
c
@s h00 = 0,
2
2
neglecting @0 -derivatives of h. Thus within our approximation
1
x
0 = 0, x
j = js @s h00 .
2
This already suggests, since x
j = r', that we should identify
1
' ' h00 up to an additive constant.
2
NB this is in a particular inertial frame of reference.
(6.10.2)
(6.10.3)
(6.10.4)
Lemma 6.10.3. In our approximation
1
Rabc d =
@a @c hdb + @b @ d hac @b @c hda @a @ d hbc
(6.10.5)
2
In particular,
1
rac =
@a @c hbb + @b @ b hac @b @c hba @a @ b hbc .
(6.10.6)
2
If we consider the 00 component of this and neglect the @0 hab terms, then
1
1 2
r h00 .
(6.10.7)
r00 = @b @ b h00 =
2
2
This leads, once again, to the proposal that the newtonian empty space postulate (r2 ' = 0
should translate into rab = 0 (since weve already seen that h00 /2 should be identified (up to a
constant) with the newtonian potential.
6.11. Physical dierential equations
In SR, the electromagnetic field is described by a skew 2-tensor F ab . In an inertial frame,
the components are
2
3
0
E1
E2
E3
6 E1
0
B3 B2 7
7
F ab = 6
(6.11.1)
4 E2 B 3
0
B1 5
E3
B2 B1
0
It can be verified that Maxwells equations in vacuum are equivalent to the system
@a F ab = 0, @a Fbc + @b Fca + @c Fab = 0.
(6.11.2)
There is a natural generalization of this to a general space time (M , g). We assume that the
electromagnetic field is given by a skew tensor of type (2, 0), F , with components F ab , on M ,
satisfying the equations
ra F ab = 0, ra Fbc + rb Fca + rc Fab = 0.
(6.11.3)
@a
Similarly, a natural replacement for the wave operator @a

in Minkowski space is the
operator 2 = ra ra on (M , g). A function is said to satisfy the wave equation on M if 2u = 0.
In local coordinates,
2u = ra ra u = ra @a u = g ab ra @b u = g ab @a @b u
g ab
c
ab @c u
(6.11.4)
84
In Minkowski space, recall that Maxwells equations imply that each of the components Ei ,
Bi , of the electric and magnetic fields satisfies the flat-space wave equation, 2Ei = 0 = BoxBi .
In general M , we have:
Proposition 6.11.1. If Fab = Fab satisfies Maxwells equations (6.11.3) in (M , g), then
Fab satisfies a modified wave equation:
ra ra Fbc = Rbcas F sa
Proof. See the Problem set.
rba Fca
rca Fab
(6.11.5)
CHAPTER 7
The Schwarzschild metric and black holes

In this Chapter we shall give a first example of a non-trivial solution of Einsteins vacuum
equations, rab = 0. This is the Schwarzschild metric. It is spherically symmetric and is the GR
version of the newtonian potential m/r due to a mass m at the origin.
We shall study the timelike and null geodesics in the Schwarzschild metric. To a first
approximation, there are timelike geodesics which correspond to closed elliptical orbits, as in
Newtons theory. When GR corrections are taken into account, it will be seen that the orbits
do not close up: this is the famous precession of the perihelion1 which is measurable in the case
of the orbit of Mercury. This was one of the first observational verifications of GR.
Null geodesics represent the paths taken by light-rays, and we shall see that there is a
bending eect. This eect was observed by Eddington during the solar eclipse of 1919 (and is
responsible for gravitational lensing). This eect provides another observational verification for
GR.
The Minkowski metric is defined initially in a set r > 2m in units with G = c = 1. However,
it turns out that this is a defect of the coordinates rather than of the metric itself. After a
change of coordinates the metric continues through this surface. This surface is now intepreted
as the event horizon of a black hole.
7.1. Spherically symmetric, static metrics
A function on R3 is spherically symmetric if it is a function only of r, the distance from
the origin. It is natural to study such things using spherical polars. We have seen that the flat
metric in R3 in spherical polars is
ds2 = dr2 + r2 (d2 + sin2 d'2 ),
(7.1.1)
where is the colatitude (i.e. latitude, but measured from the north pole rather than the
equator) and ' is longitude.
The Minkowski metric in these coordinates is
ds2 = dt2
dr2
r2 (d2 + sin2 d'2 ).
(7.1.2)
Let us write
d! 2 = d2 + sin2 d'2
(7.1.3)
which is the round metric on the unit sphere x2 + y 2 + z 2 = 1 in R3 . This will save writing later.
A spherically symmetric, static2 metric is obtained from this by introducing functions of r
as coefficients
ds2 = A(r)dt2
B(r)dr2
C(r)r2 d! 2 .
(7.1.4)
where we require A > 0, B > 0 and C > 0 in the region of interest. The dependence of these
functions only on r encodes the spherical symmetry of the metric and also its time-invariance.
One can, of course, consider more general metric forms, but that is beyond the scope of this
course.
1closest point to the sun
2i.e. with coefficients independent of t
85
86
7. THE SCHWARZSCHILD METRIC AND BLACK HOLES
7.2. Schwarzschild
3
The Schwarzschild metric is an idealization of Einsteins vacuum equations which describes

the field due to a point mass or perhaps outside a star. It is the general-relativistic analogue of
the newtonian potential m/r which is the potential describing the gravitational field due to a
point-mass m in Newtons theory of gravitation.
We shall not derive this in full. We start with the form (7.1.4). This should approach the
Minkowski metric when r is large, so we assume
A(r) 1, B(r) 1, C(r) 1 as r ! 1.
(7.2.1)
Proposition 7.2.1. By a change of r variables, = f (r), (7.1.4) can be made to take the
form
ds2 = ()dt2
()d2 2 d 2 .
(7.2.2)
p
Proof. If we define = C(r)r, then we shall get the coefficient of d 2 correct. Since
C > 0 this is certainly invertible for large enough r, so we define
() = A(r()).
For the dr term,
2
B(r)dr = B(r)
so
() = B(r())
This completes the proof.
dr
d
dr
d
d2
We use this proposition, then rechristen as r. So we may as well look at metrics in the
slightly simpler form
A(r)dt2 B(r)dr2 r2 d! 2
(7.2.3)
We saw in the previous chapter that in the weak field limit, the component g00 should be
matched with twice the newtonian potential computed by an observer with 4-vector (1, 0), up
to an additive constant. So the simplest possible guess for A(r) is 1 2m/r: the value 1 comes
from the required asymptotic form of the metric.
It turns out that there is a choice of B which then gives a metric which satisfies Einsteins
equations where the metric is defined:
Theorem 7.2.2. The Schwarzschild metric
ds2 = (1
2m/r)dt2
(1
2m/r)
dr2
r2 d
, (r > 2m)
satisfies rab = 0.
We shall not prove this in full. You can see most of the details in Woodhouse.
We shall, however, record the geodesic equations and the s for this metric
Proposition 7.2.3. The geodesic equations for the metric (7.1.4) are
d
A0
At = 0
t + tr = 0 or
A
d
A0 2 B 0 2
r 2
r
r +
t +
r
sin2 ' 2 = 0
2B
2B
B
B
+ (2/r)r sin cos ' 2 = 0
2
d 2 2
' + r ' + 2 cot ' = 0 or
r sin ' = 0.
r
d
3Named in honour of Karl Schwarzschild, 18731916
(7.2.4)
7.2. SCHWARZSCHILD
87
Proof. We know the drill by now...

The Lagrangian for the geodesics is
1 2
L=
At
B r 2 r2 (2 + sin2 ' 2 ) .
(7.2.5)
2
Hence
@L
@L
= At
=0
@t
@ t
@L
@L
1 0 2
= B r
=
At
B 0 r 2 2r(2 + sin2 ' 2 ) .
@ r
@r
2
@L
@L
2
= r
= r2 sin cos ' 2
@
@
@L
@L
= r2 sin2 '
=0
@ '
@'
Here dot is dierentiation with respect to , prime is dierentiation with respect to r. Next,
d
= A(t + (A0 /A)r t)
(At)
du
d
( B r)
=
B(
r + (B 0 /B)r 2 )
du
d
=
( r2 )
r2 ( + (2/r)r )
du
d
( r2 sin2 ')
=
r2 sin2 (' + (2/r)r ' + 2 cot ')
du
and combining these with the previous calculations we get the equations of the Proposition.
Proposition 7.2.4. The non-zero Christoel symbols for the metric (7.1.4) are as follows:
0
01
0
10
1
00
A0
2B
A0
;
2A
1
11
B0
2B
1
22
r
B
1
33
1
2
sin cos
33 =
r
1
3
3
3
3
13 = 31 =
23 = 32 = cot .
r
Proof. These are read o from the geodesic equations in the usual way.
2
12
2
21
r sin2
B
7.2.1. Curvature computations. Recall:

Rabc d = @a
d
bc
@b
d
ac
d p
ap bc
d p
bp ac
and
rac = Rabc b = Ra0c 0 + Ra1c 1 + Ra2c 2 + Ra3c 3 .
To give a flavour of these calculations, let us compute Ra0c 0 . We shall show:
R101 0 =
A00
2A
(A0 )2
4A2
A0 B 0
4AB
(7.2.6)
(7.2.7)
and
We shall also compute R121
Ra0c 0 = 0 for all other ac.

and R131 3 and obtain
r11 =
by putting a = c = 1 in (7.2.6).
We have
Ra0c 0 = @a
A00
2A
0
0c
(A0 )2
4A2
@0
0
ac
A0 B 0
4AB
0 s
as 0c
(7.2.8)
B0
rB
0 s
0s ac .
(7.2.9)
88
We may as well assume a 6 c because this is symmetric in ac.

Now @0 = @t and none of the depends upon t. Also,
0
01
0
10
= A0 /2A,
0
ab
= 0 all other ab.
So
Ra0c 0 = @a
0
0c
0 s
as 0c
A0
2A
0
ac .
0
0c
0 s
1s 0c
A0
2A
1
1c .
We know R00c d = 0, so start with a = 1

R10c 0 = @1
If c = 2, 3 every term is zero, because the only non-vanishing

1
11
1
ab
is
B0
.
2B
So
R101 0 = @1 001 + 01s
0 0
A
=
+(
2A
A00
(A0 )2
=
2A
4A2
A0 1
2A 11
A0 B 0
0 2
)
10
4AB
0
0
AB
4AB
s
01
and
Ra0c 0 = 0 for all other ac.
We have
R121 2 = @1
2
21
@2
2
11
2 s
1s 21
2 s
2s 11
Looking at the non-vanishing s, this comes down to

1
+ 212
r2
1
1
+ 2
2
r
r
B0
.
2Br
R121 2 =
=
=
2
21
2 1
21 11
B0
2Br
Similarly
R131 3 = @1
=
=
because 313 = 1/r and
Hence
1
11
3
31
1
+
r2
B0
2rB
@3
3
11
3 3
13 13
3 s
1s 13
3 s
3s 11
3 1
31 11
= B 0 /2B.
r11 = R101 0 + R111 1 + R121 2 + R131 3
= R101 0 + R121 2 + R131 3
A00
(A0 )2 A0 B 0
B0
=
.
2A
4A2
4AB rB
Similar computations yield
7.3. PHYSICAL CONSEQUENCES
89
Proposition 7.2.5. The non-vanishing components of the Ricci tensor of the spherically
symmetric metric (7.1.4) are
A00
A0 B 0
A02
A0
+
+
,
2B
4B 2
4AB rB
A00
(A0 )2 A0 B 0
B0
=
,
2A
4A2
4AB rB
0
0
rA
rB
1
=
+
1
2
2AB 2B
B
= sin2 r22 .
r00 =
r11
r22
r33
Now we can verify that the Schwarzschild metric of Theorem 7.2.2 does indeed satisfy the
Einstein vacuum equations rab = 0.
Eliminating the A00 terms between r00 and r11 gives AB 0 + BA0 = 0 so AB is constant. And
this should be 1 by the boundary condition.
Inserting B = 1/A, B 0 = A0 /A2 ,
1
1
r00 =
AA00
AA0
2
r
1
r11 =
r00
A2
r22 = rA0 + A 1
r33 = sin2 (rA0 + A 1).
Solving r22 = 0 gives A = 1 2m/r, where m is a constant and one checks that this also solves
the r00 = 0 equation. Hence we arrive at the Schwarzschild metric
2m
2m 1 2
2
2
ds = 1
dt
1
dr
r2 (d2 + sin2 d'2 ).
r
r
where m > 0 and, for the moment anyway, r > 2m.
Woodhouse, GR, Sect. 7.17.2
7.3. Physical consequences
7.3.1. Gravitational time dilatation: heavy clocks run slowly. Suppose that Alice
and Bob have positions r = rA and r = rB in the Schwarzschild space-time (angular positions
also fixed) with both rA and rB > 2m. How do they compare the rates at which their ideal
clocks run?
Imagine two ticks of Alices ideal clock, separated by a small proper time interval A . To
compare, we assume that Alices clock emits photons at each of the two ticks. Bob receives
these photons and in particular can record the elapsed time between receiving the first and
second photon. This gives him a time interval B , and the ratio A / B is the amount by
which Alices clock appears to run slowly as compared with Bobs.
Lets do it. Suppose first that tA is the dierence in t-coordinate between the two ticks of
Alices clock and suppose that tB is the dierence in t-coordinates of when the two photons are
received by Bob. (NB the Schwarzschild time-coordinate is NOT proper time for either Alice or
Bob!) Then it is pretty clear that tA = tB because the metric coefficients are all independent
of t. We shall make this computation explicitly below, just to be sure. Thus we need to see how
tA is related to A and similarly for tB and B .
Now Alices world line has the simple form A 7! (U A , rA , A , 'A ), where rA , A and 'A
are constants, and A is a proper time parameter if the associated velocity 4-vector
@
U
(7.3.1)
@t
is of unit length, i.e. g(U @t , U @t ) = 1. This entails
dt
(1 2m/rA )U 2 = 1, so U =
= (1 2m/rA ) 1/2 .
(7.3.2)
dA
90
The corresponding equation holds with A replaced by B, so
dB dA 1
dB
B =
tB =
A
dt
dt
dt
Hence
B
=
A
1
1
2m/rB
2m/rA
(7.3.3)
(7.3.4)
Thus if Alice is nearer to r = 2m then this factor is greater than one, and so Bob will record a
longer elapsed time between two ticks of Alices clocks, this becoming (in principle) infinite as
rA approaches 2m4 .
Remark 7.3.1. Note that observers with constant (r, , ') coordinates in Schwarzschild are
not freely falling.
It is interesting to compute the trajectory of a photon sent by Alice at (tA , rA , 0 , '0 ) to
Bob at (tB , rB , 0 , '0 ). This is a radial5 null geodesic for the Schwarzschild metric.
t
r = rB
r = 2m
r = rA
r
Figure 1. Diagam showing trajectories of photons between Alice and Bob in

the Schwarzschild metric. The curved blue trajectories are radial null geodesics,
the three solid vertical lines are the event horizon r = 2m and the worldlines
r = rA and r = rB of Alice and Bob.
Such a null geodesic has a parameterization

7 ! (t( ), r( ), 0 , '0 )
where 0 and '0 are constants. Being null means
2

2m
dt
2m 1 dr 2
1
1
=0
r
d
r
d
so
dr
= (1 2m/r)
dt
(plus sign if photon is travelling outwards as t increases).
Hence
r dr
dt =
r 2m
4We shall later identify the surface r = 2m with the event horizon of a black hole
5i.e. and ' are constant along the geodesic
(7.3.5)
(7.3.6)
(7.3.7)
and
tB
tA = r B
rA + 2m log
rB
rA
91
2m
2m
(7.3.8)
for a photon travelling from

(tA , rA , 0 , '0 ) to (tB , rB , 0 , '0 ).
(7.3.9)
Hence tA = tB as previously argued.

7.3.2. Geodesics in Schwarzschild. The Lagrangian L for geodesics in Schwarzschild is
1
L=
(1 2m/r)t2 (1 2m/r) 1 r 2 r2 (2 + sin2 ' 2 .
2
We take the parameter to be proper time and to mean dierentiation wrt to .
We have conserved quantities
J = r2 sin '
E = (1 2m/r)t,
(7.3.10)
and
L = 1/2 for timelike, L = 0 for null geodesics.
(7.3.11)
Remark 7.3.2. The conserved quantity E is the total energy of our particle, assumed to
have unit rest-mass. (Not the gravitating one, the one thats orbiting.) If the rest-mass of the
orbiting particle is , we claim that
E = (1 2m/r)t
(7.3.12)
is the total energy. Remember that total energy is a relative concept in relativity, so this
statement needs careful interpretation. Suppose Alice is an observer sitting at constant (r, , ')
in Minkowski space. She measures the energy of the orbiting particle as it passes her, i.e. when
its spatial coordinates are (r, , '). Let Alices 4-velocity vector be U , and that of the orbiting
particle V . Then
')
r,
U = (U a ) = (1 2m/r) 1/2 (1, 0, 0, 0) and V = (V a ) = (t,
,
where the dot denotes dierentiation with respect to the particles proper time parameter. The
instantaneous speed v of the particle as measured by Alice as it passes satisfies
(v) = g(U, V )
just as in special relativity. In this case,
g(U, V ) = (1
2m/r)1/2 t.
Hence
E = (1
2m/r)t = (1
2m/r)1/2 g(U, V ) = (1
2m/r)1/2 (1
v2)
1/2
' + v 2 /2
m/r
(7.3.13)
if m/r is small and so is v. Recall that G = 1, c = 1 here; if we restore units, then this becomes
1
Gm
E ' c2 + v 2
(7.3.14)
2
r
The terms here are the rest-energy of the particle, its kinetic energy and its gravitational
potential energy. So this approximation is in perfect agreement with newtonian gravity and
special relativity.
Proposition 7.3.3. Equatorial6 timelike geodesics in Schwarzschild are given by the equations
2
dr
= E 2 (1 2m/r) (radial geodesics)
(7.3.15)
d
and by
d2 u
m
+ u 3mu2 = 2 (non-radial geodesics),
(7.3.16)
d'2
J
6i.e. with = /2
92
where u = 1/r and the angular momentum J = r2 ' 6= 0 is a constant. The equation (7.3.16)
has the first integral
2
du
E 2 1 2m
+ u2 2mu3 =
+ 2 u,
(7.3.17)
d'
J2
J
Similarly,
Proposition 7.3.4. Radial null geodesics in Schwarzschild are given by (7.3.57.3.9). Nonradial, equatorial null geodesics in Schwarzschild satisfy
d2 u
+u
d'2
which has the first integral
du
d'
3mu2 = 0
+ u2
2mu3 =
(7.3.18)
E2
.
J2
(7.3.19)
Proof. We have seen that for Schwarzschild geodesics, with = /2 we have the conserved
quantities
J = r2 '
E = (1 2m/r)t,
(7.3.20)
and the further conservation equation
(1
2m/r)t2
(1
1 2
2m/r)
r2 ' 2 = 2L.
(7.3.21)
where L = 1/2 for timelike and L = 0 for null geodesics. In the radial case, ' = 0 and for
timelike geodesics, (7.3.21) gives
dr 2
=1
d
E2
2m
,
r
(7.3.22)
from which (7.3.15) follows at once.

In the non-radial case, we follow the same moves that led to newtonian orbits (see problem
set 1). We set u = 1/r. Then
(1
2mu)
E2
(1
2mu)
1 2
r2 ' 2 = 2L
(7.3.23)
Divide through by J 2 = r4 ' 2 :

(1
2mu)
2
1E
J2
(1
2mu)
1 4
dr
d'
u2 =
2L
J2
(7.3.24)
Now
dr
=
d'
1 du
u2 d'
(7.3.25)
and substituting this in to (7.3.24) gives

1
1
E2
2mu J 2
1
2mu
du
d'
2L
J2
(7.3.26)
4mL
u
J2
(7.3.27)
u2 =
Rearranging this yields
du
d'
+ u2
2mu3 =
E2
2L
J2
The results in the Propositions follow from this by setting L = 1/2 for timelike and L = 0
for null. The second-order equation follows by dierentiation with respect to ', and cancelling
u' .
93
Remark 7.3.5. For newtonian gravity, the equation for orbits (see homework problem 1.10)
are
with first integral
d2 u
m
+u= 2
2
d'
J
(7.3.28)
du 2
A
2m
+ u2 = 2 + 2 u
(7.3.29)
d'
J
J
Thus the GR correction to this equation is the 2mu3 term on the LHS of (7.3.27).
Remember that u = 1/r so large r corresponds to small u and the eect of the cubic
correction term is stronger for small radii. Remember also that the Schwarzschild metric appears
only to be OK for r > 2m which corresponds to 0 < u < 1/2m.
7.3.3. Circular timelike orbits and the precession of perihelion. You can find an
extensive analysis of the timelike geodesics in Schwarzschild in Woodhouses book, Chapter 8.
We shall just look at circular orbits and small perturbations of them. This already leads to the
precession of perihelion, which I may have mentioned was one of the first verifications of GR.
Consider a circular timelike orbit in Schwarzschild. For such an orbit, evidently u'' = 0,
u' = 0. Setting u'' = 0 in (7.3.16) gives the equation
3mu2
so solving the quadratic,
u + m/J 2 = 0
(7.3.30)
p
1
12m2 /J 2
(7.3.31)
6m
Thus we have circular orbits if J 2 > 12m2 . For m small, the larger value of u is approximately
1/3m and is just less than this value. The smaller value of u is
p
1
1 12m2 /J 2
m
u = u0 =
' 2
(7.3.32)
6m
J
if m/J 2 is small, and this is the newtonian value of the radius of a circular orbit for given m
and J (cf. (7.3.28))
u=
We consider a small perturbation of this orbit, u(r) = u0 + v(') where v is supposed to be

very small. Inserting into the equation of motion (7.3.16) gives
d2 v
+ v 6mu0 v = O(v 2 )
d'2
Neglecting p
the quadratic term, v satisfies simple harmonic motion as a function of ', but with
period 2/ 1 6mu0 . Thus the solution has the shape
p
u(') = u0 + " cos( 1 6mu0 ' + '0 )
(7.3.33)
The perihelion is the point on an orbit closest to the sun. This is the largest value of u (remembering the reciprocal relationship u = 1/r). To maximize u, and taking '0 = 0 for simplicity,
we need cos = 1 and so the perihelia occur at
2
4
' = 0, p
,p
,....
1 6mu0
1 6mu0
The gap between these angles is > 2, which means that the perihelion advances in successive
orbits: see Figure 2.
If we restore the units, within this approximation we get a perihelion advance of
6Gm
r0 c 2
which for Mercury comes out to be approximately 4000 per century. In fact, Mercurys perihelion
advances even without GR, the perturbation being due to the gravitational interaction with
the other planets. However, the observed value was dierent from all calculations based on
newtonian gravity. The correction due to GR accounts precisely for this anomaly.
94
Fourth perihelion
Third perihelion
Second perihelion
First perihelion
Figure 2.
Plot of r = 1/(0.1 + 0.03 cos(0.95')), showing the precession of the perihelion. The succesive perihelia are shown and occur at ' =
0, 2/0.95, 4/0.95, 6/0.95, . . .. The red arc is the part of the orbit from ' = 0
to ' = 2/.95; the blue arc is the part of the orbit from ' = 2/0.95 to 4/0.95;
the grey arc is the part of the orbit from ' = 4/0.95 to 6/0.95
7.3.4. Photon trajectories: gravitational bending of light. From Proposition 7.3.4,
non-radial photon trajectories satisfy:
d2 u
+u
d'2
3mu2 = 0.
(7.3.34)
We note first, that circular orbits exist if u 3mu2 = 0, i.e. u = 1/3m. The existence of
circular photon orbits shows clearly that light is aected by gravity in Einsteins theory.
1
These orbits are unstable. Indeed, trying u = 3m
+ v. Then
d2 v
= v + O(v 2 )
d'2
and v e' , so these perturbed solutions tend to grow exponentially.
Let us consider instead the trajectory of a photon which comes in from infinity (in a straight
line with respect to the asymptotic coordinate system) and passes near to our gravitating object.
Recall that in polar coordinates, straight lines not through the origin are given by equations
of the form
r cos(' '0 ) = C.
(7.3.35)
Indeed, remembering x = r cos ', y = r sin ', (7.3.35) is equivalent to
r cos ' cos '0
r sin ' sin '0 = C i.e. x cos '0
y sin '0 = C.
(7.3.36)
Thus this straight line is inclined at an angle '0 to the y-axis. In terms of the reciprocal
coordinate u = 1/r, (7.3.35) takes the form
u = cos('
'0 )
(7.3.37)
Let us seek an approximate solution of (7.3.34) which corresponds to a photon approaching

the sun from infinity along a line parallel to the y-axis. Thus we seek a solution
u(') = cos ' + v(')
(7.3.38)
95
Figure 3. Bending of light by a star: plots of u = cos ' + 2 (1 + sin ')2 for
= 0.08, 0.06, 0.05
where v(') is small and
v( /2) = 0
(7.3.39)
so that as ' ! /2, u(') ! 0 and r ! 1. (This corresponds to being parallel to being
asymptotically parallel to the y-axis and with y 1.
This is now an exercise in dierential equations. It is better to work with the first integral
(7.3.19)
2
du
E2
+ u2 2mu3 = 2 .
d'
J
We have to follow our noses and compute:
u0 =
sin ' + v 0 .
Substituting this into (7.3.19) assuming that v is of order
(7.3.40)
2 ,
so that
( cos ' + v('))3 = 3 cos3 ' + O(4 )
(7.3.41)
we obtain, after some algebra,

2
2 sin ' v 0 + 2 cos ' v
2m3 cos3 ' = E 2 /J 2 + O(4 ).
(7.3.42)
Hence
2 = E 2 /J 2
(7.3.43)
and
v 0 sin ' v cos ' = m2 cos3 '
(7.3.44)
2
We use the integrating factor method to solve this. The integrating factor is 1/ sin ', so
3
d
v
cos '
2 cos '
2
= m
= m
cos '
(7.3.45)
d' sin '
sin2 '
sin2 '
Hence
v(') = C sin ' + m2 (1 + sin2 ').
Applying the boundary condition v( /2) = 0, we get C = 2m2 and
v(') = m2 (1 + sin ')2 .
96
We conclude that
u(') = cos ' + m2 (1 + sin ')2 .
(7.3.46)
is an approximate null geodesic in Schwarzschild for small.
We calculate the angle of deflection by looking for the values of ' for which u = 0. We
already know one of them: ' = /2. We expect the other one to be approximately ' = /2
(since this would correspond to zero deflection). So try to solve u(/2
) = 0, assuming is
small. Substituting this into (7.3.46),
) = 0 , sin + m2 (1 + cos )2 = 0
u(/2
and putting sin
'
' 1,
cos
(7.3.47)
' 4m.
(7.3.48)
So the asymptotic direction of the light-ray is approximately /2 + 4m, showing that the
photon has been deflected by an angle 4m due to the gravitational pull of the star.
Again, putting in the units, the deflection is approximately 4mG/Dc2 , where D is the
impact parameter (the smallest value of r on the trajectory).
This was observed by Eddington during the 1919 total eclipse of the sun.
7.4. Extensions of Schwarzschild: introduction to black holes
We are going to consider the significance of the set r = 2m in Schwarzschild. This is a 3dimensional surface inside the 4-dimensional space-time. If we fix t we have a two-dimensional
sphere of radius 2m, so the surface as a whole looks like a cylinder of some kind.
We do have to worry about r = 2m, because a particle in free fall along a radial timelike
geodesic will reach this set in finite proper time. Indeed, recall that for such a geodesic,
r
2m
dr
=
E2 1 +
(7.4.1)
d
r
for some constant E > 1. If = 0 when r = R, this gives
Z R
ds
p
(r) =
.
(7.4.2)
2
E
1 + 2m/s
r
So the particle reaches r = 2m at proper time (2m) < 1. This means that the problem of
understanding the metric in the vicinity of r = 2m is of real physical relevance.
7.4.1. Toy examples.
Example 7.4.1. The metric
dt2
+ 2t d2 , (0 < x < 1)
(7.4.3)
2t
is singular at t = 0. However, if we define
p
dt
dr = p , so r = 2t,
2t
the metric becomes
ds21 = dr2 + r2 d2 = dx2 + dy 2
(x = r cos , y = r sin ). The origin (x, y) = (0, 0) corresponds to the singularity t = 0 of
(7.4.3): despite appearances, the metric is perfectly good there.
ds21 =
Example 7.4.2. Consider the metric

ds22 = e2u du2 + e2v dv 2 ,
1 < u, v < 1.
There might appear to be nothing to say about this but if we put

x = eu , y = ev so
mapping uv plane to xy quadrant, then
ds22 = dx2 + dy 2 0 < x, y < 1
(7.4.4)
7.4. EXTENSIONS OF SCHWARZSCHILD: INTRODUCTION TO BLACK HOLES
97
and the metric extends (as the flat metric) to the whole xy plane.
y
(x, y)
u
(u, v)
Definition 7.4.3. In cases where a metric is ill-defined at a point, but after a change of
coordinates it becomes well defined, we say that we have a coordinate singularity.
A coordinate singularity is not a true geometric singularity: it just corresponds to looking
at the metric in a poor choice of coordinates.
We shall now see that the surface r = 2m is a coordinate singularity rather than a metric
singularity of Schwarzschild by exhibiting new coordinates in which the singularity disappears!
7.4.2. EddingtonFinkelstein coordinates. Define
r = r + 2m log |r
2m|
(7.4.5)
so
dr
2m
r
=1+
=
= 1
dr
r 2m
r 2m
The graph of r as a function of r is shown in the diagram:
r
2m
r
(7.4.6)
r = 2m
Define new variables

v = t + r , w = t
(7.4.7)
Proposition 7.4.4. Changing variables from (t, r) to (v, r), Schwarzschild becomes
ds2 = (1
2m/r)dv 2
dr dv
r2 d! 2 .
dv dr
Note that this is a non-degenerate Lorentzian metric (signature +

Proof. Since v = t +
) even when r = 2m.
r ,
dv = dt + dr = dt + (1
2m/r)
dr
2m/r)
dr)2
(7.4.8)
we have
(1
2m/r)dt2 = (1
2m/r)(dv
= (1
2m/r)dv
(1
drdv
dvdr + (1
(7.4.9)
2m/r)
dr
(7.4.10)
so
(1
2m/r)dt2
(1
2m/r)
dr2 = (1
2m/r)dv 2
The metric
(1
2m/r)dv 2
dr dv
dv dr
drdv
dvdr.
(7.4.11)
98
has matrix form
dv dr
2m
r
1
0
showing clearly that it is non-singular near r = 2m.
dv
dr
Let us discuss whats happened here. To avoid later confusion, we shall rechristen the r
coordinate here . Thus (suppressing and ') we have two coordinate systems: the original
(t, r) and the new (v, ) with
v = t + r + 2m log |r
2m|, = r.
(7.4.12)
By the laws for change of vector fields,

@
@v @
@ @
@
=
+
=
@t
@t @v
@t @
@v
(7.4.13)
and7
@
@v @
@ @
@
@
=
+
= (1 2m/r) 1
+
.
@r
@r @v @r @
@v @
The following picture may help to visualize whats going on:
(7.4.14)
=0
= 2m
Figure 4. The change to EddingtonFinkelstein coordinates. The hypersurface

= 2m is shown as are four pairs of future-pointing null vectors. One of the null
vectors is always pointing inwards (toward r = = 0). The other is pointing
outward for r > 2m, is tangent to r = 2m for a point on this hypersurface and
then points inward for r < 2m.
Future-pointing radial null vectors in the original coordinates are (positive multiples of)
@
@
(1 2m/r) .
@t
@r
Using (7.4.13), in the new coordinates these become
@
@
@
+ (1 2m/r)
.
@v
@v
@
(7.4.15)
(7.4.16)
Hence a pair of radial future-pointing null vectors in the new coordinates is:
2@v + (1
2m/r)@ , @ .
(7.4.17)
To summarize, a change of variables has enabled us to extend the Schwarzschild metric

through the hypersurface r = 2m. When so extended, the metric is defined for 0 < r < 2m, but
light emitted from any point in this region can never escape. For this reason r = 2m is called
the event horizon.
7this equation would be confusing if we had not renamed r as !
99
Remark 7.4.5. The radius r = 2m is called the Schwarzschild radius. For our sun this is
about 3 kilometres, well inside the sun itself. In particular the Schwarzschild metric is not valid
there, because matter is present in this region. The above discussion is only of significance if
matter is so highly concentrated that the region r = 2m is contained in a region of empty space.
The region r < 2m, to which we have now extended the Schwarzschild metric, is then called the
black hole region of the space-time.
7.4.3. What happens near the event horizon? Suppose Alice and Bob are near a
black hole described by the Schwarzschild metric. Alice is sitting at r = R and the unfortunate
Bob8 falls through r = 2m. How can we analyze this?
If Bob is freely falling, radially, so = ' = 0, the Lagrangian in EddingtonFinkelstein
coordinates is
1
L = ((1 2m/r)v 2 2v r).
(7.4.18)
2
Here L is independent of v and so
@L
= (1 2m/r)v r
(7.4.19)
@ v
is a constant, F , say. Also L = 1/2 along a timelike geodesic parameterized by proper time.
Suppose that = 0 when r = 2m. For small , these equations reduce to
r = F, v r = 1/2
(7.4.20)
So near the event horizon r = 2m, Bobs world line is

v ' /2F, r ' 2m
F , for small .
(7.4.21)
Thus Bob doesnt notice anything particularly strange in crossing the event horizon: though in
reality, for a typical sized black hole, the tidal forces (the dierence between the force felt on
your head and your feet) get a bit strong well before you encounter the event horizon.
What does Alice see of Bobs descent? To answer this question, suppose she is sitting at
r = R > 2m, and receives a photon from Bob at every tick of his clock. Then As world-line is
A 7! (V A , R), her velocity 4-vector is (V, 0) and so A is proper time if this has length2 equal
to 1 with respect to our metric. This gives V = (1 2m/R) 1/2 (cf. 7.3.1). So
v(A ) = (1
2m/R)
1/2
A , r(A ) = R along As worldline.
(7.4.22)
A photon emitted by Bobs clock at proper time B < 0 and heading out to Alice will satisfy
(1
with initial conditions
v(0) =
2m/r)v 2
2r v = 0
B
, r(0) = 2m
2F
F B .
(7.4.23)
(7.4.24)
Dividing by r v,
(7.4.23) gives
dv
2r
=
dr
r 2m
(7.4.25)
and so, integrating,

v(r) = 2(r + 2m log(r 2m)) + c
for some integration constant. Inserting the initial conditions, we get
r 2m
1
v(r) = 2[r 2m + 2m log
+ (F +
)B ].
F |B |
4F
So the photon emitted by Bob at his proper time B is received by Alice when r = R,
R 2m
1
v(R) = 2 R 2m + 2m log
+ (F +
)B .
F |B |
4F
and by (7.4.22) the corresponding value of As proper time is
r
2m
R 2m
1
A = 2 1
R 2m + 2m log
+ (F +
)B .
R
F |B |
4F
8Perhaps Bobs a robot
(7.4.26)
(7.4.27)
so
(7.4.28)
(7.4.29)
100
In particular, A ! +1 as B ! 0 (from below):

r
2m
1
A 4m 1
log
as B ! 0.
R
|B |
Alice sees Bob frozen at the point at which he crosses r = 2m: she sees his clock run more
and more slowly, and never sees it reach B = 0.
7.4.4. The full extension of Schwarzschild: Kruskal coordinates. To understand
the structure of the extended Schwarzschild space-time more fully, pass from (t, r) to (v, w),
where
v = t + r + 2m log(r
2m), w = t
2m log(r
2m).
(7.4.30)
(This is more symmetrical than changing to (v, r) coordinates as we did in the previous section.)
A simple calculation shows
1
(dvdw + dwdv) = dt2
2
(dr )2 = dt2
(1
2m/r)
dr2 ,
(7.4.31)
So the Schwarzschild metric, in these coordinates, has the form

1
2m/r) (dvdw + dwdv)
2
ds2 = (1
r2 d! 2 , 1 < v, w < 1.
(7.4.32)
where r is defined implicitly by v and w through

1
(v
2
w) = r + 2m log(r
2m), r > 2m
(7.4.33)
Note that as r goes from 2m to 1 so the RHS of (7.4.33) goes from 1 to 1 and so given any
value of (v w)/2, theres a unique value r > 2m which solves (7.4.33). This metric degenerates
at r = 2m, which corresponds v w ! 1. There is a remarkable trickanalogous to what
happened in Example 7.4.2 abovewhich allows us to extend this metric through r = 2m.
We set
v = 4m log v 0 , w =
4m log( w0 )
or
v 0 = exp(v/4m), w0 =
exp( w/4m).
This maps the whole (v, w) plane to the quadrant {v 0 > 0, w0 < 0} in the (v,0 w0 ) plane (compare
Example 7.4.2). Note that for r just a bit bigger than 2m, v has some finite value, w ! +1,
so v 0 has some positive value and w0 is just less than zero.
Then
dvdw =
v 0 w0 =
=
dv 0 dw0
v 0 w0 ,
exp((v w)/4m)
16m2
exp(r/2m + log(r
(7.4.34)
(7.4.35)
2m)) =
r/2m
(r
2m).
(7.4.36)
(Here we write dvdw for (dvdw + dwdv)/2 for brevity.)

Hence
Proposition 7.4.6. The exterior region r > 2m of the Schwarzschild metric corresponds to
the region v 0 > 0, w0 < 0 with the metric
g =
16m2
e
r
r/2m
dv 0 dw0
r2 d! 2
where r is defined implicitly as a function of (v 0 , w0 ) by

er/2m (r
2m) =
v 0 w0 .
(7.4.37)
101
Remark 7.4.7. This extension was obtained in 1960 by Kruskal. The crucial point is that
g is a well-defined lorentzian metric wherever r > 0. From (7.4.37)
r > 0 , er/2m (r
2m) >
v 0 w0 >
2m ,
2m.
Thus the metric is defined in the set v 0 w0 < 2m, but the coordinates (v 0 , w0 ) can have either
sign.
Remark 7.4.8. If you prefer something looking more obviously Lorentzian, define
t0 = v 0 + w 0 , x 0 = v 0
w0 , so (dt0 )2
(dx0 )2 = 4dv 0 dw0
so
g =
4m2
e
r
r/2m
[(dt0 )2
(dx0 )2 ]
r2 d! 2
The following picture, the Kruskal diagram should help.

w0
Singularity (r = 0) t0
v0
r = 2m
r = 2m
Region I
Region I
Region II
Region II
x0
r = const > 2m
Singularity (r = 0)
Note:
Radial null geodesics are given by
v 0 = , w0 , , ' constant,
and
w0 = , v 0 , , ' constant.
(i.e. they are always at 45 in the Kruskal diagram).

The event horizon r = 2m is given by v 0 w0 = 0: the two dashed lines;
The singularity at r = 0 is given by v 0 w0 = 2m: the two thick hyperbolae at top and
bottom of the picture.
Region I is the domain of the original Schwarzschild metric, r > 2m.
Region II is the region in which the EddingtonFinkelstein coordinates describe the
metric: Region II is the region interior to the event horizon r = 2m, i.e. the black hole
region. The boundary between regions I and II is the positive v 0 -axis.
We emphasise that the (v 0 , w0 ) axes are at 45 in this diagram. Also that the hyperbola at
the top of the picture is the true singularity of the metric and represents the black hole itself.
The ultimate fate of every particle or photon in region II (inside the event horizon) is to end
up terminated by this singularity.
The singularity is the set v 0 w0 = 2m and is shielded from view by the event horizon. GR
and all known laws of physics break down at the singularity.
102
7.5. Gravitational collapse

There are two problems with the Schwarzschild solution as a realistic model of a gravitating
object such as a star.
Most stars are rotating; and the Schwarzschild solution is a vacuum solution, so one needs
a separate description of the metric along the world-line of the star itself.
The first problem has been solved in the sense that the Kerr metric is a solution of Einsteins
vacuum equations rab = 0 which contains an additional parameter corresponding to angular
momentum. This metric is beyond the scope of this course...
For the second problem, consider the Schwarzschild radius rs = 2m. Putting back in the
units,
2Gm
rs = 2 .
c
For typical objects, this radius is inside the object: e.g., for the Earth it is about 1 cm and
for the sun, about 3 km. So (ignoring angular momentum) for these objects, it is a question
of grafting part of the exterior Schwarzschild metric to another metric which describes the
matter content.
7.6. The life and death of stars
Stars like our sun manage to maintain their size against the pull of gravity because of the
pressure generated by the nuclear fusion reactions going on its core. When the fuel runs out
(to simplify a complicated story), a star of the mass of our sun is expected to settle down as a
white dwarf: not very bright, density approx 109 kg/m3 .
For stars whose mass is between about 1.5 times and 3.2 times the mass of the sun, it is
expected that the final state will be a neutron star. Such an object consists mainly of neutrons
crushed together at almost unimaginable densities in their core of around 1017 kg/m3 .
It seems that for stars of mass more than about 8 times that of the sun, no known physical
process can overcome the pull of gravity once the fuel runs out. The star collapses inside its
Schwarzschild radius, an event horizon is formed, and inside that a singularity - which has no
satisfactory physical description.
7.7. Some figures, or a tale of 3 black holes
(with apologies to Kip Thorne)
In his popular book Black holes and time warps, Kip Thorne starts by describing a space
mission to explore black holes of dierent sizes.
Hades: 10 solar mass: rs = 30 km. Tidal forces already perceptible in an orbit of
circumference 105 km; at 3000 km the tidal force is more than 15g!
Sagittario (centre of Milky Way) 106 solar masses, rs = 3 107 km. (about 8
moons orbit around earth). To hover at 1.0001rs would require a force of 1.5 108 g.
Gargantua 151012 solar masses. rs = 4.51013 km = 5 light years. For such a hole,
no perceptible tidal forces even on an orbit whose circumference is 1.0001 horizon
circumference. To hold in this position, 10g acceleration is required. All physical
experiments are entirely consistent with being in a 10g gravitational field. You could
slip through the horizon with without any ill eects (apart from never being able to
get out again).
From a larger distance, such a large hole would have a major light-bending eect
on stars in the line of sight. See a recent movie for a simulation of this.

General Relativity Notes

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

General Relativity Notes

Uploaded by

Copyright:

Available Formats

Maths for GR

Chapter 2. Minkowski Spacetime and Special Relativity

Chapter 3. Further topics in Special Relativity

Chapter 4. Multivariable calculus

Chapter 5. Space-times and geodesics

Chapter 6. Covariant dierentiation and curvature

Chapter 7. The Schwarzschild metric and black holes

Introductory mathematical material: some geometry

1. INTRODUCTORY MATHEMATICAL MATERIAL: SOME GEOMETRY

1.2. QUADRATIC FORMS AND BILINEAR FORMS

Note that if we subject p and q to an affine transformation

1. INTRODUCTORY MATHEMATICAL MATERIAL: SOME GEOMETRY

ax2 + by 2 + cz 2 + 2eyz + 2f zx + 2gxy = x y z 4 g b e 5 4y 5 .

1.2. QUADRATIC FORMS AND BILINEAR FORMS

B is non-degenerate if and only if the matrix B is invertible.

where c = cos , s = sin is a Q-isometry of R2 . Geometrically, this linear transformation just

Thinking of x and y as column vectors, we can also write this

Cf. Problem 1.2.

if T is the matrix representation of T with respect to the basis (e1 , . . . , en ).

1. INTRODUCTORY MATHEMATICAL MATERIAL: SOME GEOMETRY

1.2.2. Diagonal quadratic forms. If V = Rn , the standard quadratic form of signature

Let V 0 be the orthogonal complement of e1 with respect to Q, that is

Because Q(e1 ) = 1 and Q is non-degenerate, this is an n 1-dimensional subspace and the

1.3. CURVES, TANGENT VECTORS AND SO ON

1. INTRODUCTORY MATHEMATICAL MATERIAL: SOME GEOMETRY

where a > 0 is a constant. Then

(t) = ( a sin t, a cos t) and

(t) = ( a cos t, a sin t).

: I ! V is parameterized by arc-length if | 0 (t)|2 = 1 for all t.

Proposition 1.3.4. If : I ! V is any regular curve in a euclidean vector or affine space,

Take the length-squared, to get

Imposing that is arclength, we get the equation

1.4. CALCULUS OF VARIATIONS

1.4. Calculus of variations

1. INTRODUCTORY MATHEMATICAL MATERIAL: SOME GEOMETRY

1.4. CALCULUS OF VARIATIONS

The physical interpretation is that T (x, x)

Proof. See Problem 1.11.

are both constants.

our coordinate axes, we can reduce to this case

1. INTRODUCTORY MATHEMATICAL MATERIAL: SOME GEOMETRY

Minkowski Spacetime and Special Relativity

2. MINKOWSKI SPACETIME AND SPECIAL RELATIVITY

P4 Free particles cannot travel faster than c.

2. MINKOWSKI SPACETIME AND SPECIAL RELATIVITY

where X and N are constant vectors and N is null, i.e. (N, N ) = 0.

Now, by the bilinearity of ,

2.5. SPACETIME DIAGRAMS

Hypothesis 2.3.8. The worldline of a free particle has the form

2. MINKOWSKI SPACETIME AND SPECIAL RELATIVITY

free particle world-line

2.6. Time dilatationmoving clocks run slowly

in units in which c 6= 1. Notice that

2.7. SIMULTANEITY AND DISTANCE

The displacement vector from E to F is

Take the -inner product of this with N and with

where weve used

Combining with (2.7.4),

2. MINKOWSKI SPACETIME AND SPECIAL RELATIVITY

2.8. LENGTH CONTRACTION

By assumption (V, D) = 0, so we get 0 if and only if 0 = . For these simultaneous events,

To satisfy the simultaneity condition (Definition 2.7.3) we need to solve

which gives 0 = (U, D)/(U, V ).

(U, D)(V, D) (U, D)2