Relativity

Contents
I
Relativity
2
1 Why do we need relativity?
2
2 Special Relativity (SR)
2.1 Einstein’s 2 postulates . . . . . . . . . . . .
2.2 Lorentz transformations . . . . . . . . . . .
2.2.1 Addition of velocities . . . . . . . . .
2.2.2 The role of observers in SR . . . . .
2.2.3 Simultaneity and causality . . . . . .
2.2.4 Time dilation and length contraction
2.2.5 Relativistic Doppler . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3 Machinery of Relativity
3.1 Vectors, metric, etc. . . . . . . . . . . . .
3.2 Tensors . . . . . . . . . . . . . . . . . . .
3.3 Basis and co-basis vector sets . . . . . . .
3.4 Two types of vectors and transformations
3.5 Invariant quantities . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
6
. 6
. 7
. 8
. 9
. 10
.
.
.
.
.
2
2
3
4
5
5
5
6
4 Some Special Relativistic physics
11
4.1 Maxwell’s Equations are Lorentz invariant . . . . . . . . . . . . . . . . . . . . . . . . 11
4.2 Particle mechanics: Displacement, velocity, momentum and acceleration 4-vectors . . 12
4.3 Fluid mechanics: Energy-momentum (or stress-energy) Tensor . . . . . . . . . . . . 12
5 General Relativity (GR)
5.1 Motivation for GR . . . . . . . . . . . . . . . . . . . . . . .
5.2 The Equivalence Principle (EP) . . . . . . . . . . . . . . . .
5.3 The Weak Field limit of GR . . . . . . . . . . . . . . . . . .
5.4 Motion of a single particle . . . . . . . . . . . . . . . . . . .
5.5 Differentiation of vectors in non-Cartesian coordinates . . .
5.6 Motion of a collection of particles: characterizing curvature
5.7 Field equations . . . . . . . . . . . . . . . . . . . . . . . . .
5.8 Non-linearity of GR . . . . . . . . . . . . . . . . . . . . . .
1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
13
13
13
15
15
16
17
18
19
Part I
Relativity
1
Why do we need relativity?
Some early experiments and observations produces strange results, apparently not consistent with
‘common sense’.
• c is finite. Romer, 1675: Observed Io’s orbital period around Jupiter as Jupiter approached
and receded from Earth; concluded that velocity of light is finite.
• c is independent of the velocity of the emitter. In an eclipsing binary with equal mass stars
maximum redshift of star 1 and maximum blueshift of star 2 are observed to take place at
the same time. This challenges the Newtonian prescription for addition of velocities (Galilean
transformations).
• c is independent of the frame from which it is measured. Michelson-Morley, 1887: looked for
ether, but measured same value for c regardless of the lab frame’s orientation and motion
with respect to ether.
• Maxwell’s equations are not invariant under Galilean transformations, but are invariant under Lorentz transformtions. If Galilean transformations (GT) are the physically correct ones,
then Maxwell’s equations are valid only in one special frame, defined by ‘ether’, where c would
take on its speed of light value. In all other frames, GT would make Maxwell’s equations take
on a different form in different frames, and c would have different values, determined by the
Galilean law of addition of velocities, c0 = c + v. It was generally known that Maxwell’s equations were invariant under Lorentz transformations, but that was considered a mathematical
curiosity, rather than physical reality.
Einstein chose to start with the premise that Lorentz transformations did represent physical
reality, and that Maxwell’s equations retain the same form irrespective of the reference frame.
Conclusion: there is no ether; therefore Galilean transformation, and the corresponding ‘common
sense’ law of addition of velocities, v = v1 + v2 , does not hold in general. One needed to formulate
a new theory, where c is the same in all inertial frames, as it already is in Maxwell’s equations.
Then that theory needed to be extended to extend gravity.
Our main goal in these notes is to understand how Einstein field equations that describe gravity
arise. These lecture notes are not a comprehensive treatment of relativity, but hopefully they are
self-contained. (The lectures will probably contain more detailed material than these notes.)
2
2.1
Special Relativity (SR)
Einstein’s 2 postulates
(1) Principle of Relativity. Physics experiments must be reproducible, therefore physical laws
must have the same form in all inertial frames, i.e. physical laws must be form invariant, or
covariant. Inertial frames are defined by Newton’s 1st law. Any reference frame moving at constant
velocity w.r.t. an inertial frame is itself an inertial frame. In inertial frames free particles travel
2
along straight lines with constant velocity. PR restated: all inertial frames are completely equivalent
for the performance of physical experiments.
(2) Constancy of c. Speed of light is constant in vacuum, and is independent of the velocity
of the emitter and the absorber. It is like a property of space rather than property of the emitter/absorber.
We will use (2) to construct the basic machinery of SR, in particular, derive transformations,
LT, that will allow us to go between frames which are moving with v ≤ c with respect to each other.
Once that’s done, (1) and Lorentz transformations can be used to reformulate/redefine concepts
like velocity, momentum, acceleration, force, etc. such that they are Lorentz-invariant, and reduce
to their pre-SR forms in the low velocity limit.
2.2
Lorentz transformations
We start with the 2nd postulate (speed of light is the same in all frames) and consider propagation
of light. Consider 2 events: event 1 is the emission of light signal, and event 2 is the absorption
of that light signal. The spatial and temporal locations of these 2 events are diligently recorded
0
by observers
p in two inertial frames: S is moving at v as seen from S. In frame S light travels a
distance dx2 + dy 2 + dz 2 in an interval of time equal to dt: dx2 + dy 2 + dz 2 = c2 dt2 . In another
coordinate system, S 0 that same interval between emission and absorption can be described as
dx02 + dy 02 + dz 02 = c2 dt02 . (One can then ask what linear transformation will satisfy these, and
thereby deduce Lorentz transformations, but we will take a simpler route.) Placing all the terms
in each of the above equations on the same side we write,
2
−c2 dt2 + dx2 + dy 2 + dz 2 = 0 = −c2 dt02 + dx0 + dy 02 + dz 02
(1)
where −c2 dt2 + dx2 + dy 2 + dz 2 , or its equivalent in any other frame, is an interval between two
null-separated events in space-time. Note that this keeps c constant in all frames. This interval in
space-time has the same length, equal to zero for light, in both coordinate systems, and thus in all
inertial coordinate frames. For particles and objects other than photons this interval is not null,
but it is still invariant in all frames. It is called proper time, dτ , and is the time measured in the
rest frame of the particle.
Since dτ is the same in all frames, we should look for a transformation between coordinate
systems that preserves the magnitude of the interval, dτ 2 = −c2 dt2 + dx2 + dy 2 + dz 2 , i.e. keeps it
invariant, and also preserves the form of the interval. This interval looks just like
p the distance between two points in the usual Cartesian system of Eucledian space, where dr = dx2 + dy 2 + dz 2 ,
except that here we have an additional dimension of time, and the dt2 segment has a sign opposite
to that of the spatial-dimension segments. Note that because the squares of dt and dx, dy, dz intervals do not all have the same sign we are dealing with non-Euclidean geometry. The space-time
defined by these is called Minkowski space-time.
The interval dτ 2 = −c2 dt2 + dx2 + dy 2 + dz 2 between any two given events has a fixed length in
all frames. A transformation that has the property of preserving length of segments is a rotation
or translation of coordinate frames. Familiar rotation involves the spatial axes only, for example,
for rotation in the x-y plane by angle θ in the anti-clockwise direction we have:

 


ict0
1
0
0
0
ict
 x0   0 cos θ − sin θ 0   x 

 


(2)
 y 0  =  0 sin θ cos θ 0   y 
z0
0
0
0
1
z
3
Here t and z coordinates remain the same in both frames, while
x0 = x cos θ − y sin θ
and y 0 = x sin θ + y cos θ
(3)
If a rotation involves the time axis, say, is in the ict-x plane, the angle of rotation has to be not θ,
but iθ. Using hyperbolic trigonometry we have
cos iθ = cosh θ
and
sin iθ = i sinh θ
With these, the rotation matrix in the ict-x plane becomes,

 
ict0
cosh θ −i sinh θ 0
 x0   i sinh θ
cosh θ
0

 
 y0  = 
0
0
1
0
z
0
0
0

0
ict


0  x
0  y
1
z
(4)




(5)
And the new, i.e. primed time and x coordinates become
ict0 = ict cosh θ − ix sinh θ
and x0 = −ct sinh θ + x cosh θ
(6)
We will assume from now on that frame S 0 is moving with respect to S in the +ve x direction, with
v; and y and z velocities of S 0 w.r.t. S are 0. How does the rotation angle θ relate to v? A person
sitting at the origin of S 0 will record x0 = 0, and from the second of eqns. 6 we get x/t = v = tanh θ.
Using the identity cosh2 θ − sinh2 θ = 1, and those derived from it we get:
cosh θ = (1 − v 2 )−1/2 = γ
and
sinh θ = γv,
(7)
where we have introduced the Lorentz-γ factor, which is never less than 1 and approaches infinity
for v close to c. Lorentz transformation now becomes (from eqs. 6)
t0 = γ(t − xv/c2 )
and x0 = γ(x − tv)
(8)
and you can verify directly that the length of space-time interval is preserved. Equivalently, the
Lorentz transformations can be written as follows, avoiding imaginary numbers:


 0  
ct
γ
−γv 0 0
ct


 x0   −γv
γ
0 0 
 x 
 

(9)


 y0  =  0
y 
0
1 0
z
0
0
0 1
z0
Rotations involving time axis are called boosts. The entire set of Lorentz transformations
includes boosts, rotations, and linear translations. However, it is only the rotations involving the
time axis (.e. boosts) that can cause confusion because they are so different from our everyday
experience.
2.2.1
Addition of velocities
Frame S 00 moves with respect to frame S 0 at velocity v1 , while frame S 0 moves with respect to frame
S at v2 . What is the speed of S 00 as seen from S? The rotation angles add up linearly (and their
subscripts indicate which velocities they correspond to), so that
v = tanh θ = tanh(θ1 + θ2 ) =
tanh θ1 + tanh θ2
v1 + v2
=
(1 + tanh θ1 tanh θ2 )
1 + v1 v2
(10)
this is always less than 1, i.e. less than c, in accordance with the 1st postulate. (Recall the shape
of tanh θ vs. θ.)
4
2.2.2
The role of observers in SR
A major source of confusion in studying relativity arises from the choice of observers who record
time and location of events. Because light has a finite speed of propagation (signals travel at finite
velocities), it is very important to keep track of who is observing what, when, and according to
whose clock. One can divide most of SR experiments (thought experiments or real experiments)
into two classes:
(i) Those where each S and S 0 are populated with numerous observers stationed at every
coordinate location, each carying their own clock that tells the local time. Clocks of all these
observers in a given frame are synchronized. Coordinates t, x, etc. are those measured by these
local observers. Lorentz transformations allow you to go between frames. The standard time
dilation and length contraction experiments (§ 2.2.4 ) are in this category.
(ii) Those where only a single observer is involved. Here your calculations have to take into
account the finite speed of propagation of signals; Lorentz tranformations do not give the full
answer. Examples are relativistic Doppler effect and rotation and shape change of moving boxes
(§ 2.2.5).
2.2.3
Simultaneity and causality
Frame S 0 is moving with respect to frame S with velocity v along the positive x-axis. A person
standing still in S 0 throws a ball into a basket (also stationary in S 0 ) with velocity u0 , along the
positive x0 -axis. Event #1 corresponds to the ball being released from the person’s hands; event
#2 is the ball falling into the basket. By design, let the two frames’ origins coincide at the moment
and location of event #1 (hence all the entries in the left hand column of the table below are zeros).
Person stationary at the origin in S seen the ball’s velocity as u = (v + u0 )/(1 + vu0 ), same as eq. 10.
Transforming between the frames gives us (each entry is a pair of t and x coordinates):
S
S0
event #1
event #2
(0, 0)
(t, x = ut)
(0, 0)
(t0 = γ[t − xv], x0 = u0 t0 = γ[x − tv])
(11)
t0 = γt(1 − vu). t0 and γ are always positive, by definition, and u cannot exceed 1 (in units of c). v
also cannot exceed 1, therefore (1 − vu) is always positive, and hence t is always positive. In other
words, event #2 in S always takes place after event #1 (at t = 0), or, if both v and u are 1, then
the two events happen simultaneously. This establishes causality in SR, i.e. event #2 can never
precede event #1. So causality is universally agreed upon, while simultaneity is not, since t0 6= t,
i.e. observers in the two frames do not agree on the time when the ball hits the basket.
2.2.4
Time dilation and length contraction
These effects can be shown to result from the same transformations as we’ve just used.
The fact that moving clocks tick slower (time dilation) can be demonstrated as follows. Suppose
a clock is sitting at the origin of frame S 0 . It makes the first tick at (t0 , x0 ) = (0, 0) and the second
tick at (t0 , x0 ) = (dt0 , 0) as seen by observers in its own frame. In frame S the first tick is detected
at (t, x) = (0, 0), while the second tick is detected at (dt, dx), where dx = vdt (second of eqns. 8).
Note that dx is not zero, i.e. the second tick in frame S is detected by a different observer than
the one who detected the first tick. Since the spacetime interval between the two ticks is invariant,
dt0 2 = dt2 − dx2 = dt2 [1 − v 2 ]. Therefore, γdt0 = dt, and so interval between ticks always takes
longer if the clock is seen to be moving (here, by observers in S). Hence “moving clocks runs slow”.
The situation is completely symmetric: a clock in S will be seen to sun slow by those in S 0 .
5
Substituting clocks with people and ticks by heartbeats, and remembering that person’s age is
proportional to the duration of a heartbeat, we see that the traveling person (once they gets back
home, i.e. after decelerating, turning around, and making the return journey) would have aged less
by a factor of γ compared to a person who stayed at home.
Length contraction means that moving objects are always shorter along their direction of motion,
as seen from the lab frame (by two observers). Objects are always longest in their own rest frames.
The effect is symmetric: a spaceship moving through the Galaxy looks shorter to the inhabitants
of Galaxy’s planets; and the Galaxy looks shorter to the space-travelers, by the same γ factor.
Finally, note that time dilation and length contraction are real, and not optical illusions.
2.2.5
Relativistic Doppler
In contrast to the time dilation measurement here we have only one observer, standing still at the
origin of frame S. The light source moves at velocity v along with frame S 0 . In this case the
observer needs to wait for the light signal from the moving source to get back to him, which adds
an extra vdt, in the observer’s S frame, or vdt0 γ, where dt0 is the interval between emitted signals
in clock’s rest frame S 0 . The interval between ticks seen by observer in S is, dt = γ(1 + v)dt0 , and
so λ = γ(1 + v)λ0 . This reduces to the usual Doppler when γ approaches 1. The fractional change
in wavelength is (λ/λ0 ) − 1 = [(1 + v)/(1 − v)]−1/2 − 1.
3
3.1
Machinery of Relativity
Vectors, metric, etc.
Let’s define a vector,
~ = Ai ~ei = A1 ~e1 + A2 ~e2 + A3 ~e3
A
(12)
where we have used the summation convention: repeated up and down indeces are summed over.
Ai ’s are components of the vector and ~ei ’s are basis vectors of the coordinate system being used to
~
represent vector A.
Squared length of a vector is calculated as usual,
~2=A
~·A
~ = (Ai ~ei ) · (Aj ~ej ) = (~ei · ~ej ) Ai Aj = gij Ai Aj ,
|A|
(13)
where we have defined the metric tensor gij as the dot product of basis vectors. In Cartesian
~ 2 = (A1 )2 + (A2 )2 + (A3 )2 , which means gij = δij , but
coordinates we are used to having |A|
that is not true in Minkowski or curved spaces, or even in flat Eucledian space with a curvilinear
~ can be
coordinate system. Note that if the basis set is orthogonal, then gij is diagonal. Vector A
an infinitesimal displacement ds, from some origin, or between two points, in which case we can
rewrite the above as
ds2 = gµν dxµ dxν ,
(14)
where have also used greek indeces which are usually reserved for the 4D space-time, and range
from 0 to 3, with 0 representing the time dimension. (Indeces i, j, k are used for the 3 spatial
dimensions.) We already used eq. 14 in eq. 1, with gij given by (−1, 1, 1, 1). This SR Minkowski
metric is usually denoted by ηαβ , while gαβ is used to represent a general metric.
Transformation between coordinate systems, from the unprimed to the primed system, of vector
components dxν to vctor components dx0µ is
dx0µ =
∂x0µ ν
∂x0µ 0 ∂x0µ 1 ∂x0µ 2 ∂x0µ 3
dx
=
dx +
dx +
dx +
dx ,
∂xν
∂x0
∂x1
∂x2
∂x3
6
(15)
where ν is explicitly shown to run over 4 possible values. µ can assume any 4 values, so the above
is really 4 equations. This expression says that each component in the new (primed) system is a
linear combination of the all the components in the old (unprimed) system. For the components of
a general vector,
∂x0µ ν
A0µ =
A .
(16)
∂xν
In both cases
∂x0µ
0
= pµ ν
(17)
ν
∂x
is the 4 × 4 transformation matrix (which is not a tensor). (Strictly speaking, the two indeces of
p should not be written on top of each other because the first index next to p indicates the row
of the corresponding matrix and the second one, the column. When both indeces are either up or
down then the correct ordering happens automatically.)
An example of a transformation between frames is provided by Special Relativity and its Lorentz
transformations, Λ, so eq. 16 can be written as
0
dx0µ = Λµ ν dxν .
(18)
This is the same as eq. 9.
Vectors have physical significance, so transformation between frames should not change them,
but their components can and do change. (Scalars are invariant under coordinate transformations,
while tensors of rank higher than 1 also have components that would change between coordinate
systems.) For example, a squared length of a given vector stays the same
~ 2 = gµν Aµ Aν = gµν (
|A|
∂xµ 0α ∂xν 0β
∂xµ ∂xν 0α 0β
0
~ 0 |2
A0α A0β = |A
A
)(
A
)
=
g
(
)A A = gαβ
µν
∂x0α
∂x0α ∂x0β
∂x0β
(19)
The last two equalities are there because we know that the vector has to stay the same, and
that it can be represented using the metric tensor in that new (primed) coordinate system, and the
components of the vector in that same system. We conclude that this must be true:
gµν
∂xµ ∂xν
0
,
= gαβ
∂x0α ∂x0β
(20)
which demonstrates that the metric tensor transforms using the transformation matrix. This makes
it a tensor, like vector A. (With tensors, if it walks like a duck, it is a duck. Note that not all sets of
4 numbers will transform like a vector, i.e. a tensor of rank 1. For example, values of gravitational
potential at the 4 corners of some square do not transform like a vector, they are just 4 scalars,
and each individually stays invariant in all frames.) But unlike A, gµν is a rank 2 tensor (has 2
indices), so we need two transformation matrices to transform it, as is shown in eq. 20.
3.2
Tensors
Instead of 3D vector quantities that we are familiar with from Newtonian mechanics, SR and GR
use 4-vectors. The 4-vectors are a subclass of tensors, objects that transform in a certain way
between frames (discussed earlier, for example, as in eq. 16), and in doing so preserve their form,
and physical meaning. Vectors are tensors of rank 1, scalars are tensors of rank 0, while tensors
that are written as ‘matrices’, for example the metric tensor, are of rank 2. Tensors can have
higher ranks too, and we will encouter some later. Note that the components of tensors are not
scalars because the individual components change in different frames. A good starting example of
7
a tensor is the displacement 4-vector, with components (dt, dx, dy, dz). We already know that it
preserves its form under LT. Like other rank 1 tensors, it preserves its magnitude under Lorentz
transformations. Other tensors can be defined based on dτ = (dt, dx, dy, dz), as we will see later.
Tensor equations (that include tensors and operators) preserve their form under coordinate
transformations and hence are true in any coordinate system.
3.3
Basis and co-basis vector sets
If one is working in anything other than Eucledian space with a Cartesian coordinate system, then
any coordinate system you chose will have another, related system. In other words, there will be
two related basis vector sets. One set has its basis vectors oriented along the coordinate axes, and
is called the basis set, ~eµ . This is the set we have been using so far, starting with eq. 12. The
other basis, or co-basis or dual basis is ~eµ , such that the two are related by this dot product (not
a summation) resulting in a Kronecker delta:
~eµ · ~eν = δµ ν .
(21)
This does not necessarily mean that eµ and eµ are parallel and have equal lengths of unity. For a
general curved space they will be misaligned (not parallel) and have unequal lengths. In Minkowski
space the two sets are also different even though the space is not curved.
P
(There is another property that the two basis sets are related by, the outer product, or µ ~eµ ⊗
~eµ = ~1. In 3D, this translates in to a cross product and means that ~e1 of the co-basis set is
perpendicular to the plane containing ~e2 and ~e3 of the basis set, etc. We will not need this relation.
Co-basis, or dual basis is very useful when discussing curved spaces.)
Given the two types of basis sets there are two ways to represent a vector, using either basis or
co-basis:
~ = B µ eµ = Bν eν ,
B
(22)
so the magnitude squared a vector can be written in two ways, this way:
~ 2 = (B µ eµ ) · (Bν eν ) = B µ Bν eµ eν = B µ Bν δ ν µ = B ν Bν ,
|B|
(23)
~ 2 = gµν B µ B ν . Combining the two we have
and the one we had from before: |B|
B ν Bν = gµν B µ B ν
=⇒
Bν = gµν B µ
(24)
which is a rule to lower (or raise) indeces, i.e. go from using one type of basis set to the other
one (from basis to co-basis in this example). So, raising and lowering of indeces is done using the
metric tensor.
Here is a different look at it: let’s find a projection of a vector on to one of the basis vectors,
eν , where ν can be 0,1,2,3. By definition, this is Bν , but we can also write it as follows,
~ · ~eν = (B µ eµ )eν = B µ (eµ eν ) eq.13
B
= B µ gµν = Bν ,
(25)
and the last step shows that we can lower indeces using the metric tensor.
By analogy with eq. 13 we can write,
~ 2 = g µν Aµ Aν .
|A|
(26)
Let’s investigate g µν , by multiplying it with gµν , and using our method of raising/lowering indeces
gµν g να = gµ α .
8
(27)
We can also write out the metric tensor(s) using its definition (see eq. 13):
eq.21
eq.21
eµ eν eν eα = eµ δν ν eα = eµ eα = δµ α ,
(28)
gµ α = δµ α .
(29)
which leads to
When combined with eq. 27, it tells us that g µν is the inverse of gµν :
gµν g να = δµ α .
3.4
(30)
Two types of vectors and transformations
Two different basis sets gives us two ways to represent the same vector, with components Aµ and
Aµ , respectively. It is not surprising that the two will transform differently between frames.
Let’s look at our vector, eq. 12 again. The vector will be the same regardless of the coordinate
system used to decompose it into components. Suppose we tranform the components to a different
coordinate systems. If the basis vectors e0µ stretch or curve with respect to the original basis vectors
eµ , then the components of A in the new primed system must stretch or curve in the “opposite” way
to keep vector A the same. Thus we say that the components of vector A transform contravariantly
compared to the basis vectors. If basis vectors expand, components of A must shrink. Hence
“contravariantly”, and Aµ is called a contravariant vector. These are written with the indeces as
superscipts. A contravariant transformation law is eq. 16. So co-basis transforms contravariantly,
e0µ =
∂x0µ ν
e .
∂xν
(31)
Vectors that stretch or curve in the same way as the basis vectors (and “opposite” to the cobasis) are called covariant; they co-vary with the basis vectors upon transformation to a different
coordinate system. Basis vectors are obviously an example.
Another example, are the components of the derivative vector,
d
∂xµ d
∂x0 d
∂x1 d
∂x2 d
∂x3 d
=
=
+
+
+
,
dx0ν
∂x0ν dxµ
∂x0ν dx0 ∂x0ν dx1 ∂x0ν dx2 ∂x0ν dx3
or in general,
A0ν =
∂xµ
Aµ .
∂x0ν
(32)
(33)
The transformation matrix in these cases is
∂xµ
= pµ ν 0
∂x0ν
(34)
0
which should be compared to that in eq. 16, pµ ν . The two transformation are different because the
µ0
∂xµ
elements of the matrices, ∂x
∂xν are not the same as ∂xν0 . One can show that by using the Lorentz
transformations, and recalling that the partial derivative with respect to x, keeping t constant, is
d
d
∂x0 ∂t0 d
=
· 0+
·
dx
∂x t dx
∂x t dt0
(35)
And similarly for t. One can also show that the two transformations are inverses of each other.
9
To sum up, there are two types of tensors defined by how they transform between frames.
Contravariant tensors transforms as (here I have chosen to write the summations explicitly)
dx
0µ
=
3
X
∂x0µ
ν=0
∂xν
dxν
(36)
Covariant tensors transforms between frames as (where f is the function being differentiated)
3
X ∂xν df
df
=
0µ
dx
∂x0µ dxν
(37)
ν=0
One can also show that the distinction between covariant and contravariant transformations
vanishes in Eucledian (flat) geometry when transforming between two sets of Cartesian coordinates;
the two transformations become the same. Verify this by using the usual x − y rotation matrix
as an example of transformation between two Cartesian frames, and calculate the transformation
df
coefficients for dx and dx
. One can also show that covariant and contravariant thansformations
between a Cartesian and a Polar coordinate systems in Eucledian space are different, even though
space is flat.
Equations 16 and 33 show how to transform a contravariant and a covariant tensor of rank-1,
a vector, between frames. Some tensors have higher ranks, and their transformation laws involve
0
multiple applications of pµ ν or pµ ν 0 , one for each index. This is an example of a rank 3 mixed
tensor, i.e. it has upstairs and downstairs indeces:
Aα0 β
0
0
γ0
= pµ α0 pβ ν pξ γ 0 Aµ ν ξ .
(38)
Tensor Aµ ν ξ and its transform each has 43 = 64 elements, and each p has 42 = 16 elements.
3.5
Invariant quantities
We have already encountered some quantities that stay the same in different frames, like squared
~
lengths of vectors, eq. 14, 19, and 23. All these involve dot products of two vectors, for example B
2
ν
~
with itself, |B| = B Bν . This shows that a product of contravariant and a covariant vector is an
invariant. We will meet physics examples of these types of invariants in Section 4.2.
One can extend that to include these two product-type operations as well:
∂
∂
∂2
µ
µ ν
=
∂
∂
=
g
∂
∂
=
−
+ ∇2 = t
u
µ
µν
∂xµ ∂xµ
∂t2
(39)
∂µ Aµ ,
(40)
and
~ is covariant gradient operator, and ∂/∂xµ = (−∂/∂t; ∇)
~ is a contravariant
where ∂/∂xµ = (∂/∂t; ∇)
~ represents current, then eq. 40
gradient operator, and their product is an invariant, eq. 39. If A
represents continuity equation. The fact that eq. 39 and 40 are invariant can be demontrated
explicitly using Lorentz transformations, as we will probably do in class. (Note that in curved
spacetime or in flat spacetime with non-Cartesian coordinate system partial derivaties, ∂µ are not
the same as covariant differentives, ∇µ , or Dµ (Section 5.5) but in Minkowski space they are the
same.)
This gives us sufficient tools to show that Maxwell’s equations are Lorentz invariant and so can
be written using tensors and tensor equations.
10
4
4.1
Some Special Relativistic physics
Maxwell’s Equations are Lorentz invariant
The homogeneous ME’s are
~ ·B
~ =0
∇
absence of magnetic monopoles
~
~ ×E
~ + 1 ∂B = 0
∇
c ∂t
Faraday0 s Law
(41)
(42)
The inhomogeneous ME’s are
~ ·E
~ =ρ
∇
Gauss0 Law for electric charges
~
~
~ ×B
~ − 1 ∂E = J
∇
c ∂t
c
Ampere0 s Law
(43)
(44)
~ and B
~ are related to forces that act on charged particles. By analogy with Newtonian gravity
E
we can rewrite ME’s using potentials, recalling that one can often define force as a gradient of a
potential.
~ and φ as follows. Using eq. 41, B
~ =∇
~ × A,
~ because
Define vector and scalar E&M potentials, A
~
~ · (∇
~ × A)
~ = 0. Using eq. 42, define E
~ = − 1 ∂ A − ∇φ,
~ because ∇
~ × (∇
~ · φ) = 0.
∇
c ∂t
~ and φ? The Lorenz condition is that the three components of A
~
What is the relation between A
~
and one component of φ form a 4-vector, Φ = (φ, A), and the divergence of Φ is zero, establishing
~ and φ. Divergence operator is Lorentz invariant, and so the following
the connection between A
statement is true in all Lorentz frames:
α
~ ·A
~ + 1 ∂φ = ∂Φ = 0
∇
c ∂t
∂xα
(45)
Using the two inhomogeneous equations, 43 and 44, one can show that
α
~ · J~ + ∂ρ = ∂J = 0,
∇
∂t
∂xα
(46)
so the charge and current density form a 4-vector, and its divergence is Lorentz invariant.
Now we have two tensors, Φ and J, and one can show that they are connected by
(
1 ∂2
uΦ = J
− ∇2 )Φ = t
c2 ∂t2
(47)
Hence all 4 Maxwell’s equations can be written as one Lorentz invariant equation, which preserves its form under coordinate transformations, even though the specific values of the components
~ and B
~ are not Lorentz invariant; they
of Φ and J will change. Also note that the components of E
αβ
are part of E&M field-strength rank 2 tensor, F which transforms under LT.
Equation 47 is the E&M equivalent of Newtonian gravity’s Poisson equation, ∇2 ΦN ewt = 4πGρ;
both are field equations for the corresponding theories. Recall that our eventual goal in these notes
is to find the field equations for relativistic gravity.
11
4.2
Particle mechanics: Displacement, velocity, momentum and acceleration
4-vectors
The length squared of the displacement 4-vector, which we already encountered (eq. 1), can be
written as
dτ 2 = dxα dxβ = ηαβ dxα dxβ ,
(48)
where ηαβ is the metric tensor and encodes the geometry of space. In Minkowski space η has no off
diagonal components (it diagonal components are -1, 1, 1, 1) so there are no α 6= β terms. Here,
dτ is the proper time, which is the time measured by the moving observer, in his own rest frame.
dτ 2 is an invariant quantity (same in all frames), and so can be used to define new tensors, based
on existing, already established ones.
We define the components of the velocity 4-vector as follows:
U α = dxα /dτ =
(dt, dx, dy, dz)
(dt, dx, dy, dz)
=
= γ(1, vx , vy , vz ).
dτ
dt/γ
(49)
The magnitude of the 4-velocity is
|U |2 = Uα U β = ηαβ U α U β =
(−dt2 + dx2 + dy 2 + dz 2 )
= −γ 2 (1 − |~v |2 ) = −1.
(dt/γ)2
(50)
The magnitude of the velocity 4-vector, being a scalar, is invariant under coordinate transformations. The 4-velocity is a tensor because its constructed by dividing the displacement 4-vector,
dτ = (dt, dx, dy, dz) (a tensor), by an invariant, dτ .
Next, we define momentum 4-vector by multiplying the velocity 4-vector by m0 , P = m0 U =
m0 γ(1, vx , vy , vz ), where m0 is the mass as measured in the object’s rest frame, and is called the
rest mass; it is an invariant. Thus by construction P is also a tensor of rank 1. Just constructing
a tensor is nice, but what is more important is to see if it is useful. The time component of the
4-momentum is P 0 = m0 (1 − v 2 )−1/2 ≈ m0 + 21 m0 v 2 + ..., suggesting that “total mass” should
include, in addition to the rest mass, the object’s ‘classical’ kinetic energy. So it appears that this
term represents the energy of the particle, E. The other three components correspond to classical
momenta, p~. The squared magnitude of the 4-momentum is
|P |2 = −E 2 /c2 + |~
p|2 = −m20 c2
(51)
and is an invariant because it is related to another invariant, |U |2 by multiplicative constants,
m0 and c. The last expression is the 4-momentum magnitude expressed in the rest frame of the
particle, and also says that the maximum energy content of a particle is m0 c2 . So the newly defined
4-momentum turns out to be a useful construct; it encapsulated useful physics.
Four-acceleration is defined as dU/dτ . One can show that the dot product A · U = 0, is
an invariant, and represents a statement that does not have a direct counterpart in classical 3mechanics (where acceleration can be parallel to velocity).
4.3
Fluid mechanics: Energy-momentum (or stress-energy) Tensor
Four momentum describes individual particles. How do we describe fluid? Consider how its density
is different when fluid moves with respect to the lab frame. The mass of each particle increases by
γ, and the length of the volume element (along the direction of motion) is contacted by γ, so the
volume density is now ργ 2 . The two factors of γ suggest that the appropriate ‘momentum’ must
have two factors of U , so we define a 4x4 entity as T αβ = ρU α U β . This is a tensor because it
12
consists of two 4-velocity tensors. In general, the α, β component is the flow of the α component of
momentum through the xβ =constant surface. The T 0i (where i = 1, 3) and the T i0 components are
then the energy flux and the momentum density, respectively, and are the same thing. Energy flux
is conduction of energy in some spatial direction, i.e. ‘heat’ conductivity. The diagonal elements
T ii are the 3 pressure components, and the off-diagonal T ij elements are the stresses, or viscosities.
Within SR, one can show that the divergence of stress-energy tensor is zero in the absence of
external forces: ∂x∂ β T αβ = ∂β T αβ = T αβ ,β= 0. For α = 0, i.e. the time component, this becomes
the equation of conservation of mass, or continuity equation, while for α = 1, 2, 3, it gives the three
(x , y, z) components of the equation of motion, or the equations for the conservation of momentum
in three separate spatial directions. This shows that the stress-energy tensor we have constructed,
guided by the concept of momentum of an individual particle, is a valuable quantity for describing
relativistic fluids.
In cosmology we will be interested in a special type of fluid: a perfect fluid is one that has
no viscosity, stresses, or conductivity. In its rest frame is it characterized by energy density, ρ
and internal pressure, p. Because of assumed isotropy, p is the same in all three directions. Off
diagonal components are zero (no stresses), so the stress-energy tensor of a perfect fluid has only
the diagonal components, and we can write it with the help of the following. A general expression
for the energy-momentum tensor, for a general metric is
T αβ = (ρ + p)U α U β + pg αβ ,
(52)
and one can show that for a flat, uniformly expanding Universe at a rate a(t), whose diagonal
metric is gαβ = (−1, a2 , a2 , a2 ), energy-momentum tensor becomes


ρ
0
0
0
 0 p/a2
0
0 

T αβ = 
(53)
2
 0
0
p/a
0 
0
0
0
p/a2
Stress-energy tensor contains information about the matter and energy content of the Universe
in a relativistically appropriate form, i.e. in a tensor form, and so is a good candidate for the right
hand side of the yet-to-be-obtained GR field equation, i.e. the replacement for ρ of the Newtonian
Poisson equation. (In GR, the SR partial derivative used above, T αβ ,β , is replaced by the covariant
derivative, Section 5.5, often written as T αβ ;β .) The left hand side, i.e. the replacement for ∇2 Φ,
will take more work.
5
5.1
General Relativity (GR)
Motivation for GR
We have ‘updated’ concepts like velocity, acceleration, and force to be Lorentz invariant. The
Maxwell equations of electromagnetism are already Lorentz invariant. What about gravity? A
simple-minded inclusion of gravity into SR does not work: Poisson equation where the Laplacian
is augmented to include ∂ 2 /∂ t2 predicts wrong amplitude and sense for the precession of Mercury,
and does not predict bending of light. So obviously including gravity is more difficult than that.
The search for the Poisson equation equivalent for GR leads to the Einstein’s field equations.
5.2
The Equivalence Principle (EP)
This is equivalence between gravitating and inertial masses. The proportionality of the two can
be experimentally verified to a great precision, but it took a leap of faith for Einstein to assume
13
that the two are not just proportional, but are exactly the same. Making the assumption that
the two types of masses are the same, and so equating mä and GmM/r2 , we see that acceleration
due to gravity will be the same for objects of all masses, ä = GM/r2 . The observational result
of this—that a feather and a bowling ball when released simultaneously from a tower through a
vacuum tube will land on the ground at the same time—was known since Galileo. Einstein fully
exploited the far reaching consequences of this observation. If two particles of very different masses
(both much smaller than the mass of Earth in this particular example) follow same paths, they
must be following ‘tracks’ determined by curved geometry of space-time. Replacing the concept of
gravity with the concept of curved geometry allows us to maintain that the particle is following a
straight path. This substitution of geometry in place of gravity also gets rid of the spooky actionat-a-distance concept: for example, the falling motion of a ball is merely a response to the local
shape of space-time, not Earth’s gravitational attraction.
It is important to remember that the EP only provides a path to GR. It does not give the full
description of the relativistic space-time because it ignores the key feature of gravity, tidal forces,
which make the distances between neighboring falling particles change.
Equivalence principle predicts two effects: gravitational bending of light and gravitational time
dilation.
In its approximate form time dilation can be calculated entirely from the Newtonian point
of view, because Newton and Einstein must agree in the limit of weak gravitational fields and
nonrelativistic velocities. First, we replace the Earth with an elevator, height h, which is being
accelerated upwards at g. Then we place observer A next to a source of EM radiation, which is
affixed to the top of the elevator, and radiates at frequency ν0 . Observer B is at the bottom of the
elevator. It takes light t = h/c to get to B. In that time the velocity of B changes by v = gt, so
B sees radiation blueshifted, with ν → ν + ∆ν. Since v is non-relativistic, we can use the regular
Doppler formula, with
∆ν/ν = v/c = hg/c2 = (ΦA − ΦB )/c2 = −∆Φ/c2 ,
which is solved by
2
ν/ν0 = e−∆Φ/c ≈ (1 − ∆Φ/c2 )
(54)
(55)
So the intervals between two successive wave crests (or clock ticks) as viewed by A and B are
related by ∆tB = ∆tA (1 + ∆Φ/c2 ), where ∆tB is the interval that B ascribed to A’s clock. Since
∆Φ/c2 < 0, B concludes that his clock is running slower than A’s. Now, replacing back the elevator
with Earth, we note that A is located at a higher gravitational potential (higher above the surface
of Earth), and B is located deeper within the potential well. This result can be extended to a case
where observer A is at zero potential (i.e. very far from any source of gravity), and measures clock
tick interval as ∆t0 , and B is at some non-zero potential, and measures ∆t = ∆t0 (1 + Φ/c2 ). A and
B are not moving with respect to one another so this time dilation is not Doppler. There are two
possible interpretations (1) clocks have different rates, but space-time is Minkowskian, (2) clock
rates are the same, but the time component in the metric of the geometry has value different from
that in ηαβ , i.e. space-time is not Minkowskian. Einstein adopted (2), and the factor (1 + ∆Φ/c2 )
is related to the time coefficient of the metric tensor.
The amount of deflection of light calculated based on the EP is not the full correct value, as
determined by experiment, for example, the bending of light from distant stars that just grazes the
edge of the Sun’s surface as seen from Earth. This is because the EP does not give the full GR.
14
5.3
The Weak Field limit of GR
In the Minkowski case the metric tensor ηµν is diagonal with the elements (−1, 1, 1, 1). In a general
geometry, the elements take on other values, and are represented by gµν . In other words, the
interval squaread between two neighboring events in space-time is given by
dτ 2 = g00 dt2 + g11 dx2 + g22 dy 2 + g33 dz 2 .
(56)
Using the results of the previous section we can get the time-time component of the metric: since
∆t ≈ ∆t0 (1 + Φ), and Φ is small, ∆t2 ≈ ∆t20 (1 + 2Φ), hence g00 ≈ −(1 + 2Φ). (The minus sign
appears because when Φ = 0, g00 must become -1. I’ve also dropped factors of c2 that divide the
pontential.) We see that the GR analogs of the Newtonian potential Φ are the gµν ’s.
The spatial components of the metric cannot be obtained from a simple thought experiment.
They can be deduced from the full theory, in the weak field regime. They are gii = (1 − 2Φ). The
full weak field metric is
dτ 2 = −(1 + 2Φ)dt2 + (1 − 2Φ)dr2
(57)
where dτ is the invariant interval. This metric is used in cosmological applications of propagation of
gravitational waves, and gravitational lensing. For example, imagine an intervening galaxy causing
light from a background quasar to take more than one
R path to us because the corresponding action
(integral of the Lagrangian along the path, S = dτ ) is extremized along more than one path
from quasar to us. What we see as a result are multiple images of the same background quasar.
Light travels along null geodesics, so dτ 2 = 0 for light along each path/image. So from eq. 57 we
get dt = (1 − 2Φ)dr, where we have used approximations of the sort (1 − 2Φ)−1 ≈ (1 + 2Φ) and
(1 − 2Φ)1/2 ≈ (1 − Φ) because Φ is small. Once we find all the paths from the source to us that
the light can take, and hence locate all the images, we can calculate how long (in our frame) it
took light to travel along these paths. Integrating dt along these paths
gives different results for
R
different paths/images, because
both
the
geometrical
path
length
(
dr)
and the time delay due
R
to gravitational potential (−2 Φdr) are different for the two paths. These time delays have been
measured for a number of distant quasars, providing yet another observational verification of GR’s
predictions.
5.4
Motion of a single particle
The EP suggests that gravity can be expressed as geometry with non-Minkowskian metric, so
space-time will have some curvature. We need to figure out how to decribe that curvature.
For a given particle moving through space-time we can always find an instantaneous frame in
which the particle is not being accelerated, i.e.
dU 0α
d2 x0α
=
=0
dτ
dτ 2
(58)
This is the freely falling frame, where one would experience weightlessness. (In terms of geometry
going into a freely-falling frame corresponds to finding a ‘tangent space’ to a curved space at a given
location, similar to a tangent plane to a sphere at some point. In a small enough patch around
that point a plane approximates a sphere pretty well.) This instantaneous frame is described by
the Minkowski metric ηαβ . Suppose we now look at the same particle from another frame, in which
the particle is seen to be accelerating. We can change coordinate systems, and rewrite eq. 58 in
the accelerated (unprimed) coordinate frame. The result is the geodesic equation of motion,
β
γ
d2 xα
α dx dx
=
Γ
βγ
dτ 2
dτ dτ
15
(59)
where Γ’s are called Christoffel symbols (a.k.a. the affine connection, or just the connection).
These are analogous to forces (first gradients of the potential) in Newtonian language. Since in GR
gravity is a fictions force, one should be able to have Γ’s disappear by an appropriate selection of a
coordinate frame, just what we had in eq. 58. Γαβγ ’s are 0 in some frames and non-0 in others, hence
they are not coordinate invariant, and not tensors. Γαβγ are symmetric in β and γ, i.e. Γαβγ = Γαγβ ,
which helps when we construct more complicated objects out of Γ’s.
The curvature of a given geometry is an intrinsic property of that geometry, and as such, its
characterization should not depend on the coordinate system that we paint on that geometry. Γαβγ ’s
are coodinate dependent to the degree that they can be made to vanish in some coordinate systems.
We conclude that quantities arising from considering motion of a single particle will not be able to
characterize the geometry. We need to consider motion of a collection of particles.
5.5
Differentiation of vectors in non-Cartesian coordinates
Before discussing motion of a collection of particles, this is good place to talk about differentiation
in relativity.
When differentiating with respect to coordinates dxµ , first ask what is being differentiated
(scalar, or tensor or rank 1 or higher), whether spacetime is flat or curved, and whether the
coordinates being used are Cartesian or curvilinear. The usual partial differentiation (∂µ ) works
fine in two cases: (i) differentiation of scalars in any spacetime and any coordinates, and (ii)
differentiation of tensors in Minkowski spacetime and Cartesian coordinates. In all other cases,
for example in GR, differentiation requires more attention, and is called covariant differentiation.
Another way of saying that is: in GR, partial differentiation of tensors does not produce tensors,
and since in relativity we use equations that retain their tensor form, differentiation has to be
changed (to covariant differentiation) so that equations retain their tensor properties.
In GR basic operations with vectors are defined only for vectors located at the same point.
Differentiating a vector involves taking a vector at the initial position and comparing it to a vector
at an infinitesimally close, but displaced location. So we “carry” one of the vectors to the location
of the other, subtract them, and divide that by the distance between the two. This “carrying” is
called parallel transport. If the space is flat and the coordinate system used is orthonormal (all
unit vectors are mutually perpendicular, ~ei · ~ej = δij , and the lengths of all unit vectors are unity,
|~ei | = 1) then parallel transport does not change the values of the vector’s components.
If either the space is not flat, or curvilinear coordinate system is used in a flat space, then parallel
transport changes the values of the components of a vector because the basis set is changing. Also,
parallel transporting a vector along different closed curves (and returning it to the same place) is
path-dependent. Then differentiation of a vector involves accounting for the real changes in the
components of a vector (because the vector itself has changed between two infinitesimmaly spaced
locations), and accounting for the changes in the components because the coordinate system’s basis
set had changed. So a general expression for covariant differentiation of vector components v α
consists of two terms. Here it is in vector notation:
D~v
∂v α
∂~eα
α
=
D
~
v
=
D
(v
~
e
)
=
~eα + v α β ,
α
β
β
β
β
Dx
∂x
∂x
and in component notation:
(60)
Dv α
= Dβ v α = ∂β v α + X α βγ v γ
(61)
Dxβ
Covariant differentiation has to be coordinate independent, and partial differentiation (the first
term on the RHS) does not satisfy that.
16
What is X? We can see from eq. 60 and eq. 61 that it is related to the derivatives of the
basis set. Here’s another way to find out. Suppose vector v α represents velocity of a free particle
floating through space. Its covariant derivative w.r.t. coordinates has to be the same in all frames.
We know that in the inertial frame that derivative is zero because the particle is not accelerating,
therefore covariant derivative, eq. 61, must be zero in all frames. We can multiply both sides of
eq. 61 by v β :
∂v α
γ
α
0 = vβ
v
(62)
+
X
βγ
∂xβ
Since v β = ∂xβ /∂τ , the above equation becomes
∂xβ ∂v α
∂v α
∂ 2 xα
+ X α βγ v γ v β =
+ X α βγ v γ v β =
+ X α βγ v γ v β = 0.
β
∂τ ∂x
∂τ
∂τ 2
(63)
The last expression is exactly eq. 59, if X α βγ = Γα βγ , so eq. 61 shows how to differentiate a vector
covariantly, and it involves Christoffel symbols (which are not tensors). So neither of the two terms
on the RHS of eq. 61 are tensors, but together they form a tensor expression.
Differentiation of tensors of higher ranks is similar, with one additional term with Γ added for
each additional index. Γ is often called the ‘connection’ because it connects a vector based at point
to that another infinitesimally displaced point. Or: a connection is a prescription for moving vectors
from one point to another. The act of moving a vector is parallel transport. Curved geometry is
sometimes defined as a space where parallel transporting a vector along two different paths with
the same endpoints produced two different vectors.
Equation 62 can help us formlulate the concept of the geodesic, the straightest line in any given
geometry. A geodesic curve is a curve whose tangent vector’s covariant gradient is zero.
The covariant derivative of the metric tensor is zero, because it is zero in the local inertial frame
where all its first derivatives are zero. The reason for the last statement is that gµν ’s correspond
to ΦN ewt (in weak field limit), and potential gradients are zero in inertial frames.
5.6
Motion of a collection of particles: characterizing curvature
The basic idea can be summarized by a simple example: two particles, displaced from each other
parallel to the ground, and freely falling in Earth’s gravitational field, will (in addition to approaching Earth) also come closer to each other. If the two particles are arranged along a vertical line,
they will move further apart as they both approachthe ground. This relative motion of particles is
the way to tell that gravity is present; acceleration towards Earth of just one of the particles is not
enough. This coming together (or separating) of particles is described by tidal forces, which are
2nd derivatives of the potential,
d2 ∆xi
∂2Φ
=
−Σ
∆xj
j
dt2
∂xi ∂xj
(64)
where ∆xi is the separation between two neighboring test particles. (Ignore that the indeces in
this expression don’t quite do the right thing.) This is called the equation of Newtonian deviation.
In GR the equivalent tidal forces are described by this geodesic deviation equation:
D2 ∆xα
dxβ dxγ
α
=
R
∆xµ
βµγ
Dτ 2
dτ dτ
(65)
Since tidal forces can be detected from any reference frame, and cannot be made to vanish they
α
are the true measure of gravity. Rβγβ
is the Riemann curvature tensor, a rank 4 tensor, and is a
17
function of the metric tensor and its first and second derivatives only, or alternatively, as a function
of Γ’s and their first derivatives.
Here’s an intuitive way of understanding why the curvature tensor is needed and why it has
rank 4. The geometry is specified by the metric tensor. However, gµν ’s are coordinate dependent
(many different sets of gµν ’s can describe the same space.) whereas the curvature of a given space
is an intrinsic property of that space and is coordinate independent. So we need to come up with a
tensor that will describe the intrinsic curvature of space regardless of how we draw the coordinate
axes in that space. The 1st derivatives of gµν ’s by themselves are not useful because they vanish
in inertial frames; in the Newtonian analog these correspond to the spatial gradients of Φ, i.e.
the gravitational force. So we have to use the second derivatives of gµν ’s. Second derivatives of
the potential are related to the tidal forces, and the matter density in the Newtonian analoque,
Σi ∂ 2 Φi /∂x2i = ∇2 Φ ∝ ρ, and do not go away as long as there are sources of gravity. Each order of
differentiation introduces an additional index, so the Riemann curvature tensor has to be rank 4.
5.7
Field equations
Our goal is to incorporate gravity into relativity; in particular we are looking for equation(s) that
would replace Poisson equation,
∇2 Φ = 4πGρ,
(66)
that connects the potential which characterizes gravity (LHS) with density, the source term that
generates gravity (RHS). Relating these is the big problem that took Einstein a long time to solve.
In GR good candidates for the LHS would contain some function of the Riemann curvature tensor,
while the RHS should contain the source term, i.e. some function of the stress-energy tensor.
Einstein arrived at the right equations (as far as we know by testing these against experiments)
by simple arguments plus some guesswork. Guesswork was involved because it is a new piece of
physics that was being sought; nothing that can be derived from existing physical laws would be
new physics. Here are some helpful hints, i.e. conditions that have to be satisfied by the field
equations:
• (1) Only tensors should go into the equation because the whole equation should remain form
invariant under generalized coordinate transformations.
• (2) Einstein limited his search to tensors that contain only up to second order derivates of
the metric tensor. Riemann curvature tensor is an excellent candidate.
• (3) Because stress-energy tensor which will go in to the RHS of the field equation is a
rank 2 tensor, the LHS should also contain only 2nd rank tensor(s). Because of the various
µ
symmetries of Rνσδ
(only 20 of its 44 = 256 components are independent) only two unique
rank 2 tensors can be constructed out of it, the Ricci tensor Rµν and the Ricci scalar R, which
when multiplied by the metric tensor gives rank 2 tensor, so the LHS will be some linear
combination of Rµν and Rgµν . Ricci tensor can be expressed as
Rµν = Γα µν,α − Γα µα,ν + Γα βα Γβ µν − Γα βν Γβ µα .
(67)
Ricci scalar can be obtained as the trace of the Ricci tensor:
R = g µν Rµν
(68)
• (4) We saw earleir (in the case of SR, through the GR version can also be used)) that in the
absence of external forces or sources the divergence of the stress-energy, which is on the RHS
18
of the field equation, is 0. Therefore the covariant derivative (in the case of GR) of the LHS
must also be 0. This condition fixes the constants multiplying the two terms on the LHS.
• (5) When all the components of Tµν are zeros then the geometry of space-time must be flat,
i.e. zero space-time curvature, and zero Ricci tensor.
All these conditions lead to a unique solution:
1
Rµν − Rgµν = 8πGTµν
2
(69)
The constant 8πG comes from comparison with the result in the Newtonian limit, and G is determined from experiments.
If condition (5) above is not applied then we are free to add a constant to the equation, −Λgµν ,
and (1)-(4) conditions will still be satisfied. In particular (4) is satisfied because the covariant
derivative of the metric tensor is zero. Comparing the form of gµν to the form of the stress energy
tensor of a perfect fluid in its rest frame, we conclude that this extra term must satisfy ρ = −p, i.e.
must have negative pressure, because there is no such thing as negative mass. These days the Λ
term is written on the RHS of eq. 69, and is considered to be a contribution to the stress-energy.
What is its physical origin? It is generally thought to be due to the vacuum energy density, though
other interpretations exist as well.
5.8
Non-linearity of GR
A major difference between GR and Newtonian gravity is that the latter is a linear theory, while
the former is not: if there are two sources of gravity, like two stars, than in Newtonian gravity one
would linearly superimpose the two gravitational potentials and there is no additional ‘interaction’
term. In GR there is: the energy in the gravitational field acts like additional source of gravitation
(and changes with the separation of the two stars). Einstein’s field equations are the answer to
the complex problem of how to include this effect. Because of their non-linearity, ‘solving’ the
field equations proceeds by a different route than usual equation-solving: one has to guess at the
right form of Tµν and the corresponding Rµ νλκ and see if they satisfy the field equations. Exact
solutions are rare: Schwarzschild, Kerr, Friedmann, and Weak Field equations. All these are highly
symmetric. Friedmann equation governs the evolution of a perfectly smooth Universe.
19