General Relativity Notes

Table of Contents

Preface

These are some notes on general relativity. As in most of my pages, this is not meant to be comprehensive, but rather, to cover the basics and select topics that I feel I can better remember if I write them down.

As usual, explanatory notes can be accessed by clicking the appropriate link, often labeled “here.” To close the explanatory note, click the link again.

I. Introduction

Albert Einstein published his well-known paper on special relativity in 1905, a paper that described how lengths and time intervals vary depending on one’s frame of reference. However, this theory applies only to inertial frames of reference (i.e., frames of reference involving bodies that are at rest or moving at constant velocity). He wished to extend this theory to non-inertial (i.e., accelerating) frames.

I.A Equivalence Principle

His path to accomplishing this began in 1907 when, while sitting in his office in Bern, Switzerland, he had what he called “the happiest thought of [his] life.” He realized that a person in free fall from a rooftop does not feel his/her own weight and that objects that are falling with the person will continue to fall along side the person until they all hit the ground. The concepts embodied in this thought experiment are referred to as the equivalence principle and have been presented in several alternative forms over the years, most notably as an observer in an elevator that’s been dropped off a cliff, plummeting toward earth under the influence of gravity with an object (such as a ball) falling beside the observer. Such an arrangement is depicted in figure I.A.1.

Elevator in free fall
Figure 1.A.1

In the diagram, we see an elevator in free fall with an observer and 2 red balls “floating” within it. We know from Newtonian physics that:

    \[ F=ma \quad \text{(I.A.1)}  \]

where

F is the force on an object
m is the object’s mass, sometimes referred to as the inertial mass
a is the object’s acceleration

The force, in this case, is the force of gravity which, in Newtonian physics, is given by:

    \[ F = G\frac{mM}{r^2} \quad \text{(I.A.2)}   \]

where

F is the gravitational force between 2 objects
G is Newton’s gravitational constant = 6.674×10^{−11} m^3⋅kg^{−1}⋅s^{−2}
m and M are the masses of the object, sometimes referred to as the gravitational mass

Here, I use m and M for the masses in the Newton’s equation of gravitation because, in our example, I’ll let M represent the mass of the earth and m the mass of the smaller objects (i.e., the elevator, the observer, the balls).

Now we substitute eq. (I.A.2) into eq. (I.A.1). We get:

    \[  ma = G\frac{mM}{r^2} \quad \text{(I.A.3)}  \]

We can cancel m on both sides. This gives us:

    \[ a = G\frac{M}{r^2} \quad \text{(I.A.4)}  \]

Now let’s examine the units of the right side of eq. (I.A.4):

    \[ \frac{\cancel{L^3}}{\bcancel{M} \cdot T^2}\frac{\bcancel{M}}{\cancel{L^2}} = \frac{L}{T^2} = g \quad \text{(I.A.5)} \]

Here,

M is mass
L is length
T is time

We’re left with a term that has the units of acceleration, a term I’ve called g, the acceleration due to gravity. Thus, a, the acceleration of the object affected by gravity equals the acceleration created by the gravitational force, g. Note that these accelerations do not depend on the mass of the falling object, a well-known experimental result that legend has it Galileo established by dropping things from the leaning tower of Pisa. A basic principle we can extract from this little exercise is that inertial mass is equivalent to gravitational mass.

This seemingly simple idea has some profound consequences. First off, it explains why our observer in the elevator and objects next to them float weightlessly in the elevator as they fall: namely, because the frame of reference of the elevator is accelerating, and any observer in their own frame, don’t think they’re moving. Thus, over a sufficiently small region of spacetime, the frame of reference of an object in free fall is equivalent to an inertial frame of reference.

Accelerating rocket to show acceleration equals gravity
Figure I.A..2

To see additional consequences of the equivalence of inertial and gravitational mass, let’s expand our thought experiment and imagine an observer in a rocket with a (red) ball at the rocket’s ceiling (figure I.A.2). Of course, if the rocket is moving at constant velocity (to an observer outside the rocket) or if the rocket is in free fall, the observer and the ball do not move. But now imagine that the rocket accelerates upward (relative to the observer in the rocket, to the ball in the rocket and to an observer outside the rocket) at 9.8 cm/sec, the acceleration known to occur due to the earth’s gravity . The observer outside the rocket sees 1) the rocket accelerating upward 2) the observer in the rocket plastered to the floor of the rocket and 3) the ball remaining at the same position while the floor of the rocket moves upward to meet it. By contrast, the observer in the rocket feels like he’s being pulled down against the floor of the rocket in a manner identical to being pulled down toward the earth while standing on the earth’s surface. In addition, the observer in the rocket sees the ball fall from the ceiling to the floor at a rate identical to the rate it would fall from the same height to the surface of the earth.

From this, we can conclude – like Einstein did – that acceleration and gravity are the same thing.

Light ray bent in accelerating rocket.
Figure I.A..3

A similar phenomenon can be seen for light, as shown in figure I.A.3. In this figure, a light beam is shot in a rightward direction (let’s call it the x-direction) across an accelerating rocket. To an observer outside the rocket, the light beam appears to go straight across the rocket but the rocket’s floor gets increasingly close to the light beam as the rocket moves. On the other hand, an observer inside the rocket sees the light beam follow a parabolic path from the upper left to the lower righthand portion of the rocket. That is, the light gets “bent” in the accelerating frame of reference. And since Einstein considered acceleration and gravity to be equivalent, he concluded that gravity should “bend” light rays.

A more quantitative analysis of this problem yields further insight into what’s happening. In an inertial frame of reference, with the rocket moving in the z-direction with constant velocity, v, relative to an observer outside the rocket, we can write the following equations:

z^{\prime} = z -vt \quad \text{(I.A.6)}
t^{\prime} = t

where the primed frame is that of an observer outside the rocket and the unprimed frame is that of an observer inside the rocket. Obviously, in special relativity, the expressions for spatial and time coordinate transformations would be more complex but this Newtonian view will suffice for our purposes here. We can also draw a spacetime diagram representing this inertial coordinate system, as in figure I.A.4.

Spacetime diagram of inertial coordinates
Figure I.A.4

The dotted lines represent the worldlines of objects within the rocket from the point of view of an observer in the rocket. Such an observer doesn’t thing he’s moving in space, thus worldlines in this frame of reference are straight vertical lines (i.e., the observer inside the rocket sees himself and objects inside the rocket as moving in time but not space). By contrast, the observer outside the rocket sees the rocket and its contents moving in both space and time. Worldlines for this observer are depicted as diagonal solid lines. Again the lines are straight.

Now let’s write equations for the accelerating rocket. Specifically, we’ll write Galilean transformation equations relating the primed coordinates of an observer outside the rocket and the unprimed coordinates of an observer inside the rocket:

z^{\prime} = z - \frac12 gt^2 \quad \text{(I.A.7)}
t^{\prime} = t

If we plot the primed coordinates on a spacetime diagram, we get something like this:

Spacetime diagram of accelerated motion
Figure I.A.5

Instead of straight lines we get curved lines (i.e., parabolas) for the z^{\prime} coordinates.

Furthermore, we can write a transformation equation for the light ray in an accelerating elevator. In this case, an observer outside the rocket is stationary, in the z-direction, with respect to the light beam. This observer would write the following equations:

x = ct
z = 0

The observer in the elevator, however, is accelerating with respect to the light beam in the z-direction. Because of this relative acceleration, we’ll consider this observer’s coordinates to be the primed coordinates. This observer would write:

x^{\prime} = ct
z^{\prime} = - \frac12 gt^2  \quad \text{(I.A.8)}

That means that:

    \[  t^2 =  \frac{(x^{\prime})^2}{c^2} \quad \text{(I.A.9)} \]

Plugging eq. (I.A.9) into eq. (I.A.8), we obtain:

    \[ z^{\prime} = - \frac12  g \frac{(x^{\prime})^2}{c^2} \quad \text{(I.A.10)}  \]

When we plot this equation, we get something like:

Light path due to acceleration
Figure I.A.6

To Einstein, it was no coincidence that the path of light in an accelerated frame of reference and the coordinates in an accelerated frame are both curved in the same way (i.e., both are parabolas). Instead, it suggested to him that the light (or any other object for that matter) was simply following the shortest path along curved coordinates, coordinates that were the result of the acceleration. And because he presumed acceleration and gravity to be equivalent, he postulated that 1) gravity would also “bend” spacetime coordinates and 2) that light (and other objects) would follow these bent coordinates. Thus, it would appear that a massive object was exerting a gravitational force that was bending the light path (or the path of some other object), but really, the light (or the object) was just taking the shortest course it could along the bent coordinates. Such paths are referred to as geodesics; we will discuss them in greater detail later.

At any rate, Einstein developed these ideas further with another thought experiment. He imagined an observer (let’s call her observer A) at the center of a disc and another observer, observer B, at the outside of the disc. Across the diameter of the disc, we lay measuring rods. Let’s say it takes 10 of them to span the diameter of the disc. We also lay measuring rods around the circumference of the disc with the disc at rest with respect to the 2 observers. It takes 31.4 measuring rods to span the circumference of the disc.

Now we set the disc rotating. Each observer, in their own frame of reference – like any observer in their own frame of reference – sees themself at rest. However, observer A at the center of the rotating disc sees the observer B along the circumference of the disc as moving. From the theory of special relativity, we know that, to observer A, the measuring rods on the circumference of the disc undergo length contraction. Thus, it takes more than 31.4 measuring rods to span the circumference of the disc when the disc is in rotating motion relative to observer A. I don’t believe Einstein actually discussed this – in the context of this thought experiment – but per the theory of special relativity, to observer A, a clock on the circumference of the disc should tick slower than a clock at the center of the disc (i.e., time dilation takes place).

It has been noted that if such a disc were made of rigid material, it might break apart since, at each circumference level away from the disc center, differing degrees of circumferential length contraction should occur when the disc begins rotating while no length contraction should occur in the radial direction. Several solutions to this conundrum have been proposed in the literature such as, perhaps, the disc is made of malleable, fluid-like material or the disc is made of molten material which is allowed to cool only after the disc begins rotating. At any rate, this is a side argument that’s irrelevant to the point Einstein was trying to make.

Of course, observer B on the circumference of the rotating disc would also experience an outward (centrifugal) acceleration which would feel to him like gravity was pulling him outward. And since, in Einstein’s view, acceleration and gravity are equivalent, he reasoned that gravity should induce length contraction and time dilation as well.

So we have this idea that gravity has something to do with spacetime geometry with balls and light rays following curved paths that resemble the curved worldlines one sees with accelerated reference frames. And we also have this notion that gravity appears to cause length contraction and time dilation. If you’re like I was, you may be wondering, What’s the link between length contraction/time dilation and curved spacetime. Figure 1.7 provides one way to visualize this relationship.

How time dilation and length contraction translate to spacetime curvature
Figure I.A.7

Figure I.A.7a, shows the time axes of 2 objects as seen by an observer very far away from a massive body. The blue time axis, t, is that of the far away observer, and thus, measures proper time for that observer. The red time axis, t^{\prime}, corresponds to an object close to and moving toward the massive body, as seen by the far-away observer. An observer accompanying the object close to the massive body would – like any observer in their own frame of reference – think they aren’t moving. To this observer, each point in spacetime has, associated with it, a small set of coordinate axis which looks like flat (Minkowski) space (depicted in figure I.A.7a as little black axes at right angles to each other). The axes represent tiny local spacetime diagrams. The axis tangent to the red curve is the time axis. The axis perpendicular to the time axis is a spatial axis. In this setup, we’re only considering one spatial direction which we’ll call the z-axis. The dotted lines extending outward from the origin of each set of axes represent the path of light, oriented at 45°. The distance between the origin of each set of axes on the red time axis is equal to the tick marks on the blue axis, that distance representing 1 unit of time on both the blue and red axes. To see how the blue and red time axes relate to each other, an object moving along the red time axis sends out a light signal to the far-away observer from each set of axes, each pulse being – to the observer near the massive body – one time unit apart.

What time interval between these light pulses does the far-away observer detect? This is shown by the green dots at the intersection between the the path of the light rays (black dotted lines) and the blue time axis. What we see is that, as the object near the massive body gets closer to that body, the light signals it sends are detected by the far-away observer at time intervals that get farther and farther apart. If the path followed by the object close to the massive object had been straight, then the time intervals between light pulses would have been equally spaced, with length equal to the time intervals between the blue tick marks on the far-away observer’s time axis. Thus, we can see how time dilation can be thought of as a result of curved spacetime.

What about length contraction? This is addressed in figure 1.7b. In that figure, the blue axis (labeled z) is the spatial axis of an observer far from a massive body. The red curved axis (labeled z^{\prime} is the spatial axis of an object moving toward a massive body, as seen by the far-away observer. Of course, an observer moving with the object near the massive body – like any observer in their own frame of reference – thinks they’re not moving. They see space as flat and see the length interval between points on their spatial axis as being equidistant, each 1 length unit apart, the same length as the blue tick marks on the far-away observer’s (blue) spatial axis. To communicate his position to the far-away observer, the observer near the massive body sends a light signal from each black dot, to the far-away observer. The light moves in a straight line to the far-away observer, being received by the far-away observer, on the blue axis, at the green dots. Whereas the observer near the massive body measures the distance between each black dot as 1 length unit, the far away observer sees the units of length as being less than one unit, with that length interval decreasing as the object close to the massive body gets closer to that source of mass. That is, the far-away observer experiences length contraction. If the spatial axis in the frame of reference of the object near the massive body had been straight, then the black dots on that spatial axis would have coincided exactly with the blue tick marks on the blue axis. Therefore, we can see how length contraction can be thought of as the result of curved spacetime.

So we can see how Einstein’s thought experiment about objects in free fall that led to the simple idea that inertial mass equals gravitational mass got the ball rolling toward a theory – general relativity – that revolutionized scientific thought. Before we move on, we should note that this simple idea has a name – the equivalence principle – and that this principle has been expressed in 3 forms:

  • Weak Equivalence Principle: Over sufficiently small spacetime regions, the motion of freely falling objects due to gravity cannot be distinguished from uniform acceleration.
  • Einstein Equivalence Principle: In sufficiently small regions of spacetime, we. can find a representation such that the laws of physics reduce to those of special relativity.
  • Strong Equivalence Principle: Gravity is ultimately a form of energy, and as such, behaves like mass in a gravitational field.

We won’t talk much about the strong equivalence principle in this article. We’ve already discussed the weak equivalence principle in some detail . At this point, we need to say a few words about the Einstein equivalence principle, mainly because it will be a good lead-in to our next section.

To illustrate the Einstein equivalence principle, consider the rocket ship containing an observer and a ball we talked about earlier. If the observer and ball took on a frame of reference where they accelerated with same constant acceleration as the rocket, then the observer would no longer be plastered to the floor of the rocket and the ball would no longer fall. They would float weightlessly in the rocket like an observer and a ball in a rocket in inertial motion (relative to some observer watching both rockets) in far outer space, far away from any massive objects. We know the spacetime in such an inertial frame of reference is the spacetime of general relativity – Minkowski space. And because the observer and ball in the uniformly accelerating frame behave like the observer and ball and rocket in outer space, we can surmise that the spacetime governing the uniformly accelerating frame is also that of special relativity. Thus, a coordinate transformation has been found that has changed things from a uniformly accelerating frame with behavior that looks like motion due to gravity to one with behavior that looks like special relativity.

Another way of thinking about this is to consider the earth whose surface is spherical – clearly curved. However, if we look at a small patch of the earth, it looks flat to us. Similarly, in curved spacetime (like spacetime axes in the region of a massive object), if we examine a small enough region, it will look flat (analogous to the flat coordinate system of special relativity). For those who are interested, I’ll offer a detailed proof of this but I’ll have to defer this proof until I’ve introduced Christoffel symbols.

I.B Tidal Forces

At any rate, we keep saying that acceleration equals gravity and the laws of physics are equivalent to general relativity (which occurs in flat Minkowski space) “in sufficiently small regions of spacetime.” This is because, over larger areas where gravitational effects are in play, this is not true. These are called tidal effects (or tidal forces) and they will ultimately allow us to differentiate accelerated motion in flat space from accelerated motion due to gravity.

Renowned physicist Leonard Susskind, in his lecture series and book on general relativity, illustrates these effects by an example that he calls the 2000 mile man.

Tidal forces
Figure I.B.1

Figure 1.8a shows the 2000-mile-long man floating in space far from any massive body. Figure 1.8b shows the 2000-mile-long man near a massive body like the earth. Looking at things from a Newtonian point of view, the earth exerts a gravitational force, g(r), on the man – a force that’s a function of the distance from the center of the earth.

This gravitational force is oriented at an angle with respect to the long axis of the man. Thus, the man feels 2 components of force: one downward along his long axis and another directed inward toward his center. The latter inwardly directed forces squeeze him together. Meanwhile, the longitudinal forces vary from head to foot, the man experiencing a higher gravitational force near his feet (nearer the center of the earth) than at his head. This relative difference in longitudinal forces causes the man to stretch in the longitudinal direction.

When an accelerating frame of reference causes what appears to be a gravitational effect, we can always find a global coordinate transformation (i.e., a Lorentz transformation if we’re working in special relativity) that “gets rid” of the effects of gravity leading to a spacetime that looks flat.

On the other hand, this is impossible in a real gravitational field like that shown in figure 1.8. Why? Because the the gravitational field varies as a function of position, and thus, acceleration varies with position. If we try to make a global coordinate transformation that gets rid of the gravitational effect at one place (e.g., at the head of the 2000 mile man) then it won’t get rid of the gravitational effect in another place (e.g., at the feet of the 2000 mile man).

An even more extreme example would be to consider a person standing on the earth in Seville, Spain and another in Aukland, New Zealand, 2 locations separated by a straight line through the center of the earth. Both are influenced by earth’s gravity, but in opposite directions. Thus, if we make a global coordinate transformation that gets rid of the gravitational effect in Seville, it will double the gravitational effect in Aukland, and vice versa.

Einstein realized that frames of reference where there is no gravitational acceleration are equivalent to inertial frames of reference like the Minkowski space of special relativity, which we know is flat. By contrast, frames of reference where there is gravitational acceleration are associated with curved spacetime. We’ll defer proof of this until we acquire some additional mathematical tools, but for now, suffice it to say that we can always find a local coordinate transformation that makes a small region of curved spacetime flat. However, we can never identify a global coordinate transformation to make all of curved spacetime flat (proof ), and this inability to find such a global coordinate transformation is what tells us that a true gravitational effect (not just acceleration in flat space) and, thus, curved spacetime, is present.

How, then, can one determine whether true gravity/curved spacetime or acceleration in flat spacetime exists? Testing all possible coordinate transformations by trial and error could give us the answers but this, of course, is impractical. Einstein realized that he needed some novel method to make the differentiation. For this task, he turned to mathematician colleagues like Marcel Grossman who helped him find such a method: differential geometry. We’ll see how this helped to solve Einstein’s dilemma in the next section.

Curvature

Everything we do in this section is geared toward developing a mathematical method of differentiating flat spacetime from curved spacetime. However, this will involve several steps. To understand this process, the reader will need a basic understanding of tensors and Einstein summation notation. For those readers who need to learn about these topics, I have a tensors page on this website that gives an introduction to these subjects. There are several more comprehensive treatments on these topics elsewhere on the web. I list a few references to such sites in my tensors page. I’ll only provide an introduction to several of the topics I discuss in my tensor page but go into depth on topics I haven’t covered elsewhere.

II.A Metric Tensor

We talked previously about Einstein’s thought experiment involving the rotating disc which showed how accelerating frames of reference, and thus gravity, can give rise to time dilation and length contraction. We then discussed how the variability in the magnitude of time and spatial units is related to curved spacetime. The metric tensor is what tells us the lengths of line elements, and therefore, the magnitude of the time and spatial units in spacetime.

Specifically, the length of a vector is given by the dot product with itself:

\displaystyle \vec{V} \cdot \vec{V} = V^i e_i \cdot V^j e_j
      =(e_i \cdot e_j) V^i \cdot V^j \quad \text{eq. (II.A.1)}

Suppose we have an infinitesimal displacement vector d \vec{r} =  dx^i e_i. To find its length, ds, we take the dot product with itself:

\displaystyle ds^2  = dx^i e_i \cdot dx^j e_j
     =(e_i \cdot e_j) dx^i \cdot dx^j
     =g_{ij} dx^i \cdot dx^j \quad \text{eq. (II.A..2)}

To get the length, ds, we take the square root of eq. (II.A.2).

Notice that, in eq. (2.1.2), we’ve defined the entity g_{ij} = e_i \cdot e_j

where, if we’re working in 2 dimensions, g_{ij} could be expressed as a matrix:

g_{ij} = \begin{bmatrix} \vec{e}_1 \cdot \vec{e}_1 & \vec{e}_1 \cdot \vec{e}_2 \\ \vec{e}_2 \cdot \vec{e}_1 & \vec{e}_2 \cdot \vec{e}_2 \end{bmatrix} = \begin{bmatrix} g_{11} & g_{12 }\\ g_{21} & g_{22} \end{bmatrix} \quad \text{eq. (II.A.3)}

As it applies to physics, we can imagine that a tiny coordinate axis exists at every point in spacetime and that a metric exists at each point where the metric gives the relative lengths of the basis vectors in these infinitesimal coordinate systems. In flat spacetime, the metric is the same everywhere. An example is the spacetime of special relativity, Minkowski space, where the metric is the same at every point and is given by:

    \[ \eta = \begin{bmatrix} -1&0&0&0 \\ 0&1&0&0 \\ 0&0&1&0 \\ 0&0&0&1  \end{bmatrix} \quad \text{or} \quad  \begin{bmatrix} 1&0&0&0 \\ 0&-1&0&0 \\ 0&0&-1&0 \\ 0&0&0&-1  \end{bmatrix} \quad \text{eq. (II.A.4)}   \]

In curved spacetime, this is not the case. The metric varies from point to point. This might cause one to wonder, Is calculation of the metric a way to check to see if spacetime is flat? (i.e., if the metric is the same everywhere, the space is flat; if not, it isn’t). It turns out that we can always find a set of coordinates where g_{\mu \nu} = \eta_{\mu \nu} and \displaystyle \frac{\partial g_{\mu \nu}}{\partial x^{\alpha}} = 0 (i.e., where the metric is not changing to first order. However, if the space is curved, we can never find a coordinate system where \displaystyle \frac{\partial^2 g_{\mu \nu}}{\partial x^{\alpha} \partial x^{\beta}} = 0. A proof of this can be found :

Thus, we can’t use the metric alone to determine whether or not a space is flat. As we’ll see later, to do this, we’ll have to come up with a mathematic object that incorporates the second derivative of the metric.

There are a few properties of the metric that we should briefly mention here:

  • The metric tensor is a symmetric tensor.
  • The covariant derivative of the metric tensor is zero: \Delta_{\alpha} g_{\mu \nu} = 0.
  • The metric tensor can be used to raise and lower indices on tensors. For example: D^i_{jk}=g_{js} D^{is}_k.

II.B Covariant Derivative

The first topic we need to touch upon on our way to understanding curvature is the covariant derivative. Briefly, if we are working in Cartesian coordinates, we can take derivatives and partial derivative in the manner that’s taught in undergraduate calculus classes i.e., by taking the derivative of vector components. This works because, in Cartesian coordinates, basis vectors are the same everywhere. However, in non-Cartesian coordinates in flat spacetime (e.g., non-orthogonal axes, Minkowski spacetime), basis vectors are not constant. The problem is even more severe in curved spacetime where basis vectors vary from point to point. We want our derivative operation to return a tensor, because the laws of physics should be invariant in all frames of reference, and the only way that this can happen is if we use tensor equations to describe them. And the only way to return a tensor from our derivative operation is to take basis vectors into account.

Let’s look at a vector

\vec{V} = V^1e_1  + V^2e_2 + V^3e_3 = V^ie_i \quad \text{eq (II.B.1)}

where

V^i are the components of the vector
e_i are the basis vectors

If we want to take the derivative of this vector \vec{A}, since both the components and basis vectors can vary, we need to use the chain rule:

\displaystyle \frac{d \vec{V}}{dx}=\displaystyle \frac{(\partial V^ie_i)}{\partial x^j}

    =\displaystyle \frac{\parital V^i}{\partial x^j}\vec{e_i}+V^i\displaystyle \frac{\partial \vec{e}_i}{\partial x^j} \quad \text{eq (II.B.2)}

The result, referred to as the covariant derivative, is a tensor.

II.C Christoffel Symbols

The righthand-most term in eq. II.B.2 is frequently expressed in the following way:

\displaystyle \frac{\partial \vec{e}_i}{\partial x^j} = \Gamma^k_{ij}\vec{e}_k \quad \text{eq (II.C.1)}

And so, eq. II.B.2 becomes:

\displaystyle \frac{\partial \vec{V}}{\partial x^j}=\frac{\partial V^i}{\partial x^j}e_i+V^i\Gamma^k_{ij}\,e_k \quad \text{eq (II.C.2)}

Changing some indices and rearranging, we get:

\displaystyle \frac{\partial \vec{V}}{\partial x^j}=\left(\frac{\partial V^i}{\partial x^j}+V^k\Gamma^i_{kj}\right)e_i\quad \text{eq (II.C.3)}

The term in parentheses is called the covariant derivative:

\displaystyle \frac{\partial V^i}{\partial x^j}+V^k\Gamma^i_{kj}\quad \text{eq (II.C.4)}

It represents the component of the derivative of vector \vec{V} in the i direction and – as opposed to Christoffel symbols alone – it is a tensor.

The meaning of the Christoffel symbol is shown in figure II.C.1.

Christoffel symbol explanation
Figure II.C.1

There are 2 types of Christoffel symbols, depending on how they’re expressed in terms of the metric. Christoffel symbols of the first type are written as:

\Gamma_{lij}=\displaystyle \frac12 \left[\displaystyle \frac{\partial g_{li}}{\partial x^j} +  \frac{\partial g_{lj}}{\partial x^i} + \frac{\partial g_{ij}}{\partial x^l}\right] \quad \text{eq (II.C.5)}

Christoffel symbols of the second type are written as:

\Gamma^l_{ij}=\displaystyle \frac12 g^{kl}\left[\frac{\partial g_{ik}}{\partial x^j} +  \frac{\partial g_{jk}}{\partial x^i} - \frac{\partial g_{ij}}{\partial x^k}\right] \quad \text{eq (II.C.6)}

Derivations of eq. (II.C.6) can be found here:

It’s the Christoffel symbols of the second type, the one most often used in general relativity, that we’ll be using in this article.

It should be noted that the Christoffel symbol is a kind of connection coefficient. There are many ways we can define connection coefficients. The Christoffel symbol of the second type, as defined above, is also called the Levi-Civita connection.

II.D Parallel Transport

We would expect that, to take the covariant derivative in a curved space, we could simply take the difference between 2 vectors, \eval{V}_P and \eval{V}_Q, separated by a short distance, \Delta \lambda, then take the limit as \Delta \lambda goes to zero:

\displaystyle \frac{DV}{dt} = \lim_{\Delta \lambda \rightarrow 0} \frac{\eval{V}_Q - \eval{V}_P}{\Delta \lambda} \quad \text{eq (II.D.1)}

However, we can’t do this because \eval{V}_Q - \eval{V}_P is undefined. This is due to the fact that, in order to compare 2 vectors, they have to be in the same tangent plane. But the tangent spaces at points P and Q in a curved space are not the same, as shown in figure II.D.1.

Tangent spaces in flat and curved spaces
Figure II.D.1

We can show this algebraically as well. We know we can express vectors as a sum of components and basis vectors:

\vec{V} = V^u e_u \quad \text{eq (II.D.1)}

In flat space, we can write the quantity \eval{V}_Q - \eval{V}_P as:

\eval{V}_Q - \eval{V}_P = \eval{V}_Q \eval{e_u}_Q - \eval{V}_P\eval{e_u}_P \quad \text{eq (II.D.2)}

Because, in flat space, the tangent space and basis vectors are the same everywhere, we have \eval{e_u}_P = \eval{e_u}_Q =. Therefore:

\eval{V}_Q - \eval{V}_P = (\eval{V}_Q - \eval{V}_P)e_u \quad \text{eq (II.D.3)}

and we can subtract components in the numerator of our expression for the derivative in eq. (II.D.1).

However, if coordinates are curved, basis vectors (and in the case of curved space, tangent planes) aren’t the same at the points P and Q. Thus, we can’t just subtract components in the numerator of the expression for the derivative.

In order to compare 2 vectors in curved space, we need to transport one vector to the tangent space of the other vector, then compare them. How do we transport them? By a process called parallel transport.

What does it mean to parallel transport a vector? In words, it means “as we progress along a path (e.g., worldline or coordinate line), keep the vector parallel to itself.” This translates, in mathematical terms, to “the covariant derivative of the vector, in the direction of the tangent vector, is zero” which means that, when we move along the tangent vector (i.e., move along a worldline or other parameterized curve because the tangent vector is everywhere parallel to the curve), the vector does not change. The equation that expresses this idea is:

U^{\beta}\displaystyle \nabla_{\beta} V^{\mu} = \frac{D V^{\mu}}{d \lambda} = \frac{d V^{\mu}}{d \lambda} + \Gamma^{\mu}_{\alpha \beta}V^{\alpha} \frac{d x ^{\beta}}{d \lambda}= 0 \quad \text{eq (II.D.4)}

where

\displaystyle U^{\beta}\nabla _{\beta} = \frac{\partial x^{\beta}}{\partial \lambda}\frac{D}{\partial x^{\beta}} \quad \text{eq (II.D.5)}

In pictorial form:

Figure II.D.2

In figure II.D.2a, we see parallel transport in flat space. The vector just points in the same direction at every point along a closed loop and it ends up pointing in the same direction that it started at the apex of the loop.

In figure II.D.2b, we see parallel transport in a curved space, the surface of a sphere. The way we can visual this is to imagine that the vector we’re parallel transporting is a spear being held by a person walking along the curve. He starts at the top of the closed curve and walks along the curve with the spear pointing along the direction he’s walking. When he reaches the equator, the curve he is traversing courses to the right. In an attempt to keep the spear pointing in the same direction, the person keeps the spear pointing down (the same direction it was pointing when it reached the equator). To do this, he walks sideways and rightward. At the point where the curve courses upward from the equator, again, keeping the spear pointing down, the person carrying the spear keeps it pointing down but walks backward. As he walks backward, the spear remains in the tangent plane but the tangent plane gets “tilted backward” as he walks. When he reaches the starting point, at the apex of the curve, the spear (i.e., the orange vector) remains in the tangent plane (as it does all along the curve) except that it points in a direction different from that in which it pointed at the start of its journey (green vector). Thus, despite “trying to keep the spear (vector) parallel” as it was transported around the closed loop, when it reaches the origin of the trip, the spear (vector) points in a different direction that in which it started because the direction of the basis vectors changed as the curve was traversed.

I go through this little exercise here because parallel transport will play a central role in understanding geodesics and the Riemann tensor, the ultimate tool we’ll us to determine spacetime curvature.

II.E Lie Derivative

Lie transport is an alternative way to look at transporting vectors and gives rise to the Lie derivative, Lie bracket, Killing vector fields and spacetime symmetries. While these topics are useful for advanced applications in general relativity, they aren’t immediately important for what I’m covering in this article. However, in my first go-round in learning general relativity, I admit I was befuddled by Lie derivatives and their associated topics so I’ve decided to discuss them here, largely to enhance my own understanding. Because they are a bit of an aside, I’ll provide links which can be clicked to see the information on these subjects, if the reader so desires.

II.E.1 Derivation

In the covariant derivative, we use parallel transport to transport one vector to another so they can be compared. The mathematical object that facilitates this transport is the Christoffel symbol which is a function of the metric. But there are other ways to transport vectors so they can be compared. One is Lie transport. A vector undergoing Lie transport follows the “flow” of integral curves created by a vector field. The comparison it allows is called the Lie derivative. Figure 1 depicts Lie transport along integral curves and defines the quantities we will need to find the Lie derivative.

Lie transport along integral curves
Figure X

As shown in figure 1a, we start with a vector field with tangent vectors (shown in green) that can be connected to constitute integral curves (also called flow lines). Figure 1b shows a small section of the vector field noted in figure 1a with tangent vectors (green), t, along 2 integral curves (blue) parameterized by a variable \lambda with coordinates X^{\mu}(\lambda).

We define a displacement vector, oriented along an integral curve parameterized by the variable \gamma, pointing in a direction defined by coordinates X^{\mu}(\gamma), with tangent vector V_P. The displacement vector extends from point P to point R, point R lying along one of t vector field flow lines. Its length is \Delta \gamma V_P (which is what we’ll call this vector).

We define another displacement vector along another flow line of vector field V with tangent vector V_Q. This displacement vector starts at point Q and has length \Delta \gamma V_Q so we’ll call it \Delta \gamma V_Q.

We define a third displacement vector from point P to point Q along the tangent vector between the 2 points, t_P. Its length is \Delta \lambda and we’ll call it \Delta \lambda t_P.

Finally, we construct a fourth displacement vector along the upper t flow line, beginning at point R and extending a distance \Delta \lambda along the tangent vector, t_R, on that flow line. We’ll call this displacement vector \Delta \lambda t_R.

We now transport vector \Delta \gamma V_P to point Q, dragging its base along \Delta \lambda t_P to point Q, its tip ending up at the tip of \Delta \lambda t_R. This is called Lie transport. We’ll call this transported vector \Delta \gamma V_{P \rightarrow Q}.

Now we subtract V_{P \rightarrow Q} from V_Q to get the Lie derivative.

Let’s make this a bit more mathematically precise.

First, t, the tangent vectors along the X^{\mu}(\lambda) and X^{\prime\,\,\mu}(\lambda) flow lines can be expressed as a differential operator:

t = \displaystyle \frac{\partial}{\partial \lambda} \quad \text{(1)}

The components of t are:

t^{\mu} = \displaystyle \frac{\partial X^{\mu}}{\partial \lambda} \quad \text{(2)}

That means that:

\Delta \lambda t^{\mu} = \Delta x^{\mu} = X^{\mu}_Q - X^{\mu}_P \quad \text{(3)}

Now we define the Lie derivative as:

\displaystyle \frac{"dV"}{d \lambda} = \displaystyle \lim_{\Delta \lambda \rightarrow 0} \frac{V_Q -V_{P \rightarrow Q}}{\Delta \lambda} \quad \text{(4)}

where, for the time being, we’ll use \displaystyle \frac{"dV"}{d \lambda} to refer to the Lie derivative.

Next, we multiply both sides of eq. (4) by \Delta \gamma. We get:

\displaystyle \Delta \gamma \frac{"dV"}{d \lambda} = \displaystyle \lim_{\Delta \lambda \rightarrow 0} \frac{\Delta \gamma V_Q -\Delta \gamma V_{P \rightarrow Q}}{\Delta \lambda} \quad \text{(5)}

We know that we the components of a displacement vector are just the difference between its coordinates at its tip and its tail. So:

\Delta \gamma V_{P \rightarrow Q} = (\underbrace{\cancel{X^{\mu}_P} + \Delta \gamma V_P^{\mu} + \Delta \lambda t_R^{\mu}}_{\text{Tip coordinates}}) - (\underbrace{\cancel{X^{\mu}_P} + \Delta \lambda t_P^{\mu}}_{\text{Tail coordinates}})

                   =\Delta \gamma V_P^{\mu} + \Delta \lambda t_R^{\mu} + \Delta \lambda t_P^{\mu} \quad \text{(6)}

Let’s look at the \mu^{th} component of eq. (5):

\displaystyle \Delta \gamma \left( \frac{"dV"}{d \lambda} \right)^{\mu} = \lim_{\Delta \lambda \rightarrow 0} \frac{\Delta \gamma V_Q^{\mu} - (\Delta \gamma V_P^{\mu} + \Delta \lambda t_R^{\mu} + \Delta \lambda t_P^{\mu})}{\Delta \lambda} \quad \text{(7)}

Let’s examine the term V_Q^{\mu} and t_R^{\mu}:

V_Q^{\mu} = V^{\mu}(X^{\alpha}_P) = V^{\mu}(X^{\alpha}_P + \Delta \lambda t^{\alpha}_P) \quad \text{expand in Taylor series}
                            =V^{\mu}_P + \Delta \lambda t^{\alpha}_P \partial_{\alpha} t^{\mu}_P \quad \text{(8)}

and

t_R^{\mu} = t^{\mu}(X^{\alpha}_R) = t^{\mu}(X^{\alpha}_P + \Delta \gamma V^{\alpha}_P) \quad \text{expand in Taylor series}
                         =t^{\mu}_P + \Delta \gamma V^{\alpha}_P \partial_{\alpha} t^{\mu}_P \quad \text{(9)}

We put the results of eq. (8) and eq. (9) back into eq. (7) and obtain:

\displaystyle \Delta \gamma \frac{"dV"}{d \lambda} = \lim_{\Delta \lambda \rightarrow 0} \frac{1}{\Delta \lambda} \Bigl[ \Delta \gamma V_Q^{\mu} - (\Delta \gamma V_P^{\mu} + \Delta \lambda t_R^{\mu} + \Delta \lambda t_P^{\mu}) \Bigl]

                   = \lim_{\Delta \lambda \rightarrow 0} \frac{1}{\Delta \lambda} \Bigl[ \Delta \gamma (V^{\mu}_P + \Delta \lambda t^{\alpha}_P \partial_{\alpha} t^{\mu}_P) -  \Delta \gamma V^{\mu}_P
                                            + \Delta \lambda(t^{\mu}_P + \Delta \gamma V^{\alpha}_P \partial_{\alpha} t^{\mu}_P) - \Delta \lambda t_P^{\mu} \Bigr]

                    = \lim_{\Delta \lambda \rightarrow 0} \frac{1}{\Delta \lambda} \Bigl[ \cancel{\Delta \gamma V^{\mu}_P} + \Delta \gamma \Delta \lambda t^{\alpha}_P \partial_{\alpha} t^{\mu}_P - \cancel{\Delta \gamma V^{\mu}_P}                                                                             - \bcancel{\Delta \lambda t^{\mu}_P} - \Delta \lambda \Delta \gamma V^{\alpha}_P \partial_{\alpha} t^{\mu}_P) + \bcancel{\Delta \lambda t_P^{\mu}} \Bigr]

We can cancel \gamma‘s and \lambda‘s leaving us with:

\displaystyle \cancel{\Delta \gamma} \frac{"dV"}{d \lambda} = \lim_{\Delta \lambda \rightarrow 0} \frac{1}{\bcancel{\Delta \lambda}} \Bigl[ \cancel{\Delta \gamma} \bcancel{\Delta \lambda} t^{\alpha}_P \partial_{\alpha} t^{\mu}_P - \bcancel{\Delta \lambda} \cancel{\Delta \gamma} V^{\alpha}_P \partial_{\alpha} t^{\mu}_P \Bigr]

and ultimately:

\displaystyle  \left( \frac{"dV"}{d \lambda} \right)^{\mu} =t^{\alpha} \partial_{\alpha} V^{\mu} - V^{\alpha} \partial_{\alpha} t^{\mu} \quad \text{(10)}

In arriving at eq. (10), we drop the P subscripts since we’ve taken the limit as \Delta \lambda goes to zero meaning the points P and Q essentially coincide.

The Lie derivative is usually denoted by \mathcal{L}. So we have:

\mathcal{L}_t V^{\mu} = t^{\alpha} \partial_{\alpha}  V^{\mu} - V^{\alpha} \partial_{\alpha} t^{\mu}  \quad \text{(11)}

which denotes the \mu^{th} component of the Lie derivative of V with respect to t.

II.E.2 Other Lie Derivatives

II.E.2.a Lie Derivative of a Scalar

\mathcal{L}_t f = t^{\alpha} \partial_{\alpha} f \quad \text{(12)}

II.E.2.a Lie Derivative of a (0,1) Tensor

We know that when a (0,1) tensor (i.e., a covector) acts on a vector, we get a scalar, so we have:

W(V) = W_{\mu} V^{\mu}

We know how the Lie derivative acts on a scalar, so we get:

\mathcal{L}_t W_{\mu} V^{\mu} = t^{\alpha} \partial_{\alpha}(W_{\mu} V^{\mu}) \quad \text{(13)}

Applying the product rule to the Lie derivative gives us:

\mathcal{L}_t W_{\mu} V^{\mu} = (\mathcal{L}_t W_{\mu})V^{\mu} + W_{\mu} (\mathcal{L}_t  V^{\mu}) \quad \text{(14)}

We know how to take the Lie derivative of a vector i.e., a (1,0) tensor. Thus, eq. (14) becomes:

\mathcal{L}_t W_{\mu} V^{\mu} = (\mathcal{L}_t W_{\mu})V^{\mu} + W_{\mu} (t^{\alpha} \partial_{\alpha}  V^{\mu} - V^{\alpha} \partial_{\alpha} t^{\mu}) \quad \text{(15)}

We expand the righthand side of eq. (13) using the product rule. This yields:

t^{\alpha} \partial_{\alpha}(W_{\mu} V^{\mu}) = (t^{\alpha} \partial_{\alpha} W_{\mu})  V^{\mu} + W_{\mu} (t^{\alpha} \partial_{\alpha} V^{\mu} )  \quad \text{(16)}

When we equate the righthand sides of eq. (15) and eq. (16), we obtain:

\mathcal{L}_t  W_{\mu} V^{\mu} + \cancel{W_{\mu}  t^{\alpha} \partial_{\alpha}  V^{\mu}} - W_{\mu} V^{\alpha} \partial_{\alpha} t^{\mu} = t^{\alpha} \partial_{\alpha} W_{\mu}  V^{\mu} + \cancel{W_{\mu} t^{\alpha} \partial_{\alpha} V^{\mu}}  \quad \text{(17)}

\mathcal{L}_t  W_{\mu} V^{\mu}  = t^{\alpha} \partial_{\alpha} W_{\mu} V^{\mu} + W_{\mu} V^{\alpha} \partial_{\alpha} t^{\mu} \quad \text{(18)}

We want to cancel the term V^{\mu} on both sides. To do this, we rename dummy indices in the righthand term on the righthand side of eq. (18):

\alpha \rightarrow \mu
\mu \rightarrow \alpha

Doing this gives us:

\mathcal{L}_t  W_{\mu} \cancel{V^{\mu}}  = t^{\alpha} \partial_{\alpha} W_{\mu} \cancel{V^{\mu}} + W_{\alpha} \cancel{V^{\mu}} \partial_{\mu} t^{\alpha}

After making these cancellations, we are left with:

\mathcal{L}_t  W_{\mu}  = t^{\alpha} \partial_{\alpha} W_{\mu} + W_{\alpha} \partial_{\mu} t^{\alpha} \quad \text{(19)}

II.E.2.c Lie Derivative of (n,m) Tensor

We can generalize the expression of the Lie derivative to tensors of any rank as follows:

Let \displaystyle T^{\mu_1 \dots \mu_n}_{\nu_1 \dots \nu_n} be a tensor of rank \displaystyle \begin{pmatrix} n \\ m \end{pmatrix}. Then the Lie derivative is:

    \begin{align*} \mathcal{L}_t\,  T^{\mu_1 \dots \mu_n}_{\nu_1 \dots \nu_n} &= t^{\alpha} \partial_{\alpha} T^{\mu_1 \dots \mu_n}_{\nu_1 \dots \nu_n} \\ \\ &\,\,- T^{\alpha\, \mu_2 \dots  \mu_n}_{\nu_1 \dots \nu_n} \partial_{\alpha} T^{\mu_1}  - \ldots\,\text{(1 term for each }\,T^{\mu}) \\ \\ &\,\, + T^{\mu_1 \dots \mu_n}_{\alpha\, \nu_2 \dots \nu_n}  \partial_{\nu_1} t^{\alpha} + \ldots \, \text{(1 term for each }\,T_{\nu}) \quad \text{(20)} \end{align*}

III.E.3 Spacetime Symmetries

Spacetime symmetries

In figure X, we have integral curves, x^{\mu} (shown in blue) of a vector field k. What we mean by spacetime symmetry is that all of the points along the flow lines of the vector field remain separated by the same spacetime distance. That means that, if there is spacetime symmetry, then the spacetime separations between the points separated by dx^{mu} and d(x^{\mu} + \Delta \lambda k^{\mu}) are equal. That is:

g_{\mu \nu}(x^{\alpha})dx^{\mu}dx^{\nu} = g_{\mu \nu}(x^{\alpha} + \Delta \lambda k^{\alpha}) d(x^{\mu} + \Delta \lambda k^{\mu}) d(x^{\nu} + \Delta \lambda k^{\nu}) \quad \text{(1)}

We expand each of the terms on the righthand side of eq. (2) in a Taylor series to first order:

g_{\mu \nu}(x^{\alpha})dx^{\mu}dx^{\nu}=\Bigl( g_{\mu \nu}(x^{\alpha}) + \Delta \lambda k^{\alpha} \partial_{\alpha} g_{\mu \nu} \Bigr) \Bigl( dx^{\mu} + \Delta \lambda \partial_{\alpha} k^{\mu} dx^{\alpha} \Bigr)
                                        \Bigl(  dx^{\nu} + \Delta \lambda \partial_{\alpha} k^{\nu} dx^{\alpha}  \Bigr)

=\Bigl( g_{\mu \nu}(x^{\alpha}) + \Delta \lambda k^{\alpha} \partial_{\alpha} g_{\mu \nu} \Bigr)\Bigl( dx^{\mu} dx^{\nu} + dx^{\mu} \Delta \lambda \partial_{\alpha} k^{\nu} dx^{\alpha} + \Delta \lambda \partial_{\alpha} k^{\mu} dx^{\alpha} dx^{\nu}
    + \cancel{\Delta \lambda \partial_{\alpha} k^{\mu} dx^{\alpha} \Delta \lambda \partial_{\alpha} k^{\nu} dx^{\alpha}} \Bigr) \quad \text{(2)}

All of the (\Delta \lambda) terms raised to powers of \geq 2 are negligible and can be ignored. Thus, we have:

\cancel{g_{\mu \nu}(x^{\alpha})dx^{\mu}dx^{\nu}} = \cancel{g_{\mu \nu}(x^{\alpha})dx^{\mu}dx^{\nu}} + g_{\mu \nu}(x^{\alpha})dx^{\mu} \Delta \lambda \partial_{\alpha} k^{\nu} dx^{\alpha}
                              + g_{\mu \nu}(x^{\alpha})\Delta \lambda \partial_{\alpha} k^{\mu} dx^{\alpha} dx^{\nu} + \Delta \lambda k^{\alpha} \partial_{\alpha} g_{\mu \nu} dx^{\mu} dx^{\nu}

0 = g_{\mu \nu}(x^{\alpha})dx^{\mu} \Delta \lambda \partial_{\alpha} k^{\nu} dx^{\alpha} + g_{\mu \nu}(x^{\alpha})\Delta \lambda \partial_{\alpha} k^{\mu} dx^{\alpha} dx^{\nu} \quad \text{(2)}             + \Delta \lambda k^{\alpha} \partial_{\alpha} g_{\mu \nu} dx^{\mu} dx^{\nu} \quad \text{(3)}

We can divide through by \Delta \lambda to obtain:

0 = g_{\mu \nu}(x^{\alpha})\partial_{\alpha} k^{\nu} dx^{\mu} dx^{\alpha} + g_{\mu \nu}(x^{\alpha}) \partial_{\alpha} k^{\mu} dx^{\alpha} dx^{\nu}
            + k^{\alpha} \partial_{\alpha} g_{\mu \nu} dx^{\mu} dx^{\nu}  \quad \text{(4)}

Next, we rename some dummy indices. Specifically, in the first term on the lefthand side, we change \mu \leftrightarrow \alpha; in the second term on the lefthand side, change \nu \leftrightarrow \alpha. That gives us:

0 = g_{\mu \alpha } \partial_{\nu} k^{\alpha} dx^{\nu} dx^{\nu} + g_{\alpha \nu} \partial_{\mu} k^{\alpha} dx^{\mu} dx^{\nu}
            + k^{\alpha} \partial_{\alpha} g_{\mu \nu} dx^{\mu} dx^{\nu} \quad \text{(5)}

This allows us to divide through by dx^{\mu} dx^{\nu}. When we do this and rearrange, we are left with:

0 = k^{\alpha} \partial_{\alpha} g_{\mu \nu} + g_{\alpha \nu} \partial_{\mu} k^{\alpha} + g_{\mu \alpha } \partial_{\nu} k^{\alpha} \quad \text{(6)}

But we recognize eq. (6) as the Lie derivative of the metric along the vector field k:

0 = \mathcal{L}_k\,g_{\mu \nu} \quad \text{(7)}

What this means is that spacetime is symmetric along the integral curves of a vector field k if \mathcal{L}_k\,g_{\mu \nu} = 0. And the vector field k that defines this symmetry is called a Killing vector field.

Now let’s examine an important consequence of these findings. Consider what happens if one of the coordinate basis vectors is a Killing vector field. Suppose, for example, that the x^0 coordinate basis vectors are a Killing vector field. The basis vector along the flow lines that make up the x^0 field is given by \displaystyle \frac{\partial}{\partial x^0}. The fact that the x^0 field is a Killing vector field means that \mathcal{L}_k\,g_{\mu \nu} = 0. Expanding this out, we get:

k^{\alpha} \partial_{\alpha} g_{\mu \nu} + g_{\alpha \nu} \partial_{\mu} k^{\alpha} + g_{\mu \alpha} \partial_{\nu} k^{\alpha} = 0 \quad \text{(8)}

The Killing vector field consists of just the x^0 flow lines so we can say:

k^{\alpha} = (1,0,0,0) \quad \text{(9)}

Because k^{\alpha} is a constant, the terms in eq. (8) with \partial k in them become zero. The first term in eq. (8) becomes 1 \cdot \partial_0 g_{\mu \nu} = 0

This tells us that, if one of the metric components is independent of one of the coordinates (i.e., its derivative is zero which means it doesn’t change), then 1) the vector field that defines that coordinate is a Killing vector field and 2) the spacetime is symmetric under translations along those coordinate lines. Such symmetry under translations along a coordinate line can indicate the presence of conservation laws. For example, if the time coordinate is shown to be a Killing vector field, then there is time translation symmetry which indicates the presence of energy conservation.

Note, however, that if g_{\mu \nu} components are not independent of any coordinates, this does not imply that there are no symmetries. There may well be a symmetry, just not along a coordinate line.

III.E.4 Lie Bracket

Another way to express the Lie derivative is the Lie bracket. This discussion is drawn, in part, from the YouTube video from eigenchris, Tensor Calculus 21: Lie Bracket, Flow, Torsion Tensor. Figure 1 shows how the two are analogous. Figure 1a, taken from figure Xb, shows, diagrammatically, the Lie derivative. Figure 1b, patterned after the eigenchris video, depicts the Lie bracket.

The Lie bracket can be defined as:

\vec{u}(\vec{v}) - \vec{v}(\vec{u}) = [\vec{u},\vec{v}] \quad \text{(1)}

where [\vec{u},\vec{v}], is the commutator of \vec{u} and \vec{v}.

We can expand out the first term on the lefthand side of eq. (1):

\vec{u}(\vec{v}) = u^j \vec{e}_j(v^i \vec{e}_i)
         =u^j \partial_j(v^i \partial_i)
         =u^j\Bigl[ \bigl( \partial_j v^i\bigr)\partial_i + v^i\bigl( \partial_j \partial_i  \bigr) \Bigr]
         = u^j \bigl( \partial_j v^i\bigr)\partial_i + u^jv^i\bigl( \partial_j \partial_i  \bigr) \quad \text{(2)}

Next, we expand out the second term on the lefthand side of eq. (1):

\vec{v}(\vec{u}) = v^i \vec{e}_i(u^j \vec{e}_j)
         =v^i \partial_i(u^j \partial_j)
         =v^i\Bigl[ \bigl( \partial_i u^j\bigr)\partial_j + v^i  u^j\bigl( \partial_i \partial_j  \bigr) \Bigr]
         = v^i \bigl( \partial_i u^j\bigr)\partial_j + u^jv^i\bigl( \partial_j \partial_i  \bigr)  \quad \text{(3)}

Now we subtract eq. (3) from eq. (2):

[\vec{u},\vec{v}] = \vec{u}(\vec{v}) - \vec{v}(\vec{u})
         =u^j \bigl( \partial_j v^i\bigr)\partial_i + \cancel{u^jv^i\bigl( \partial_j \partial_i  \bigr)}
           v^i \bigl( \partial_i u^j\bigr)\partial_j + \cancel{u^jv^i\bigl( \partial_j \partial_i  \bigr)}
         =u^j \bigl( \partial_j v^i\bigr)\partial_i - v^i \bigl( \partial_i u^j\bigr)\partial_j
         =u^j \bigl( \partial_j v^i\bigr)\vec{e}_i - v^i \bigl( \partial_i u^j\bigr)\vec{e}_j
         =u^i \bigl( \partial_j v^j\bigr)\vec{e}_j - v^i \bigl( \partial_i u^j\bigr)\vec{e}_j
         = \Bigl[u^i \bigl( \partial_j v^j\bigr) - v^i \bigl( \partial_i u^j\bigr)\Bigr]\vec{e}_j \quad \text{(4)}

We can compare eq. (4) with the expression for the Lie derivative given in eq. (11):

\mathcal{L}_t V^{\mu} = t^{\alpha} \partial_{\alpha}  V^{\mu} - V^{\alpha} \partial_{\alpha} t^{\mu}

[\vec{u},\vec{v}] = u^i \bigl( \partial_i v^j\bigr) - v^i \bigl( \partial_i u^j\bigr)

The similarities are obvious.

It’s certainly not a proof but the following is interesting to note nonetheless:

\displaystyle t^{\alpha} = \frac{\partial x^{\alpha}}{\partial \lambda}   and   \displaystyle \frac{\partial x^{\alpha}}{\partial \lambda} \frac{\partial}{\partial  x^{\alpha}} = \frac{\partial}{\partial \lambda}

This implies:

\displaystyle \mathcal{L}_t V^{\mu} = t^{\alpha} \partial_{\alpha}  V^{\mu} - V^{\alpha} \partial_{\alpha} t^{\mu}
           =\frac{\partial}{\partial_{\lambda}}  V^{\mu} - V^{\alpha} \frac{\partial}{\partial_{\lambda}}
           = \left[\frac{\partial}{\partial_{\lambda}}, V^{\alpha}\right]

\left[\frac{\partial}{\partial_{\lambda}}, V^{\alpha}\right] is, of course, a commutator, just like the Lie bracket.

III.E.5 Torsion Tensor

Using what we learned about the Lie bracket, we can now use it to introduce the torsion tensor. The torsion tensor is defined as:

T(u,v) = \nabla_{\vec{u}} \vec{v} -\nabla_{\vec{v}} \vec{u} - \bigl[\vec{u}, \vec{v}\bigr]\quad \text{(1)}

where

\nabla_{\vec{u} \vec{v} represents the vector \vec{v} parallel transported along \vec{u}

\nabla_{\vec{v} \vec{u} represents the vector \vec{u} parallel transported along \vec{v}

\displaystyle \bigl[\vec{u}, \vec{v}\bigr] is the Lie bracket of \vec{u} and \vec{v}

Recall that parallel transport involves moving a vector along a curve keeping it “as straight as possible.” The connection coefficient or Christoffel symbol helps define this motion. In Lie transport, a vector is “swept along” the flow line (or integral curve) of a vector field. Figure 1 shows the difference and gives a geometric interpretation of the torsion tensor.

Torsion tensor: geometric interpretation

Figure 1a represents vectors that are part of vector fields v (shown in blue) and u (shown in red). The vector \displaystyle v^{||}_Q represents the vector v_P which has been parallel transported from point P to point Q. Similarly, the vector \displaystyle u^{||}_R represents the vector u_P} which has been parallel transported from point P to point R. The difference between them is the torsion tensor T(u,v).

In figure 1a, v_Q represents the vector v_P which has been Lie transported from P to Q. And u_R represents the vector u_P after it’s been Lie transported from P to R. The difference between these 2 vectors, shown in purple, is the Lie bracket \displaystyle \bigl[u,v\bigr]

In the last 2 paragraphs and moving forward in this section, I’ve dropped the arrow above the letter notation since the context makes it clear that we’re dealing with vectors.

Figure 1b is a visual depiction of eq. (1). We start with the magenta vector labeled \nabla_{\vec{v} \vec{u} which represents the difference between the parallel transported vector \displaystyle u^{||}_R and u_R, a quantity given by the covariant derivative (as suggested by the label). We subtract from that the difference between \displaystyle v^{||}_Q and v_Q which, again, is the covariant derivative \nabla_{\vec{u} \vec{v} (represented by the cyan vector.) Next we subtract the Lie bracket (represented by the purple vector.) The result is the torsion tensor T(u,v), shown in black. The torsion tensor is an operator that takes in 2 vectors and yields as its product a vector representing the difference between the 2 vectors parallel transported along each other.

If we set the torsion tensor to zero, that means the separation between the parallel transported vectors is zero which means we get a closed 4-sided shape, as shown in figure 2.

Image

Mathematically:

\displaystyle T(u,v) = \nabla_{\vec{u}} \vec{v} -\nabla_{\vec{v}} \vec{u} - \bigl[\vec{u}, \vec{v}\bigr] = 0
    \displaystyle =\nabla_{\vec{u}} \vec{v} -\nabla_{\vec{v}} \vec{u} = \bigl[\vec{u}, \vec{v}\bigr] \quad \text{(2)}

When the above condition holds for all vector fields, we say the the connection is “torsion free.” It’s important to note, before going forward, that when we talk about torsion, we’re talking about the nature of the connection coefficients (e.g. Christoffel symbols.) It does not depend on the vector fields.

Now let’s find the components of the torsion tensor.

    \begin{align*}  T(u,v) &= \nabla_{\vec{u}} \vec{v} -\nabla_{\vec{v}} \vec{u} - \bigl[\vec{u}, \vec{v}\bigr] \\ \\ &= \nabla_{\vec{u}} \vec{v} -\nabla_{\vec{v}} \vec{u} -\Bigl( \vec{u}(\vec{v}) -  \vec{v}(\vec{u}) \Bigr) \\ \\ &=u^i \bigl( \partial _i v^k + v^j \Gamma^{k}_{ij} \bigr) \partial_k - v^i \bigl( \partial _i u^k + u^j \Gamma^{k}_{ij} \bigr) \partial_k \\ &\quad - \biggl[ \Bigl[ u^i \partial_i \bigl( v^j \partial_j \bigr) \Bigr] - \Bigl[ v^i \partial_i \bigl( u^j \partial_j \bigr) \Bigr]  \biggr] \\ \\ &= u^i \bigl( \partial _i v^k + v^j \Gamma^{k}_{ij} \bigr) \partial_k - v^i \bigl( \partial _i u^k + u^j \Gamma^{k}_{ij} \bigr) \partial_k \\ &\quad - \biggl[ \Bigl[ u^i (\partial_i v^j)\partial_j + \cancel{u^iv^j(\partial_i\partial_j)} \Bigr]  -  \Bigl[ v^i (\partial_i u^j)\partial_j + \cancel{v^i u^j(\partial_i\partial_j)} \Bigr] \biggr]\\ \\ &= \bigl(u^i  \partial _i v^k \partial_k + u^i v^j \Gamma^{k}_{ij} \partial_k \bigr)  - \bigl( v^i \partial _i u^k \partial_k + v^i u^j \Gamma^{k}_{ij} \partial_k \bigr)  \\ &\quad - \biggl[ u^i (\partial_i v^j)\partial_j -  v^i (\partial_i u^j)\partial_j   \biggr] \\ \\ &=  \bigl( \cancel{u^i  \partial _i v^k \partial_k} + u^i v^j \Gamma^{k}_{ij} \partial_k \bigr)  - \bigl( \bcancel{v^i \partial _i u^k \partial_k} + v^i u^j \Gamma^{k}_{ij} \partial_k \bigr)  \\ &\quad - \cancel{\biggl[ u^i (\partial_i v^k)\partial_k} -  \bcancel{v^i (\partial_i u^k)\partial_k}   \biggr] \\ \\ &= u^i v^j \Gamma^{k}_{ij} \partial_k - v^i u^j \Gamma^{k}_{ij} \partial_k \\ \\ &= u^i v^j \Gamma^{k}_{ij} \partial_k - v^j u^i \Gamma^{k}_{ji} \partial_k \\ \\ &= u^i v^j \Bigl( \Gamma^{k}_{ij} - \Gamma^{k}_{ji}  \bigr) \partial_k \quad \text{(3)}  \end{align*}

The term in parentheses are the components of the torsion tensor:

    \[  T^k_{ij} = \Gamma^{k}{ij} - \Gamma^{k}{ji} \quad \text{(4)} \]

When the torsion tensor is zero, \displaystyle \Gamma^{k}{ij} = \Gamma^{k}{ji}. Thus, the lower indices of the Christoffel symbols can be freely interchanged.

Eq. (4) also proves what we stated previously: namely, that the torsion tensor depends only on the connection coefficients, not on the coordinates.

II.F Geodesics

We’ve noted previously that, in an accelerated reference frame, and thus, in a gravitational field, light is predicted to follow a curved course. In fact, all objects tend to move along a path that’s as straight as possible (if not influenced by a non gravitational force), but in curved spacetime (like in a gravitational field), that path may be curved. Such motion is the general relativity version of Newton’s first law. Indeed, the natural state of motion of objects is said to be free fall, like that of the observer in the elevator plummeting toward earth, described in the introduction section.

We can formalize this mathematically in 2 ways:

II.F.1 Method 1

Since geodesic motion is the motion of free fall, and an observer in free fall experiences no acceleration, any mathematical description of such motion must include the fact that acceleration is zero. Accordingly, we need to start by coming up with a tensor expression for acceleration.

Acceleration tensor
Figure 2.5.1

Referring to figure 2.5.1a, we can write a tensorial expression for acceleration as follows:

\displaystyle a = \lim_{\Delta \tau \rightarrow 0} \frac{U_{Q  \rightarrow P} - U_P}{\Delta \tau} \quad \text{(2.5.1)}

where U_P and U_Q are velocity vectors on a worldline x^{\mu}(\tau) is geodesic parameterized by \tau. Of course, velocity vectors are tangent vectors, and like all vectors, can be thought of as a derivative operator, in this case \displaystyle U =  \frac{\partial}{\partial \tau}. U_{Q  \rightarrow P} represents the vector U_Q parallel transported to point P so that a meaningful covariant derivative can be taken.

We’ve seen previously how to, in general, take a covariant derivative. Referring to figure 2.5.1b, we take V_Q and parallel transport it from point Q to point P on a worldline parameterized by \lambda, then us the following equation:

\displaystyle \frac{DV}{d\lambda} = \lim_{\Delta \lambda \rightarrow 0} \frac{V_{Q \rightarrow P} - V_P}{\Delta \lambda} \quad \text{(2.5.2)}

The components of the covariant derivative \displaystyle \frac{DV}{d\lambda}^{\mu}, then, are :

\displaystyle \frac{DV}{d\lambda}^{\mu} = t^{\alpha}\nabla_{\alpha}V^{\mu} \quad \text{(2.5.3)}

where

\displaystyle t = \frac{\partial}{\partial \lambda}
\displaystyle t^{\alpha} = \frac{\partial x^{\alpha}}{\partial \lambda}
\displaystyle t^{\alpha}\nabla_{\alpha} = \frac{\partial x^{\alpha}}{\partial \lambda}\frac{\partial}{\partial x^{\alpha}} = \frac{\partial}{\partial \lambda} \quad \text{(2.5.4)}

Making the analogies t^{\alpha \rightarrow U^{\alpha} and V^{\mu} \rightarrow U^{\mu}, we come up with the following tensor equation for the components of the acceleration:

\displaystyle a^{mu} = U^{\alpha} \nabla_{\alpha} U^{\mu}
        =U^{\alpha}\bigl( \partial_{\alpha}U^{\mu} + \Gamma^{\mu}_{\alpha \beta} \bigr) \quad \text{(2.5.5)}

Now, \displaystyle U^{\alpha} = \frac{\partial x^{\alpha}}{\partial \tau}. Therefore:

\displaystyle a^{\mu} = \underbrace{\frac{\partial x^{\alpha}}{\partial \tau} \frac{\partial}{\partial x^{\alpha}}}_{\frac{\partial}{\partial \tau}} \left(  \frac{\partial x^{\mu}}{\partial \tau} \right) + \Gamma^{\mu}_{\alpha \beta} \frac{\partial x^{\alpha}}{\partial \tau} \frac{\partial x^{\beta}}{\partial \tau}

        \displaystyle = \frac{\partial^2 x^{\mu}}{\partial \tau^2} + \Gamma^{\mu}_{\alpha \beta} \frac{\partial x^{\alpha}}{\partial \tau} \frac{\partial x^{\beta}}{\partial \tau} \quad \text{(2.5.6)}

Since when traveling along a geodesic, there is no acceleration, we set eq. (2.5.6) to zero. We get:

    \[  \displaystyle \frac{\partial^2 x^{\mu}}{\partial \tau^2} + \Gamma^{\mu}_{\alpha \beta} \frac{\partial x^{\alpha}}{\partial \tau} \frac{\partial x^{\beta}}{\partial \tau} = 0 \quad \text{(2.5.7)}   \]

Eq. (2.5.7) is called the geodesic equation. This equation, in this form, applies to a timelike geodesic. If we were dealing with a spacelike geodesic, we’d replace \tau with s. Lightlike geodesics are a little trickier. To write a valid geodesic equation for a lightlike geodesic, we’d use 4-momentum instead of 4-velocity in our equation as well as a parameter \lambda where \displaystyle \Delta \lambda \equiv \frac{\Delta \tau}{m}, m being mass. We take the limits m \rightarrow 0 and m \rightarrow 0 with \displaystyle \frac{\Delta \tau}{m} remaining constant. For a more detailed explanation of the geodesic equation for lightlike geodesics, I refer you to

Prof. Scott Hugs. 8.962 General Relativity. Spring 2020. Massachusetts Institute of Technology: MIT OpenCouseWare, https://ocw.mit.edu/. License: Creative Commons BY-NC-SA.

II.F.2 Method 2

This derivation is taken from Physics Unsimplified.

This second method of deriving the geodesic equation borrows the principle of stationary action from Lagrangian mechanics.

Geodesic variations between points P and Q
Figure 2.5.2

We imagine 2 points in spacetime, P and Q. There are an infinite number of possible paths, parameterized by some variable (say \tau – proper time), that the particle could take in moving from P to Q. The path that the particle actually takes is the one that extremalizes the distance between P and Q, \Delta \tau. More information on this process and Lagrangian mechanics, in general, can be found here. We can apply this technique to arrive at the geodesic equation.

Figure 2.5.2 shows several worldlines – parameterized by \lambda – that a particle could follow between the points P and Q. The red worldline is the actual path taken by the particle (although we don’t know that before we start). The worldlines in the diagram are meant to be timelike but some, from the way they’re drawn, appear spacelike. Please excuse the deficiencies in my artwork. At any rate, we want to find the path that minimizes or maximizes the proper time it takes to traverse the path. We know that:

\displaystyle  \tau = \int_P^Q d\tau \quad \text{(2.5.8)}

The worldlines are given by x^{\mu}(\lambda}. The velocity vectors (which are the tangent vectors) are given by \displaystyle \dot{x}^{\mu} \equiv \frac{\partial x^{\mu}}{\partial \lambda}

Now

ds^2 = g_{\mu \nu} dx^{\mu} dx^{\nu} = -d\tau^2 \quad \text{(2.5.9)}

d\tau = \sqrt{-g_{\mu \nu} dx^{\mu} dx^{\nu}}

      \displaystyle =\sqrt{-g_{\mu \nu} \frac{\partial x^{\mu}}{\partial \lambda} \frac{\partial x^{\nu}}{\partial \lambda} d\lambda^2 }

      =\sqrt{-g_{\mu \nu} \dot{x}^{\mu} \dot{x}^{\nu}} d\lambda \quad \text{(2.5.10)}

Placing the results of eq. (2.5.10) into eq. (2.5.8), we get:

\displaystyle  \tau = \int_P^Q \sqrt{-g_{\mu \nu} \dot{x}^{\mu} \dot{x}^{\nu}} d\lambda \quad \text{(2.5.11)}

Next, we want to examine all of the possible paths by varying x^{\mu} slightly, add up all the changes and see what the effect is on \Delta \tau. The path that results in a minimum or maximum of \Delta \tau – and the equation that describes that path – is the one that’s “picked out” by the process. It’s known that finding the minimum or maximum value of a function occurs when the first derivative is zero. Likewise, the extremal value of \Delta \tau can be found when \delta \Delta \tau = 0. So the equation we need to solve is:

\displaystyle \delta \tau = \int_P^Q  \delta \sqrt{-g_{\mu \nu} \dot{x}^{\mu} \dot{x}^{\nu}} d\lambda \quad \text{(2.5.12)}

Differentiating using the product rule, we have:

\displaystyle \delta \tau =\displaystyle  \int_P^Q  d\lambda \biggl[ \frac{1}{2 \sqrt{-g_{\mu \nu} \dot{x}^{\mu} \dot{x}^{\nu}}} \Bigl( -g_{\mu \nu} \dot{x}^{\mu}\delta \dot{x}^{\nu} -g_{\mu \nu} \delta \dot{x}^{\mu} \dot{x}^{\nu}
                          - \partial_{\alpha} g_{\mu \nu} \delta x^{\alpha} \dot{x}^{\mu} \dot{x}^{\nu}  \Bigr) \biggr] \quad \text{(2.5.13)}

But g_{\mu \nu} \dot{x}^{\mu}\delta \dot{x}^{\nu} = g_{\mu \nu} \dot{x}^{\nu}\delta \dot{x}^{\mu} \quad \text{(2.5.14)}

Using this relationship in eq. (2.5.13), we get:

\displaystyle \delta \tau = \int_P^Q  d\lambda \biggl[ \frac{1}{\cancel{2} \sqrt{-g_{\mu \nu} \dot{x}^{\mu} \dot{x}^{\nu}}} \Bigl(  -\cancel{2} g_{\mu \nu} \delta \dot{x}^{\mu} \dot{x}^{\nu} -  \partial_{\alpha} g_{\mu \nu} \delta x^{\alpha} \dot{x}^{\mu} \dot{x}^{\nu}  \Bigr) \biggr] \quad \text{(2.5.15)}

Next, we integrate by parts. The following is a brief recap of how integration by parts works.

\displaystyle \int \dfrac{d}{dx}\left[ F(x)G(x)\right]dx = \int_a^b F\left( x\right) \dfrac{dG}{dx}dx+\int G\left( x\right) \dfrac{dF}{dx}dx

\displaystyle \int \dfrac{d}{dx}\left[ F(x)G(x)\right] dx-\int G\left( x\right) \dfrac{dF}{dx}dx=\int F\left( x\right) \dfrac{dG}{dx}dx

\displaystyle \eval{F(x)G(x)}_{a}^{b}-\dfrac{dF}{dx}G\left( x\right) =f\left( x\right) \dfrac{d\cdot }{\delta x}G\left( x\right)

In our case:

F(x) \rightarrow \displaystyle \frac{g_{\mu \nu}\dot{x}^{\nu}}{\sqrt{-g_{\mu \nu} \dot{x}^{\mu} \dot{x}^{\nu}}}}

\displaystyle \frac{dF}{dx} \rightarrow \frac{d}{d\lambda}\frac{g_{\mu \nu}\dot{x}^{\nu}}{\sqrt{-g_{\mu \nu} \dot{x}^{\mu} \dot{x}^{\nu}}}}

G(x) \rightarrow \delta x^{\mu}

\displaystyle \frac{dG}{dx} \rightarrow \delta \dot{x}^{\mu}

Applying integration by parts to the term g_{\mu \nu} \delta \dot{x}^{\mu} \dot{x}^{\nu}, we obtain:

\displaystyle \delta \tau = -\displaystyle \frac{g_{\mu \nu}\dot{x}^{\nu}}{\sqrt{-g_{\mu \nu} \dot{x}^{\mu} \dot{x}^{\nu}}}} \eval{\delta x^{\mu}}_P^Q  + \int_P^Q  d\lambda \biggl[  \frac{d}{d\lambda}\left( \frac{g_{\mu \nu}\dot{x}^{\nu}}{\sqrt{-g_{\mu \nu} \dot{x}^{\mu} \dot{x}^{\nu}}}} \right) \delta x^{\mu}  - \frac{\partial_{\alpha} g_{\mu \nu}  \dot{x}^{\mu} \dot{x}^{\nu}}{2\sqrt{-g_{\mu \nu} \dot{x}^{\mu} \dot{x}^{\nu}}}} \delta x^{\alpha} \biggr] \quad \text{(2.5.16)}

Points P and Q are fixed; they don’t vary. Thus, \delta x^{\mu} = 0. Therefore the term -\displaystyle \frac{g_{\mu \nu}\dot{x}^{\nu}}{\sqrt{-g_{\mu \nu} \dot{x}^{\mu} \dot{x}^{\nu}}}} \eval{\delta x^{\mu}}_P^Q = 0. This makes eq. (2.5.16):

\displaystyle \delta \tau = \int_P^Q  d\lambda \biggl[  \frac{d}{d\lambda}\left( \frac{g_{\mu \nu}\dot{x}^{\nu}}{\sqrt{-g_{\mu \nu} \dot{x}^{\mu} \dot{x}^{\nu}}}} \right) \delta x^{\mu}  - \frac{\partial_{\alpha} g_{\mu \nu}  \dot{x}^{\mu} \dot{x}^{\nu}}{2\sqrt{-g_{\mu \nu} \dot{x}^{\mu} \dot{x}^{\nu}}}} \delta x^{\alpha} \biggr] \quad \text{(2.5.17)}

We’d like to factor out the \delta x terms but they have different indices. We can fix this by making the following dummy index changes in the righthand term: \mu \leftrightarrow \alpha and \nu\leftrightarrow \beta. That gives us:

\displaystyle \delta \tau = \int_P^Q  d\lambda \biggl[  \frac{d}{d\lambda}\left( \frac{g_{\mu \nu}\dot{x}^{\nu}}{\sqrt{-g_{\mu \nu} \dot{x}^{\mu} \dot{x}^{\nu}}}} \right)  - \frac{\partial_{\mu} g_{\alpha \beta}  \dot{x}^{\alpha} \dot{x}^{\beta}}{2\sqrt{-g_{\mu \nu} \dot{x}^{\mu} \dot{x}^{\nu}}}} \delta x^{\mu} \biggr] \delta x^{\mu} \quad \text{(2.5.18)}

We want to extremize the proper time so we set \delta \tau = 0. The only way this can happen is if the term in bracket is zero. That yields:

0 = \displaystyle \frac{d}{d\lambda}\left( \frac{g_{\mu \nu}\dot{x}^{\nu}}{\sqrt{-g_{\mu \nu} \dot{x}^{\mu} \dot{x}^{\nu}}}} \right)  - \frac12 \frac{\partial_{\mu} g_{\alpha \beta}  \dot{x}^{\alpha} \dot{x}^{\beta}}{\sqrt{-g_{\mu \nu} \dot{x}^{\mu} \dot{x}^{\nu}}}} \delta x^{\mu} \quad \text{(2.5.19)}

Eq. (2.5.19) is true for any arbitrary parameter lambda. Thus, it still holds if we set \lambda = \tau. When we do this, the \displaystyle \dot{x} = \frac{\partial x}{\partial \lambda} terms become \displaystyle \dot{x} = \frac{\partial x}{\partial \tau} = U where U represents the 4-velocity. But we know that the dot product of the 4-velocity is -1 (in units where c=1). Thus:

\displaystyle \sqrt{-g_{\mu \nu} \dot{x}^{\mu} \dot{x}^{\nu}}} = \sqrt{-g_{\mu \nu} \frac{\partial x^{\mu}}{\partial \tau}  \frac{\partial x^{\nu}}{\partial \tau}}  = \sqrt{-g_{\mu \nu} U^{\mu} U^{\nu}}} = \sqrt{-(-1)} = 1

When we substitute this result into eq. (2.5.19), eq. (2.5.19) becomes:

\displaystyle 0 = \frac{\partial}{\partial \tau}\bigl( g_{\mu \nu} \dot{x}^{\nu}  \bigr) - \frac12 \partial_{\mu} g_{\alpha \beta} \dot{x}^{\alpha} \dot{x}^{\beta}

    =g_{\mu \nu} \ddot{x}^{\nu} + \partial_{\alpha} g_{\mu \nu} \dot{x}^{\alpha} \dot{x}^{\nu} - \frac12 \partial_{\mu} g_{\alpha \beta} \dot{x}^{\alpha} \dot{x}^{\beta} \quad \text{(2.5.20)}

Multiplying both sides of eq. (2.5.20) by g^{\sigma \mu} changes eq. (2.5.20) to:

\displaystyle 0 = g^{\sigma \mu} g_{\mu \nu} \ddot{x}^{\nu} + g^{\sigma \mu} \bigl( \partial_{\alpha} g_{\mu \nu} \dot{x}^{\alpha} \dot{x}^{\nu} - \frac12 \partial_{\mu} g_{\alpha \beta} \dot{x}^{\alpha} \dot{x}^{\beta} \bigr) \quad \text{(2.5.21)}

We’d like to factor out a factor of \dot{x}^{\alpha} \dot{x}^{\beta} from the righthand term of eq. (2.5.21). To do this, we have to change the dummy index \nu \rightarrow \beta in the lefthand term in the parentheses. We have:

\displaystyle 0 = \ddot{x}^{\sigma} + g^{\sigma \mu} \bigl(  \partial_{\alpha} g_{\beta \mu} - \frac12  \partial_{\mu} g_{\alpha \beta} \bigr) \dot{x}^{\alpha} \dot{x}^{\beta}\quad \text{(2.5.22)}

Because \dot{x}^{\alpha} \dot{x}^{\beta} is symmetric in their indices, this means that \partial_{\alpha} g_{\beta \mu} = \partial_{\beta} g_{\alpha \mu} so we can replace the term \partial_{\alpha} g_{\beta \mu} with the expression \displaystyle \frac12 \partial_{\alpha} g_{\beta \mu} + \frac12 \partial_{\beta} g_{\alpha \mu}. We get:

\displaystyle 0 = \ddot{x}^{\sigma} + g^{\sigma \mu} \bigl(  \frac12 \partial_{\alpha} g_{\beta \mu} + \frac12 \partial_{\beta} g_{\alpha \mu} - \frac12  \partial_{\mu} g_{\alpha \beta} \bigr) \dot{x}^{\alpha} \dot{x}^{\beta}

    =\ddot{x}^{\sigma} + \underbrace{\frac12 g^{\sigma \mu} \bigl( \partial_{\alpha} g_{\beta \mu} + \partial_{\beta} g_{\alpha \mu} - \partial_{\mu} g_{\alpha \beta} \bigr)}_{\Gamma^{\sigma}_{\alpha \beta}} \dot{x}^{\alpha} \dot{x}^{\beta}

    =\ddot{x}^{\sigma} + \Gamma^{\sigma}_{\alpha \beta}\dot{x}^{\alpha} \dot{x}^{\beta}

    =\displaystyle \ddot{x}^{\sigma} + \Gamma^{\sigma}_{\alpha \beta} \frac{\partial x^{\alpha}}{\partial \tau} \frac{\partial x^{\beta}}{\partial \tau} \quad \text{(2.5.23)}

Eq. (2.5.23), of course, is the geodesic equation, which is what we were trying to derive.

Local Inertial Frame

I mentioned earlier that, even in curved spacetime, spacetime, over small regions, spacetime looks flat. And I said that I’d offer a proof later. Now, with some additional mathematical tools in our toolkit, I’m in a position to provide that proof.

The above statement – that in small regions, spacetime will look flat – can also be expressed in the following manner: For each tiny region in spacetime, coordinates can be chosen such that the observed metric is the Minkowski metric (i.e., spacetime appears flat). Such coordinates are called Fermi normal coordinates. Proof of this is tedious. However, for those interested, a proof of this can be found .

Riemann Curvature Tensor

With the concepts of the covariant derivative and parallel transport in hand, we’re now ready to find a method for determining whether a space is flat or curved.

Derivation

The derivation that follows is taken from Robert Davies, “Riemann Curvature Tensor” which is part of his Tensor Calculus series on YouTube.

Parallel transport around closed loop in flat vs curved space
Figure 2.6.1

We can see, from figure 2.6.1a, that when we parallel transport a vector from point A to B to C, it will remain unchanged as compared with the same vector parallel transported from A to D to C. However, in curved space like the surface of a sphere (see figure 2.6.2), when we parallel transport a vector from A to B to C, the vector that we end up with at C will differ from the resultant vector after parallel transporting the same vector from A to D to C. To determine whether a space is flat or curved, we’ll apply this same basic idea of parallel transport of a vector in different directions around a closed loop, but will examine this process on a loop of infinitesimal length.

Riemann curvature tensor derivation. Parallel transport around a closed loop.
Figure 2.6.2

Specifically, as shown in figure 2.6.2, we’ll take a vector, \vec{V}, and, like in figure 2.6.1 parallel transport it to point C along 2 paths – A to B to C and A to D to C. For each path, we’ll write equations describing these motions with a goal of arriving at a mathematical expression that characterizes the curvature of the space in which we’re working.

Before we get into the derivation proper, we know that we can consider derivatives as operators that bring about changes in functions. Allowing for minor abuses in notation, we can write:

\displaystyle \frac{df}{dx} = \frac{f(x + dx) -f(x)}{dx} \quad \Rightarrow \quad f(x + dx) = f(x) + dx \frac{df}{dx} \quad \text{(2.6.1)}

Similarly, we can think of the covariant derivative as an operator that brings about a change in a vector as it’s parallel transported along a worldline. For example:

\displaystyle V^{\alpha}_{\text{Final}} = V^{\alpha}_{\text{Initial}} +\int \frac{\partial V}{\partial x}  dx \quad \text{(2.6.2)}

where we sum up all the little changes brought about by our operator (the covariant derivative), in each little interval, dx, take the limit as x goes to zero and add it to the initial vector to get the total change.

We also know that, since the covariant derivative of a vector that’s parallel transported is zero, we can use equations like the following:

\displaystyle \frac{\partial v^{\alpha}}{\partial x^i} + v^{u}\Gamma^{\alpha}_{\mu i} = 0 \quad \text{(2.6.3)}

which implies:

\displaystyle \frac{\partial v^{\alpha}}{\partial x^i} = -v^{u}\Gamma^{\alpha}_{\mu i} \quad \text{(2.6.4)}

Given these tools, let’s begin with parallel transport of vector v from A to B along the x^1. We have:

    \begin{align*} \frac{\partial v^{\alpha}}{\partial x^{1}} & =-v^{\mu} \Gamma_{\mu 1}^{\alpha} \\ v^{\alpha}(B)-v^{\alpha}(A) & =\int_{a}^{a+\delta a} \frac{\partial v^{\alpha}}{\partial x^{1}} d x^{1} \\ & =-\int_{a}^{a+\delta a} v^{\mu} \Gamma_{\mu 1}^{\alpha} d x^{1}  \quad \text{(2.6.5)} \end{align*}

Note that in eq. (2.6.5), and subsequently in this discussion, we’ve moved v^{\alpha}(A) to the left side of the equation.

The equations for parallel transport of v^{\alpha} from B to C are:

    \[ \frac{\partial v^{\alpha}}{\partial x^{2}} & =-v^{\mu} \Gamma_{\mu 2}^{\alpha} \quad \text{(2.6.6)} \]

and

    \begin{align*} v^{\alpha}(C)-v^{\alpha}(B) & =\int_{b}^{b+\delta b} \frac{\partial v^{\alpha}}{\partial x^{2}} d x^{2} \\ & =-\int_{b}^{b+\delta b} v^{\mu} \Gamma_{\mu 2}^{\alpha} d x^{2} \quad \text{(2.6.7)}  \end{align*}

The net change for Path 1 (i.e., the path from A to B to C) is:

    \begin{align*} v^{\alpha}(C)-v^{\alpha}(A) & =v^{\alpha}(B)-v^{\alpha}(A)+v^{\alpha}(C)-v^{\alpha}(B) \\ & =-\int_{a}^{a+\delta a} v^{\mu} \Gamma_{\mu 1}^{\alpha} d x^{1}-\int_{b}^{b+\delta b} v^{\mu} \Gamma_{\mu 2}^{\alpha} d x^{2} \quad \text{(2.6.8)}   \end{align*}

Next, we turn our attention to Path 2, the path from A to D to C. We’ll begin with parallel transport from A to D. We have:

    \begin{align*} v^{\alpha}(D)-v^{\alpha}(A) & =\int_{b}^{b+\delta b} \frac{\partial v^{\alpha}}{\partial x^{2}} d x^{2} \\ & =-\int_{b}^{b+\delta b} v^{\mu} \Gamma_{\mu 2}^{\alpha} d x^{2}  \quad \text{(2.6.9)}\end{align*}

The equations for parallel transport from D to C are:

    \begin{align*} v^{\alpha}(C)-v^{\alpha}(D) & =\int_{a}^{a+\delta a} \frac{\partial v^{\alpha}}{\partial x^{1}} d x^{1} \\ & =-\int_{a}^{a+\delta a} v^{\mu} \Gamma_{\mu 1}^{\alpha} d x^{1}  \quad \text{(2.6.10)}\end{align*}

For the entirety of Path 2, the equations are:

    \begin{align*} v^{\alpha}(C)-v^{\alpha}(A) & =v^{\alpha}(D)-v^{\alpha}(A)+v^{\alpha}(C)-v^{\alpha}(D) \\ & =-\int_{b}^{b+\delta b} v^{\mu} \Gamma_{\mu 2}^{\alpha} d x^{2}-\int_{a}^{a+\delta a} v^{\mu} \Gamma_{\mu 1}^{\alpha} d x^{1} \quad \text{(2.6.11)}   \end{align*}

The difference between the vectors transported along Path 1 and Path 2, when they get to C, is given by:

    \begin{align*} \delta v^{\alpha} & =v^{\alpha}(C)_{\text {Path } 2}-v^{\alpha}(C)_{\text {Path } 1} \\ \\ & =-\int_{b}^{b+\delta b} v^{\mu} \Gamma_{\mu 2}^{\alpha} d x^{2}-\int_{a}^{a+\delta a} v^{\mu} \Gamma_{\mu 1}^{\alpha} d x^{1}+\int_{a}^{a+\delta a} v^{\mu} \Gamma_{\mu 1}^{\alpha} d x^{1}\\&\quad+\int_{b}^{b+\delta b} v^{\mu} \Gamma_{\mu 2}^{\alpha} d x^{2} \quad \text{(2.6.12)}  \end{align*}

We rearrange terms such that dx^1 terms and dx^2 terms are together:

    \begin{align*} \delta v^{\alpha} &=-\int_{A D} v^{\mu} \Gamma_{\mu 2}^{\alpha} d x^{2}+\int_{B C} v^{\mu} \Gamma_{\mu 2}^{\alpha} d x^{2}-\int_{D C} v^{\mu} \Gamma_{\mu 1}^{\alpha} d x^{1}\\ &\quad +\int_{A B} v^{\mu} \Gamma_{\mu 1}^{\alpha} d x^{1} \quad \text{(2.6.13)}  \end{align*}

Notice that the integrands in the integrals \displaystyle -\int_{A D} v^{\mu} \Gamma_{\mu 2}^{\alpha} d x^{2} (which represents v^{\alpha} transported from A to D) and \displaystyle \int_{B C} v^{\mu} \Gamma_{\mu 2}^{\alpha} d x^{2} (which represents v^{\alpha} transported from B to C), are the same. These vectors are parallel but separated by dx^1. What we’d like to do is parallel transport the \int_{AD} vector to the \int_{BC} vector so we can compare them directly. This will give us the x^2 component of the \delta v. To do this, we take the Taylor series expansion of the vector at D in the x^1 direction. Or, the way I like to think about is: take the vector at D and parallel transport it to the resultant BC vector at C by taking the derivative of the vector at D with respect to x^1, then multiply it by \delta x^1 – the infinitesimal displacement to C – similar to what we did in parallel transporting our other vectors.

Likewise, the vectors resulting from parallel transport from D to C and A to B are parallel. Thus, we’ll compare them in the manner described above for the A to D and B to C parallel transported vectors to get the x^1 component of \delta v. The results are:

    \begin{align*} \delta v^{\alpha}& =-\int_{A D}\left(v^{\mu} \Gamma_{\mu 2}^{\alpha}+\delta a \frac{\partial}{\partial x^{1}}\left(v^{\mu} \Gamma_{\mu 2}^{\alpha}\right) \ldots\right) d x^{2}+\int_{B C} v^{\mu} \Gamma_{\mu 2}^{\alpha} d x^{2} \\ &\quad-\int_{D C} v^{\mu} \Gamma_{\mu 1}^{\alpha} d x^{1}+\int_{A B}\left(v^{\mu} \Gamma_{\mu 1}^{\alpha}+\delta b \frac{\partial}{\partial x^{2}}\left(v^{\mu} \Gamma_{\mu 1}^{\alpha}\right) \ldots\right) d x^{1} \\ \\   &= \int_{b}^{b +\delta b} \Biggl[-v^{\mu} \Gamma_{\mu 2}^{\alpha}-\left(\delta a \frac{\partial}{\partial x^{1}}\left(v^{\mu} \Gamma_{\mu 2}^{\alpha}\right) \ldots\right) +v^{\mu} \Gamma_{\mu 2}^{\alpha}\Biggr] d x^{2} \\ &\quad\int_a^{a+\delta a} \Biggl[-v^{\mu} \Gamma_{\mu 1}^{\alpha} +v^{\mu} \Gamma_{\mu 1}^{\alpha}+\left(\delta b \frac{\partial}{\partial x^{2}}\left(v^{\mu} \Gamma_{\mu 1}^{\alpha}\right) \ldots\right)\Biggr] d x^{1} \quad \text{(2.6.14)}  \end{align*}

After cancelling the v^{\mu}\Gamma terms of opposite sign in eq. (2.6.14), we’re left with:

    \[ \int_{b}^{b +\delta b}  -\delta a \frac{\partial}{\partial x^{1}}\Bigl(v^{\mu} \Gamma_{\mu 2}^{\alpha}\Bigr)\,\,dx^2 + \int_a^{a+\delta a} \delta b \frac{\partial}{\partial x^{2}}\Bigl(v^{\mu} \Gamma_{\mu 1}^{\alpha}\Bigr)\,\, dx^1 \quad \text{(2.6.15)} \]

This next step is where I become confused. In the video from Robert Davie and other sources that present this derivation (e.g., Scott Hughes, General Relativity, at 1h 05min of Lecture 10, MIT Opencourseware; Bernard Schutz, A First Course in General Relativity, p. 159), it’s said that, because we’re working with infinitesimal distances, the integrals in eq. (2.6.15) become just the area of the loop,\delta a \delta b. Thus, we can pull out the \delta a, \delta b terms and dispense with the integral to get:

    \[  \delta a \delta b \Bigl[  -\frac{\partial}{\partial x^{1}}\Bigl(v^{\mu} \Gamma_{\mu 2}^{\alpha}\Bigr)  +  \frac{\partial}{\partial x^{2}}\Bigl(v^{\mu} \Gamma_{\mu 1}^{\alpha}\Bigr) \Bigr] \quad \text{(2.6.16)} \]

To see how I reconcile this, click .

At this point, we expand out the expression in the brackets in eq. (2.6.16) using the product rule, swap some dummy indices and where appropriate, make the following substitution for partial derivatives: \displaystyle \frac{\partial v}{\partial x = -v\Gamma}. This leads to:

    \begin{align*} \delta v^\alpha & \approx \delta a \delta b\left[-\frac{\partial}{\partial x^1}\left(v^\mu \Gamma_{\mu 2}^\alpha\right)+\frac{\partial}{\partial x^2}\left(v^\mu \Gamma_{\mu 1}^\alpha\right)\right] \\ & =\delta a \delta b\left[\frac{\partial v^\mu}{\partial x^2} \Gamma_{\mu 1}^\alpha+\frac{\partial \Gamma_{\mu 1}^\alpha}{\partial x^2} v^\mu-\frac{\partial v^\mu}{\partial x^1} \Gamma_{\mu 2}^\alpha-\frac{\partial \Gamma_{\mu 2}^\alpha}{\partial x^1} v^\mu\right] \\ & =\delta a \delta b\left[-v^v \Gamma_{v 2}^\mu \Gamma_{\mu 1}^\alpha+\frac{\partial \Gamma_{\mu 1}^\alpha}{\partial x^2} v^\mu--v^v \Gamma_{v 1}^\mu \Gamma_{\mu 2}^\alpha-\frac{\partial \Gamma_{\mu 2}^\alpha}{\partial x^1} v^\mu\right] \\ & =\delta a \delta b\left[-v^\mu \Gamma_{\mu 2}^v \Gamma_{v 1}^\alpha+\frac{\partial \Gamma_{\mu 1}^\alpha}{\partial x^2} v^\mu--v^\mu \Gamma_{\mu 1}^v \Gamma_{v 2}^\alpha-\frac{\partial \Gamma_{\mu 2}^\alpha}{\partial x^1} v^\mu\right] \\ & =\delta a \delta b\left[\frac{\partial \Gamma_{\mu 1}^\alpha}{\partial x^2} v^\mu-\frac{\partial \Gamma_{\mu 2}^\alpha}{\partial x^1} v^\mu+\Gamma_{\mu 1}^v \Gamma_{v 2}^\alpha v^\mu-\Gamma_{\mu 2}^v \Gamma_{v 1}^\alpha v^\mu\right] \\ & =\delta a \delta b\left[\frac{\partial \Gamma_{\mu 1}^\alpha}{\partial x^2}-\frac{\partial \Gamma_{\mu 2}^\alpha}{\partial x^1}+\Gamma_{\mu 1}^v \Gamma_{v 2}^\alpha-\Gamma_{\mu 2}^v \Gamma_{v 1}^\alpha\right] v^{\mu} \quad \text{(2.6.17)}  \end{align*}

When we take the limit as \delta a and \delta b go to zero, we get the difference in the parallel transported vectors at point C:

    \[ \delta v^\alpha  = \Bigl[ \frac{\partial \Gamma_{\mu 1}^\alpha}{\partial x^2}-\frac{\partial \Gamma_{\mu 2}^\alpha}{\partial x^1}+\Gamma_{\mu 1}^v \Gamma_{v 2}^\alpha-\Gamma_{\mu 2}^v \Gamma_{v 1}^\alpha\Bigr] v^{\mu} \quad \text{(2.6.18)}\]

When \delta v^\alpha = 0, it means the space we’re working with is flat. If it’s nonzero, it means the space is curved. The operator that ultimately makes this differentiation is the expression in brackets in eq. (2.6.18):

    \[\delta v^{\alpha} =  R^{\alpha}_{\mu21} =  \frac{\partial \Gamma_{\mu 1}^\alpha}{\partial x^2}-\frac{\partial \Gamma_{\mu 2}^\alpha}{\partial x^1}+\Gamma_{\mu 1}^v \Gamma_{v 2}^\alpha-\Gamma_{\mu 2}^v \Gamma_{v 1}^\alpha \quad \text{(2.6.19)} \]

We can make this more general by generalizing the coordinate lines x^1 and x^2, replacing them with x^{\beta} and x^{\gamma}:

    \[ R^{\alpha}_{\mu \gamma \beta} =  \frac{\partial \Gamma_{\mu \beta}^\alpha}{\partial x^\gamma}-\frac{\partial \Gamma_{\mu \gamma}^\alpha}{\partial x^\beta}+\Gamma_{\mu \beta}^v \Gamma_{v \gamma}^\alpha-\Gamma_{\mu \gamma}^v \Gamma_{v \beta}^\alpha \quad \text{(2.6.20)} \]

So we can think of the Riemann curvature tensor as an operator which takes 3 vectors as an inputs (2 infinitesimal displacement vectors and the vector it will parallel transport) and spits out the difference vector \delta v, which tells us whether a space is flat or curved.

Method 2

Riemann curvature tensor derivation method 2
Figure 2.6.3

Another way to derive an expression for the Riemann curvature tensor is to make use of a mathematical object referred to as a commutator. Again, we consider parallel transport of a vector, V^{\lambda}, around an infinitesimal square closed loop defined by 2 vectors, \mu and \nu. The lengths of sides of this infinitesimal square are \delta \mu and \delta \nu, respectively.

We start by parallel transporting V^{\lambda} from A to B. We know that the operator that brings about parallel transport is the covariant derivative. Thus, we can describe this movement from A to B mathematically as:

    \[  \nabla_{\nu}V^{\lambda} \delta \nu\quad \text{(2.6.21)} \]

We then parallel transport the resultant vector at B to point C via the covariant derivative in the \mu direction as follows:

    \[ \nabla_{\mu} \delta \mu  \bigl(\nabla_{\nu}V^{\lambda}\bigr) \delta \nu \quad \text{(2.6.22)}\]

Therefore, the expression describing the transport of V^{\lambda} from A to B to C is:

    \[ \nabla_{\mu} \nabla_{\nu}V^{\lambda} \delta \mu \delta \nu  \quad \text{(2.6.23)}\]

Now, like in method 1 derivation described above, we want to parallel transport V^{\lambda} from A to D to C. Then we subtract the vector at C resulting from this path from vector resulting from transport along the A to B to C path. Again, if the result is zero (i.e., no difference), then we infer that the space is flat along this loop. If the result is nonzero (i.e., there is a difference) then we presume that the space is curved in the area of the loop. And if we take the limit as \delta \mu and \delta \nu go to zero, we can determine the flatness or curvature at the point C.

So, in parallel transporting V^{\lambda} from A to D, we move along the \mu direction. Thus, we have:

    \[  \nabla_{\mu}V^{\lambda} \delta \mu\quad \text{(2.6.24)} \]

In moving from D to C, we go in the \nu direction. Therefore, we get:

    \[ \nabla_{\mu} \delta \mu  \bigl(\nabla_{\nu}V^{\lambda}\bigr) \delta \nu \quad \text{(2.6.25)}\]

The expression for the path from A to D to C, then, is:

    \[ \nabla_{\nu} \nabla_{\mu}V^{\lambda} \delta \mu \delta \nu  \quad \text{(2.6.26)}\]

To get \delta V – the difference in resultant vectors at C being created by the A to B to C versus A to D to C transport paths – we subtract eq. (2.6.26) from eq. (2.6.23):

    \begin{align*} \delta V &= \nabla_{\mu} \nabla_{\nu} V^{\lambda} \delta \mu \delta \nu - \nabla_{\nu} \nabla_{\mu} V^{\lambda} \delta \mu \delta \nu \\ &= \bigl( \nabla_{\mu} \nabla_{\nu}  - \nabla_{\nu} \nabla_{\mu} \bigr)V^{\lambda} \delta \mu \delta \nu \\ &= \bigl[  \nabla_{\mu}, \nabla_{\nu}\bigr] V^{\lambda} \delta \mu \delta \nu  \quad \text{(2.6.27)} \end{align*}

where \bigl[  \nabla_{\mu}, \nabla_{\nu}\bigr] is called the commutator of \nabla_{\mu} and \nabla_{\nu}. The commutator is the general expression for the Riemann curvature tensor.

We should note that the complete form of eq. (2.6.27) is actually:

\delta V = R(\vec{\mu}, \vec{\nu})\vec{V} = \nabla_{\vec{\mu}} \nabla_{\vec{\nu}}\vec{V} - \nabla_{\vec{\nu}} \nabla_{\vec{\mu}}\vec{V} - \nabla_{[\vec{\mu}, \vec{\nu}]}\vec{V}

where R(\vec{\mu}, \vec{\nu})\vec{w} represents the Riemann Curvature Tensor which takes in vectors \vec{\mu} and \vec{\mu} and works on \vec{V} to give us \delta V. The third term on the righthand side is the covariant derivative of \vec{V} in the direction of the Lie bracket [\vec{\mu}, \vec{\nu}]. I won’t say much about this except to say: when we parallel transport our vector around our infinitesimal loop to derive the Riemann tensor, we’re traversing integral curves that are essentially coordinate lines. If we were traversing general integral curves, there would be a chance that the loop wouldn’t close (i.e., that the Lie bracket would be nonzero). The term \nabla_{[\vec{\mu}, \vec{\nu}]}\vec{V} would be needed to handle such a situation. However, all of the coordinate lines that we’ll be dealing with on this webpage form closed loops. (In fact, I can’t even imagine what coordinate axes that don’t form closed loops would look like.) Such closed loops are associated with a Lie derivative of zero. Therefore, the derivative in the direction of the Lie derivative zero. Thus, for all intents and purposes, in this article, we can ignore the term \nabla_{[\vec{\mu}, \vec{\nu}]}\vec{V}.

To find the components of the Riemann Curvature Tensor, we need to do some calculations, which are taken from the YouTube video by Andrew Dotson, Riemann Curvature Tensor.

We’ll need 2 equations to make these calculations. Recall:

1. Covariant derivative of a vector:

\displaystyle \nabla_{\mu} V^{\lambda} = \partial_{\mu} V^{\lambda} + \Gamma^{\lambda}_{\mu \nu} V^{\nu} \quad \text{(2.6.28)}

2. Covariant derivative of a mixed rank 2 tensor:

\displaystyle \nabla_{\mu} T^{\lambda}_{\nu} = \partial_{\mu} T^{\lambda}_{\nu} + \Gamma^{\lambda}_{\alpha \mu} T^{\alpha}_{\nu} - \Gamma^{\sigma}_{\mu \nu} T^{\lambda}_{\sigma} \quad \text{(2.6.29)}

We begin with the following form of eq. (2.6.27):

\displaystyle \delta V = \nabla_{\mu} \nabla_{\nu} V^{\lambda}  - \nabla_{\nu} \nabla_{\mu} V^{\lambda} \quad \text{(2.6.30)}

Here we’re dropping the \delta \mu \delta \nu term since we’re taking the limit as these infinitesimal distances go to zero and they vanish anyway.

We start our derivation in earnest by expanding the covariant derivatives of \nu and \mu in eq. (2.6.27):

\displaystyle = \nabla_{\mu} \underbrace{\bigl(  \partial_{\nu} V^{\lambda} + \Gamma^{\lambda}_{\alpha \nu} V^{\alpha} \bigr)}_{T_{\nu}^{\lambda}}  - \nabla_{\nu} \underbrace{\bigl( \partial_{\mu} V^{\lambda} + \Gamma^{\lambda}_{\alpha \mu } V^{\alpha}  \bigr)}_{T_{\mu}^{\lambda}} \quad \text{(2.6.31)}

The covariant derivatives in parentheses in eq. (2.6.31) are, in essence, mixed rank 2 tensor. Thus, we’ll treat them as such. Eq. (2.6.31) becomes:

\displaystyle = \nabla_{\mu} T_{\nu}^{\lambda} - \nabla_{\nu} T_{\mu}^{\lambda} \quad \text{(2.6.32)}

We’ll evaluate the lefthand-most term first. Then, for whatever we get, because the 2 parts of eq. (2.6.32) are symmetric, we can just replace \mu with \nu and vice versa to get the righthand-most term. So:

\nabla_{\mu} T_{\nu}^{\lambda} = \partial_{\mu} T_{\nu}^{\lambda} + \Gamma^{\lambda}_{\sigma \mu} - \Gamma^{\delta}_{\mu \nu} T_{\delta}^{\lambda}

            =\partial_{\mu} \bigl( \partial_{\nu} V^{\lambda} + \Gamma^{\lambda}_{\alpha \nu} V^{\alpha} \bigr) + \Gamma^{\lambda}_{\sigma \mu} \bigl(  \partial_{\nu} V^{\sigma} + \Gamma^{\sigma}_{\alpha \nu} V^{\alpha} \bigr)
            - \Gamma^{\delta}_{\mu \nu} \bigl(  \partial_{\delta} V^{\lambda} + \Gamma^{\lambda}_{\alpha \delta} V^{\alpha} \bigr)

          = \partial_{\mu} \partial_{\nu} V^{\lambda} + V^{\alpha} \partial_{\mu} \Gamma^{\lambda}_{\alpha \nu} + \Gamma^{\lambda}_{\alpha \nu} \partial_{\mu} V^{\alpha}
            + \Gamma^{\lambda}_{\sigma \mu} \partial_{\nu} V^{\sigma} + \Gamma^{\lambda}_{\sigma \mu} \Gamma^{\sigma}_{\alpha \nu} V^{\alpha}
            - \Gamma^{\delta}_{\mu \nu} \partial_{\delta} V^{\lambda} - \Gamma^{\delta}_{\mu \nu} \Gamma^{\lambda}_{\alpha \delta} V^{\alpha} \quad \text{(2.6.33)}

We can quickly obtain the term \nabla_{\nu} T_{\mu}^{\lambda} simply by making the index switch \mu \leftrightarrow \nu. We then subtract this expression from the expression for \nabla_{\mu} T_{\nu}^{\lambda}, recognizing that \partial_a \partial_b = \partial_a \partial_b and \Gamma^{x}_{a b} = \Gamma^{x}_{b a}. That gives us:

\displaystyle \delta V = \cancel{\partial_{\mu} \partial_{\nu} V^{\lambda}} + V^{\alpha} \partial_{\mu} \Gamma^{\lambda}_{\alpha \nu} + \Gamma^{\lambda}_{\alpha \nu} \partial_{\mu} V^{\alpha}
            + \Gamma^{\lambda}_{\sigma \mu} \partial_{\nu} V^{\sigma} + \Gamma^{\lambda}_{\sigma \mu} \Gamma^{\sigma}_{\alpha \nu} V^{\alpha}
            - \bcancel{\Gamma^{\delta}_{\mu \nu} \partial_{\delta} V^{\lambda}} - \bcancel{\Gamma^{\delta}_{\mu \nu} \Gamma^{\lambda}_{\alpha \delta} V^{\alpha}}

            -\cancel{\partial_{\nu} \partial_{\mu} V^{\lambda}} - V^{\alpha} \partial_{\nu} \Gamma^{\lambda}_{\alpha \mu} - \Gamma^{\lambda}_{\alpha \mu} \partial_{\nu} V^{\alpha}
            - \Gamma^{\lambda}_{\sigma \nu} \partial_{\mu} V^{\sigma} - \Gamma^{\lambda}_{\sigma \nu} \Gamma^{\sigma}_{\alpha \mu} V^{\alpha}
            + \bcancel{\Gamma^{\delta}_{\nu \mu} \partial_{\delta} V^{\lambda}} + \bcancel{\Gamma^{\delta}_{\nu \mu} \Gamma^{\lambda}_{\alpha \delta} V^{\alpha}} \quad \text{(2.6.34)}

After the cancellations noted above, we are left with:

\delta V = V^{\alpha} \partial_{\mu} \Gamma^{\lambda}_{\alpha \nu} + \Gamma^{\lambda}_{\alpha \nu} \partial_{\mu} V^{\alpha}
        + \Gamma^{\lambda}_{\sigma \mu} \partial_{\nu} V^{\sigma} + \Gamma^{\lambda}_{\sigma \mu} \Gamma^{\sigma}_{\alpha \nu} V^{\alpha}

        - V^{\alpha} \partial_{\nu} \Gamma^{\lambda}_{\alpha \mu} - \Gamma^{\lambda}_{\alpha \mu} \partial_{\nu} V^{\alpha}
        - \Gamma^{\lambda}_{\sigma \nu} \partial_{\mu} V^{\sigma} - \Gamma^{\lambda}_{\sigma \nu} \Gamma^{\sigma}_{\alpha \mu} V^{\alpha} \quad \text{(2.6.35)}

In the terms \Gamma^{\lambda}_{\sigma \mu} \partial_{\nu} V^{\sigma} and - \Gamma^{\lambda}_{\sigma \nu} \partial_{\mu} V^{\sigma}, we make the dummy index swap \sigma \rightarrow \alpha. We get:

\delta V = V^{\alpha} \partial_{\mu} \Gamma^{\lambda}_{\alpha \nu} + \Gamma^{\lambda}_{\alpha \nu} \partial_{\mu} V^{\alpha}
        + \Gamma^{\lambda}_{\alpha \mu} \partial_{\nu} V^{\alpha} + \Gamma^{\lambda}_{\sigma \mu} \Gamma^{\sigma}_{\alpha \nu} V^{\alpha}

        - V^{\alpha} \partial_{\nu} \Gamma^{\lambda}_{\alpha \mu} - \Gamma^{\lambda}_{\alpha \mu} \partial_{\nu} V^{\alpha}
        - \Gamma^{\lambda}_{\alpha \nu} \partial_{\mu} V^{\alpha} - \Gamma^{\lambda}_{\sigma \nu} \Gamma^{\sigma}_{\alpha \mu} V^{\alpha} \quad \text{(2.6.36)}

That allows us to make further cancellations:

\delta V = V^{\alpha} \partial_{\mu} \Gamma^{\lambda}_{\alpha \nu} + \cancel{\Gamma^{\lambda}_{\alpha \nu} \partial_{\mu} V^{\alpha}}
        + \bcancel{\Gamma^{\lambda}_{\alpha \mu} \partial_{\nu} V^{\alpha}} + \Gamma^{\lambda}_{\sigma \mu} \Gamma^{\sigma}_{\alpha \nu} V^{\alpha}

        - V^{\alpha} \partial_{\nu} \Gamma^{\lambda}_{\alpha \mu} - \bcancel{\Gamma^{\lambda}_{\alpha \mu} \partial_{\nu} V^{\alpha}}
        - \cancel{\Gamma^{\lambda}_{\alpha \nu} \partial_{\mu} V^{\alpha}} - \Gamma^{\lambda}_{\sigma \nu} \Gamma^{\sigma}_{\alpha \mu} V^{\alpha} \quad \text{(2.6.37)}

After factoring out V^{\alpha} and rearranging, we obtain:

\delta V = \underbrace{\Bigl( \partial_{\mu} \Gamma^{\lambda}_{\alpha \nu} - \partial_{\nu} \Gamma^{\lambda}_{\alpha \mu} + \Gamma^{\lambda}_{\sigma \mu} \Gamma^{\sigma}_{\alpha \nu} - \Gamma^{\lambda}_{\sigma \nu} \Gamma^{\sigma}_{\alpha \mu} \Bigr)}_{R^{\lambda}_{\alpha \nu \mu}} V^{\alpha} \quad \text{(2.6.38)}

The expression in parentheses is the Riemann curvature tensor, R^{\lambda}_{\alpha \nu \mu},which – with the exception of some different index names – is the same result as obtained with method 1.

There are other methods of deriving the Riemann curvature tensor, including showing whether or not a geodesic remains parallel, but we won’t discuss them here.

We also won’t prove that the Riemann curvature tensor is, in fact, a tensor. However, for proof of this important fact, I refer you to the discussion in Sean Carroll, Lecture Notes on General Relativity, p. 75.

Instead, topics I’d like to spend some time on include symmetries of the Riemann curvature tensor and the Bianchi identity.

Symmetries

Since the Riemann curvature tensor has 4 indices, each with 4 spacetime components, overall, this tensor has 4 x 4 x 4 x 4 = 256 components. Fortunately, the Riemann tensor has multiple symmetries that reduce the number of components that must be calculated. We’ll list the important ones and prove them. This discussion is based on the YouTube video of eigenchris, Riemann Curvature Tensor Components and Symmetries.

34 Symmetry

From our method 2 derivation of the Riemann Curvature Tensor, we can write:

\delta w = R(\vec{u}, \vec{v})\vec{w} = \nabla_{\vec{u}} \nabla_{\vec{v}}\vec{w} - \nabla_{\vec{v}} \nabla_{\vec{u}}\vec{w} \quad \text{(1)}

where R(\vec{\u}, \vec{v})\vec{w} represents the Riemann Curvature Tensor which takes in vectors \vec{u} and \vec{v} and works on \vec{w} to give us \delta w, the difference in \vec{w} after it’s parallel transported around an infinitesimal loop in 2 different directions.

We can expand out the vectors the Riemann tensor acts on as follows:

\vec{u} = u^i e_i
\vec{v} = v^j e_j
\vec{w} = w^k e_k

Since the Riemann Curvature Tensor is linear [proof: eigenchris, Tensor Calculus 22: Riemann Curvature Tensor Geometric Meaning (Holonomy + Geodesic Deviation), at 18:51], we can pull out the components u^i, v^j and w^k giving us:

R(\vec{u}, \vec{v})\vec{w} = u^i  v^j w^k R(\vec{e}_i, \vec{e}_j)\vec{e}_k
                  =u^i  v^j w^k R^m_{kij} \vec{e}_m \quad \text{(2)}

For simplicity, we’ll ignore the vector components and just focus on the basis vectors for our calculations.

I’ll remind you of 2 relationships that will help us in our derivations:

Metric Compatibility: \nabla_{\vec{z}}(\vec{u} \cdot \vec{v} = (\nabla_{\vec{z}} \vec{u}) \cdot  \vec{v} + \vec{u} \cdot (\nabla_{\vec{z}} \vec{v}) \quad \text{(3)}

Torsion Free: \nabla_{\vec{e}_i} \vec{e}_j = \nabla_{\vec{e}_j} \vec{e}_i \quad \Rightarrow \quad \Gamma^k_{ij}  = \Gamma^k_{ji} \quad \text{(4)}

Compare these 2 relationships:

R(\vec{u}, \vec{v}) = \nabla_{\vec{u}} \nabla_{\vec{v}} - \nabla_{\vec{v}} \nabla_{\vec{u}} \quad \text{(5)}

and

R(\vec{v}, \vec{u}) = \nabla_{\vec{v}} \nabla_{\vec{u}} - \nabla_{\vec{u}} \nabla_{\vec{v}} \quad \text{(6)}

Notice that the terms on the righthand side of eq. (6) are just the negative of the righthand terms in eq. (5). Therefore:

R(\vec{v}, \vec{u}) = -R(\vec{u}, \vec{v}) \quad \text{(7)}

Now, we’ll replace \vec{u} by \vec{e}_i and \vec{v} by \vec{e}_j. Looking back at eq. (2), we see that the basis vectors with indices i and j correspond to the last 2 lower indices of the Riemann Curvature Tensor. So:

R(\vec{e}_j, \vec{e}_i) = - R(\vec{e}_i, \vec{e}_j)\quad \text{(8)}

and

R^m_{kji} = - R^m_{kij} \quad \text{(9)}

If we think of the Riemann Curvature Tensor like this:

R^1_{234}

Then we’d say that the Riemann Curvature Tensor is antisymmetric in its 34 indices. Therefore, eq. (9) is been referred to as the 34 symmetry of the Riemann Curvature Tensor.

Bianchi Identity

This next relationship we’re going to derive is not so much a symmetry as it is an identity but it will be used to prove other symmetries.

Consider the expression:

    \[ R(\vec{e}_a, \vec{e}_b)\vec{e}_c +  R(\vec{e}_c, \vec{e}_a)\vec{e}_b + R(\vec{e}_b, \vec{e}_c)\vec{e}_a \quad \text{(1)} \]

where we’ve just cycled through the lower indices of the basis vectors. We can expand this as follows:

    \begin{align*}  &= \nabla_{\vec{e}_a} \nabla_{\vec{e}_b} \vec{e}_c - \nabla_{\vec{e}_b} \nabla_{\vec{e}_a} \vec{e}_c \\ &+ \nabla_{\vec{e}_c} \nabla_{\vec{e}_a} \vec{e}_b - \nabla_{\vec{e}_a} \nabla_{\vec{e}_c} \vec{e}_b \\ &+ \nabla_{\vec{e}_b} \nabla_{\vec{e}_c} \vec{e}_a - \nabla_{\vec{e}_c} \nabla_{\vec{e}_b} \vec{e}_a  \quad \text{(2)}  \end{align*}

Recall our torsion free rule:

\nabla_{\vec{e}_i} \vec{e}_j = \nabla_{\vec{e}_j} \vec{e}_i

We can, therefore, make index swaps that will allow cancellations. For example, we can make the following change to the fourth term in eq. (2):

\nabla_{\vec{e}a} \nabla{\vec{e}_c} \vec{e}_b \quad \rightarrow \quad \nabla_{\vec{e}a} \nabla{\vec{e}_b} \vec{e}_c \quad \text{(3)}

Thus, the first and fourth terms in eq. (2) cancel. We can make the further index swaps and cancellations:

    \begin{align*}  &= \cancel{\nabla_{\vec{e}_a} \nabla_{\vec{e}_b} \vec{e}_c} - \bcancel{\nabla_{\vec{e}_b} \nabla_{\vec{e}_a} \vec{e}_c} \\ &+ \cancel{\nabla_{\vec{e}_c} \nabla_{\vec{e}_a} \vec{e}_b} - \cancel{\nabla_{\vec{e}_a} \nabla_{\vec{e}_b} \vec{e}_c} \\ &+ \bcancel{\nabla_{\vec{e}_b} \nabla_{\vec{e}_a} \vec{e}_c} - \cancel{\nabla_{\vec{e}_c} \nabla_{\vec{e}_a} \vec{e}_b}\\ &= 0  \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \text{(4)}  \end{align*}

So we have:

R(\vec{e}_a, \vec{e}_b)\vec{e}_c +  R(\vec{e}_c, \vec{e}_a)\vec{e}_b + R(\vec{e}_b, \vec{e}_c)\vec{e}_a  = 0 \quad \text{(5)}

We can write eq. (5) in terms of Riemann tensor components:

R^d_{cab}\vec{e}_d + R^d_{bca}\vec{e}_d + R^d_{abc}\vec{e}_d = 0 \quad \text{(6)}

This is the first Bianchi Identity which applies to torsion free conditions (which are the only conditions we’ll be considering here).

12 Symmetry

When the Riemann Curvature Tensor acts on a dot product, it’s similar to using the product rule in calculus. Proof of this can be found at eigenchris, Tensor Calculus 22: Riemann Curvature Tensor Geometric Meaning (Holonomy + Geodesic Deviation), at 10:32. It looks like this:

R(\vec{e}_a, \vec{e}_b)(\vec{r} \cdot \vec{s}) = R(\vec{e}_a, \vec{e}_b)\vec{r} \cdot \vec{s} + \vec{r} \cdot R(\vec{e}_a, \vec{e}_b) \vec{s} \quad \text{(1)}

Now,

R(\vec{e}_a, \vec{e}_b)(\vec{r} \cdot \vec{s}) = \nabla_{\vec{e}_a} \nabla_{\vec{e}_b}(\vec{r} \cdot \vec{s}) - \nabla_{\vec{e}_b} \nabla_{\vec{e}_a}(\vec{r} \cdot \vec{s}) \quad \text{(2)}

Since (\vec{r} \cdot \vec{s} is a scalar, we can use partial derivatives rather than covariant derivatives:

R(\vec{e}_a, \vec{e}_b)(\vec{r} \cdot \vec{s}) = \partial_a \partial_b(\vec{r} \cdot \vec{s}) - \partial_b \partial_a (\vec{r} \cdot \vec{s}) \quad \text{(3)}

But \partial_a \partial_b = \partial_b \partial_a), so:

R(\vec{e}_a, \vec{e}_b)(\vec{r} \cdot \vec{s}) = \partial_a \partial_b(\vec{r} \cdot \vec{s}) - \partial_b \partial_a (\vec{r} \cdot \vec{s}) = 0 \quad \text{(4)}

Thus, we can write:

    \begin{align*}    0 &= R(\vec{e}_a, \vec{e}_b)(\vec{r} \cdot \vec{s})\\ &= R(\vec{e}_a, \vec{e}_b)(\vec{e}_c \cdot \vec{e}_d)\\ &= \Bigl(R(\vec{e}_a, \vec{e}_b)\vec{e}_c \Bigr) \cdot \vec{e}_d) + \vec{e}_c \cdot \Bigl( R(\vec{e}_a, \vec{e}_b)\vec{e}_d \vec{e}_i \Bigr)   \\ &= \Bigl( R^i_{cab} \vec{e}_i\Bigr)\cdot \vec{e}_d + \vec{e}_c \cdot \Bigl( R^i_{dab} \vec{e}_i  \Bigr) \\ &= \vec{e}_i \cdot \vec{e}_d \Bigl( R^i_{cab}\Bigr) + \vec{e}_c \cdot \vec{e}_i \Bigl( R^i_{dab} \Bigr)\\ &= g_{id}\Bigl( R^i_{cab}\Bigr) + g_{ci}\Bigl( R^i_{dab}\Bigr)  \quad \text{(5)} \end{align*}

The metrics in eq. (5) – g_{id} and g_{ci} – lower the i-indices on the Riemann tensors R^i_{cab} and R^i_{dab}, respectively. We’re left with:

    \[  0 = R_{dcab} + R_{cdab}\]

Which means that:

    \[ R_{dcab} = - R_{cdab} \quad \text{(6)}  \]

Eq. (6) is the so-called Riemann tensor 12 symmetry.

“Flip” Symmetry

The last symmetry we’ll discuss is what we’ll call the flip symmetry. We can express it as:

    \[ R_{abcd}  =  R_{cdab}  \]

Here’s the proof:

We start with the first Bianchi identity in lower index form:

    \[  R_{abcd} +  R_{adbc} +  R_{acdb}  = 0 \quad \text{(1)} \]

Subtract the second and third terms on the lefthand side of eq. (1) from both sides:

    \[  R_{abcd} = - R_{adbc} -  R_{acdb} \quad \text{(2)} \]

We apply 12 symmetry to both terms on the righthand side of eq. (2). (Remember: the 12 symmetry operation changes the sign.) We get:

    \[  R_{abcd} =  R_{dabc}  + R_{cadb} \quad \text{(3)} \]

Now rewrite equivalent expressions for the terms on the righthand side of eq. (3) using the Bianchi identity, like we did with eq. (2):

    \[  R_{abcd} =  -R_{dcab}  - R_{dbca} - R_{cbad}  - R_{cdba} \quad \text{(4)} \]

Next:

  • Use the 12 symmetry relationship for the first, second and third terms on the righthand side of eq. (4).
  • Use the 34 symmetry relationship for the fourth term on the righthand side of eq. (4). (Remember the sign change.)
  • Add the first and fourth terms on the righthand side of eq. (4) (which will be the same after the above 2 symmetry operations).

That gives us:

    \[  R_{abcd} =  2R_{cdab}  + R_{bdca} + R_{bcad}  \quad \text{(5)} \]

Note that the last 2 terms on the righthand side of eq. (5) constitute 2 terms of the Bianchi identity, so we can write: 

    \[  R_{bdca} + R_{bcad}  = -R_{badc} \]

We plug this into eq. (5) to obtain:

    \[  R_{abcd} =  2R_{cdab}  -R_{badc}  \quad \text{(6)}  \]

We can apply the 12 and 34 symmetry relationships to the second term on the righthand side of eq. (6). This flips the sign twice so, overall, the sign of this term doesn’t change. This yields:

    \[  R_{abcd} =  2R_{cdab}  -R_{abcd}  \quad \text{(7)}  \]

Because the Riemann tensor in the second term on the righthand side of eq. (7) is the same as the term on the lefthand side, we can add this to both sides giving us:

    \[  2R_{abcd} =  2R_{cdab}  \quad \text{(8)}  \]

Divide eq. (8) through by 2. We get:

    \[  R_{abcd} =  R_{cdab}  \quad \text{(9)}  \]

Eq. (9) is what we were trying to derive. It says that we can swap the 12 indices of the Riemann tensor with the 34 indices and the resulting 2 Riemann Curvature Tensors will be equal.

Ricci Tensor

This discussion is drawn from the YouTube videos by eigenchris, Tensor Calculus 22 and Tensor Calculus 24.

In order to understand the Ricci tensor, we first have to develop 3 additional concepts:

  • Geodesic deviation
  • Sectional curvature
  • Ricci curvature

Geodesic Deviation

Geodesic deviation for 1) a flat space (a) and 2) a curved space i.e., a spherical surface (b)
Figure 1

Figure 1a shows a vector field consisting of geodesics (yellow lines). We’ll call this vector field \vec{v}. We can represent the separation between these geodesics by the vectors shown in red. They also form a vector field which we’ll call \vec{s}. We can think of the 3 separation vectors as moving, in time, along a geodesic. The separation vectors increase in length linearly with time. Thus, we can write:

    \[ \nabla_{\vec{v}} \vec{s} = c \quad \text{(1)} \]

where c is a constant. If we take the second covariant derivative of the expression on the lefthand side of eq. (1) with respect to \vec{v}, we get:

    \[ \nabla_{\vec{v}} \nabla_{\vec{v}} \vec{s} = 0 \quad \text{(2)} \]

since the derivative of a constant is zero.

Contrast this with the situation on a curved space like the surface of a sphere (figure 1b). If we start at the top of the sphere, the separation vector is small. As we move downward, the separation vector gets bigger and bigger until it reaches a maximum at the equator. Then it gets smaller and smaller again. Obviously, because the magnitude of the separation vector are changing, the first derivative of the separation field with respect to \vec{v} is nonzero. However, unlike the case in flat space, the rate at which the separation vectors change as we move along \vec{v} differs. Thus, the second derivative is also nonzero:

    \[ \nabla_{\vec{v}} \nabla_{\vec{v}} \vec{s} \neq 0 \quad \text{(3)} \]

In fact, using the geodesic deviation is an alternative method to the Riemann tensor in determining whether a space is flat or curved. We can quantify this as follows:

Vector field v and separation vector field s
Figure 2

Consider figure 2 which shows: 1) a vector field made of geodesics (field lines depicted in orange; tangent vectors, \vec{v}, depicted in brown) and 2) another vector field created by separation vectors (field lines depicted as red dotted lines; tangent vectors, \vec{s}, represented by solid red arrows).

We know that, by the definition of a geodesic, that:

    \[ \nabla_{\vec{v}} \vec{v} = 0 \quad \text{(4)}  \]

Taking the covariant derivative with respect to \vec{s} on both sides, we get:

    \[ \nabla_{\vec{s}} \nabla_{\vec{v}} \vec{v} = 0 \quad \text{(5)}   \]

We add zero to the left side of eq. (5) in the form -\nabla_{\vec{v}} \nabla_{\vec{s}} \vec{v} + \nabla_{\vec{v}} \nabla_{\vec{s}} \vec{v}. That gives us:

    \begin{align*}    \nabla_{\vec{s}} \nabla_{\vec{v}} \vec{v} -\nabla_{\vec{v}} \nabla_{\vec{s}} \vec{v} + \nabla_{\vec{v}} \nabla_{\vec{s}} \vec{v} &= 0 \\ R(\vec{s},\vec{v})\vec{v} + \nabla_{\vec{v}} \nabla_{\vec{s}} \vec{v} &= 0  \quad \text{(6)}    \end{align*}

The \vec{v} and \vec{s} field lines form closed rectangles. Therefore, the Lie bracket [\vec{v}, \vec{s}] = \nabla_{\vec{v}} \vec{s} - \nabla_{\vec{s}} \vec{v} = 0 which implies that \nabla_{\vec{v}} \vec{s} = \nabla_{\vec{s}} \vec{v} (i.e., we can swap \vec{v} and \vec{s} in the covariant derivative expression. Applying this to eq. (6) yields:

    \begin{align*}    \nabla_{\vec{v}} \nabla_{\vec{s}} \vec{v} &= -R(\vec{s},\vec{v})\vec{v} \\ \nabla_{\vec{v}} \nabla_{\vec{v}} \vec{s} &= -R(\vec{s},\vec{v})\vec{v} \quad \text{(7)}  \end{align*}

\nabla_{\vec{v}} \nabla_{\vec{v}} \vec{s} is called the geodesic deviation. There are other equivalent ways that it can be expressed:

\nabla_{\vec{v}} \nabla_{\vec{v}} \vec{s} &= -R(\vec{s},\vec{v})\vec{v} = +R(\vec{v},\vec{s})\vec{v}

or

\nabla_{\vec{v}} \nabla_{\vec{v}} \vec{s} &= +R(\vec{s},\vec{v})\vec{v} = -R(\vec{v},\vec{s})\vec{v}

We’ll use the form given in eq. (7). And we can use the geodesic deviation to determine if a space is curved or flat.

Geodesic deviation in flat space
Figure 3

In figure 3a, we have 2 geodesics in flat space (shown in orange) forming a vector field. Another vector field is formed by the separation vectors between the geodesics (shown in red). In figure 3a, the separation vectors are not changing as we go upward (think of moving upward as moving forward in time). Thus, the derivative of \vec{s} is zero.

On the other hand, in figure 3b – again in flat space – the separation vectors are increasing with time but at a constant rate (i.e., the first derivative of \vec{s} is a constant). However, the second derivative is zero (i.e., the derivative of a constant is zero). That makes the geodesic deviation zero.

Contrast this with geodesics in curved space. Figure 4 shows such geodesics.

Geodesic deviation in curved space
Figure 4

In figure 4a, the geodesics converge as one might see on a spherical surface. The separation vectors, \vec{s}, are decreasing but the rate of this decrease is decreasing meaning the that second derivative is negative (depicted by the blue arrow). If we take the inner product of the geodesic deviation with \vec{s}, it will be negative because \vec{s} (red arrows) and the geodesic deviation (blue arrow) point in different directions. Because the geodesic deviation and the Riemann tensor have different signs, the inner product of the Riemann tensor tensor and \vec{s} is positive.

Figure 4b shows a curved space where geodesics diverge. In this case, the separation vectors are increasing so the first derivative is positive. The rate at which they’re changing is increasing with increasing time (i.e., the upward direction). Thus, the geodesic deviation (i.e. second derivative) is positive. It points in the same direction as \vec{s} so the inner product of the geodesic deviation and \vec{s} is positive. Because the geodesic deviation and the Riemann tensor have different signs, the inner product of the Riemann tensor tensor and \vec{s} is negative.

Sectional Curvature

The inner product of the Riemann tensor with the separation vector tells us whether geodesics in a curved space are converging or diverging. However, this quantity depends on the magnitudes of vectors \vec{v} and \vec{s}. We’d like to us a quantity that’s normalized. We can do this by dividing by the square of the area of the parallelogram formed by \vec{v} and \vec{s}.

Consider the parallelogram shown in figure 5.

Area of a parallelogram
Figure 5

We know that the area of this parallelogram is given by:

    \begin{align*}    A &= \underbrace{\left \| \vec{u} \right \|}_{\text{base}} \underbrace{\left \| \vec{w \sin \theta} \right \|}_{\text{height}} \\ &= \left \| \vec{u} \times \vec{w} \right \| \end{align*}

Therefore,

    \begin{align*}    A^2 &= \left \| \vec{u} \right \|^2 \left \| \vec{w} \right \|^2 (\sin \theta)^2 \\ &= \left \| \vec{u} \right \|^2 \left \| \vec{w} \right \|^2 \bigl(1 - (\cos \theta)^2 \bigr) \\ &= \left \| \vec{u} \right \|^2 \left \| \vec{w} \right \|^2 - \bigl( \left \| \vec{u} \right \| \left \| \vec{w} \right \| \cos \theta  \bigr)^2 \\ &= \bigl( \vec{u} \cdot \vec{u} \bigr) \bigl( \vec{w} \cdot \vec{w} \bigr) - \bigl( \vec{u} \cdot \vec{w} \bigr)^2 \quad \text{(1)} \end{align*}

To normalize the inner product of the Riemann Curvature Tensor and the separation vector, we divide that entity by eq. (1). What we get is the sectional curvature:

    \[ K(\vec{s}, \vec{v}) = \frac{\bigl[ R(\vec{s}, \vec{v}) \vec{v} \bigr] \cdot \vec{s}}{\bigl( \vec{s} \cdot \vec{s} \bigr) \bigl( \vec{v} \cdot \vec{v} \bigr) - \bigl( \vec{s} \cdot \vec{v} \bigr)^2} \quad \text{(2)}  \]

eigenchris, at 12:35 of the Youtube video Tensor Calculus 24: Ricci Tensor Geometric Meaning (Sectional Curvature) , proves that this relationship holds for any vectors in the plane of \vec{v} and \vec{s}.

Ricci Curvature

From the sectional curvature, we can define the Ricci curvature. We do this by taking an orthonormal basis \{  \vec{e}_1, \vec{e}_2,  \dots, \vec{e}_n \} and select a direction vector \vec{v} = \vec{e}_n. We then define the Ricci curvature in the direction of \vec{v} as the sum of the sectional curvatures along every other basis vector direction aside from \vec{v}. Thus, we can think of the Ricci curvature in the direction of \vec{v} as the average of all possible sectional curvatures containing \vec{v}:

    \[  Ric(\vec{v}, \vec{v}) =  \sum_{i=1}^{n-1} K(\vec{\vec{e}_i}, \vec{v}) = \sum_i \frac{\bigl[ R(\vec{e}_i, \vec{v}) \vec{v} \bigr] \cdot \vec{e}_i}{\bigl( \vec{e}_i \cdot \vec{e}_i \bigr) \bigl( \vec{v} \cdot \vec{v} \bigr) - \bigl( \vec{e}_i \cdot \vec{v} \bigr)^2} \quad \text{(1)}  \]

The geometric interpretation of the Ricci curvature is that it reflects the volume of an object as it flows along geodesics.

Ricci curvature = 0 in flat space
Figure 1

Figure 1 shows a sphere in flat space. Sectional curvature in both the \vec{e}_1 and \vec{e}_2 directions are zero. Thus, the Ricci curvature, which is the sum of these sectional curvatures, is also zero. In such a setting, the sphere’s volume remains unchanged at all points along the geodesic.

Ricci curvature > 0
Figure 2

Figure 2 shows a sphere in curved space where geodesics converge. Sectional curvature in both the \vec{e}_1 and \vec{e}_2 directions are positive. Thus, the Ricci curvature is also positive. In this setting, the sphere’s volume decreases from the lefthand portion to the righthand portion of the geodesics.

Ricci curvature < 0
Figure 3

Figure 3 shows a sphere in curved space where geodesics diverge. Sectional curvature in both the \vec{e}_1 and \vec{e}_2 directions are negative. Thus, the Ricci curvature is negative. In this setting, the sphere’s volume increases from the lefthand portion to the righthand portion of the geodesics.

Ricci curvature = 0 in curved space
Figure 4

The situation depicted in figure 4 is a bit more complicated. The separation vectors in the \vec{e}_1 direction diverge while the separation vectors in the \vec{e}_2 direction converge. Accordingly, the sectional curvature associated with the \vec{e}_1 direction is negative but the sectional curvature in the \vec{e}_2 direction is positive. In this case, the sectional curvatures in each direction counteract each other, making the Ricci curvature zero. However, this, obviously, is not flat space. Here, the sphere elongates in the in the \vec{e}_1 direction and shortens in the \vec{e}_2 direction but the volume of the sphere remains the same. Thus, the Ricci curvature is can tells us about changes in volume as we move along geodesics but not change in shape.

Note that figures 1-4 contain an error. Since we only have 2 curvature directions, I should have represented the “volume” element changing with geodesics, not as a sphere but as a circular disc – a 2D object. The general arguments given in the explanations, however, still apply.

Our next task is to see how the Ricci curvature relates to the Ricci tensor. Starting with the definition of the Ricci curvature:

    \begin{align*}   Ric(\vec{v}, \vec{v}) &= \sum_i \frac{\bigl[ R(\vec{e}_i, \vec{v}) \vec{v} \bigr] \cdot \vec{e}_i}{\bigl( \vec{e}_i \cdot \vec{e}_i \bigr) \bigl( \vec{v} \cdot \vec{v} \bigr) - \bigl( \vec{e}_i \cdot \vec{v} \bigr)^2} \\ &= \sum_i \frac{\bigl[ R(\vec{e}_i, \vec{v}) \vec{v} \bigr] \cdot \vec{e}_i}{\bigl( \underbrace{\cancel{\vec{e}_i \cdot \vec{e}_i}}_{1} \bigr) \bigl( \underbrace{ \cancel{\vec{v} \cdot \vec{v}}}_{1} \bigr) - \bigl( \underbrace{\cancel{\vec{e}_i \cdot \vec{v}}}_{0} \bigr)^2} \\ &= \bigl[ R(\vec{e}_i, \vec{v}) \vec{v} \bigr] \cdot \vec{e}_i \\ &= \bigl[ R(\vec{e}_i, v^j \vec{e}_j) v^k \vec{e}_k \bigr] \cdot \vec{e}_i \\ &= v^j v^k \bigl[ R(\vec{e}_i, \vec{e}_j) \vec{e}_k \bigr] \cdot \vec{e}_i \\ &= v^j v^k R^i_{kij} \\ &= v^j v^k R_{kj} \end{align*}

R_{ij} is the Ricci tensor. When the Ricci tensor acts twice on the same vector, we get the Ricci curvature. It can be shown that this is true for any orthonormal basis.

We can generalize the above concepts to any arbitrary basis (not just orthonormal bases) by making use of the volume element derivative. Proof of this can be found at the eigenchris YouTube video Tensor Calculus 25 – Geometric Meaning Ricci Tensor/Scalar (Volume Form).

Note that the only contraction of the Riemann curvature tensor that’s nonzero is the contraction of its first and third indices, the contraction that results in the Ricci tensor.

Ricci Scalar

This discussion is largely taken from the eigenchris YouTube video Tensor Calculus 25 – Geometric Meaning Ricci Tensor/Scalar (Volume Form).

The Ricci scalar is defined as the contraction of the Ricci Tensor with and upper and lower index:

    \[ R = R^i_i = g^{ij} R_{ij} \quad \text{(1)} \]

The geometric meaning of the Ricci scalar can be thought of as representing how the volume of a ball in curved space deviates from its volume in flat space. This is illustrated in figure 1.

Ricci Scalar: Area of circle of same circumference is greater in curved space than in flat space
Figure 1

Figure 1a shows a circle with area = A in flat space. A circle of the same circumference in a space with positive curvature has a greater area (figure 1b). In this case, the curved space is a 2D spherical surface and the area that we’re talking about is like a cap at the north pole of the sphere. This idea can be expanded to higher dimensions but visualization in such higher dimensions is more difficult.

On the other hand, for a circle of radius r in flat space, the area of is \pi r^2. However, in a positively curved space like the spherical surface shown in figure 1b, a circle of the same radius is less than \pi r^2. (Here, the radius is not the radius from the center of the circular base to the circumference of the circle, but rather, the arc length on the sphere’s cap measured from the north pole to the circumference of the circular base.)

Things are exactly opposite for a negatively curved space like a saddle-shaped 2D space.

These concepts are summarized in the figures below, taken from the eigenchris video cited above.

Ricci scalar: area in flat vs. positively curved space
Figure 2
Ricci scalar: area in flat vs. negatively curved space
Figure 3

Properties of the Ricci Tensor/Scalar

There are some important properties of the Ricci tensor and scalar that we should discuss. I’ll prove a couple of them but I’ll just state the second Bianchi identity since it’s proof is tedious. However, proof of this identity can be found at 8:43 of the YouTube video by eigenchris, Tensor Calculus 26 – Ricci Tensor/Scalar Properties.

Ricci Tensor Symmetry

    \[ R_{ij} = R{ji}  \quad \text{(1)} \]

Second Bianchi Identity

    \[ R^d_{cab;i} +  R^d_{cia;b} +  R^d_{cbi;a} = 0  \quad \text{(2)} \]

Contracted Bianchi Identity

    \[ \nabla_n \bigl( R^{mn} -\frac12 Rg^{mn}  \bigr) = \nabla_n G^{mn} = 0   \quad \text{(3)}  \]

Einstein Field Equation

At the center of general relativity is Einstein’s field equation. It’s what we’ve been working toward and solutions to this equation 1) supply the answer to questions Newtonian gravity cannot answer and 2) provide a description of some of the most interesting phenomena in all of physics.

I’ll present 2 derivations of this equation here. The first follows, in a general way, how Einstein originally arrived at the equation. The second is a derivation from first principles, a derivation based on the principle of stationary action.

Derivation 1

Intuition

We’ve already talked about the geodesic equation, which says that the covariant acceleration along a geodesic is zero:

    \[\frac{d^2 x^{\sigma}}{d\tau^2} + \Gamma^{\sigma}_{\mu \nu} \frac{dx^{\mu}}{d\tau} \frac{dx^{\mu}}{d\tau} =0 \quad\text{(1)}\]

so

    \[\frac{d^2 x^{\sigma}}{d\tau^2} = -\Gamma^{\sigma}_{\mu \nu} \frac{dx^{\mu}}{d\tau} \frac{dx^{\mu}}{d\tau} \quad \text{(2)} \]

In non relativistic physics, Newton’s second law is:

    \[  F=ma = mg  \quad \Rightarrow \quad a = g =\frac{F}{m} \quad \text{(3)}\]

where g is the acceleration due to gravity of the earth.

Let’s assume our mass is a unit mass (i.e. m=1). Then:

    \[ a = F \quad \text{(4)}  \]

We know that \displaystyle a = \frac{d^2 x}{dt^2}  \quad \text{(5)}

Thus:

    \[\frac{d^2 x}{dt^2}  = F  \quad \text{(6)}  \]

If we compare the Newtonian equation, eq. (6), with the geodesic equation, eq. (2), we can see a similarity. On the left side of both equations are acceleration terms. On the righthand side of the geodesic equation is a term containing the Christoffel symbol that we can consider analogous to a force.

Any relativistic theory of gravity should reduce to the Newtonian gravitation. Thus, we can see what happens to eq (2) under the non relativistic conditions in which Newtonian gravitation is applicable. These conditions are:

  1. Speeds are slow (i.e., v \ll c)
  2. Gravity is weak (i.e., we can consider it a perturbation of flat space)
  3. The gravitational field is static (i.e., unchanging with time)

Condition 1 means that velocities in the spatial directions are negligible compared to velocity in the time direction, which is the speed of light. Thus,

    \[ \frac{dx^i}{d\tau} \ll \frac{dt}{d\tau}  \quad \text{(7)}   \]

Therefore:

    \[\frac{d^2 x^{\sigma}}{d\tau^2} + \Gamma^{\sigma}_{\mu \nu} \frac{dx^{\mu}}{d\tau} \frac{dx^{\mu}}{d\tau} =0 \quad \Rightarrow \quad \frac{d^2 x^{\sigma}}{d\tau^2} + \Gamma^{\sigma}_{00} \frac{dt}{d\tau}^2  =0 \quad\text{(8)} \]

Next, it’s known from special relativity that:

    \[ \frac{dt}{d\tau} = \gamma = \frac{1}{\sqrt{1-\frac{v^2}{c^2}}}   \quad\text{(9)} \]

But \displaystyle v \ll c. So, \displaystyle \frac{v^2}{c^2} \approx 0 and therefore:

    \[  \frac{dt}{d\tau} \approx 1 \quad\text{(10)}  \]

This makes eq. (8):

    \[ \frac{d^2 x^{\sigma}}{d\tau^2} + \Gamma^{\sigma}_{00}   =0 \quad\text{(11)}   \]

\displaystyle \frac{dt}{d\tau} \approx 1 also implies that d\tau \approx dt. Thus:

    \[ \frac{d^2 x^{\sigma}}{dt^2} + \Gamma^{\sigma}_{00}   =0 \quad\text{(11)}   \]

If we let \sigma = 0, \displaystyle \frac{d x^{\sigma}}{dt} = \frac{d t}{dt} =1. Then \displaystyle \frac{d^2 x^{\sigma}}{dt^2} = \frac{d^2 t}{dt^2} =\frac{d(1)}{dt} = 0. Thus:

    \begin{align*} \frac{d^2 x^0}{d\tau^2} + \Gamma^{\sigma}_{00}   &=0 \quad\text{(11)} \\ &= 0 +  \Gamma^0_{00} &= 0\end{align*}

which implies that

    \[ \Gamma^0_{00} = 0  \]

This means that the only nonzero Christoffel symbols are those where \sigma = 1,\,2 \text{ or } 3 (i.e., \Gamma^i_{00} \neq 0.

So eq. (11) becomes:

    \[ \frac{d^2 x^i}{dt^2} + \Gamma^i_{00}   =0 \quad\text{(12)}   \]

Expressing \Gamma^i_{00} in terms the metric, we have:

    \[ \Gamma^i_{00} = \frac12 g^{i \lambda}\bigl(  \partial_0 g_{ \lambda 0} +  \partial_0 g_{0 \lambda} - \partial_{\lambda} g_{00}\bigr)  \quad\text{(13)}  \]

Because the field is static, the metric isn’t changing which means that the time derivatives of the metric are zero. Therefore:

    \[ \Gamma^i_{00} = \frac12 g^{i \lambda}  \partial_{\lambda} g_{00}   \quad\text{(14)} \]

Now the second condition under which we’re operating (i.e., the gravitational field is weak) means that we can decompose the metric into the Minkowski metric plus a small perturbation:

    \[ g_{\mu \nu} = \eta_{\mu \nu} +  h_{\mu \nu}   \quad\text{(15)} \]

We need to find an expression for the inverse metric that we can use in eq. (14). We can do this as follows:

    \[ \bigl( n^{\alpha \mu} - h^{\alpha \mu} \bigr) \bigl( n_{\mu \nu} + h_{\mu \nu} \bigr) &= g^{\alpha \mu}g_{\mu \nu} = \delta^{\alpha}_{\nu} \quad\text{(16)} \]

    \[ \underbrace{n^{\alpha \mu} n_{\mu} \nu}}_{\delta^{\alpha}_{\mu}} + \cancel{n^{\alpha \mu} h_{\mu \nu}} - \cancel{h^{\alpha \mu} n_{\mu \nu}} - \underbrace{h^{\alpha \mu} h_{\mu \nu}}_{h^2=0} &= g^{\alpha \mu}g_{\mu \nu} = \delta^{\alpha}_{\nu}  \]

To see why n^{\alpha \mu} h_{\mu \nu}} - h^{\alpha \mu} n_{\mu \nu} = 0, click .

Looking at eq. (16), we can equate the terms n^{\alpha \mu} - h^{\alpha \mu} and g^{\mu \nu}

Renaming indices and recognizing the symmetry of the Minkowski metric and metric tensor, we get:

    \[  g^{\mu \nu} = n^{ \mu \nu} - h^{ \mu \nu}   \quad\text{(17)}  \]

We then have:

    \begin{align*} g_{00} &=  \eta_{00} + h_{00} \\  &= 1 + h_{00}   \quad\text{(18)}  \end{align*}

So:

    \begin{align*} \partial_{\lambda}g_{00} &=  \partial_{\lambda}\eta_{00} + \partial_{\lambda}h_{00} \\  &= 0 + \partial_{\lambda} h_{00}  \\ &=  \partial_{\lambda} h_{00}   \quad\text{(18)}  \end{align*}

Putting the results of eq. (18) into eq. (14), we get:

    \begin{align*} \Gamma^i_{00} &= -\frac12 g^{i \lambda}  \partial_{\lambda} h_{00} \\ &= -\frac12 \bigl( \eta^{i \lambda} - h^{i \lambda} \bigr) \partial_{\lambda} h_{00} \\ &= -\frac12 \eta^{i \lambda} \partial_{\lambda} h_{00} + \underbrace{h^{i \lambda} \partial_{\lambda} h_{00}}_{\approx 0} \\ &= -\frac12 \eta^{i \lambda} \partial_{\lambda} h_{00} \quad\text{(19)} \end{align*}

All of the spatial terms in the Minkowski metric equal 1 so eq. (19) becomes:

    \[  \Gamma^i_{00} = -\frac12  \partial_i h_{00} \quad\text{(20)} \]

Substituting eq. (20) into eq. (12) gives us:

    \begin{align*} \frac{d^2x^i}{dt^2} &= -\Gamma^i_{00} \\ &= -\bigl(-\frac12 \partial_i h_{00}\bigr)\\ &= \frac12 \partial_i h_{00} \quad \text{(21)} \end{align*}

What this means is that, in Newtonian gravitational theory, the acceleration tells an object how to move. In general relativity, however, the thing that tells an object how to move is curvature of spacetime (which in eq. (21) is represented by the small perturbation of the metric tensor, h_{00}).

We can continue this line of thought by noting that, in Newtonian gravity, for a unit mass:

    \[ \frac{d^2x^i}{dt^2} = a =  F = -\nabla \phi \quad \text{(22)}  \]

But we just saw that, in the Newtonian limit of general relativity:

    \[ a = \frac12 \partial_i h_{00}  \quad \text{(23)}  \]

Therefore:

    \begin{align*} -\nabla \phi  &= \frac12 \partial_i h_{00}  \\ \int -\nabla \phi  &= \int \frac12 \partial_i h_{00}  \\ \phi &= \frac12 h_{00}  + C  \quad \text{(24)}  \end{align*}

where C is a constant of integration.

It’s well-known that Poisson’s equation is the field equation of Newtonian gravity:

    \[ \nabla^2 \phi = 4 \pi G \rho (x,t)  \quad \text{(25)}  \]

where

  • \nabla^2 \phi is the Laplacian operator, an operator that takes 2 spatial derivatives of whatever it operates on
  • \phi is the gravitational field
  • \rho(x,t) is the mass density which is a function of space and time

Derivation and solution of eq. (25) can be found here.

Substituting the value for \phi obtained in eq. (24) into eq. (25) yields:

    \begin{align*} \nabla^2 \bigl( \frac12 h_{00}\bigr) &= 4 \pi G \rho (x,t) \\  \nabla^2  h_{00} &= 8 \pi G \rho (x,t) \quad \text{(26)}  \end{align*}

In eq. (26), we begin to see the beginnings of what a field equation in general relativity might look like:

    \[ \text{spacetime curvature } \propto \text{ mass density}  \quad \text{(27)} \]

There’s a problem with eq. (26) though; it’s not a tensor equation. We know the laws of physics need to hold in all coordinate systems. The only way to make this happen is to express these laws with tensor equations. Our goal for the rest of this derivation is to convert eq. (26) into a tensor equation and to generalize this tensor equation so it applies to all conditions, not just to conditions where Newtonian mechanics are valid.

The first dilemma we encounter is that mass density, itself, is not tensorial. Why? Because the mass density differs in differing reference frames. Figure 1 illustrates such a situation.

Affect of length contraction on mass density
Figure 1

In the diagram, an observer A, in the rest frame of a box of particles, sees a mass density of \displaystyle \frac{m}{v} = \rho where m is mass, V is volume and \rho is the density. However, an observer B, moving to the left with respect to the box, because of length contraction, sees the volume of the box decrease by one-half. Therefore, he sees the mass density of the box as 2 times that noted by observer A: \displaystyle \frac{m}{0.5\,V} = 2\rho. But it’s the same box; this should not be.

The answer to this conundrum is to somehow make the mass density a tensor. In special relativity, the manner in which non-tensor quantities were made into 1-tensors that are invariant under Lorentz transformations was to make them into 4-vectors. Furthermore, it’s worth noting that mass is but one component of the 4-momentum:

    \[  P^u = \left(  \frac{E}{c}, p_x,  p_y,  p_z  \right) \quad \text{(1)} \]

where E=mc^2. It makes sense, then, that use of this fact might play a role in developing a tensor for mass density. Indeed, this is the case. The tensor that’s found on the right side of the Einstein field equation, the energy-momentum tensor, describes the flow of each component of the 4-momentum in each direction of spacetime. In what follows, we’ll offer a more detailed account of the energy-momentum tensor.

Energy-Momentum Tensor

Energy-Momentum Tensor
Figure 1 (Maschen, based on File:StressEnergyTensor.svg created by Bamse, CC0, via Wikimedia Commons)

Figure 1 depicts the energy-momentum tensor. The remainder of this section will be devoted to explaining what it means. The derivation that follows is taken largely from the YouTube video by dXoverdteqprogress, Energy-momentum tensor.

Suppose we have have a cluster of particles that don’t interact with each other but which are all moving with the same velocity, \displaystyle \frac{dx}{dt}.

The number of particles per unit volume is given by:

    \[ n_0 = \frac{N}{V}  \quad \text{(1)} \]

where n_0 is the number density and N is the number of particles in volume V.

We saw above that, because of length contraction, observer A, who measures the number density of our particle cluster in his own rest frame as \displaystyle n_0 = \frac{N}{V} will measure the cluster’s number density, when it’s moving relative to him, as:

    \[ n = \frac{N}{V}\Gamma = n_0 \Gamma  \quad \text{(2)} \]

We define current as the flow of “stuff” across a unit surface per unit time:

    \[  J^i = \rho v^i  \quad \text{(3)} \]

where J^I is the ith component of current, rho is the number density of the “stuff”, and v^i is the ith component of the velocity at which the “stuff” is flowing.

The mass density \epsilon is given by the particle number density times the mass, m, of each particle (which is assumed to be the same for all the particles):

    \[ \epsilon = n_0 m  \quad \text{(4)}  \]

When special relativity is considered, eq. (4) becomes:

    \[ \epsilon_{\text{Rel}}  = n_0 m \gamma^2  \quad \text{(5)}   \]

Notice that we have 2 factors of \gamma in eq. (5): one due to length contraction and one due to relativistic increase in mass in the moving frame.

Now, the time component of the 4-velocity equals \gamma:

    \[ u^0 = \frac{dt}{d\tau} = \gamma  \quad \text{(6)}  \]

Substituting eq. (6) into eq. (5), we get:

    \[ \epsilon_{\text{Rel}}  = n_0 m u^0 u^0  \quad \text{(7)}  \]

From now on, since we are working with relativistic physics, we’ll drop the “Rel” subscript on \epsilon and simply take \epsilon_{\text{Rel}} = \epsilon.

So, combining the current equation, eq. (3), with eq. (7), we find that the flow of energy per unit time across a unit surface perpendicular to the ith component of space (which is also referred to as energy flux, \sigma) is:

    \begin{align*} \sigma^i &= \epsilon v^i  \\ &= n_0 m u^0 u^0 v^i \quad \text{(8)}  \end{align*}

We note the following relationships involving the 4-velocity:

    \[  u^0 = \frac{dt}{d\tau} = \gamma   \]

    \[ u^i = \frac{dx^i}{d\tau} \frac{dt}{dt} =  \frac{dx^i}{dt} \frac{dt}{d\tau} = \gamma v^i = u^0 v^i  \quad \text{(9)}  \]

Substituting eq. (9) into eq. (8), we have:

    \[ \sigma^i =  n_0 m u_0 u_i   \quad \text{(10)}  \]

We know that energy cannot be created or destroyed. Thus, the energy density in a given volume must decrease if net energy is leaving the volume through the “surfaces” of the volume, or increase if net energy enters the volume through its “surfaces.” The equation that describes this concept is called the continuity equation:

    \[ \frac{\partial \epsilon}{\partial t} + \nabla \cdot \vec{\sigma^i} = 0   \quad \text{(11)}  \]

which can be written as:

    \[ \frac{\partial \epsilon^i}{\partial x^i} = 0, \quad \epsilon^i = (\epsilon, \sigma^1, \sigma^2, \sigma^3 )  \quad \text{(12)}  \]

In eq. (12), \epsilon is the flow of energy in the time direction (i.e., across a surface perpendicular to the ct-axis). This means that there is no velocity in the spatial directions (i.e., the clump of energy per unit volume we’re talking about is at rest); which means that \epsilon is just the energy density. The \sigma^i terms, on the other hand, represent the flow of energy in the i-direction (i.e., across a surface perpendicular to the i-axis).

We can make a similar analysis for each of the components of momentum.

Momentum density, P^i, is given by:

    \[  P^i = n_o m v^i \gamma^2 = n_o m \underbrace{v^i u^0}_{u^i} u^0  = n_0 m u^i u^0  \quad \text{(13)}   \]

where mv^i is the momentum of the particles in the i-direction. The ith component of momentum density, P^i, traversing a surface perpendicular to the jth spatial direction, then, is:

    \[ P^{ij }= P^i v^j = n_0 m u^i \underbrace{u^0 v^j}_{u^j} = n_0 m u^i u^j \quad \text{(14)}  \]

We can write a continuity equation for momentum, just as we did for energy:

    \[ \frac{\partial P^i}{\partial t} + \nabla \cdot \vec{P}^i = 0   \quad \text{(15)}  \]

which can be written as:

    \[ \frac{\partial P^i}{\partial x^j} = 0, \quad P^{i j}= (P^i, P^{i1}, P^{i2}, P^{i3} )  \quad \text{(16)}  \]

To summarize, we have:

EntityPhysical MeaningEquations
Energy (E) densityFlow of E in time (x^0) direction\epsilon = T^{00} = n_0 m u^0 u^0
Energy (E) fluxFlow of E in spatial (x^i) directions\sigma^i = T^{0i} = n_0 m u^0 u^i
Momentum (P) densityFlow of P in time (x^0) directionP^i = T^{i0} = n_0 m u^i u^0
Momentum flux (P)Flow of P in spatial (x^i) directionsP^{i j}= T^{ij} = n_0 m u^i u^j

At this point, we can put the elements that we derived into matrix form.

Energy momentum tensor matrix with terms like in text
Figure 2
Energy momentum tensor matrix with equations like in text
Figure 3
Symmetry

The energy-momentum tensor is symmetric. In what follows, I’ll provide some intuition as to why this is so. This discussion is patterned after

Schutz, Bernard. A First Course in General Relativity. Second ed., Cambridge University Press, 2009, pp. 97-8.

Toi = Tio symmetry

The energy flux terms and momentum density terms are symmetric. We can see this in the following way:

  • In special relativity, energy is equivalent to mass: E = m
  • Energy density is energy per unit volume: \displaystyle \frac{E}{Vol}
  • Energy flux is given by energy density times volume: \displaystyle \frac{E}{Vol} \cdot v^i = \frac{m}{Vol} \cdot v^i where v^i is the velocity in direction i
  • But momentum times velocity in direction i equals momentum in direction i: P^i = m \cdot v^i
  • Therefore,

    \[ \displaystyle \underbrace{\frac{E}{Vol} \cdot v^i }_{\text{Energy flux}}\,\,\,=\,\,\, \frac{m}{Vol} \cdot v^i \,\,\,= \underbrace{\frac{P^i}{Vol}}_{\substack{\text{Momentum } \\ \text{density}}} \quad \text{(1)}  \]

Tij – Tji symmetry

Understanding why the stress terms of the energy-momentum tensor are symmetric is more difficult.

Cube-shaped volume element for proof Tij = Tji.
Figure 1

Consider the cube-shaped element with sides of (very small) length \ell, as shown in figure 1. We will consider forces on the sides labeled 1, 2, 3 and 4. The force by the element on an adjacent element due to flow across side 1 is given by \mathbf{F}^i_1 = T^{ix}\ell^2 where \ell^2 is the area of the surface. Likewise, the force by the element through side 2 on an adjacent element is \mathbf{F}^i_2 = T^{iy}\ell^2. Here, i runs from 1-3 because \mathbf{F} is not necessarily perpendicular to the surface under consideration. Because the force associated with flow through side 1 is in the +x-direction, flow (and thus force) through side 3 (the side opposite side 1) must be be in the opposite direction. Thus, \mathbf{F}^i_3 = -\mathbf{F}_1 = -T^{ix}\ell^2. Similarly, \mathbf{F}^i_4 = -\mathbf{F}_2 = -T^{iy}\ell^2.

Since our element experts forces on its surroundings, its surroundings expert equal and opposite forces on our element. Thus, the forces on each face of our cubic element are -\mathbf{F}^i_1, -\mathbf{F}^i_2, \mathbf{F}^i_3 and \mathbf{F}^i_4

Next, let’s calculate the torque on the cube generated by each of these forces. The torque due to -\mathbf{F}^i_1 is \tau_1 = -(\mathbf{r} \times \mathbf{F}_1)^z = -xF^y_1 = \displastyle -\frac12 \ell T^{yx} \ell^2. By similar reasoning, the torques related to forces on other faces of the cube are:

\displaystyle \tau_3 = -\frac12 \ell T^{yx} \ell^2
\displaystyle \tau_2 = \frac12 \ell T^{xy} \ell^2
\displaystyle \tau_4 = \frac12 \ell T^{xy} \ell^2

We add all of these to get the total torque, \tau_{\text{tot}}:

    \begin{align*}\displastyle  \tau_{\text{tot}} &= \underbrace{-\frac12 \ell T^{yx} \ell^2}_{\tau_1} + \underbrace{(-)\frac12 \ell T^{yx} \ell^2}_{\tau_3} + \underbrace{\frac12 \ell T^{xy} \ell^2}_{\tau_2} + \underbrace{\frac12 \ell T^{xy} \ell^2}_{\tau_4} \\ &= \Bigl( -\frac12 T^{yx}  -\frac12 T^{yx} + \frac12 T^{xy} + \frac12 T^{xy} \Bigr) \ell^3 \\ &= \ell^3 \Bigl(  T^{xy} - T^{yx} \Bigr) \quad \text{(2)}  \end{align*}

The moment of inertia, I, is proportional to mr^2 where m is mass and r is the moment arm – the radius from the origin of the axis we’re rotating about and the point where the force giving rise to torque resides, so we have:

    \[ I = \alpha m \ell^2 = \alpha \frac{m}{\ell^3} \cdot \ell^3 \cdot \ell^2 = \alpha \rho \ell^5 \quad \text{(3)}  \]

The angular momentum, \ddot \theta, created by the total torque is:

    \[ \ddot \theta = \frac{\tau_{\text{tot}} }{I} = \frac{ \ell^3 \Bigl( T^{xy} - T^{yx} \Bigr)}{\ell^5} = \frac{T^{xy} - T^{yx}}{\alpha \rho \ell^2} \quad \text{(4)} \]

As \ell goes to zero, \ddot \theta goes to infinity. But this makes no physical sense; we don’t see fluid elements whirling around inside fluids at infinite speeds. The only way this can be avoided is to say:

    \[ T^{xy} - T^{yx} = 0 \quad \Rightarrow \quad T^{xy} = T^{yx} \quad \text{(5)} \]

Therefore, the stress components of the energy-momentum tensor are symmetric.

Note that, although I understand the overall plan of the argue, I have a hard time visualizing where the author is putting his lever arms and where the forces are acting to arrive at the outcome at which he arrives. But then, I was never very good at torque problems. If anyone can enlighten me, please leave a comment via the contact page.

Types

There are a couple of important forms of the energy-momentum tensor that are commonly encountered: dust and a perfect fluid. We’ll discuss each in turn.

Dust

Dust is a collection of non-interacting particles at rest with respect to each other (although the collection might be moving with some velocity, called the drift velocity). Because the particles are at rest in this frame of reference (referred to as the momentarily comoving rest frame – MRCF). Being at rest means the particles have no velocity; which means U^0 = c,\,U^1 = 0,\,U^2 = 0,\,U^3 = 0. Given this, looking at figure 3, we can see that the only term that’s nonzero is n_0 m u^0 u^0. This is the T^{00} = \rho c^2 component of the energy-momentum tensor. So the energy-momentum tensor of dust is:

    \[\begin{pmatrix} pc^2 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \\  0 & 0 & 0 & 0 \\  0 & 0 & 0 & 0 \\ \end{pmatrix}\]

Perfect Fluid

A perfect fluid is one in which particles within a fluid element move randomly in all directions. This creates pressure but there is no heat conduction (i.e., no energy flux and no momentum density; T^{io} = T^{oi} =0) and no viscosity (meaning there are no shear stress forces, and thus, no terms where T^{ij} with i \neq j). There is however, energy density and pressure (i.e., T^{ij} terms were with i  =  j). Without providing a detailed derivation, the energy-momentum tensor for a perfect fluid is:

    \[\begin{pmatrix} \rho c^2 & 0 & 0 & 0 \\ 0 & P & 0 & 0 \\  0 & 0 & P & 0 \\  0 & 0 & 0 & P \\ \end{pmatrix}\]

An equation that summarizes the energy-momentum tensor of a perfect fluid is:

    \[ T^{\mu \nu} = \bigl( \rho + \frac{P}{c^2}  \bigr) U^{\mu} U^{\nu} - P \eta^{\mu \nu} \quad \text{(1)} \]

To see this, consider the case of dust. We know that In the MCRF

    \[ U^u = 1,0,0,0 \quad \text{(2)}  \]

Where there are no super dense objects around, p \ll \rho. Under these conditions, P \approx 0 and the energy-momentum tensor reduces to that of dust:

    \begin{align*} T^{\mu \nu} &= \bigl( \rho + \frac{0}{c^2}  \bigr) U^{\mu} U^{\nu} - (0) \eta^{\mu \nu} \\ &= \rho U^{\mu} U^{\nu} \\ &= \rho U^0 U^0 \\ &= \rho c^2 \quad \text{(3)} \end{align*}

For a perfect fluid, in a MRCF:

U^0 = c
U^1 = U^2 = U^3 = 0
\displaystyle \eta^{\mu \nu} = \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & -1 & 0 & 0 \\  0 & 0 & -1 & 0 \\  0 & 0 & 0 & -1 \\ \end{pmatrix}

If we used the mostly pluses form of the Minkowski metric, then we’d have to replace -P \eta^{\mu \nu} with +P \eta^{\mu \nu}.

    \begin{align*} T^{00} &= \rho U^0 U^0 + \frac{P}{c^2}U^0 U^0 - P \eta^{00} \\ &= \rho c^2 + \frac{P}{\cancel{c^2}}\cancel{c^2} - P(1) \\ &= \rho c^2 + P - P \\ &= \rho c^2 \end{align*}

Since, in the MCRF, U^i = 0 and \eta{0i} = \eta{i0} = 0

    \[T{0i} = T{i0} = 0\]

Finally, for diagonal spatial terms, because U^i = 0 and \eta^{ii} = -1, we have:

    \begin{align*}  T^{ii} &= \bigl( \rho + \frac{P}{c^2} \bigr) U^i U^i - P \eta^{i i} \\ &= \bigl( \rho + \frac{P}{c^2} \bigr) (0)(0) - P(-1) \\ &= 0 + P(1) \\ &= P \end{align*}

From this, we can see that eq. (1) does, indeed, specify the energy-momentum tensor for a perfect fluid.

Derivation Proper

With our intuition and the energy-momentum tensor at hand, we can now recapitulate – more or less – the thought process that permitted Einstein to arrive at his field equation for general relativity.

We’d like to find a tensor form of Poisson’s equation:

    \[ \nabla^2 \phi = 4 \pi G \rho (x,t)  \quad \text{(25)}  \]

As we saw, a major problem with this equation is that \rho(x,t) is not a tensor. Length contraction makes its value different in different reference frames. We solved this by finding the energy-momentum tensor which we can substitute for \rho.

Our task, now, is to find a tensor expression that can replace the lefthand side of eq. (25). We saw that, in lieu of the gravitational field \phi, we’re looking for a term that describes spacetime curvature, one that contains second derivatives of the metric (\nabla^2 \phi \longrightarrow \nabla^2 g).

The Riemann curvature tensor is the ultimate depiction of spacetime curvature but it’s a rank (1,3) tensor. The energy-momentum tensor is a rank 2 tensor. Thus, we need a rank 2 tensor for the left side of our tensorial equation of gravitation.

The metric tensor is a rank 2 tensor but, as noted previously, this, alone, does not characterize curvature adequately.

The Ricci tensor is a rank 2 tensor that describes spacetime curvature. However, there’s a problem. The divergence of the energy-momentum tensor is zero:

    \[  \nabla_{\mu}T^{\mu \nu} = 0  \quad \text{(1)}  \]

This is just an expression of the law of conservation of energy. For any given volume in space, the change in density of energy equals the sum of the flow of energy in and out of the volume; the same is true with momentum density. The equations that describe this behavior are called continuity equations. I didn’t make a big deal about it at the time, but each row of the energy-momentum tensor can be made into a continuity equation. Recall equations 11 and 15:

    \[ \frac{\partial \epsilon}{\partial t} + \nabla \cdot \vec{\sigma^i} = 0   \quad \text{(11)}  \]

and

    \[ \frac{\partial P^i}{\partial t} + \nabla \cdot \vec{P}^i = 0   \quad \text{(15)}  \]

Since the covariant derivative of each row of the energy-momentum tensor equals zero, then the covariant derivative of the entire energy-tensor is zero.

From section (contracted Bianchi identity), we can extract some good news and some bad news.

The bad news: The covariant derivative of Ricci tensor is NOT zero:

    \[ \nabla_{\mu} R^{\mu \nu}  = \frac12 Rg^{\mu \nu}   \quad \text{(2)}  \]

But the good news: By rearranging eq. (2), we can come up with an entity whose covariant derivative IS zero:

    \[ \nabla_{\mu} \bigl( R^{\nu \mu} -\frac12 Rg^{\nu \mu}\bigr) = 0   \quad \text{(3)}  \]

The term in parentheses is referred to as the Einstein tensor, G^{\mu \nu}:

    \[  G^{\mu \nu} =  R^{\nu \mu} -\frac12 Rg^{\nu \mu}  \quad \text{(4)}  \]

Another term, \Lambda g^{\nu \mu}, can be added (or subtracted) to the Einstein tensor and its covariant derivative will still be zero. This is because:

1) \Lambda is a constant   –   the so-called cosmological constant   –   and the derivative of a constant is zero

and

2) The covariant derivative of the metric tensor is zero (proof here)

Therefore, so far, our candidate for the Einstein field equation is:

    \[ G^{\mu \nu} - \Lambda  g^{\nu \mu} = \kappa T^{\nu \mu}  \quad \text{(5)}  \]

The last thing we need to do is find \kappa. We’ll do this by assuming that eq. (5) reduces to Poisson’s equation under conditions where Newton’s gravitational theory works (low velocity, weak gravity and a time-independent metric).

The consequences of these conditions are:

  • u \ll c
  • \gamma \approx 1
  • \tau \approx t

Because the spatial components are minuscule compared to c, c dominates the 4-velocity expression; the spatial components are approximately zero:

\vec{U} \approx \begin{bmatrix} c \\ 0 \\ 0 \\ 0 \end{bmatrix}  \quad \text{(6)}

And in the weak gravity limit:

    \[ g_{\nu \mu} \approx \eta_{\nu \mu} + h_{\nu \mu}  \quad \text{(7)}   \]

where

\eta_{\nu \mu} = diag(+1,-1,-1,-1)
\left\lVert h_{\nu \mu} \right\rVert \ll 1

This means that:

    \begin{align*}T^{\mu \alpha} g_{\mu \nu} &= T^{\mu \alpha}\bigl( \eta_{\alpha \nu}  + h_{\alpha \nu} \bigr) \\ &= T^{\mu \alpha} \eta_{\alpha \nu} +  T^{\mu \alpha}  \underbrace{h_{\alpha \nu}}_{0} \\ &\approx T^{\mu \alpha} \eta_{\alpha \nu}  + 0   \quad \text{(8)}  \end{align*}

but

    \begin{align*} \partial_{\sigma}  g_{\mu \nu} &=  \partial_{\sigma}  \big( \eta  _{\mu \nu} + h_{\mu \nu} \bigr) \\ &=  \underbrace{\partial_{\sigma}  \eta  _{\mu \nu}}_{0}  + \partial_{\sigma} h_{\mu \nu}  \\ &\approx   0 + \partial_{\sigma} h_{\mu \nu}   \quad \text{(9)}   \end{align*}

Also, time-independence of the metric means that:

    \[ \partial_t   g_{\mu \nu} =  \partial_0   g_{\mu \nu} = 0  \,\, \Rightarrow \,\, \partial_0 \Gamma^{\sigma}_{\mu \nu}  \quad \text{(10)}  \]

Finally, we’ll assume \Lambda \approx 0 so we’ll ignore this term.

Under Newtonian conditions, the only component of the energy-momentum tensor that matters is the T^{00} component and so the energy-momentum tensor that we’ll be working with is that of dust (i.e., T^{00} = \rho c^2 and all other T^{\mu \nu} = 0.

The Einstein field equations are almost always expressed with covariant indices so we’ll do the same, recognizing that we can always raise and lower indices by multiplication with the metric tensor. For example:

    \[ T_{\mu \nu} = T^{\alpha \beta} g_{\alpha \mu} g_{\beta \nu}  \quad \text{(11)}  \]

Under the Newtonian conditions with which we’ve been working, we can, in a similar fashion, lower the indices of T^{00} and find that T_{00} = \rho c^2.

Applying all these facts to the Einstein field equation candidate with lower indices that we’ll henceforth be using, we find:

    \begin{align*}  R_{\mu \nu} - \frac12 R g_{\mu \nu} &= \kappa T_{\mu \nu} \\ R_{ij}  -\frac12 R \eta_{ij} &= \underbrac{\kappa T_{ij}}_{0} \\  R_{ij}  -\frac12 R (-\delta_{ij}) &= 0 \\  R_{ij} & = -\frac12 R \delta_{ij}   \quad \text{(12)}   \end{align*}

Now

    \begin{align*}  R =  R^{\mu}_{\mu} = g^{\mu \nu} R_{\mu \nu} \approx \eta^{\mu \nu} R_{\mu \nu}  \\ R = R_{00} - R_{11} -R_{22} -R_{33} \end{align*}

We just showed that R_{ij} = -\frac12 R \delta_{ij}. Therefore:

    \begin{align*} R &=  R_{00} - 3\left(- \frac12 R  \right)  \\ R &=  R_{00} + \frac32 R \\ R_{00} &= -\frac12 R  \quad \text{(13)}  \end{align*}

After these calculations:

  • The R_{\mu \nu} term of our Einstein field equation candidate looks [from eq. (12) and (13)] like:
    • \displaystyle -\frac12 R \begin{bmatrix} 1 & 0 & 0 & 0 \\  0 & 1 & 0 & 0 \\  0 & 0 & 1 & 0 \\  0 & 0 & 0 & 1 \end{bmatrix}
  • The \displaystyle -\frac12 R g_{\mu \nu} term (given that g_{\mu \nu} \approx \eta_{\mu \nu}) looks like:
    • \displaystyle -\frac12 R \begin{bmatrix} 1 & 0 & 0 & 0 \\  0 & -1 & 0 & 0 \\  0 & 0 & -1 & 0 \\  0 & 0 & 0 & -1 \end{bmatrix}
  • And the \kappa T_{\mu \nu} term looks like:
    • \displaystyle \kappa \begin{bmatrix} \rho c^2 & 0 & 0 & 0 \\  0 & 0 & 0 & 0 \\  0 & 0 & 0 & 0 \\  0 & 0 & 0 & 0 \end{bmatrix}

Then our Einstein field equation candidate:

    \[ R_{\mu \nu} - \frac12 R g_{\mu \nu} &= \kappa T_{\mu \nu}  \]

currently has the form:

\displaystyle -\frac12 R \begin{bmatrix} 1 & 0 & 0 & 0 \\  0 & 1 & 0 & 0 \\  0 & 0 & 1 & 0 \\  0 & 0 & 0 & 1 \end{bmatrix}  \displaystyle -\frac12 R \begin{bmatrix} 1 & 0 & 0 & 0 \\  0 & -1 & 0 & 0 \\  0 & 0 & -1 & 0 \\  0 & 0 & 0 & -1 \end{bmatrix} = \displaystyle \kappa \begin{bmatrix} \rho c^2 & 0 & 0 & 0 \\  0 & 0 & 0 & 0 \\  0 & 0 & 0 & 0 \\  0 & 0 & 0 & 0 \end{bmatrix}

This simplifies to:

    \[  -R = \kappa \rho c^2 \quad \text{(14)} \]

But we know from eq. (13) that R = -2R_{00}. Thus, we have:

    \[ 2R_{00} = \kappa \rho c^2 \quad \text{(15)}   \]

The next step in our derivation is to solve for R_{00} under the low-velocity, weak gravity conditions we’ve been considering.

The solution begins in a fashion similar to what we did in the section on intuition for the development of the Einstein field equations:

    \begin{align*}   m \vec{a} &= \vec{F}_g \\ m \vec{a} &= m \vec{g} \\ \vec{a} &= \vec{g} \\ \frac{d^2\vec{x}}{dt^2} &= (-\nabla \phi) \\ \frac{d^2\vec{x}}{dt^2} &= -\frac{\partial \phi}{\partial x^i} \\ \frac{d^2 x^i}{dt^2} + \frac{\partial \phi}{\partial x^i} &= 0 \quad \text{(16)}  \end{align*}

We note that eq. (16) bears a close resemblance to the geodesic equation of general relativity:

    \[ \frac{d^2 x^{\sigma}}{d\lambda^2} + \Gamma^{\sigma}_{\mu \nu} \frac{dx^\mu}{d\lambda}  \frac{dx^\nu}{d\lambda} = 0 \quad \text{(17)}  \]

and we expect that the geodesic equation should reduce to eq. (16) under low-velocity, weak gravity conditions.

The first things we can do to make eq. (17) look more like eq. (16) is to change our parameter from \lambda to \tau:

    \[ \frac{d^2 x^{\sigma}}{d\tau^2} + \Gamma^{\sigma}_{\mu \nu} \frac{dx^\mu}{d\tau}  \frac{dx^\nu}{d\tau} = 0 \quad \text{(18)}  \]

This change turns the velocities in eq. (17) into 4-velocities:

    \[ \frac{d^2 x^{\sigma}}{d\tau^2} + \Gamma^{\sigma}_{\mu \nu} U^{\mu}  U^{\nu} = 0 \quad \text{(19)}  \]

In our low-velocity limit, the spatial components of the 4-velocity are minuscule compared with the time component since v^I \ll c. Therefore, the only velocity component that survives in the sum specified by eq. (19) is the U^0 component. Taking this into account, eq. (19) is transformed into:

    \[ \frac{d^2 x^{\sigma}}{d\tau^2} + \Gamma^{\sigma}_{00} U^0  U^0 = 0 \quad \text{(20)}  \]

In our low-velocity limit, \tau \approx t. When we make this replacement, we include only the time component of the 4-velocity, U^0 = c and change the denominator of the lefthand term. We get:

    \[ \frac{d^2 x^{\sigma}}{dt^2} + \Gamma^{\sigma}_{00} c^2 = 0 \quad \text{(21)}  \]

If we set \sigma = 0 \,\, \Rightarrow \,\, x^0 = ct. Then the second derivative of x^0 is 0 (because \displaystyle \frac{dx^0}{dt} = \frac{dct}{dt} = c; c is a constant, so \frac{d(c)}{dt} = 0). That makes eq. (21):

    \begin{align*} \frac{d^2 x^0}{dt^2} + \Gamma^{\sigma}_{00} c^2 &= 0 \\ 0 + \Gamma^{0}_{00} c^2 &= 0\quad \text{(22)}  \end{align*}

The only way that eq. (22) can be true is if \Gamma^{0}_{00} c^2 &= 0 and the only way that this can be so is if \Gamma^{0}_{00}  = 0. That means that the only Christoffel symbols that are nonzero are the ones with a spatial contravariant (upper) component (i.e., \Gamma^{i}_{00}  \neq 0).

So, eq. (21) becomes:

    \[ \frac{d^2 x^i}{dt^2} + \Gamma^{i}_{00} c^2 = 0 \quad \text{(22)}  \]

Comparing eq. (22) with the equation from Newtonian mechanics – equation eq. (16) – we see that:

    \begin{align*} \frac{\partial \phi}{\partial x^i} &= \Gamma^{i}_{00} c^2  \\ \Gamma^{i}_{00} &= \frac{\partial \phi}{\partial x^i} \frac{1}{c^2} \quad \text{(23)} \end{align*}

Next, we’ll use \Gamma^{i}_{00} to find R_{00}. By definition:

    \[  R_{00} = R^{\mu}_{0 \mu 0}  \]

But R^0_{000} = 0. Why? Because, if we use the 34 symmetry rule on R^0_{000}, we get R^0_{000} = -R^0_{000}. But the only way a term can be equal to is its negative is if its zero.

Therefore:

    \[  R_{00} = R^i_{0 i 0}   \quad \text{(24)}  \]

where i are indices that represent spatial variables.

We can express the Riemann curvature tensor in terms of the metric as:

    \[ R^{\rho}_{\sigma \mu \nu} = \partial_{\mu}\bigl( \Gamma^{\rho}_{\mu \sigma}  \bigr)  - \partial_{\mu}\bigl( \Gamma^{\rho}_{\nu \sigma}  \bigr) + \Gamma^{\alpha}_{\nu \sigma} \Gamma^{\rho}_{\mu \alpha} - \Gamma^{\beta}_{\mu \sigma} \Gamma^{\rho}_{\nu \beta}  \quad \text{(25)} \]

But we know that R_{00} = R^i_{0 i 0} so \rho and \mu must both equal the spatial index i. Thus:

    \[ R^{i}_{0 i 0} = \partial_{i}\bigl( \Gamma^{i}_{00}  \bigr)  - \partial_{0}\bigl( \Gamma^{i}_{i0}  \bigr) + \Gamma^{\alpha}_{00} \Gamma^{i}_{i \alpha} - \Gamma^{\beta}_{i o} \Gamma^{i}_{0 \beta}  \quad \text{(26)} \]

Our metric is time-independent, meaning that the derivative of the metric is zero. The gamma terms are made up of derivatives of the metric. Therefore, \partial_{0}\bigl( \Gamma^{i}_{i0}  \bigr) = 0.

We’ve said previously that the only nonzero Christoffel symbols are those of the form \Gamma^{i}_{00}. From eq. (23), we know that \displaystyle \Gamma^{i}_{00} &= \frac{\partial_i \phi}{c^2}. c^2 is very large. Thus, the \Gamma^{i}_{00} terms are small. In turn, terms of the form \Gamma^2 are negligible and can be ignored. Given these facts, eq. (26) becomes:

    \begin{align*} R^{i}_{0 i 0} &= \partial_{i}\bigl( \Gamma^{i}_{00}  \bigr)  - \underbrace{\cancel{\partial_{0}\bigl( \Gamma^{i}_{i0}  \bigr)}}_{0} + \underbrace{\bcancel{\Gamma^{\alpha}_{00} \Gamma^{i}_{i \alpha}}}_{0} - \underbrace{\bcancel{\Gamma^{\beta}_{i o} \Gamma^{i}_{0 \beta}}}_{0}  \\ &=  \partial_{i}\bigl( \Gamma^{i}_{00}  \bigr) \\ &= \sum_i \partial_i \Bigl( \frac{1}{c^2} \partial_i \phi \Bigr) \\ &=  \frac{1}{c^2}  \nabla^2 \phi \quad \text{(27)} \end{align*}

But R^{i}_{0 i 0} = R_{00}. Thus:

    \[ R_{00} =  \frac{1}{c^2}  \nabla^2 \phi  \quad \text{(28)} \]

Recall from eq. (15):

    \[ 2R_{00} = \kappa \rho c^2  \quad \text{(15)}  \]

Substituting the results of eq. (28) into eq. (15) gives us:

    \[  2 \frac{1}{c^2}  \nabla^2 \phi= \kappa \rho c^2   \quad \text{(29)} \]

We use Poisson’s equation – \nabla^2 \phi = 4 \pi G \rho – to replace \nabla^2 \phi in eq. (29). From this, we obtain:

    \[  2 \frac{1}{c^2}  4  \pi G \cancel{\rho} = \kappa \cancel{\rho} c^2   \quad \text{(30)} \]

After rearranging eq. (30), we are left with:

    \[ \frac{8 \pi G}{c^4} = \kappa  \quad \text{(31)} \]

Putting this value for \kappa into previously-derived equations yields our final Einstein field equation:

    \[ R_{\mu \nu}  -\frac12 R g_{\mu \nu} - \Lambda g_{\mu \nu} = \frac{8 \pi G}{c^4} T_{\mu \nu}  \quad \text{(31)} \]

Notice that we’ve expressed our final Einstein field equation with lower indices in eq. (31) while our preliminary forms of the equation used upper indices. As noted earlier, one should not be troubled by this as indices can be raised and lowered with impunity by using the metric. Indeed, there are other forms of the Einstein field equation (like the trace-reversed form). It should also be noted that different signs can be used for the various terms. It’s kind of a complex issue, so I won’t go into it here. However, for those interested, a discussion can be found beginning at 28:13 of the eigenchris YouTube video General Relativity Basics – Einstein Field Equation Derivation (w/ sign convention).

Derivation 2

The second derivation of the Einstein field equation that I’ll present utilizes the stationary action principle from Lagrangian mechanics. Thus, familiarity with this technique is required. For readers who need further information, I have a page on Lagrangian mechanics elsewhere on this site.

One question that the reader may be asking themself is: Why would we want to attempt such a derivation? The main reason for this undertaking is that most modern theories in physics are derived in this manner and applying this to general relativity will be useful when attempting to relate general relativity to these other theories.

Given this motivation, the first thing we need to do is to devise an equation for the action. It has to incorporate terms for both curvature and matter/energy. Thus:

    \[ S = S_c + S_m \quad \text{(1)} \]

where S_c is the part of the action related to curvature and S_m is the part of the action related to matter/energy.

Each S term will be of the form:

    \[ S=\int_V  \mathcal{L}\, dV  \quad \text{(2)}  \]

where

S is the action
\mathcal{L} is the Lagrange density
V is the volume over which we’re integrating

We’ll work on the S_c portion of the equation first.

So it will be a tensor equation, since S is a scalar, the Lagrangian should be a scalar. For this derivation, our Lagrangian for curvature will be:

    \[ \mathcal{L}_c = \alpha(R - 2\Lambda)  \quad \text{(3)}   \]

where

\mathcal{L} is the Lagrange density for curvature
\Lambda is the cosmological constant
\alpha is another nonspecific constant

Our action – called the Einstein-Hilbert action – therefore is:

    \[ S_c = \alpha \int_V (R - 2\Lambda) dV  \quad \text{(4)}   \]

Our next task is to find a coordinate-independent form of the volume element dV. This is needed because, as we saw when we talked about the energy-momentum tensor, volumes change when we change coordinates. The way we do this is to multiply it by a Jacobian term in the form of \sqrt{-g} where g is the determinant of the metric, \lvert g_{\mu \nu} \rvert. Click to see why this works.

Our action becomes:

    \[ S_c = \alpha \int_V (R - 2\Lambda) \sqrt{-g}\,d^4x  \quad \text{(5)}   \]

Given our action related to spacetime curvature, to obtain Einstein’s field equation, we need to take the variation in that action and set it to zero. The metric – the entity that ultimately determines the “spacetime field” – is what we’ll vary. Specifically, we’ll take the partial derivative of both sides of eq. (3) with respect to the metric: \displaystyle \delta = \frac{\delta}{\delta g_{\mu \nu}}, taking what amounts to partial derivatives of each variable term inside our integral using the product rule:

    \begin{align*}\delta S_c &= \alpha \int_V \Bigl( \delta (R - 2\Lambda) \sqrt{-g} \Bigr) d^4x \\ &=  \alpha \int_V \Bigl( \delta \sqrt{-g} R  - 2\Lambda \delta \sqrt{-g} \Bigr) d^4x \\  &= \alpha \int_V \Bigl( R \delta \sqrt{-g} + \sqrt{-g} \delta R - 2\Lambda \delta \sqrt{-g} \Bigr) d^4x \\ \quad \text{(6)} \end{align*}

    \[ \delta  \sqrt{-g} =-\frac12 \frac{1}{\sqrt{-g}} \delta g \quad \text{(7)} \]

Substituting eq. (7) into eq. (6), we get:

    \[ \delta S_c &= \alpha \int_V \Bigl(-\frac{R}{2 \sqrt{-g}} \delta g + \sqrt{-g} \delta R + \frac{\Lambda}{\sqrt{-g}}} \delta g \Bigr) \,d^4x \quad \text{(8)} \]

At this point, we’ll evaluate the variation of the matter action and add it back in.

    \[ S_m = \int_V \mathcal{L}_m   \sqrt{-g} d^4x  \quad \text{(9)} \]

    \begin{align*} S_m &= \int_V \delta \bigl( \mathcal{L}_m   \sqrt{-g} \bigr)d^4x \\  &=  \int_V \bigl( \sqrt{-g} \delta \mathcal{L}_m + \mathcal{L}_m \delta \sqrt{-g} \bigr) d^4x \\ &= \int_V \bigl( \sqrt{-g} \delta \mathcal{L}_m - \frac{1}{2 \sqrt{-g}} \mathcal{L}_m \delta g \bigr) d^4x \quad \text{(10)} \end{align*}

Combining \delta S_m with eq. (8), we have:

    \begin{align*} \delta S &= \alpha \int_V \Bigl(-\frac{R}{2 \sqrt{-g}} \delta g + \sqrt{-g} \delta R + \frac{\Lambda}{\sqrt{-g}} \delta g \Bigr)\,d^4x \\ &+ \int_V \bigl( \sqrt{-g} \delta \mathcal{L}_m - \frac{1}{2 \sqrt{-g}} \mathcal{L}_m \delta g \bigr) \, d^4x = 0 \\ \\ &= \alpha \int_V \Bigl(-\frac{R}{2 \sqrt{-g}} \delta g + \sqrt{-g} \delta R + \frac{\Lambda}{\sqrt{-g}} \delta g \Bigr)\,d^4x \\ &= - \int_V \bigl( \sqrt{-g} \delta \mathcal{L}_m - \frac{1}{2 \sqrt{-g}} \mathcal{L}_m \delta g \bigr) \, d^4x  \quad \text{(11)} \end{align*}

Since the integrals on both sides of eq. (12) are both over V (which is essentially all of spacetime), then the integrands must be equal:

    \[  \alpha \Bigl(-\frac{R}{2 \sqrt{-g}} \delta g + \sqrt{-g} \delta R + \frac{\Lambda}{\sqrt{-g}} \delta g \Bigr) =  \frac{1}{2 \sqrt{-g}} \mathcal{L}_m \delta g  -  \sqrt{-g} \delta  \mathcal{L}_m \quad \text{(12)} \]

We multiply both sides of eq. (12) by \displaystyle \frac{2}{\sqrt{-g}}. We get:

    \[  \alpha \Bigl( \frac{R}{g} \delta g + 2 \delta R - \frac{\Lambda}{g} \delta g \Bigr) =  -\frac{1}{g} \mathcal{L}_m \delta g  -  2 \delta  \mathcal{L}_m \quad \text{(13)} \]

This is about as specific as we can make the right (energy-matter) side of eq. (13) since \mathcal{L}_m represents whatever matter is present in the spacetime we’re studying. For example, if our spacetime contains electromagnetic radiation, we would use the Maxwell Lagrangian:

    \[ \mathcal{L}_m = \frac{1}{4 \mu_0 c} F_{\mu \nu} F^{\mu \nu} \sqrt{-g} \]

We can see, though, that the energy-matter portion (righthand side) of eq. (13) is a rank 2 tensor. So we’ll keep things general and define the right side of eq. (13) as:

    \[ T_{\mu \nu} = -\frac{1}{g} \mathcal{L}_m \delta g  -  2 \delta  \mathcal{L}_m \quad \text{(14)}  \]

Making this change, we obtain:

    \begin{align*}  \alpha \Bigl( \frac{R}{g} \delta g + 2 \delta R - \frac{\Lambda}{g} \delta g \Bigr)  &= T_{\mu \nu} \\ \frac{R}{2g} \delta g + \delta R - \frac{\Lambda}{g} \delta g &=  \frac{1}{2\alpha} T_{\mu \nu}  \quad \text{(15)}  \end{align*}

Looking at eq. (15), we see that we have additional work to do on the lefthand side. We need to calculate:

  • \delta g
  • \delta R

These are no easy tasks. I’ll list the result for each and provide a proof for those interested.

Variation of g

\delta g = - gg_{\mu \nu}   \quad \text{(16)}

Variation of R

\delta R = R_{\mu \nu}   \quad \text{(17)}

Putting the results in eq. (16) and (17) into eq. (15), we are left with:

    \begin{align*} \frac{R}{2g} \delta g + \delta R - \frac{\Lambda}{g} \delta g &=  \frac{1}{2\alpha} T_{\mu \nu} \\  \frac{R}{2\cancel{g}} (-) \cancel{g}g_{\mu \nu} + R_{\mu \nu} - \frac{\Lambda}{\cancel{g}} (-) \cancel{g}g_{\mu \nu} &=  \frac{1}{2\alpha} T_{\mu \nu} \\ R_{\mu \nu} - \frac12 R g_{\mu \nu} + \Lambda g_{\mu \nu} &=  \frac{1}{2\alpha} T_{\mu \nu} \quad \text{(18)}  \end{align*}

The constant that multiplies the energy-momentum tensor can be found in the same way we found it previously: by taking the low-energy, weak gravity version of Einstein’s field equation. If we do this, we will get:

    \[  \frac{1}{2\alpha} = \kappa = \frac{8 \pi G}{c^4} \quad \text{(19)}  \]

Substituting eq. (19) into eq. (18) gives us:

    \[ R_{\mu \nu} - \frac12 R g_{\mu \nu} + \Lambda g_{\mu \nu} =  \frac{8 \pi G}{c^4}  T_{\mu \nu} \quad \text{(20)}  \]

Notice that we have a + \Lambda g_{\mu \nu} term resulting from this derivation but we got a - \Lambda g_{\mu \nu} term from the previous derivation. As explained in the eigenchris video, General Relativity Basics – Einstein Field Equation Derivation (w/ sign convention), the sign of the \Lambda g_{\mu \nu} term depends on which form of the metric we’re using. If we’re using the mainly pluses form of the metric (-,+,+,+) then the \Lambda g_{\mu \nu} term is positive. If we’re using the mainly minuses form of the metric (+,-,-,-), then the sign of the \Lambda g_{\mu \nu} term is negative.

Schwarzschild Metric

Now that we have Einstein’s field equation, we can start to look at solutions to this equation. These solutions can be used to describe such physical phenomena as black holes, gravitational waves and the expansion of the universe.

We’ll start with the Schwarzschild metric which will allow us to understand, among other things:

  • Gravitational time dilation
  • Gravitational doppler effect
  • Bending of light due to gravity
  • Shift in perihelion orbits
  • Non-rotating black holes

But we have to learn to walk before we can run. That is, we need to find out what the Schwarzschild metric is and where it comes from. Thus, we’ll begin with its derivation.

IV.A Derivation

There’s no easy way to derive the Schwarzschild metric. Much of this derivation is taken from:

eigenchris. “Relativity 108a: Schwarzschild Metric – Derivation.” Oct 10, 2021.

Other sources include:

We start by describing the assumptions that underlie the Schwarzschild solution. The setup is as follows:

We have a non-rotating, spherically symmetric, uncharged massive object. We wish to examine the curvature of spacetime outside this object. Although there is energy-mass inside the object, there is no energy-mass in the area where we want to determine the spacetime curvature. Therefore, in this area:

    \[ T_{\mu \nu}  = 0  \quad \text{(IV.A.1)}  \]

And the Einstein field equation we want to solve is:

    \[ R_ {\mu \nu} - \frac12 R g_{\mu \nu} = 0  \quad \text{(IV.A.2)} \]

This means that, in the vacuum outside our object, R_ {\mu \nu} = 0. But because spacetime is being curved by the massive object, R^{\rho}_ {\sigma \mu \nu} \neq 0.

We also make the assumption that the metric is static in time. This means that:

  1. The metric doesn’t change with time: \displaystyle \frac{\partial g_{\mu \nu}}{\partial x^0} = 0
  2. t \rightarrow -t does not change g_{\mu \nu}

#2 guarantees that a black hole is non-rotating because, if black hole were rotating and we reversed time, the gravitational effects due to rotation would also reverse direction.

Now, the metric far away from the massive object is the Minkowski metric. Since the gravitational effects from our massive object is spherically symmetric, it makes sense to use spherical coordinates. In spherical coordinates, using the mainly minuses form, the metric far away from the object is:

    \[ \begin{bmatrix}  1&0&0&0 \\ 0&-1&0&0 \\ 0&0&-r^2&0 \\ 0&0&0&-r^2(\sin \theta)^2 \end{bmatrix}   \quad \text{(IV.A.3)}  \]

Derivation of this metric can be found here.

The metric closer to the object is indeterminant but is of the form:

    \[ \begin{bmatrix} g_{tt}&g_{tr}&g_{t\theta}&g_{t\phi} \\ g_{rt}&g_{rr}&g_{r\theta}&g_{r\phi} \\ g_{\theta t}&g_{\theta  r}&g_{\theta  \theta }&g_{\theta \phi} \\ g_{\phi  t}&g_{\phi  r}&g_{\phi \theta }&g_{\phi  \phi } \end{bmatrix} \quad \text{(IV.A.4)} \]

Based on our assumption of time reversal symmetry, the spherical symmetry of our setup, and the symmetry of the metric (proof here), we can simplify the metric given in eq. (IV.A.4), as follows:

\displaystyle \vec{e}_t = \frac{\partial}{\partial ct}. If we reverse the time coordinate, then \displaystyle \frac{\partial}{\partial c(-t)} =  -\frac{\partial}{\partial ct} = -\vec{e}_t. g_{tt} = \vec{e}_t \cdot \vec{e}_t = (-\vec{e}_t) \cdot (-\vec{e}_t) = + g_{tt}. So this term is nonzero.

However, g_{ti} =  \vec{e}_t \cdot \vec{e}_i. If we reverse the time coordinate, we have g_{ti} = (- \vec{e}_t) \cdot \vec{e}_i = -g_{ti}. But the only way that g_{ti} can equal -g_{ti} is if g_{ti}=0. A similar argument tells us that g_{it} terms also equal zero. So we’ve got:

    \[  \begin{bmatrix} g_{tt}&0&0&0 \\ 0&g_{rr}&g_{r\theta}&g_{r\phi} \\0&g_{r  \theta }&g_{\theta  \theta }&g_{\theta \phi} \\ 0&g_{r  \phi }&g_{\phi \theta }&g_{\phi  \phi } \end{bmatrix} \quad \text{(IV.A.5)} \]

Because their basis vectors are perpendicular, g_{r \theta} = \vec{e}_r \cdot \vec{e}_{\theta} = 0 and g_{r \phi} = \vec{e}_r \cdot \vec{e}_{\phi} = 0. And because the metric is a symmetric tensor, g_{\theta r} and g_{\phi r} are also zero. So we have:

    \[ \begin{bmatrix} g_{tt}&0&0&0 \\ 0&g_{rr}&0&0 \\0&0&g_{\theta  \theta }&g_{\theta \phi} \\ 0&0&g_{\phi \theta }&g_{\phi  \phi } \end{bmatrix} \quad \text{(IV.A.6)} \]

Finally, we find that g_{\theta \theta} = e_{\theta} \cdot  e_{\theta} = e_{(-\theta)} \cdot  e_{-(\theta)} and g_{\phi \phi} = e_{\phi} \cdot  e_{\phi} = e_{(-\phi)} \cdot  e_{-(\phi)}. On the other hand, g_{\theta \phi} = \vec{e}_{\theta} \cdot  \vec{e}_{\phi}. If we reverse, the direction of \theta, we get: \vec{e}_{(-\theta)} \cdot \vec{e}_{\phi} = -\vec{e}_{\theta} \cdot  \vec{e}_{\phi} = -g_{\theta \phi}. But the only way that g_{\theta \phi} can equal -g_{\theta \phi} is if g_{\theta \phi} = 0. By a similar argument, g_{\phi \theta} = 0. Thus, our metric ends up as:

    \[ \begin{bmatrix} g_{tt}&0&0&0 \\ 0&g_{rr}&0&0 \\0&0&g_{\theta  \theta }&0\\ 0&0&0&g_{\phi  \phi } \end{bmatrix} \quad \text{(IV.A.7)} \]

So we can generalize the form of the metric as follows:

    \[ds^2 = A(r)\,dt^2  - B(r)\,dr^2 - C(r)\,r^2 d\theta^2 - D(r)\,r^2 \sin^2 \theta \, d\phi^2 \quad \text{(IV.A.8)} \]

Because of spherical symmetry, C(r) = D(r) = 1. Therefore, we have:

    \[ds^2 = A(r)\,dt^2  - B(r)\,dr^2 - r^2 d\theta^2 - r^2 \sin^2 \theta \, d\phi^2 \quad \text{(IV.A.9)} \]

From here on, we’ll just write A and B instead of A(r) and B(r), respectively, recognizing that Aand B are, in fact, functions of r. We’ll also write (\sin \theta)^2 as \sin^2 \theta.

In matrix form, eq. (IV.A.9) is:

    \[g_{\mu \nu} = \begin{bmatrix} A&0&0&0 \\ 0&-B&0&0 \\0&0&-r^2&0\\ 0&0&0&-r^2 \sin^2 \theta \end{bmatrix} \quad \text{(IV.A.10)} \]

and, from this:

    \[g_{\mu \nu} = \begin{bmatrix} \frac{1}{A}&0&0&0 \\ 0&-\frac{1}{B}&0&0 \\0&0&-\frac{1}{r^2}&0\\ 0&0&0&-\frac{1}{r^2 \sin^2 \theta} \end{bmatrix} \quad \text{(IV.A.11)} \]

Having found g_{\mu \nu}, we need to find Christoffel symbols (which are functions of the metric). From the Christoffel symbols, we can find R_{\mu \nu}. We’ll then use R_{\mu \nu} to solve the equation \displaystyleR_ {\mu \nu} - \frac12 R g_{\mu \nu} = 0 in the low velocity/weak gravitational field limit. This will allow us to solve for A and B.

We’ll begin with the Christoffel symbols. To simplify their calculation, we can utilize the following rules:

  1. Because the solution is static with respect to time, all time derivative of the metric equal zero
  2. Because our metric is diagonal, g_{\mu \nu} and g^{\mu \nu} are zero when \mu \neq \nu
  3. Because of the spacetime we’re dealing with is torsion-free, Christoffel symbols are symmetric in their lower indices: \Gamma^{\alpha}_{\mu \nu} = \Gamma^{\alpha}_{\nu \mu}
  4. We’ll employ the commonly used conventions: a) t\rightarrow x^0,\, r \rightarrow x^1, \, \theta \rightarrow x^2, \, \phi \rightarrow x^3 b) spacetime indices will be represented with greek letters (e.g., mu and \nu) and will run from 0 to 3; spatial indices will be represented with latin letters (e.g., i and j and will run from 1-3.

The general formula for the Christoffel symbol is:

    \[ \Gamma^{\sigma}_{\mu \nu} = \frac12 g^{\sigma \alpha} \bigl( \partial_{\nu} g_{\alpha \mu} +  \partial_{\mu} g_{\alpha \nu} -  \partial_{\alpha} g_{\mu \nu} \bigr) \quad \text{(IV.A.12)} \]

Terms are nonzero only of the indices on the inverse metric are equal. Therefore, we’ll let \alpha \rightarrow \sigma. Making these substitutions in eq. (IV.A.12) gives us:

    \[ \Gamma^{\sigma}_{\mu \nu} = \frac12 g^{\sigma \sigma} \bigl( \partial_{\nu} g_{\sigma \mu} +  \partial_{\mu} g_{\sigma  \nu} -  \partial_{\sigma} g_{\mu \nu} \bigr) \quad \text{(IV.A.13)} \]

𝛔 = 0

    \[  \Gamma^{0}_{\mu \nu} = \frac12 g^{0 0} \bigl( \partial_{\nu} g_{0 \mu} +  \partial_{\mu} g_{0  \nu} -  \partial_{0} g_{\mu \nu} \bigr) \quad \text{(IV.A.14)}   \]

With \mu = 0,\,\nu = 0:

    \[  \Gamma^{0}_{0 0} = \frac12 g^{0 0} \bigl( \partial_{0} g_{0 0} +  \partial_{0} g_{0  0} -  \partial_{0} g_{0 0} \bigr)  = 0 \]

(because time derivatives of metric are 0.)

With \mu = i,\,\nu = i:

    \[  \Gamma^{0}_{ii} = \frac12 g^{0 0} \bigl( \partial_{i} g_{0 i} +  \partial_{i} g_{0  i} -  \partial_{0} g_{ii} \bigr)  = 0 \]

(because g_{oi} are off-diagonal elements = 0 and time derivative of metric \partial_{0} g_{ii} = 0.)

With \mu = i,\,\nu = j, \, i\neq j:

    \[  \Gamma^{0}_{ij} = \frac12 g^{0 0} \bigl( \partial_{j} g_{0 i} +  \partial_{i} g_{0 j} -  \partial_{0} g_{ij} \bigr)  = 0 \]

With \mu = 0,\,\nu = i:

    \[  \Gamma^{0}_{ij} = \frac12 g^{0 0} \bigl( \partial_{j} g_{0 0} + \cancel{ \partial_{0} g_{0 I}} -  \cancel{\partial_{0} g_{0i}} \bigr)  = \frac12 g^{00}(\partial_i g_{00}) \]

Since g_{00} = A(r), only \partial _1 g_{00} = \partial_r A(r) is nonzero. So we get:

    \[  \Gamma^{0}_{01}  = \Gamma^{0}_{10} = \frac12 \frac{1}{A}(\partial_r A)\]

𝛔 = 1

    \[  \Gamma^{1}_{\mu \nu} = \frac12 g^{11} \bigl( \partial_{\nu} g_{1 \mu} +  \partial_{\mu} g_{1  \nu} -  \partial_{1} g_{\mu \nu} \bigr) \quad \text{(IV.A.15)}   \]

With \mu = 0,\,\nu = 0:

    \begin{align*} \Gamma^{1}_{00} &= \frac12 g^{11} \bigl( \cancel{\partial_{0} g_{1 0}} + \cancel{ \partial_{0} g_{1  0}} -  \partial_{1} g_{00} \bigr) = \frac12 \Bigl( -\frac{1}{B} \Bigr) \Bigl(  -\partial_r A \Bigr) \\ &= \frac12 \Bigl( \frac{1}{B} \Bigr) \Bigl(  \partial_r A \Bigr) \end{align*}

With \mu = 1,\,\nu = 1:

    \begin{align*}  \Gamma^{1}_{11} &= \frac12 g^{11} \bigl( \partial_{1} g_{1 1} +  \cancel{\partial_{1} g_{1  1}} -  \cancel{\partial_{1} g_{11} \bigr)} = \frac12  g^{11} (\partial_1 g_{11}) \\ &= \frac12 \Bigl( - \frac {1}{B}\Bigr) \Bigl(  \partial_r (-B)\Bigr)  \\ &=   \frac12 \Bigl( \frac {1}{B}\Bigr) \Bigl(  \partial_r  B   \Bigr) \end{align*}

With \mu = 2,\,\nu = 2:

    \begin{align*}  \Gamma^{1}_{22} &= \frac12 g^{11} \bigl( \cancel{\partial_{2} g_{1 2}} +  \cancel{\partial_{2} g_{1  2}} -  \partial_{1} g_{22} \bigr) = \frac12  g^{11} (-\partial_1 g_{22}) \\ &= \frac12 \Bigl( - \frac {1}{B}\Bigr) \Bigl( - \partial_r (-r^2)\Bigr)  \\ &=  - \frac12 \Bigl( \frac {1}{B}\Bigr) \Bigl(  2r  \Bigr)  \\ &=- \frac{r}{B}\end{align*}

With \mu = 3,\,\nu = 3:

    \begin{align*}  \Gamma^{1}_{33} &= \frac12 g^{11} \bigl( \cancel{\partial_{3} g_{1 3}} + \cancel{ \partial_{3} g_{1  3}} -  \partial_{1} g_{33} \bigr) =   \frac12 g^{11}\Bigl(  \partial_1  g_{33}  \Bigr) \\ &= \frac12 \Bigl( - \frac {1}{B}\Bigr) \Bigl( -\partial_r (- r^2 \sin^2 \theta) \Bigr) \\ &= - \frac{r \sin^2 \theta}{B}\end{align*}

With \mu ,\,\nu = 1,\, \mu \neq 1:

    \[  \Gamma^{1}_{01} = \frac12 g^{11} \bigl( \cancel{\partial_{1} g_{1 0}} +  \bcancel{\partial_{0} g_{1  1}} -  \cancel{\partial_{1} g_{01}} \bigr) = 0 \]

    \[  \Gamma^{1}_{21} = \frac12 g^{11} \bigl( \cancel{\partial_{1} g_{1 2}} +  \bcancel{\partial_{2} g_{1  1}} -  \cancel{\partial_{1} g_{21}} \bigr) = 0 \]

    \[  \Gamma^{1}_{31} = \frac12 g^{11} \bigl( \cancel{\partial_{1} g_{1 3}} +  \bcancel{\partial_{3} g_{1  1}} -  \cancel{\partial_{1} g_{31}} \bigr) = 0 \]

All of the above Christoffel symbol equal zero, either because

  1. g_{ab} = 0 when a \neq b
  2. g_{11} = B(r) is a function of r. The coordinate with which we’re taking the partial derivative (call it a) is not equal to r. So \partial_a B(r) = 0

With \mu = i, \, \nu = j;\, (i \neq j \neq 1):

    \[  \Gamma^{1}_{ij} = \frac12 g^{11} \bigl( \cancel{\partial_{j} g_{1i}} +  \cancel{\partial_{i} g_{1  j}} -  \cancel{\partial_{1} g_{ij}} \bigr) = 0 \]

All of the Christoffel symbols above equal zero because all metric terms are off-diagonal terms which equal zero.

𝛔 = 2

    \[  \Gamma^{2}_{\mu \nu} = \frac12 g^{22} \bigl( \partial_{\nu} g_{2\mu} +  \partial_{\mu} g_{2  \nu} -  \partial_{2} g_{\mu \nu} \bigr) \quad \text{(IV.A.16)}   \]

With \mu = \nu = 0,1, \text{ or }2:

    \[  \Gamma^{2}_{00} = \frac12 g^{22} \bigl( \partial_{0} g_{20} +  \partial_{0} g_{2  0} -  \partial_{2} g_{00} \bigr) = 0 \]

    \[  \Gamma^{2}_{11} = \frac12 g^{22} \bigl( \partial_{1} g_{21} +  \partial_{1} g_{2  1} -  \partial_{2} g_{11} \bigr) = 0 \]

    \[  \Gamma^{2}_{22} = \frac12 g^{22} \bigl( \partial_{2} g_{22} +  \partial_{2} g_{2  2} -  \partial_{2} g_{22} \bigr) = 0 \]

By now, you probably can figure out why terms evaluate to zero in the 3 equations above. I’ll just comment on the \partial_2 g_{22} term. g_{22} = -r^2 but \partial_2 = \partial_{\theta}. There is no \theta term in g_{22}. Therefore, the \partial_{\theta} operator “sees” the g_{22} = -r^2 term as a constant, and of course, the derivative of a constant is zero.

With \mu = 3,\,\nu=3:

    \begin{align*}  \Gamma^{2}_{33} &= \frac12 g^{22} \bigl( \cancel{\partial_{3} g_{23}} +  \cancel{\partial_{3} g_{23}} -  \partial_{2} g_{33} \bigr) = \frac12 \left(  -\frac{1}{r^2} \right) \Bigl(  -\partial_{\theta} \left(- r^2 \sin^2 \theta  \right)  \Bigr) \\ &= \frac{1}{\cancel{2}} \left(  -\frac{1}{\cancel{r^2}} \right) \Bigl( \bcancel{ -} (\bcancel{-}) \cancel{2} \, \cancel{r^2} \sin \theta \cos \theta   \Bigr)  \\ &= -\sin \theta \cos \theta \end{align*}

With \mu = 0 \text{ or } 3,\,\nu=2:

    \[  \Gamma^{2}_{02} = \frac12 g^{22} \bigl( \partial_{2} g_{20} +  \partial_{0} g_{2  2} -  \partial_{2} g_{02} \bigr) = 0 \]

    \[  \Gamma^{2}_{32} = \frac12 g^{22} \bigl( \partial_{2} g_{23} +  \partial_{3} g_{2  2} -  \partial_{2} g_{12} \bigr) = 0 \]

With \mu=1,\,\nu=2:

    \begin{align*}  \Gamma^{2}_{12} &= \frac12 g^{22} \bigl(\cancel{ \partial_{2} g_{21}} +  \partial_{1} g_{2 2} -  \cancel{\partial_{2} g_{12}} \bigr) \\ &=  \frac12 g^{22} (\partial_1 g_{22} )  \\ &= \frac12 \left(  -\frac{1}{r^2} \right) \Bigl(\partial_{\theta} (-r^2)  \Bigr) \\ &= \frac{1}{\cancel{2}} \left(  -\frac{1}{r^\bcancel{2}} \right) (\cancel{2}\bcancel{r}) \\ \Gamma^{2}_{12}  = \Gamma^{2}_{21} &=  \frac{1}{r}\end{align*}

𝛔 = 3

    \[  \Gamma^{3}_{\mu \nu} = \frac12 g^{33} \bigl( \partial_{\nu} g_{3\mu} +  \partial_{\mu} g_{3 \nu} -  \partial_{3} g_{\mu \nu} \bigr) \quad \text{(IV.A.17)}   \]

With \mu = \nu,\, \mu \neq 3:

    \[  \Gamma^{3}_{\mu \mu} = \frac12 g^{33} \bigl( \partial_{\mu} g_{3\mu} +  \partial_{\mu} g_{3 \mu} -  \partial_{3} g_{\mu \mu} \bigr) = 0 \]

With \mu = \nu = 3:

    \[  \Gamma^{3}_{33} = \frac12 g^{33} \bigl( \partial_{3} g_{33} + \cancel{ \partial_{3} g_{3 3}} -  \cancel{\partial_{3} g_{33}} \bigr) = \frac12 g^{33} \bigl( \partial_{\phi} g_{33} \bigr) = 0\]

With \mu = 0,\, \nu = 3:

    \[  \Gamma^{3}_{03} = \frac12 g^{33} \bigl( \partial_{3} g_{30} +  \partial_{0} g_{3 3} -  \partial_{3} g_{03} \bigr) = 0 \]

With \mu = 1,\, \nu = 3:

    \begin{align*}  \Gamma^{3}_{13} &= \frac12 g^{33} \bigl( \cancel{\partial_{3} g_{31}} +  \partial_{1} g_{3 3} -  \cancel{\partial_{3} g_{13}} \bigr) \\ &= \frac12 g^{33} \bigl( \partial_{1} g_{3 3}\bigr) \\ &= \frac12 \left(- \frac{1}{r^2 \sin^2 \theta}  \right) \Bigl( \partial_r (-r^2 \sin^2 \theta)  \Bigr) \\ &= \frac{1}{\cancel{2}} \left(- \frac{1}{r^{\bcancel{2}} \cancel{\sin^2 \theta}}  \right)  \Bigl( -\cancel{2}\bcancel{r }\cancel{\sin^2 \theta)}  \Bigr) \\ \Gamma^{3}_{13} = \Gamma^{3}_{31} &= \frac{1}{r} \end{align*}

With \mu = 2,\, \nu = 3:

    \begin{align*}  \Gamma^{3}_{23} &= \frac12 g^{33} \bigl( \cancel{\partial_{3} g_{32}} +  \partial_{2} g_{3 3} -  \cancel{\partial_{3} g_{23}} \bigr) \\ &= \frac12 g^{33} \bigl( \partial_{2} g_{3 3}\bigr) \\ &= \frac12 \left(\cancel{-} \frac{1}{\cancel{r^2} \sin^2 \theta} \right) \Bigl( \partial_{\theta} (\cancel{-r^2} \sin^2 \theta) \Bigr) \\  &= \frac{1}{\cancel{2}} \left( \frac{1}{\sin^{\bcancel{2}} \theta} \right) (\cancel{2} \bcancel{\sin \theta} \cos \theta) \\ &= \frac{\cos \theta}{\sin \theta} \\ \Gamma^{3}_{23} = \Gamma^{3}_{32} &= \cot \theta \end{align*}

Here is a summary of nonzero Christoffel symbols:

    \begin{align*}   \Gamma^{0}_{01} = \Gamma^{0}_{10} &= \frac12  \frac{1}{A}(\partial_r A) \\ \Gamma^{1}_{00} &= \frac12 \frac{1}{B}(\partial_r A) \\    \Gamma^{1}_{11} &= \frac12 \frac{1}{B}(\partial_r B) \\ \Gamma^{1}_{22} &= -\frac{r}{B} \\ \Gamma^{1}_{33} &= -\frac{r \sin^2 \theta}{B} \\ \Gamma^{2}_{12} = \Gamma^{2}_{21} &= \frac{1}{r} \\ \Gamma^{2}_{33} &= -\sin \theta \cos \theta \\ \Gamma^{3}_{13} = \Gamma^{3}_{31} &= \frac{1}{r} \\ \Gamma^{3}_{23} = \Gamma^{3}_{32} &= \cot \theta \quad \text{(IV.A.18)}  \end{align*}

Note that, although we’ve been using the (+—) form of the metric here, the Christoffel symbol values are the same for the (-+++) form.

Having found the nonzero Christoffel symbols, we can now use them to calculate components of the Ricci tensor. Recall:

    \[ R^{\rho}_{\sigma \mu \nu} = \partial_{\mu} \Gamma^{\rho}_{\nu \sigma} - \partial_{\nu} \Gamma^{\rho}_{\mu \sigma} + \Gamma^{\alpha}_{\nu \sigma} \Gamma^{\rho}_{\mu \alpha} - \Gamma^{\beta}_{\mu \sigma} \Gamma^{\rho}_{\nu \beta} \quad \text{(IV.A.19)} \]

The Ricci tensor is made by contraction of the first and third indices of the Riemann curvature tensor. So:

R00

    \[ R_{00} = R^{\mu}_{0 \mu 0} = \partial_{\mu} \Gamma^{\mu}_{00} - \partial_{0} \Gamma^{\mu}_{\mu 0} + \Gamma^{\alpha}_{00} \Gamma^{\mu}_{\mu \alpha} - \Gamma^{\beta}_{\mu 0} \Gamma^{\mu}_{0 \beta} \quad \text{(IV.A.20)}   \]

\partial_{\mu} \Gamma^{\mu}_{00}:

The only nonzero \Gamma with 2 lower indices is \Gamma^{1}_{00}.

\partial_{0} \Gamma^{\mu}_{\mu 0}:

There are no Christoffel symbols with a right lower 0 index and matching upper and left lower indices so this term goes to zero.

\Gamma^{\alpha}_{00} \Gamma^{\mu}_{\mu \alpha}:

The \alpha indices must match. The only nonzero left \Gamma in this expression is \Gamma^{1}_{00}. Thus, \alpha = 1. This expression then becomes \Gamma^{1}_{00} \Gamma^{\mu}_{\mu 1}.

\Gamma^{\beta}_{\mu 0} \Gamma^{\mu}_{0 \beta}:

Setting \beta equal to 0 or 1 will yield the possibilities of nonzero \Gamma terms.

Putting this information into eq. (4.20) gives us:

    \[ R_{00} = R^{\mu}_{0 \mu 0} = \partial_{1} \Gamma^{1}_{00} - 0 + \Gamma^{1}_{00} \Gamma^{\mu}_{\mu 1} - \Gamma^{0}_{\mu 0} \Gamma^{\mu}_{00} - \Gamma^{1}_{\mu 0} \Gamma^{\mu}_{01} \quad \text{(IV.A.21)}   \]

Setting \mu = 0,\,1,\,2 \text{ or } 3 in the \Gamma^{1}_{00} \Gamma^{\mu}_{\mu 1} will yield nonzero values. Setting \mu = 1 for the \Gamma^{0}_{\mu 0} \Gamma^{\mu}_{00} term and \mu = 0 for the \Gamma^{1}_{\mu 0} \Gamma^{\mu}_{01} term will give us nonzero values. Making these changes, eq. (IV.A.21) becomes:

    \[ R_{00} = R^{\mu}_{0 \mu 0} = \partial_{1} \Gamma^{1}_{00}  + \cancel{\Gamma^{1}_{00} \Gamma^{0}_{0 1}} + \Gamma^{1}_{00} \Gamma^{1}_{1 1} + \Gamma^{1}_{00} \Gamma^{2}_{2 1} + \Gamma^{1}_{00} \Gamma^{3}_{3 1}    - \Gamma^{0}_{1 0} \Gamma^{1}_{00} - \cancel{\Gamma^{1}_{0 0} \Gamma^{0}_{01}} \quad \text{(IV.A.22)}   \]

    \[ R_{00} = R^{\mu}_{0 \mu 0} = \partial_{1} \Gamma^{1}_{00}  + \Gamma^{1}_{00} \Gamma^{1}_{1 1} + \Gamma^{1}_{00} \Gamma^{2}_{2 1} + \Gamma^{1}_{00} \Gamma^{3}_{3 1}    - \Gamma^{0}_{1 0} \Gamma^{1}_{00}  \quad \text{(IV.A.23)}   \]

Now we can substitute in some values for the Christoffel symbol. We have:

\displaystyle= \partial_r \left(  \frac{\partial_r A}{2B} \right) + \left(  \frac{\partial_r A}{2B} \right) \left(  \frac{\partial_r B}{2B} \right)  + \Bigl(  \frac{\partial_r A}{2B} \Bigr) \Bigl(  \frac{1}{r} \Bigr) +  \Bigl(  \frac{\partial_r A}{2B} \Bigr) \Bigl(  \frac{1}{r} \Bigr) - \Bigl(  \frac{\partial_r A}{2A} \Bigr) \Bigl(  \frac{\partial_r A}{2B} \Bigr) \quad \text{(IV.A.24)}

We can just multiply out most of the terms in eq. (IV.A.24). However, we need to apply the product rule to the first term:

    \begin{align*} \partial_r \Bigl( \frac{\partial_r A}{2} B^{-1} \Bigr) &= \frac{\partial_r \partial_r A}{2B} + \frac{\partial_r A}{2} B^{-2}(-1)(\partial_r B) \\ &= \frac{\partial_r \partial_r A}{2B} - \frac{\partial_r A \partial_r B}{2B^2}  \end{align*}

Making these changes transforms eq. (IV.A.24) into:

    \begin{align*} R_{00} &= \frac{\partial_r \partial_r A}{2B} - \frac{\partial_r A \partial_r B}{2B^2} + \frac{\partial_r A \partial_r B}{4B^2} + \frac{\partial_r A}{Br} - \frac{(\partial_r A)^2}{4AB} = 0 \\ &= \frac{\partial_r \partial_r A}{2B} - \frac{\partial_r A \partial_r B}{4B^2} + \frac{\partial_r A}{Br} - \frac{(\partial_r A)^2}{4AB} = 0   \quad \text{(IV.A.25)} \end{align*}

To make things neater, let \partial_r A = A^{\prime} and \partial_r B = B^{\prime}. We’ll also find a common denominator for eq. (IV.A.24) then multiply both sides of the equation by that common denominator to get rid of it. When we do this, we get:

    \begin{align*} R_{00} &= \frac{A^{\prime \prime}}{2B} \frac{2ABr}{2ABr} -   \frac{A^{\prime} B^{\prime}}{4B^2} \frac{rA}{rA} \\ & + \frac{A^{\prime}}{Br} \frac{4AB}{4AB} - \frac{(A^{\prime})^2}{4AB} \frac{rB}{rB} = 0  \\ \\  &=  \frac{2rABA^{\prime \prime} - rAA^{\prime} B^{\prime} + 4ABA^{\prime} - rB(A^{\prime})^2}{4AB^2 r} = 0 \\ \\ &=  2rABA^{\prime \prime} - rAA^{\prime} B^{\prime} + 4ABA^{\prime} - rB(A^{\prime})^2 = 0     \quad \text{(IV.A.26)}  \end{align*}

R11

    \[ R^{\rho}_{\sigma \mu \nu} = \partial_{\mu} \Gamma^{\rho}_{\nu \sigma} - \partial_{\nu} \Gamma^{\rho}_{\mu \sigma} + \Gamma^{\alpha}_{\nu \sigma} \Gamma^{\rho}_{\mu \alpha} - \Gamma^{\beta}_{\mu \sigma} \Gamma^{\rho}_{\nu \beta} \quad \text{(IV.A.18)} \]

    \begin{align*}    R_{11} &= R^{\mu}_{1 \mu 1} = \partial_{\mu}\Gamma^{\mu}_{11} - \partial_{1} \Gamma^{\mu}_{\mu 1} + \Gamma^{\alpha}_{11} \Gamma^{\mu}_{\mu \alpha} - \Gamma^{\beta}_{\mu 1} \Gamma^{\mu}_{1 \beta} \\ \\ &= \partial_{1}\Gamma^{1}_{11} - \partial_{1} \Gamma^{0}_{01} - \partial_{1} \Gamma^{1}_{11} - \partial_{1} \Gamma^{2}_{21} - \partial_{1} \Gamma^{3}_{31} \\ & \quad + \Gamma^{1}_{11} \Gamma^{\mu}_{\mu 1} - \Gamma^{0}_{\mu 1} \Gamma^{\mu}_{10} - \Gamma^{1}_{\mu 1} \Gamma^{\mu}_{11} - \Gamma^{2}_{\mu 1} \Gamma^{\mu}_{12} - \Gamma^{3}_{\mu 1} \Gamma^{\mu}_{13} \\ \\ &= \cancel{\partial_{1}\Gamma^{1}_{11}} - \partial_{1} \Gamma^{0}_{01} - \cancel{\partial_{1} \Gamma^{1}_{11}} - \partial_{1} \Gamma^{2}_{21} - \partial_{1} \Gamma^{3}_{31} \\ & \quad + \Gamma^{1}_{11} \Gamma^{0}_{01} + \bcancel{\Gamma^{1}_{11} \Gamma^{1}_{11}} + \Gamma^{1}_{11} \Gamma^{2}_{21} + \Gamma^{1}_{11} \Gamma^{3}_{31} \\ & \quad - \Gamma^{0}_{01} \Gamma^{0}_{10} - \bcancel{\Gamma^{1}_{11} \Gamma^{1}_{11}} - \Gamma^{2}_{21} \Gamma^{2}_{12} - \Gamma^{3}_{31} \Gamma^{3}_{13} \quad \text{(IV.A.26)}  \end{align*}

Some of the Christoffel symbols in eq. (26) are the same (i.e., \displaystyle \Gamma^2_{12} = \Gamma^2_{21} \Gamma^3_{13} \Gamma^3_{31} = \frac{1}{r}). These can be combined. Combining these terms and removing the terms that are crossed out in the equation above, we get:

R_{11} =  - \partial_{1} \Gamma^{0}_{01} - 2\partial_{1} \Gamma^{2}_{21} + \Gamma^{1}_{11} \Gamma^{0}_{01} + 2\Gamma^{1}_{11} \Gamma^{2}_{21} - \Gamma^{0}_{01} \Gamma^{0}_{10} - 2\Gamma^{2}_{21} \Gamma^{2}_{12} \quad \text{(IV.A.27)}

Next we need to plug in values of Christoffel symbols into eq. (27). I’ll re-write their values for reference.

\displaystyle  \Gamma^{0}_{01} = \Gamma^{0}_{10} = \frac12  \frac{1}{A}(\partial_r A), \quad \Gamma^{1}_{00} = \frac12 \frac{1}{B}(\partial_r A)

\displaystyle  \Gamma^{1}_{11} = \frac12 \frac{1}{B}(\partial_r B), \quad \Gamma^{1}_{22} = -\frac{r}{B}, \quad \Gamma^{1}_{33} = -\frac{r \sin^2 \theta}{B}

\displaystyle   \Gamma^{2}_{12} = \Gamma^{2}_{21} =  \Gamma^{3}_{13} = \Gamma^{3}_{31} =\frac{1}{r}

\displaystyle  \Gamma^{2}_{33} &= -\sin \theta \cos \theta, \quad  \Gamma^{3}_{23} = \Gamma^{3}_{32} &= \cot \theta \quad \text{(IV.A.17)}

So:

    \begin{align*} R_{11} &=  -\partial_r \Bigl( \frac{\partial_r A}{2}  A^{-1} \Bigr)  - 2\partial_r  \Bigl( \frac{1}{r}  \Bigr)   +  \Bigl( \frac{\partial_r B}{2B}  \Bigr)\Bigl( \frac{\partial_r A}{2A}  \Bigr) \\ & \quad +2\Bigl( \frac{\partial_r B}{2B} \Bigr) \Bigl( \frac{1}{r} \Bigr) -\Bigl( \frac{\partial_r A}{2A} \Bigr)\Bigl( \frac{\partial_r A}{2A} \Bigr) -2\Bigl( \frac{1}{r} \Bigr)\Bigl( \frac{1}{r} \Bigr) \\ \\ &= -\Bigl( \frac{\partial_r \partial_r A}{2A} +  \frac{\partial_r A}{2A^2}(-1)\partial_r A \Bigr) - \cancel{2 \Bigl( \frac{1}{r^2}  \Bigr)(-1)} + \Bigl(\frac{ \partial_r A \partial_r B}{4AB}   \Bigr) \\ & \quad +2\frac{\partial_r B }{2Br} -  \frac{(\partial_r A )^2}{4A^2} - \cancel{\frac{2}{r^2}} \\ \\ &= -\frac{\partial_r \partial_r A}{2A} + \frac{(\partial_r A )^2}{4A^2} + \frac{ \partial_r A \partial_r B}{4AB}  +  \frac{\partial_r B }{Br} \\ \\ &= \Biggl(-\frac{\partial_r \partial_r A}{2A} \Biggr) \Biggl(\frac{2ABr}{2ABr} \Biggr)  + \Biggl(\frac{(\partial_r A )^2}{4A^2} \Biggr)  \Biggl(\frac{rB}{rB} \Biggr)  \\ & \quad + \Biggl(\frac{ \partial_r A \partial_r B}{4AB} \Biggr)  \Biggl(\frac{rA}{rA} \Biggr)   +  \Biggl(\frac{\partial_r B }{Br}  \Biggr) \Biggl(\frac{4A^2}{4A^2} \Biggr)  \\ \\ &=  \frac{-2rABA^{\prime \prime} + rB(A^{\prime})^2 + rAA^{\prime} B^{\prime} + 4A^2 B^{\prime}}{4A^2 Br} = 0  \\ \\ &=  -2rABA^{\prime \prime} + rB(A^{\prime})^2 + rAA^{\prime} B^{\prime} + 4A^2 B^{\prime} = 0  \quad \text{(IV.A.28)}   \end{align*}

R22

    \[ R^{\rho}_{\sigma \mu \nu} = \partial_{\mu} \Gamma^{\rho}_{\nu \sigma} - \partial_{\nu} \Gamma^{\rho}_{\mu \sigma} + \Gamma^{\alpha}_{\nu \sigma} \Gamma^{\rho}_{\mu \alpha} - \Gamma^{\beta}_{\mu \sigma} \Gamma^{\rho}_{\nu \beta} \quad \text{(IV.A.18)} \]

    \begin{align*}    R_{22} &= R^{\mu}_{2 \mu 2} = \partial_{\mu}\Gamma^{\mu}_{22} - \partial_{2} \Gamma^{\mu}_{\mu 2} + \Gamma^{\alpha}_{22} \Gamma^{\mu}_{\mu \alpha} - \Gamma^{\beta}_{\mu 2} \Gamma^{\mu}_{2 \beta} \\ \\ &= \partial_{1}\Gamma^{1}_{22} - \partial_{2} \Gamma^{3}_{32} + \Gamma^{1}_{22} \Gamma^{\mu}_{\mu 1} - \Gamma^{1}_{\mu 2} \Gamma^{\mu}_{2 1}  - \Gamma^{2}_{\mu 2} \Gamma^{\mu}_{2 2}  - \Gamma^{3}_{\mu 2} \Gamma^{\mu}_{2 3} \\ \\ &= \partial_{1}\Gamma^{1}_{22} - \partial_{2} \Gamma^{3}_{32} + \Gamma^{1}_{22} \Gamma^{0}_{0 1} + \Gamma^{1}_{22} \Gamma^{1}_{1 1} + \cancel{\Gamma^{1}_{22} \Gamma^{2}_{2 1}} + \bcancel{\Gamma^{1}_{22} \Gamma^{3}_{3 1}} \\ & \quad - \bcancel{\Gamma^{1}_{2 2} \Gamma^{2}_{2 1}} -\cancel{\Gamma^{2}_{1 2}} \Gamma^{1}_{2 2} - \Gamma^{3}_{3 2} \Gamma^{3}_{2 3} \\ \\ &= \partial_{1}\Gamma^{1}_{22} - \partial_{2} \Gamma^{3}_{32} + \Gamma^{1}_{22} \Gamma^{0}_{0 1} + \Gamma^{1}_{22} \Gamma^{1}_{1 1} - \Gamma^{3}_{3 2} \Gamma^{3}_{2 3} \\ \\ &= \partial_{1}\Gamma^{1}_{22} - \partial_{2} \Gamma^{3}_{32} + \Gamma^{1}_{22} \Bigl(\Gamma^{0}_{0 1} + \Gamma^{1}_{1 1} \Bigr) - \Gamma^{3}_{3 2} \Gamma^{3}_{2 3} \quad \text{(IV.A.29)}  \end{align*}

I’ll reproduce the values of the Christoffel again for reference.

\displaystyle  \Gamma^{0}_{01} = \Gamma^{0}_{10} = \frac12  \frac{1}{A}(\partial_r A), \quad \Gamma^{1}_{00} = \frac12 \frac{1}{B}(\partial_r A)

\displaystyle  \Gamma^{1}_{11} = \frac12 \frac{1}{B}(\partial_r B), \quad \Gamma^{1}_{22} = -\frac{r}{B}, \quad \Gamma^{1}_{33} = -\frac{r \sin^2 \theta}{B}

\displaystyle   \Gamma^{2}_{12} = \Gamma^{2}_{21} =  \Gamma^{3}_{13} = \Gamma^{3}_{31} =\frac{1}{r}

\displaystyle  \Gamma^{2}_{33} &= -\sin \theta \cos \theta, \quad  \Gamma^{3}_{23} = \Gamma^{3}_{32} &= \cot \theta \quad \text{(IV.A.17)}

So:

    \begin{align*}    R_{22} &= \partial_r(-rB^{-1}) - \partial_{\theta}\bigl(\cos \theta (\sin \theta)^{-1} \bigr) \\ & + \Biggl( -\frac{r}{B} \Biggr) \Biggl(\frac{\partial_r A}{2A} + \frac{\partial_r B}{2B} \Biggr) - (\cot \theta)^2 \\ \\ &= -B^{-1} -(rB^{-2})(\partial_r B) -(\sin \theta)^{-1}(-sin \theta) - (\cos \theta)(-\sin \theta)^{-2}(\cos \theta) \\ & - \Biggl( \frac{r}{B} \Biggr) \Biggl(\frac{\partial_r A}{2A} + \frac{\partial_r B}{2B} \Biggr) - (\cot \theta)^2 \\ \\ &= -\frac{1}{B} + \frac{r \partial_r B}{B^2} + 1 \\ & + \cancel{(\cot \theta)^2} - \Biggl( \frac{r}{B} \Biggr) \Biggl(\frac{\partial_r A}{2A} + \frac{\partial_r B}{2B} \Biggr) - \cancel{(\cot \theta)^2} \\ \\ &= -\frac{1}{B} + 1  - \frac{r\partial_r A}{2AB} + \frac{r\partial_r B}{2B^2} \\ \\ &= -\frac{1}{B}\Biggl( \frac{2AB}{2AB} \Biggr) + 1\Biggl( \frac{2AB^2}{2AB^2} \Biggr)  - \frac{r\partial_r A}{2AB}\Biggl( \frac{B}{B} \Biggr) + \frac{r\partial_r B}{2B^2}\Biggl( \frac{A}{A} \Biggr) \\ \\  &= \frac{-2AB + 2AB^2 -rA^{\prime}B + rAB^{\prime}}{2AB^2} = 0 \\ \\ &=  -2AB + 2AB^2 -rA^{\prime}B + rAB^{\prime} = 0  \quad \text{(IV.A.30)} \end{align*}

We now have equations for the 3 Ricci tensor components that all equal zero:

2rABA^{\prime \prime} - rAA^{\prime} B^{\prime} + 4ABA^{\prime} - rB(A^{\prime})^2 = 0 \quad \text{(IV.A.25)}
-2rABA^{\prime \prime} + rB(A^{\prime})^2 + rAA^{\prime} B^{\prime} + 4A^2 B^{\prime} = 0 \quad \text{(IV.A.28)}
R_{22} = -2AB + 2AB^2 -rA^{\prime}B + rAB^{\prime} = 0 \quad \text{(IV.A.30)}

Since all of the equations above equal zero:

    \[ R_{00} + R_{11} = 0\quad \text{(IV.A.31)} \]

So:

\begin{array}{cccccccccl} \,&\quad \cancel{2rABA^{\prime \prime}}&- & \bcancel{rAA^{\prime} B^{\prime}} & + & 4ABA^{\prime} & - & \cancel{rB(A^{\prime})^2}& = &0 \\ +&-\cancel{2rABA^{\prime \prime}} & + & \cancel{rB(A^{\prime})^2} & + & \bcancel{rAA^{\prime} B^{\prime}} & + & 4A^2 B^{\prime} & = & 0 \\ \hline\\ \,&\,&\,&\,&\,&\cancel{4A}BA^{\prime}&+&\cancel{4}A\cancel{^2} B^{\prime} & = & 0 \\ \,&\,&\,&\,&\,&BA^{\prime}&+&AB^{\prime} & = & 0 \\   \,&\,&\,&\,&\,&\,&\,&\partial_r(AB) & = & 0 \\ \,&\,&\,&\,&\,&\,&\,&\Rightarrow\,AB & = & K \text{ (constant)}\\  \,&\,&\,&\,&\text{(IV.A.32)}\,&\,&\,&\,&\, \end{array}

Currently, the metric close to our massive object is:

    \[g_{\mu \nu} = \begin{bmatrix} A&0&0&0 \\ 0&-B&0&0 \\0&0&-r^2&0\\ 0&0&0&-r^2 \sin^2 \theta \end{bmatrix} \quad \text{(IV.A.33)} \]

As r goes to infinity (i.e., the low-velocity/weak-field limit), the metric becomes:

    \[g_{\mu \nu} = \begin{bmatrix} 1&0&0&0 \\ 0&-1&0&0 \\0&0&-r^2&0\\ 0&0&0&-r^2 \sin^2 \theta \end{bmatrix} \quad \text{(IV.A.34)} \]

We established that AB = K close our massive object. Far away, from eq. (IV.A.34), AB = 1 \cdot 1 = 1. Therefore:

    \[ B = \frac{1}{r} \quad \text{(IV.A.35)} \]

Taking the derivative of eq. (IV.A.35) gives us:

    \[ B^{\prime} = \partial_r(A^{-1} ) = -\frac{A^{\prime}}{A^2} \quad \text{(IV.A.36)} \]

Substituting the results of eq. (IV.A.35) and eq. (IV.A.36) into the expression for R_{22} (eq. (IV.A.30)), we get:

    \begin{align*} R_{22} &= -2AB + 2AB^2 -rA^{\prime}B + rAB^{\prime} = 0 \\ &= -2\cancel{A}\frac{1}{\cancel{A}} + 2\cancel{A}\Bigl( \frac{1}{A} \Bigr)\cancel{^2} - rA^{\prime}\Bigl( \frac{1}{A} \Bigr) +r\cancel{A}\Bigl( -\frac{A^{\prime}}{A^{\cancel{2}}} \Bigr) = 0 \\ &= -2 + \frac{2}{A} - r\frac{A^{\prime}}{A} - r\frac{A^{\prime}}{A} = 0\\ &= -\cancel{2}A + \cancel{2} - \cancel{2}r A^{\prime} = 0\\ rA^{\prime} &= 1 - A  \quad \text{(IV.A.37)} \end{align*}

The solution to the differential equation given in eq. (4.37) is:

    \[  A(r) = 1 - \frac{k}{r} \quad \text{(IV.A.38)} \]

To see how we arrived at eq. (IV.A.38), click .

So:

\displaystyle A(r) = 1 - \frac{k}{r} \,\, \Rightarrow \,\, B(r) = \frac{1}{1 - \frac{k}{r}}

That makes the metric close to the massive object:

    \[g_{\mu \nu} = \begin{bmatrix} 1 - \frac{k}{r}&0&0&0 \\ 0&-\Bigl( 1 - \frac{k}{r}\Bigr)^{-1}&0&0 \\0&0&-r^2&0\\ 0&0&0&-r^2 \sin^2 \theta \end{bmatrix} \quad \text{(IV.A.39)} \]

We can solve for k by evaluating the Schwarzschild metric in the low velocity/weak gravity limit.

In our derivation of the constant, \kappa, that multiplies the energy-momentum tensor, we arrived at the following expression:

    \begin{align*} \frac{\partial \phi}{\partial x^i} &= \Gamma^{i}_{00} c^2  \\ \Gamma^{i}_{00} &= \frac{\partial \phi}{\partial x^i} \frac{1}{c^2} \quad \text{(23)} \end{align*}

For a refresher on where that came from, click :

In the weak gravitational limit, we can approximate the metric as:

    \[ g_{\mu \nu} \approx \eta _{\mu \nu} + h_{\mu \nu} \quad \| h_{\mu \nu} \| \ll 1 \quad \text{(IV.A.40)} \]

Eq. (40) means that when gravity is weak, the metric deviates only slightly from the Minkowski metric (i.e., spacetime is only slightly curved). Also, we’ll use Cartesian coordinates for the Minkowski metric, not spherical coordinates.

Note that when we do summations using this approximations, h_{\mu \nu} can be neglected. When taking derivatives, the Minkowski metric is made of all constants and is, thus, zero. However, the derivative of h_{\mu \nu} is nonzero:

Summations

    \begin{align*}   T^{\mu \alpha} g{\alpha \nu} &= T^{\mu \alpha} (\eta _{\alpha \nu} + h_{\alpha  \nu}) \\ &= T^{\mu \alpha}\eta _{\alpha \nu} +   T^{\mu \alpha}h_{\alpha  \nu} \\ &\approx  T^{\mu \alpha}\eta _{\alpha \nu} + 0 \\ &\approx  T^{\mu \alpha}h_{\alpha  \nu} \end{align*}

Derivatives

    \begin{align*} \partial_{\sigma}g_{\mu \nu} &=  \partial_{\sigma}(\eta _{\mu \nu} + h_{\mu \nu})\\ &= \partial_{\sigma} \eta _{\mu \nu}  + \partial_{\sigma}h_{\mu \nu} \\&\approx 0 +   \partial_{\sigma}h_{\mu \nu}  \\ &\approx  \partial_{\sigma}h_{\mu \nu}\end{align*}

Now we solve for \Gamma^{i}_{oo}:

    \[  \Gamma^{\sigma}_{\mu \nu} &= \frac12 g^{\sigma \alpha} (\partial_{\nu}g_{\alpha \mu} +  \partial_{\mu}g_{\alpha \nu} -  \partial_{\alpha}g_{\mu \nu} ) \quad \text{(IV.A.41)} \]

    \[  \Gamma^{i}_{00} = \frac12 g^{i \alpha} (\partial_{0}g_{\alpha 0} +  \partial_{0}g_{\alpha 0} -  \partial_{\alpha}g_{00} )   \quad \text{(IV.A.42)}  \]

As discussed above, in this summation, g^{i \alpha} can be approximated by \eta^{i \alpha} and h^{i \alpha} can be ignored. For the derivatives, the \partial g terms can be ignored and the \partial h terms are potentially nonzero.

    \[  \Gamma^{i}_{00} = \frac12 \eta^{i \alpha} (\partial_{0}h_{\alpha 0} +  \partial_{0}h_{\alpha 0} -  \partial_{\alpha}h_{00} )  \quad \text{(IV.A.43)}   \]

All of the Minkowski spacial components are -1 so we can replace \eta{ii} with -1.

    \[ \Gamma^{i}_{00} = \frac12 \eta^{ii} (\cancel{\partial_{0}h_{i0}} + \cancel{\partial_{0}h_{i0} }- \partial_{i}h_{00} ) \quad \text{(IV.A.44)} \]

    \begin{align*}  \Gamma^{i}_{00} &= \frac12 (-1)(- \partial_{i}h_{00})  \\ &= \frac12 \partial_{i}h_{00} \quad \text{(IV.A.45)} \end{align*}

So:

    \[ \Gamma^{i}_{00} = \frac12 \partial_{i}h_{00} = \frac{1}{c^2} \partial_{i} \phi  \quad \text{(IV.A.46)} \]

    \begin{align*}\int  \frac12 \partial_{i}h_{00} &= \int \frac{1}{c^2} \partial_{i} \phi  \\  \frac12 h_{00} &= \frac{1}{c^2} \phi + b\\  h_{00} &= \frac{2 \phi}{c^2}  \quad \text{(IV.A.47)}  \end{align*}

b is a constant of integration. b must go to 0 since g_{00} \approx  \eta_{00} + h_{00}, not g_{00} \approx  \eta_{00} + h_{00} + b

    \[ F = ma = \frac{G\cancel{m}M}{r^2}(-\vec{e}_r)  = -\cancel{m}\nabla \phi  \quad \text{(IV.A.48)}  \]

If we take the integral of both sides, we get:

    \[ \phi = -\frac{GM}{r}  \quad \text{(IV.A.49)}   \]

As with eq. (IV.A.47), we get a constant of integration when we integrate but, like b in eq. (IV.A.47), it can be ignored since, when we take the gradient of \phi, it disappears.

Therefore:

    \[ h_{00} = -\frac{2GM}{c^2 r}   \quad \text{(IV.A.50}   \]

Now we know what h_{00} = h_{tt} is in Cartesian coordinates. But the time coordinates are the same (ct = ct) in both Cartesian and Spherical coordinates. Therefore, we can use our value for h_{00} to find the value of k in our metric in spherical coordinates.

We said that:

    \[  g_{\mu \nu} \approx \eta _{\mu \nu} + h_{\mu \nu}  \]

Therefore:

    \[  g_{00} \approx \eta _{00} + h_{00}  \]

and

    \begin{align*} g_{00} &\approx 1 +  (-\frac{2GM}{c^2 r})  \\ &= 1 -\frac{2GM}{c^2 r}    \quad \text{(IV.A.51)}  \end{align*}

We can see from eq. (IV.A.51) that \displaystyle k=\frac{2GM}{c^2}. This is referred to as the Schwarzschild radius R_s, which, as will be discussed below, defines the event horizon of a black hole.

We put this value of g_{00} into our metric that describes spacetime near our massive object:

    \[g_{\mu \nu} = \begin{bmatrix} 1 -\frac{2GM}{c^2 r}&0&0&0 \\ 0&-\Bigl( 1 -\frac{2GM}{c^2 r}\Bigr)^{-1}&0&0 \\0&0&-r^2&0\\ 0&0&0&-r^2 \sin^2 \theta \end{bmatrix} \quad \text{(IV.A.52)} \]

Eq. (IV.A..52) is the Schwarzschild metric.

IV.B Implications

IV.B.1 Gravitational Time Dilation

IV.B.1.a Derivation

Suppose we have an uncharged, non-rotating, spherically symmetric planet of mass M in outer space, far from any other matter and an observer, A, a coordinated distance r from the center of the planet, at rest with respect to the planet. Observer B is also located in empty space, far enough away from the planet such that spacetime around him closely approximates the Minkowski metric. And there is no matter between him and Observer A . Observer A has a light source and emits 2 short pulses of light in rapid succession. He measures the infinitesimal time interval between pulses with a highly accurate clock. Because the light pulse events are local to Observer A, the time interval between pulses is the proper time interval which we’ll, therefore, call d \tau.

Observer B, who also possesses a highly accurate clock and is at rest with respect to the planet, eventually receives the light pulses and measures the interval between them which we’ll denote dt.

Since the planet is an uncharged, non-rotating, spherically symmetric massive object, we can apply the Schwarzschild metric. And since Observer B is at rest with respect to the gravitational source, dr = d\theta = d \phi = 0. Using the mainly minuses form of the metric, we have:

    \begin{align*} ds^s = c^2 d\tau^2 &= \Biggl(1 -\frac{2GM}{c^2 r}\Biggr)c^2dt^2 -  \Biggl( 1 -\frac{2GM}{c^2 r}^{-1}\Biggr)dr^2 -r^2 d\theta^2 -r^2 \sin^2 \theta  d\phi^2 \\ &=  \Biggl( 1 -\frac{2GM}{c^2 r}\Biggr)c^2dt^2 -  \Biggl(  1 -\frac{2GM}{c^2 r}\Bigr)^{-1}\Biggr)(0) -r^2 (0) -r^2 \sin^2 \theta(0) \\ &= \Biggl( 1 -\frac{2GM}{c^2 r}\Biggr)c^2dt^2  \quad \quad \text{(IV.B.1.a.1)}  \end{align*}

Dividing both sides of eq. (IV.B.1.a.1) by c^2 gives us:

    \[ d\tau = \Biggl( 1 -\frac{2GM}{c^2 r}\Biggr)^{\frac12} dt  \quad \quad \text{(IV.B.1.a.2)}  \]

Because 1 -\frac{2GM}{c^2 r} < 1, eq. (IV.B.2) means that an observer (such as A) near a massive object measures a proper time that is less than an observer (such as B) far away.

Or conversely:

    \[ dt =\frac{d\tau}{ \Biggl( 1 -\frac{2GM}{c^2 r}\Biggr)^{\frac12}} \quad \quad \text{(IV.B.1.a.3)}  \]

an observer B far away from a massive object measures a larger time interval occurring near the massive object than an observer A near the object.

Put another way, an observer far from a massive object sees a clock near the massive object as ticking slower than their clock similar to the way, in special relativity, an observer sees a moving clock as ticking slower than their clock.

There are differences between time dilation in special and general relativity though:

  • In special relativity, observers are moving relative to each other while in general relativity, the are stationary to each other.
  • In special relativity, 2 observers moving relative to each other see the other observer’s clock to be running slow. On the other hand, in general relativity, both observers agree that the clock nearer a massive object runs slower an observer further away (i.e., where the gravitational effect is weaker).

As an aside, note that the formula for gravitational time dilation bears a resemblance to the time dilation formula in special relativity:

    \[ \[ d\tau = \Biggl( 1 -\frac{2GM}{c^2 r}\Biggr)^{\frac12} dt \quad \text{vs} \quad d\tau = \Biggl( 1 -\frac{v^2}{c^2 }\Biggr)^{\frac12} dt \]

Indeed, the velocity it takes to generate enough energy to overcome the gravitational effects of a massive object – the escape velocity v_{\escape} – is given by:

    \[  v_{\escape} = \sqrt{\frac{2MG}{r}} \quad \text{(IV.B.1.a.4)}  \]

Thus, the escape velocity squared, \displaystyle \frac{2MG}{r}, in the gravitational time dilation equation is analogous to the v^2 term in the special relativity time dilation formula.

IV.B.1.b Example

Consider a clock at the top of Mount Everest that measures 1 second (a time interval we’ll call d\tau_{\text{top}}) and a clock at sea level that measures that also measures 1 second (a time interval we’ll refer to as d\tau_{\text{base}}). We want to know how the time intervals compare.

We’ll assume the conditions in which the Schwarzschild metric is valid are in play. We know that under these conditions, the equation that describes the proper time measured by the clock on Mount Everest is:

    \[ d\tau_{\text{top}} = \Biggl( 1 -\frac{2GM}{c^2 (r+H)}\Biggr)^{\frac12} dt  \quad \text{(IV.B.1.b.1)}  \]

where

G is Newton’s gravitational constant = \displaystyle 6.6742 \times 10^{-11}\, \frac{Nm^2}{kg^2}
M is the mass of the earth = 5.97219 \times 10^{14}\,kg
c is the speed of light = 2.99792458 \time 10^8\,m/s
r is the radius of the earth = 6.371 \times 10^6\,m
H is the height of Mount Everest = 8.848 \times 10^3\,m

The equation that describes the proper time measured by the clock at sea level is:

    \[ d\tau_{\text{base}} = \Biggl( 1 -\frac{2GM}{c^2 r}\Biggr)^{\frac12} dt  \quad \text{(IV.B.1.b.2)}  \]

To see how they compare, we take the ratio of \displaystyle \gamma = \frac{d\tau_{\text{top}}}{d\tau_{\text{base}} }  \quad \text{(IV.B.1.b.3)}:

    \begin{align*} \gamma = \frac{d\tau_{\text{top}}}{d\tau_{\text{base}} } &= \frac{\Biggl( 1 -\frac{2GM}{c^2 (r+H)}\Biggr)^{\frac12} \cancel{dt} }{\Biggl( 1 -\frac{2GM}{c^2 r}\Biggr)^{\frac12} \cancel{dt}} \\ &= 1.000\, 000\, 000\, 000\,965   \quad \text{(IV.B.1.b.4)}  \end{align*}

Eq. (IV.B.1.b.4) means that the clock on Mount Everest ticks slightly faster than the clock at sea level. To put this in perspective:

\gamma-1 &= 9.65 \times 10^{-13}\,\frac{\text{extra sec}}{\text{sec}}

\displaystyle 9.65 \times 10^{-13}  \times \underbrace{60\frac{\text{sec}}{\text{hr}}  \times 60 \frac{\text{hr}}{\text{d}}  \times 365 .25\frac{\text{d}}{\text{yr}}}_{31,557,600} \approx 30.45 \times 10^{-6} \frac{\text{sec}}{\text{yr}}

\displaystyle \frac{1\text{ sec}}{30.45 \times 10^{-6} \frac{\text{sec}}{\text{yr}}} \approx 32,837 \text{ yr}

That means that the clock on top of Mount Everest gets 1 second ahead of the clock at sea level approximately every 32,837 years.

This experiment has actually been done (using atomic clocks accurate to within 1/15,000,000,000 of a second per year) and yielded the above results. These results have also been replicated in numerous other studies. For example, it’s been shown that time runs slightly faster on jet planes. In addition, GPS systems, which utilize satellites far above the earth’s surface, depend on corrections for gravitational time dilation (as well as corrections for time dilation related to satellite motion) for proper function.

IV.B.2 Gravitational Redshift

We’ll start with a basic review of waves.

Waves of different frequency/wavelength
Figure IV.B.2

The diagram above shows 2 waves. Recall that the distance between peaks is called the wavelength. The number of wavelengths that cross a point per unit time is called the wave’s frequency. The time it takes to complete one wavelength is the wave’s period. The velocity of the wave is the speed at which the wave moves through space. Some pertinent relationships between these entities are:

\displaystyle f = \frac{1}{T}
\displaystyle T = \frac{1}{f}
\displaystyle v = f\lambda

where

f is the wave’s frequency
T is the wave’s period
\lambda is the wave’s wavelength
v is the wave’s velocity

Note that waves of short wavelength (figure IV.B.2a) have a high frequency and higher energy while waves of long wavelength (figure IV.B.2b) have a low frequency and lower energy.

Let’s reproduce the setup we had in section IV.B.1.a where we derived gravitational time dilation. We have observer A near a planet of mass M and an observer B far enough away from the planet that the gravitational effect from it are negligible i.e., spacetime around be is essentially Minkowski spacetime. All of these objects are floating in empty space and are stationary with respect to each other. Observer A again has a light source. He flashes it twice, at the peaks of the light waves leaving the light source. Thus, the proper time measured by observer A, d\tau, represents the period of the wave. Observer B, again, measures the time interval of the light sent from A as dt. The equation describing the relationship between d\tau and dt is:

    \[ dt =\frac{d\tau}{ \Biggl( 1 -\frac{2GM}{c^2 r}\Biggr)^{\frac12}}  \quad \text{(IV.B.2.1)} \]

From the definitions above, we know that \displaystyle f = \frac{1}{T}. Therefore:

    \[ f_R = f_E \Biggl( 1 -\frac{2GM}{c^2 r}\Biggr)^{\frac12}}  \quad \text{(IV.B.2.2)}  \]

where

f_E is the frequency of the light emitted by observer A
f_R is the frequency of the light received by observer B

Because \displaystyle  1 -\frac{2GM}{c^2 r} is < 1, eq. (IV.B.2.2) tells us that the frequency of light received is lower than when it was emitted, and thus, has less energy. The gravitational field causes it to lose energy just as a particle of matter loses energy as it climbs out of a gravitational field.

This effect was confirmed by a famous experiment performed by Pound and Rebka in 1959. They showed that the frequency of gamma rays emitted at the top of Harvard University’s Jefferson laboratory tower was slightly lower than when they were emitted 74 feet below. Their results have since been replicated multiple times.

IV.B.3 Gravitational Length Contraction

In our discussion of the equivalence principle, we spoke of how, on a rotating disc, an observer at the center of the disc would see measuring rods laid end-to-end on the circumference of the periphery as length contracted. This is because the rods are in motion relative to the observer. And because the rods closer to the the observer are moving will less relative velocity than the rods further away, the degree of length contraction will vary with the radial distance from the observer. We said that, because -per the equivalence principle – acceleration is equivalent to a gravitational effect, we would expect similar length contraction in the presence of a gravitational field, contraction that would increase the closer to the gravitational source we got.

Subsequently, we talked about tidal effects. We noted that, in the presence of gravity, an object will experience elongation in the radial direction and contraction (as in the rotating disc) in the transverse direction.

We can’t directly demonstrate these effects from mathematical manipulation of the Schwarzschild metric. However, some interesting affects can be deduced by examination of this metric as we move in the radial direction toward a massive body.

The first thing we should establish is that the r in the Schwarzschild metric equation is not simply the proper distance from the center of a spherically symmetric mass to some point outside the mass. The proper distance between 2 points is the distance we get if we lay out a series of rulers between the 2 points. The problem we have with this in curved spacetime is that the units on the rulers are of different lengths for different points in spacetime. Therefore, to calculate the proper distance along a path in curved spacetime, we have to parameterize the trajectory and take an integral:

    \[  L_0 = \int \| \frac{d}{d\lambda}d\lambda  \|  \quad \text{(IV.B.3.1)}  \]

where \lambda is the path parameter. In this calculation, we assume that \theta and \phi are constant.

\displaystyle \frac{d}{d\lambda} is spacelike (i.e., time is constant). We’re using the mainly minuses metric, thus the expression under the square root sign is negative. So our integral becomes:

    \begin{align*}  L_0 &= \int \sqrt{-\frac{d}{d\lambda} \cdot \frac{d}{d\lambda}} d\lambda \\ &= \int \sqrt{-\Bigl( \underbrace{\cancel{\frac{dct}{d\lambda}}}_{0} \frac{\partial}{\partial ct} + \frac{dr}{d\lambda} \frac{\partial}{\partial r} \Bigr) \cdot \Bigl( \underbrace{\cancel{\frac{dct}{d\lambda}}}_{0} \frac{\partial}{\partial ct} + \frac{dr}{d\lambda} \frac{\partial}{\partial r} \Bigr) } d\lambda \\ &\text{Path has constant } ct \,\, \Rightarrow \,\, \frac{dct}{d\lambda} = 0 \\ &= \int \sqrt{- \Bigl( \frac{dr}{d\lambda} \frac{\partial}{\partial r} \Bigr) \cdot \Bigl( \frac{dr}{d\lambda} \frac{\partial}{\partial r} \Bigr) }d\lambda \\ & \text{Use parameter } \lambda = r \\ &= \int \sqrt{- \Bigl( \underbrace{\cancel{\frac{dr}{dr}}}_{1} \frac{\partial}{\partial r} \Bigr) \cdot \Bigl( \frac{dr}{dr} \frac{\partial}{\partial r} \Bigr) }dr \\ &= \int \sqrt{- \frac{\partial}{\partial r} \cdot \frac{\partial}{\partial r} } dr \\ &= \int \sqrt{-g_{rr}} dr \quad \text{(IV.B.3.2)} \end{align*}

Remember that the Schwarzschild metric with constant \theta and \phi is:

g_{\mu \nu} = \begin{bmatrix} 1-\displaystyle \frac{R_s}{r} & 0 \\ 0 & -\displaystyle \frac{1}{1-\displaystyle \frac{R_s}{r}} \end{bmatrix} where \displaystyle R_s = \frac{2GM}{c^2}. This implies that g_{rr} = \displaystyle -\frac{1}{1-\displaystyle\frac{R_s}{r}}. We substitute this value into eq. (IV.B.3.2) and obtain:

    \begin{align*}     L_0 &= \int \sqrt{-g_{rr}} dr \\ &= \int \sqrt{-\Bigl(-\frac{1}{1-\displaystyle\frac{R_s}{r}}\Bigr)}dr \\ &= \int \sqrt{\Bigl(\frac{1}{1-\displaystyle\frac{R_s}{r}}\Bigr)}dr \\ &= \int \sqrt{\Bigl(\frac{1}{1-\displaystyle\frac{R_s}{r}}\Bigr)\frac{r}{r}}dr \\ &= \int \sqrt{ \frac{r}{r-R_s}}dr \quad \text{(IV.B.3.3)} \end{align*}

So the general equation for proper length between points r_1<r_2} in Schwarzschild spacetime is:

    \[ L_0 =  \int_a^b dL_0 = \int_{r_1}^{r_2} \sqrt{ \frac{r}{r-R_s}}dr \quad \text{(IV.B.3.4)} \]

So in Minkowski (flat) spacetime, where r \gg R_s \,\, \Rightarrow \,\, \displaystyle \sqrt{ \frac{r}{r-R_s}}  \rightarrow 1. Thus, the expression for proper length is simple:

    \[ L_0 = \int_{r_1}^{r_2} 1 \cdot dr = b - a  \quad \text{(IV.B.3.5)} \]

But in Schwarzschild spacetime, the relationship is more complicated. Qualitatively, though, we can see that as r\rightarrow R_s, dL_0 gets progressively greater than dr. If we want a quantitative evaluation, we need to evaluate the integral in eq. (IV.B.3.4). This is no small task. To see it in all its glory, click .

It turns out that the solution to the indefinite form of the integral in eq. (IV.B.3.4) is:

    \[ L_{0}=\sqrt{r} \sqrt{r}-R_{s}}+R_{s} \ln \left(\sqrt{r}+\sqrt{r-R_{s}}\right) + C \quad \text{(IV.B.3.5)} \]

If we want to determine the proper distance between 2 points r=r_2>r_1>R_s, we must solve the definite integral:

    \begin{align*} L_0 &= \int_{r_1}^{r_2} \sqrt{ \frac{r}{r-R_s}}dr  = \eval_{r_1}^{r_2} \sqrt{r} \sqrt{r-R_{s}}+R_{s} \ln \left(\sqrt{r}+\sqrt{r-R_{s}}\right) \\ \\ &=  \sqrt{r_2} \sqrt{r_2-R_{s}} - \sqrt{r_1} \sqrt{r_1-R_{s}} \\  &+  R_{s} \ln \left(\sqrt{r_2}+\sqrt{r_2-R_{s}}\right) - R_{s} \ln \left(\sqrt{r_1}+\sqrt{r_1-R_{s}}\right)   \quad \text{(IV.B.3.6)} \end{align*}

We can “simplify” eq. (IV.B.3.6) further, as follows:

    \[  L_0 = \sqrt{r_2} \sqrt{r_2-R_{s}} - \sqrt{r_1} \sqrt{r_1-R_{s}} + R_{s} \ln \left(\frac{\sqrt{r_2}+\sqrt{r_2-R_{s}}}{\sqrt{r_1}+\sqrt{r_1-R_{s}}} \right)  \quad \text{(IV.B.3.7)}   \]

At first glance, eq. (IV.B.3.7) shares elements with – but is not exactly the same as – the equation given by K. Griest, Physics 161: Black Holes: Lecture 9/10: 25/27 Jan (for which he does not supply a derivation):

    \[\Delta \sigma=r_{2} A_{2}-r_{1} A_{1}+\frac{R_{S}}{2} \ln \left(\frac{r_{2} A_{2}+r_{2}-R_{S} / 2}{r_{1} A_{1}+r_{1}-R_{S} / 2}\right)  \quad \text{(IV.B.3.8)}  \]

where

\Delta \sigma = L_0
\displaystyle A_i = \sqrt{1-\frac{2GM}{r_i c^2}} = \sqrt{1-\frac{R_s}{r_i}}

However, with some algebraic manipulation, we can show that eq. (IV.B.3.7) and eq. (IV.B.3.8) are the same. To see these manipulations, click .

Griest, in his online book, provides an example of the use of this equation. Consider a black hole with r=R_s=8.86 km\,. If we start at r_2=30 km\, and move to r_1=20\, km, we might think that we’ve traveled 10 km. However, if we plug the values above into eq. (IV.B.3.7), we found that the proper distance that we’ve moved is actually 12.51 km. However, if orbit the black hole at r=20 km\,, we might expect that the circumference of the circle in which we traveled would be 2\pi 17.49\, km. However, the circumference that we’d actually measure would be2\pi 20\,km. To see why this is so, click .

Figure IV.B.3.1, referred to as an embedding diagram, may help to visualize why proper length increases the closer we get to a massive object.

Embedding diagram
Figure IV.B.3.1

We can see from the diagram that the change in proper distance, dL_0, is greater than the change in coordinate distance, dr, because spacetime is curved.

The paraboloid surface traced out by outer margin of the concentric circles (shown as black half-parabola curves in figure IV.C.1) is referred to as Flamm’s paraboloid. Its formula is:

    \[ w(r) = 2\sqrt(R_s(r-R_s)}   \]

where ct and \theta = \displaystyle \frac{\pi}{2} are constant and r and \phi vary. To see where this formula comes from, click .

IV.B.4 Black Holes

IV.B.4.a Introduction

Notice in the Schwarzschild metric

    \[  g_{\mu \nu} = \begin{bmatrix}  1-\displaystyle \frac{R_s}{r} &0&0&0 \\ 0&-\displaystyle \frac{1}{1-\displaystyle \frac{R_s}{r}}&0&0 \\ 0&0&-r^2&0 \\ &0&0&0&-r^2\sin^2 \theta\end{bmatrix}  \quad \text{(IV.B.4.1)}  \]

if r=0, the \displaystyle \frac{R_s}{r} portion of the g_{00} component of the metric goes to infinity. This represents the true singularity that occurs at the center of a black hole, a locus where general relativity no longer defined.

On the other hand, if r=R_s the righthand term in the denominator of the g_{rr} component of the metric, then the g_{rr} component goes to infinity. However, this is apparent singularity is referred to as a coordinate singularity since we can find coordinates at which this singularity does not occur at this value of r. This location (or actually, spherical shell of locations) is called the event horizon of a black hole with center at r=0. The value r=R_s is called the Schwarzschild radius.

As eigenchris points out in his YouTube video, Relativity 108b: Schwarzschild Metric – Interpretation (Gravitational Time Dilation, Event Horizon), this is similar to what happens at the origin of the polar coordinate system of flat spacetime. At the origin polar coordinates, the value of r is always 0 and the \theta variable has an infinite number of values. Thus, the basis vector \vec{e}_r = \displaystyle \frac{\partial}{\partial r} is undefined because r can’t be negative. And the basis vector \vec{e}_{\theta} is undefined because how can we take the derivative of an entity that has more than one value at one time. However, flat spacetime in Cartesian coordinates shows no such problems at the origin.

Coordinate systems have been developed (eg. Eddington-Finkelstein coordinates, Kruskal-Szekeres coordinates) that removes the singularity at r=R_s.

A black hole occurs when all the mass of an object becomes concentrated to within the object’s Schwarzschild radius. Recall that the Schwarzschild radius of an object is defined as:

    \[ R_s = \displaystyle \frac{2GM}{c^2}   \quad \text{(IV.B.4.2)}   \]

where

G = 6.6743 \times 10^{11}\,\displaystyle \frac{Nm^2}{kg ^2}
c = 2.99792458 \times 10^8\,\displaystyle \frac{m}{s}

G is very small and c is very small. Thus, the Schwarzschild radius of an object is, in general, quite small. For example, the earth’s mass is 5.97 \times 10^{24}\,kg. Therefore:

    \begin{align*} R_{s_{Earth}} &= \displaystyle \frac{5.97\times 10^{24}\cdot 2 \cdot 6.6743 \times 10^{-11} }{(2.99792458\times 10^8)^2}\,kg \cdot \frac{kg\cdot m}{s^2}\cdot \frac{m^2}{kg^2}\cdot \frac{s^2}{m^2} \\ &= 8.87\,mm    \quad \text{(IV.B.4.3)}    \end{align*}

So, to become a black hole, all the mass of the earth would be needed to be compressed to a radius less than 1 cm. Similar calculations show that, to become a black hole, the mass of the sun would need to be compressed down to a radius of less than 2.96 km.

To understand how spacetime is curved around a black hole, let’s look at the behavior of light around one. We do this by looking at light-like geodesics. For a light-like geodesic, tangent vectors have a squared length of zero:

    \[ \| \frac{d}{d\lambda}  \|^2 = \frac{dx^{\mu}}{d\lambda} \frac{dx^{\nu}}{d\lambda} g_{\mu \nu}= 0   \quad \text{(IV.B.4.4)}   \]

We’ll only consider light beams traveling in the radial direction; \theta and \phi will be kept constant. Therefore, the metric we’ll be working with is reduced to:

    \[ g_{\mu \nu} = \begin{bmatrix} 1-\displaystyle \frac{R_s}{r} &0 \\ 0&  -\displaystyle \frac{1}{1-\displaystyle \frac{R_s}{r}} \end{bmatrix}   \quad \text{(IV.B.4.5)}   \]

Eq. (IV.B.4.4) can be re-written:

    \[ \Bigl( \frac{dct}{d\lambda}  \Bigr)^2\Bigl(  1-\frac{R_s}{r}\Bigr) -  \Bigl( \frac{dr}{d\lambda}  \Bigr)^2  \Bigl(  \displaystyle \frac{1}{1-\displaystyle \frac{R_s}{r}}\Bigr)  = 0 \]

    \[ \Bigl( \frac{dct}{d\lambda}  \Bigr)^2\Bigl(  1-\frac{R_s}{r}\Bigr) =  \Bigl( \frac{dr}{d\lambda}  \Bigr)^2  \Bigl(  \displaystyle \frac{1}{1-\displaystyle \frac{R_s}{r}}\Bigr)    \]

    \begin{align*}     \Biggl[\Bigl( \frac{dct}{d\lambda}  \Bigr)^2\Bigl(  1-\frac{R_s}{r}\Bigr)\Biggr]\Bigl(  1-\frac{R_s}{r}\Bigr) &=  \Biggl[\Bigl( \frac{dr}{d\lambda}  \Bigr)^2  \Bigl(  \displaystyle \frac{1}{1-\displaystyle \frac{R_s}{r}}\Bigr)\Biggr]\Bigl(  1-\frac{R_s}{r}\Bigr) \\ \Bigl( \frac{dct}{d\lambda}  \Bigr)^2\Bigl(  1-\frac{R_s}{r}\Bigr)^2 &= \Bigl( \frac{dr}{d\lambda}  \Bigr)^2 \\ \pm \frac{dct}{d\lambda}\Bigl(  1-\frac{R_s}{r}\Bigr) &= \frac{dr}{d\lambda}   \quad \text{(IV.B.4.6)}    \end{align*}

We can make our parameter anything that we want. Let’s make it \lambda = ct. We get:

    \begin{align*}\pm \underbrace{\cancel{\frac{dct}{dct}}}_{1}\Bigl( 1-\frac{R_s}{r}\Bigr) &= \frac{dr}{d\lambda} \\ \pm \Bigl( 1-\frac{R_s}{r}\Bigr) &= \frac{dr}{dct} \\ \frac{dr}{dt} &= \pm c\Bigl( 1-\frac{R_s}{r}\Bigr) \quad \text{(IV.B.4.7)}  \end{align*}

What eq. (IV.B.4.7) appears to be saying is that the speed of light varies with position as light beams approach the event horizon of a black hole. However, this is an artifact of coordinate choice. If we measure the speed of light locally (i.e., using proper time and proper length) it will be true in all coordinate systems, and thus, an expression of a physical law, that:

    \[\displaystyle \frac{dL_0}{d\tau} = c \quad \text{(IV.B.4.8)} \]

Indeed, \displaystyle \frac{dr}{dt} is just a ratio of coordinates and coordinates don’t always have physical meaning. In this case, remember:

  • t is the proper time only for an observer at r=\infty
  • r doesn’t measure proper length

As eigenchris notes, the lesson here is: Don’t trust your eyes looking at a graph or coordinates; the only thing we should trust for making physical measurements is the metric.

But let’s take a closer look at eq. (IV.B.4.7). Solving this differential equation gives us:

    \[ct(r) &= \pm \Bigl( r + R_s\ln(r-R_s) \Bigr) + k \quad \text{(IV.B.4.9)} \]

(To see how we got this, click .)

If we plot eq. (IV.B.4.9), with r>R_s we get something like figure IV.B.4.1.

Light-like geodesics for r>R_s =2
Figure IV.B.4.1

The purple graphs are what an observer at infinity would see for light beams heading toward the event horizon of a black hole with a Schwarzschild radius of 2 and varying values of k. It’s not easy to see from this diagram, but these curves never reach the horizon. The interpretation of this finding is that the an observer far away from a black hole will see a light beam approach but never reach the horizon.

The red curves represent light beams moving away from the event horizon. Similar to the light beams represented by the red curves, the distant observer never sees the light beams represented by the red curves touch the horizon.

Figure IV.B.4.2, taken from 28:51 of the eigenchris YouTube video, Relativity 108b: Schwarzschild Metric – Interpretation (Gravitational Time Dilation, Event Horizon), shows curves representing light beams inside the horizon (on the left side) as well as those outside the horizon (on the right).

Light-like geodesics for r
Figure IV.B.4.2

To a distant observer, the light beams outside the horizon never reach the horizon. Furthermore, the light cones nearer the horizon are narrower than those farther away. Those farther away become wider, and at infinity, become normal light cones. Again, this is according to the observer at infinity and, as will be discussed later, is artifact of the coordinates we’re using.

On the other hand, the light beams inside the horizon all point toward the singularity at the center of the black hole and none of them ever touch the horizon. This can be interpreted as meaning that light cannot escape from a black hole, which, in fact, turns out to be true.

IV.B.4.b Alternative Coordinates
IV.B.4.b.i Eddington-Finkelstein

In progress

IV.B.4.b.ii Kruskal-Szekeres

In progress

V Confirmatory Evidence

V.A Mercury Perihelion

As early as the mid nineteenth century, astronomers have known that the orbits of planets around the sun are not perfect ellipses. Instead, each time a planet completes an orbit, the next orbit is slightly “tilted” compared to the previous orbit (figure V.1). This can be made more precise by defining the tilt by the

Precession of the perihelion of Mercury as it orbits the sun
Figure V.1

relative position of the planet’s perihelion between successive orbits. As seen in figure V.1, the perihelion is the shortest distance between the sun and the planet during its orbit. Specifically, each time a planet orbits the sun, it doesn’t return to the point from which it started. Instead, it goes a little past that point. The perihelion of the new orbit makes a nonzero angle with the perihelion of the previous orbit. (If that angle were zero, then that would mean the planet returned to its original position.) This is called the advance (or precession of) the perihelion of the planet. It is most pronounced for Mercury, the planet closest to the sun.

The effect is tiny. Mercury, for example, gains an extra angle of only 575 arcseconds each century. There are 3600 arcseconds per degree so that’s a little less than 1/6 of one degree per century(0.1597 of a degree, to be exact; obviously, figure V.1 is not drawn to scale).

Astronomers since the late 1800’s have been able to explain most of this perihelion advance, mainly via the gravitational effect of planets and other objects with mass in the solar system. However, until the advent of general relativity, 43 arcseconds per century of perihelion advance remained unaccounted for. In 1915, Albert Einstein used his theory of general relativity to account for these last 43 arcseconds.

The discussion that follows shows how general relativity solved this problem.

V.A.1 Orbits in Newtonian Gravity

Orbit in Newtonian Gravity
Figure V.A.1.1

    \[\frac{d\vec{R}}{dr} =\vec{e}_r \quad  \frac{d\vec{R}}{d\phi} =\vec{e}_{\phi} \quad \text{(V.A.1)} \]

    \[ \frac{d\vec{R}}{dt} = \frac{dr}{dt}\frac{d\vec{R}}{dr} + \frac{d\phi}{dt}\frac{d\vec{R}}{d\phi}  \quad \text{(V.A.2)}  \]

    \[\frac{d\vec{R}}{dt} = \dot{r}\vec{e}_r + \dot{\phi}\vec{e}_{\phi} \quad \text{(V.A.3)} \]

    \begin{align*} \Biggl\|\frac{d\vec{R}}{dt}\Biggr\|^2 &= \frac{d\vec{R}}{dt} \cdot \frac{d\vec{R}}{dt} \\ &= \bigl( \dot{r}\vec{e}_r + \dot{\phi}\vec{e}_{\phi}  \bigr) \cdot \bigl( \dot{r}\vec{e}_r + \dot{\phi}\vec{e}_{\phi}  \bigr) \\ &= \dot{r}^2(\underbrace{\vec{e}_r \cdot \vec{e}_r}_{1}) + 2\dot{r}\dot{\phi}(\vec{e}_r \cdot \vec{e}_{\phi}) + \dot{\phi}^2 (\underbrace{\vec{e}_{\phi} \cdot \vec{e}_{\phi}}_{r^2}) \\ &= \dot{r}^2 + \dot{\phi}^2 r^2   \quad \text{(V.A.4)}  \end{align*}

To see where \vec{e}_{\phi} \cdot \vec{e}_{\phi} = r^2 comes from, click here.

Now we need to consider angular momentum.

    \begin{align*} \vec{L} &= \vec{R} \times \vec{p} = \vec{R} \times m\frac{d\vec{R}}{dt} \\ &= m\Biggl(  \vec{R} \times \frac{d\vec{R}}{dt}  \Biggr)  \quad \text{(V.A.5)}  \end{align*}

From eq. (V.A.1), if \displaystyle \frac{d\vec{R}}{dr} =\vec{e}_r, then \vec{R} = r\vec{e}_r. And from eq. (V.A.3), we saw that \displaystyle \frac{d\vec{R}}{dt} = \dot{r}\vec{e}_r + \dot{\phi}\vec{e}_{\phi}. Substituting these values into eq. (V.A.5), we get:

    \begin{align*} \vec{L} &= m\Bigl((r\vec{e}_r) \times (\dot{r}\vec{e}_r + \dot{\phi}\vec{e}_{\phi})  \Bigr) \\ &= m \Bigl(  r\dot{r}(\underbrace{\cancel{\vec{e}_r \times \vec{e}_r}}_{0})  + r\dot{\phi}(\vec{e}_r \times \vec{e}_{\phi})  \Bigr)  \quad \text{(V.A.6)}  \end{align*}

The magnitude of the cross product \vec{e}_r \times \vec{e}_{\phi} is given by the area of a parallelogram formed from these basis vectors. We know that:

    \[  \vec{e}_r \cdot \vec{e}_r  = 1 \quad \Rightarrow \quad \vec{e}_r = 1   \quad \text{(V.A.7)}  \]

and

    \[ \vec{e}_{\phi} \cdot  \vec{e}_{\phi} =r^2   \quad \Rightarrow \quad \vec{e}_{\phi} = r   \quad \text{(V.A.7)}  \]

Actually, \vec{e}_r = \pm 1 and \vec{e}_{\phi} = \pm r but lengths can’t be negative.

Therefore, the parallelogram formed by \vec{e}_r and \vec{e}_{\phi} looks like this:

Parallelogram whose area is the magnitude of the cross product of the e(r) x e(phi) basis vectors.
Figure V.A.1.2

and the area of this parallelogram is \vec{e}_r \cdot \vec{e}_{\phi} = 1 \cdot r = r. Recall that the vector associated with the cross product points in a direction perpendicular to plane in which the parallelogram is located; we’ll call it z. The unit basis vector in the z-direction is given by \vec{e}_z. Thus, eq. (V.A.6) becomes:

    \[ \vec{L} = mr\dot{\phi} (r \vec{e}_z) \quad \text{(V.A.8)}  \]

This means that the magnitude of the angular momentum is:

    \[ L =  mr^2\dot{\phi} \quad \text{(V.A.9)} \]

Rearranging, we get:

    \[ \dot{\phi} = \frac{L}{mr^2} \quad \text{(V.A.10)} \]

The total energy E of an object – say a planet – orbiting a larger object (like a star) is the planet’s kinetic energy (K.E.) plus its potential energy (P.E):

    \begin{align*}E &= K.E. + P.E. \\ &= \frac12 m v^2 + mV(r)  \quad \text{(V.A.11)} \end{align*}

where

m is mass
v is the orbiting object’s velocity \displaystyle \frac{d\vec{R}}{dt}
V(r) is the gravitational potential energy

We know from eq. (V.A.4) that:

    \[ v^2 =  \Biggl\|\frac{d\vec{R}}{dt}\Biggr\|^2  = \dot{r}^2 + \dot{\phi}^2 r^2 \]

and from eq. (V.A.10):

    \[ \dot{\phi} = \frac{L}{mr^2}  \]

Placing these values into eq. (V.A.11) and doing some algebra, we obtain:

    \begin{align*}    E &= \frac12(\dot{r}^2 + \dot{\phi}^2 r^2) + mV(r) \\ &= \frac12 m\dot{r}^2 + \frac12mr^2 \dot{\phi}^2 + mV(r) \\ &= \frac12 m\dot{r}^2 + \frac12mr^2\Big(\frac{L}{mr^2} \Bigr) + mV(r) \\ &= \frac12 m\dot{r}^2 + m\frac{L^2}{2m^2r^2} + mV(r)   \quad \text{(V.A.12)}  \end{align*}

From standard Newtonian gravity, the gravitational potential V(r) is given by:

    \[ V(r) = -\frac{GM}{r} \quad \text{(V.A.13)} \]

Plugging eq. (V.A.13) into eq. (V.A.12) gives us:

    \begin{align*} E &= \frac12 m\dot{r}^2 + m\frac{L^2}{2m^2r^2} - m\frac{GM}{r} \\ &= \frac12 m\Bigl(\frac{dr}{dt}\Bigr)^2 + \underbrace{m\Bigl( - \frac{GM}{r} + \frac{L^2}{2m^2r^2}\Bigr)}_{V_{eff}(r)} \quad \text{(V.A.14)}  \end{align*}

Technically, the \displaystyle -\frac{GM}{r} term is related to potential energy and the other terms are related to kinetic energy. However, for this discussion, we group them the way we do in eq. (V.A.14) because the \frac12 m\Bigl(\frac{dr}{dt}\Bigr)^2 term depends on \dot{r} and the other terms depend on r.

V.A.1 Orbits in General Relativity

We start with the geodesic equation:

    \[ 0 = \frac{d^2 x^{\sigma}}{d\lambda^2} + \Gamma^{\sigma}_{\mu \nu} \frac{dx^{\mu}}{d\lambda} \frac{dx^{\nu}}{d\lambda}  \quad \text{(V.A.15)}  \]

Eq. (V.A.15) actually corresponds to 4 separate equations:

    \[ 0 = \frac{d^2 ct}{d\lambda^2} + \Gamma^{t}_{\mu \nu} \frac{dx^{\mu}}{d\lambda} \frac{dx^{\nu}}{d\lambda}  \quad \text{(V.A.16)}    \]

    \[ 0 = \frac{d^2 r}{d\lambda^2} + \Gamma^{r}_{\mu \nu} \frac{dx^{\mu}}{d\lambda} \frac{dx^{\nu}}{d\lambda}  \quad \text{(V.A.17)}    \]

    \[ 0 = \frac{d^2 \theta}{d\lambda^2} + \Gamma^{\theta}_{\mu \nu} \frac{dx^{\mu}}{d\lambda} \frac{dx^{\nu}}{d\lambda}  \quad \text{(V.A.18)}    \]

    \[ 0 = \frac{d^2 \phi}{d\lambda^2} + \Gamma^{\phi}_{\mu \nu} \frac{dx^{\mu}}{d\lambda} \frac{dx^{\nu}}{d\lambda}  \quad \text{(V.A.19)}    \]

We’re going to assume that \displaystyle \theta = \frac{\pi}{2}. This is a valid assumption because the mass our object is orbiting is spherically symmetric so we can rotate our coordinate system at any angle and any orbit with \displaystyle \theta = \frac{\pi}{2} will be equivalent. Because \theta is constant, derivatives of \theta are all zero:

    \[\frac{d\theta}{d\lambda} = \frac{d^2\theta}{d^2\lambda}  = 0  \quad \text{(V.A.20)}  \]

Recall from our derivation of the Schwarzschild metric, we found:

    \begin{align*}   \Gamma^{0}_{01} = \Gamma^{0}_{10} &= \frac12  \frac{1}{A}(\partial_r A) \\ \Gamma^{1}_{00} &= \frac12 \frac{1}{B}(\partial_r A) \\    \Gamma^{1}_{11} &= \frac12 \frac{1}{B}(\partial_r B) \\ \Gamma^{1}_{22} &= -\frac{r}{B} \\ \Gamma^{1}_{33} &= -\frac{r \sin^2 \theta}{B} \\ \Gamma^{2}_{12} = \Gamma^{2}_{21} &= \frac{1}{r} \\ \Gamma^{2}_{33} &= -\sin \theta \cos \theta \\ \Gamma^{3}_{13} = \Gamma^{3}_{31} &= \frac{1}{r} \\ \Gamma^{3}_{23} = \Gamma^{3}_{32} &= \cot \theta \quad \text{(IV.A.18)}  \end{align*}

We know that:

A is the g_{00} term in the Schwarzschild metric and equals \displaystyle 1-\frac{R_s}{r}. Therefore:

    \[ \partial_r A = 0 - R_s r^{-2}(-1)  \]

    \[  \partial_r A = \frac{R_s}{r^2}  \quad \text{(V.A.21)}  \]

B is the absolute value of the g_{11} term in the Schwarzschild metric and equals \displaystyle \frac{1}{1-\displaystyle \frac{R_s}{r}} = \frac{r}{r-R_s} = r(r-R_s)^{-1}. Therefore:

    \begin{align*} \partial_r B  &= (r-R_s)^{-1} + r(-1)(r-R_s)^{-2} \\ &= \frac{1}{r-R_s} - \frac{r}{(r-R_s)^{-2}}  \\ &= -\frac{R_s}{(r-R_s)^2}  \quad \text{(V.A.22)}  \end{align*}

When we insert these values for A, B and their derivatives into eq. (IV.A.18), we get:

    \begin{align*}   \Gamma^{t}_{tr} = \Gamma^{t}_{rt} &= \frac12  \frac{R_s}{r(r-R_s)} \\ \Gamma^{r}_{tt} &= \frac12 \frac{R_s(r-R_s)}{r^3} \\    \Gamma^{r}_{rr} &= \frac12 \frac{R_s}{R_s(r-R_s)}\\ \Gamma^{r}_{\theta \theta} &= R_s-r \\ \Gamma^{r}_{\phi \phi} &= (R_s-r )\sin^2\theta \\ \Gamma^{\theta}_{r \theta} = \Gamma^{\theta}_{\theta r} &= \frac{1}{r} \\ \Gamma^{\theta}_{\phi \phi} &= -\sin \theta \cos \theta \\ \Gamma^{\phi}_{r \phi} = \Gamma^{\phi}_{\phi r} &= \frac{1}{r} \\ \Gamma^{\phi}_{\theta \phi} = \Gamma^{\phi}_{\phi \theta} &= \cot \theta  \quad \text{(V.A.23)}  \end{align*}

Now that we have values for our Christoffel symbols, we can substitute these into our 4 geodesic equations.

We’ll start with the ct equation:

    \begin{align*}    0 &= \frac{d^2 ct}{d\lambda^2} + \Gamma^{t}_{\mu \nu} \frac{dx^{\mu}}{d\lambda} \frac{dx^{\nu}}{d\lambda} \\ 0 &= \frac{d^2 ct}{d\lambda^2} + 2\Gamma^{t}_{tr} \frac{dx^{ct}}{d\lambda} \frac{dx^{r}}{d\lambda} \\ 0 &= \frac{d^2 ct}{d\lambda^2} + \Biggl[ 2\frac12 \frac{R_s}{r(r-R_s)}  \Biggr] \frac{dx^{ct}}{d\lambda} \frac{dx^{r}}{d\lambda} \\ 0 &= \frac{d^2 ct}{d\lambda^2} + \Biggl[ \frac{R_s}{r(r-R_s)}  \Biggr] \frac{dx^{ct}}{d\lambda} \frac{dx^{r}}{d\lambda}   \quad \text{(V.A.24)}  \end{align*}

Recall from our discussion of Lie derivatives that if the Lie derivative of a vector field is zero, then there is spacetime symmetry (i.e., a conserved quantity) along that vector field. It turns out we can define a conserved quantity from the geodesic equation for ct. It is:

    \[ \frac{d(ct)}{d\lambda}\Bigl( 1-\frac{R_s}{r}  \Bigr)  \quad \text{(V.A.25)}  \]

We can show that this is true, as follows:

    \begin{align*}    &\frac{d}{d\lambda}\Biggl( \frac{d(ct)}{d\lambda}\Bigl( 1-\frac{R_s}{r} \Bigr) \Biggr) \\ &= \frac{d^2(ct)}{d^2\lambda}\Bigl( 1-\frac{R_s}{r} \Bigr) + \frac{d(ct)}{d\lambda}\frac{d}{d\lambda}\Bigl( 1-\frac{R_s}{r} \Bigr) \\ &= \frac{d^2(ct)}{d^2\lambda}\Bigl( 1-\frac{R_s}{r} \Bigr) - \frac{d(ct)}{d\lambda}R_s\frac{d}{d\lambda}(r^{-1}) \\ &= \frac{d^2(ct)}{d^2\lambda}\Bigl( 1-\frac{R_s}{r} \Bigr) - \frac{d(ct)}{d\lambda}R_s(-1)(r^{-2}\frac{dr}{d\lambda} \\ &= \frac{d^2(ct)}{d^2\lambda} \Bigl( \frac{r-R_s}{r} \Bigr) + \frac{d(ct)}{d\lambda}\frac{dr}{d\lambda}\frac{R_s}{r^2} = 0 \\ &= \frac{d^2(ct)}{d^2\lambda} \underbrace{\Bigl( \frac{r-R_s}{r} \Bigr)\Bigl( \frac{r}{r-R_s} \Bigr)}_{1} + \frac{d(ct)}{d\lambda}\frac{dr}{d\lambda}\frac{R_s}{r^2}\Bigl( \frac{r}{r-R_s} \Bigr) = 0 \\ &= \frac{d^2(ct)}{d^2\lambda} + \frac{d(ct)}{d\lambda}\frac{dr}{d\lambda}\frac{R_s}{r\cancel{^2}}\Bigl( \frac{\cancel{r}}{r-R_s} \Bigr) = 0 \\ &= \underbrace{\frac{d^2(ct)}{d^2\lambda} + \frac{R_s}{r(r-R_s)}\frac{d(ct)}{d\lambda}\frac{dr}{d\lambda}}_{\text{Geodesic equation for ct: (V.A.24)}} = 0   \quad \text{(V.A.26)} \end{align*}

So we’ve shown that the derivative of \frac{d(ct)}{d\lambda}\Bigl( 1-\frac{R_s}{r} \Bigr) equals zero which means that \frac{d(ct)}{d\lambda}\Bigl( 1-\frac{R_s}{r} \Bigr) is a constant.

We can think of \frac{d(ct)}{d\lambda}\Bigl( 1-\frac{R_s}{r} \Bigr) as energy per unit mass:

    \[ \frac{d(ct)}{d\lambda}\Bigl( 1-\frac{R_s}{r} \Bigr) \equiv \frac{E}{m} \equiv \mathcal{E} \quad \text{(V.A.27)} \]

where \mathcal{E} is the Schwarzschild spacetime “Energy.”

Note that if we evaluate \mathcal{E} at r \rightarrow \infty, we get the energy (P_{00}) term of the 4-momentum:

    \begin{align*} \lim_{r \rightarrow \infty} \frac{d(ct)}{d\lambda}\Bigl( 1-\frac{R_s}{r} \Bigr) \Rightarrow \frac{E}{m} &= c\frac{dt}{d\tau} \\ \frac{E}{m} &= c\gamma \\ E &= mc\gamma = P_{00} \quad \text{(V.A.28)} \end{align*}

This justifies our contention that \mathcal{E} is an energy term.

For the r geodesic equation:

    \begin{align*} 0 &= \frac{d^2 r}{d\lambda^2} + \Gamma^{r}_{\mu \nu} \frac{dx^{\mu}}{d\lambda} \frac{dx^{\nu}}{d\lambda} \\ & \theta = \frac{\pi}{2} \quad \Rightarrow \quad \frac{d\theta}{d\lambda} = 0 \\ &= \frac{d^2 r}{d\lambda^2} + \Gamma^r_{tt}\Bigl( \frac{d(ct)}{d\lambda} \Bigr)^2 + \Gamma^r_{rr}\Bigl(\frac{dr}{d\lambda} \Bigr)^2 + \Gamma^r_{\theta \theta}\Bigl( \underbrace{\cancel{\frac{d\theta}{d\lambda}}}_{0} \Bigr)^2 + \Gamma^r_{\phi \phi}\Bigl( \frac{d\phi}{d\lambda} \Bigr)^2 \\ &\text{(V.A.29)} \end{align*}

We previously listed values for nonzero Christoffel symbols in eq. (V.A.23):

    \begin{align*}   \Gamma^{t}_{tr} = \Gamma^{t}_{rt} &= \frac12  \frac{R_s}{r(r-R_s)} \\ \Gamma^{r}_{tt} &= \frac12 \frac{R_s(r-R_s)}{r^3} \\    \Gamma^{r}_{rr} &= \frac12 \frac{R_s}{R_s(r-R_s)}\\ \Gamma^{r}_{\theta \theta} &= R_s-r \\ \Gamma^{r}_{\phi \phi} &= (R_s-r )\sin^2\theta \\ \Gamma^{\theta}_{r \theta} = \Gamma^{\theta}_{\theta r} &= \frac{1}{r} \\ \Gamma^{\theta}_{\phi \phi} &= -\sin \theta \cos \theta \\ \Gamma^{\phi}_{r \phi} = \Gamma^{\phi}_{\phi r} &= \frac{1}{r} \\ \Gamma^{\phi}_{\theta \phi} = \Gamma^{\phi}_{\phi \theta} &= \cot \theta   \end{align*}

We use these to insert the appropriate values into eq. (V.A.29):

    \begin{align*}    0 &= \frac{d^2 r}{d\lambda^2} + \frac12\frac{R_s(r-R_s)}{r^3}\Bigl(   \frac{d(ct)}{d\lambda} \Bigr)^2 + \frac12\frac{R_s}{r(R_s-r)}\Bigl(  \frac{dr}{d\lambda}\Bigr)^2 + (R_s - r)\underbrace{\sin^2 \theta}_{1} \Bigl( \frac{d\phi}{d\lambda} \Bigr)^2 \\ 0 &= \frac{d^2 r}{d\lambda^2} + \frac12\frac{R_s(r-R_s)}{r^3}\Bigl(   \frac{d(ct)}{d\lambda} \Bigr)^2 + \frac12\frac{R_s}{r(R_s-r)}\Bigl(  \frac{dr}{d\lambda}\Bigr)^2 + (R_s - r) \Bigl( \frac{d\phi}{d\lambda} \Bigr)^2 \quad \text{(V.A.30)} \end{align*}

It turns out that we won’t need to use this complicated equation.

For the \theta geodesic equation, there are 3 nonzero Christoffel symbols. In addition, we know that:

    \[ \theta = \frac{\pi}{2} \quad \Rightarrow \quad \frac{d\theta}{d\lambda} = \frac{d^2\theta}{d\lambda^2} = 0  \]

Using these facts, the \theta geodesic equation becomes:

    \begin{align*}    0 &= \frac{d^2 \theta}{d\lambda^2} + \Gamma^{\theta}_{\mu \nu} \frac{dx^{\mu}}{d\lambda} \frac{dx^{\nu}}{d\lambda} \\ &= \frac{d^2 \theta}{d\lambda^2} + 2\Gamma^{\theta}_{r \theta} \frac{dr}{d\lambda} \frac{d\theta}{d\lambda} + \Gamma^{\theta}_{\phi \phi} \Bigl( \frac{d\phi}{d\lambda} \Bigr)^2\\ &= \underbrace{\frac{d^2 \theta}{d\lambda^2}}_{0} + \frac{2}{r}\frac{dr}{d\lambda} \underbrace{\frac{d\theta}{d\lambda}}_{0} - \sin\theta \cos \theta \Bigl( \frac{d\phi}{d\lambda} \Bigr)^2 \\ &= 0 + 0 -\sin \frac{\pi}{2} \underbrace{\cos \frac{\pi}{2}}_{0} \Bigl( \frac{d\phi}{d\lambda} \Bigr)^2 \\ 0 &= 0 \quad \text{(V.A.31)} \end{align*}

Eq. (V.A.31) shows that setting \displaystyle \theta = \frac{\pi}{2} does not violate our geodesic equations.

For our \phi geodesic equation, we have:

    \begin{align*}  0 &= \frac{d^2 \phi}{d\lambda^2} + \Gamma^{\phi}_{\mu \nu} \frac{dx^{\mu}}{d\lambda} \frac{dx^{\nu}}{d\lambda} \\ &= \frac{d^2 \phi}{d\lambda^2} + 2\Gamma^{\phi}_{r \phi} \frac{dr}{d\lambda} \frac{d\phi}{d\lambda} + 2\Gamma^{\phi}_{\theta \phi} \frac{d\theta}{d\lambda} \frac{d\phi}{d\lambda} \\ &\Gamma^{\phi}_{r \phi} = \Gamma^{\phi}_{r \phi} = \frac{1}{r} \text{ and } \Gamma^{\phi}_{\theta \phi} = \Gamma^{\phi}_{\phi \theta} = \cot \theta \\ &= \frac{d^2 \phi}{d\lambda^2} + 2\frac{1}{r}\frac{dr}{d\lambda} \frac{d\phi}{d\lambda} + 2\cot \theta \underbrace{\frac{d\theta}{d\lambda}}_{0} \frac{d\phi}{d\lambda} \\ &= \frac{d^2 \phi}{d\lambda^2} + 2\frac{1}{r}\frac{dr}{d\lambda} \frac{d\phi}{d\lambda} \quad \text{(V.A.32)} \end{align*}

Like the ct geodesic equation, the \phi geodesic equation is associated with a conserved quantity: \displaystyle r^2 \frac{d\phi}{d\lambda}. Here is the proof:

    \begin{align*}   &\frac{d}{d\lambda}\Bigl( r^2 \frac{d\phi}{d\lambda} \Bigr) \\ &= \frac{d}{d\lambda}(r^2)\frac{d\phi}{d\lambda} + r^2\frac{d^2\phi}{d\lambda^2} \\ &= 2r\frac{dr}{d\lambda}\frac{d\phi}{d\lambda} + r^2\frac{d^2\phi}{d\lambda^2} \\ &= \Bigl[2r\frac{dr}{d\lambda}\frac{d\phi}{d\lambda} + r^2\frac{d^2\phi}{d\lambda^2}\Bigr]\frac{1}{r^2} \\ &= \underbrace{2\frac{1}{r}\frac{dr}{d\lambda} \frac{d\phi}{d\lambda} + \frac{d^2 \phi}{d\lambda^2}}_{\phi \text{ geodesic equation}} \\ &= 0 \quad \text{(V.A.33)} \end{align*}

The fact that the derivative of r^2 \frac{d\phi}{d\lambda} is zero means that r^2 \frac{d\phi}{d\lambda} is a constant. The Newtonian angular momentum is given by:

    \[ L =  mr^2\frac{d\phi}{d\tau} \quad \text{(V.A.34)} \]

Since we can call our parameter whatever we want, let’s call it \lambda. Then:

    \[ L =  mr^2\frac{d\phi}{d\lambda} \quad \text{(V.A.35)} \]

which means:

    \[  r^2\frac{d\phi}{d\lambda} \equiv \frac{L}{m}  \equiv \mathcal{L} \quad \text{(V.A.36)} \]

That is, our conserved quantity represents angular momentum per unit mass. As seen in eq. (V.A.36), we’ll denote this quantity \mathcal{L}.

So, from our geodesic equations, we’ve derived 2 constants of motion:

\displaystyle  \mathcal{E} \equiv \frac{d(ct)}{d\lambda}\Bigl( 1-\frac{R_s}{r} \Bigr)

and

\displaystyle \mathcal{L} \equiv r^2 \frac{d\phi}{d\lambda}

We need one more piece to come up with an equation that describes the orbit of objects in general relativity: an expression for the length of the tangent vector to our object’s orbit.

We know the squared length of a vector is given by:

    \begin{align*}  \| \frac{d}{d\lambda} \| &= \frac{d}{d\lambda} \cdot \frac{d}{d\lambda} \\  &= \frac{dx^{\mu}}{d\lambda} \frac{dx^{\nu}}{d\lambda}g_{\mu \nu} \\  &= \Bigl( \frac{d\,ct}{d\lambda} \Bigr)^2 g_{tt} +   \Bigl( \frac{dr}{d\lambda} \Bigr)^2 g_{rr} +   \Bigl( \underbrac{\frac{d\theta}{d\lambda} }_{0}\Bigr)^2 g_{\theta \theta} +   \Bigl( \frac{d\phi}{d\lambda} \Bigr)^2 g_{\phi \phi} \\  &= \Bigl( \frac{d\,ct}{d\lambda} \Bigr)^2 \Bigl( 1-\frac{R_s}{r} \Bigr) -   \Bigl( \frac{dr}{d\lambda} \Bigr)^2 \Bigl( 1-\frac{R_s}{r} \Bigr)^{-1} -   \Bigl( \frac{d\phi}{d\lambda} \Bigr)^2 r^2 \underbrace{\sin^2 \theta}_{1} \\ &= \Bigl( \frac{d\,ct}{d\lambda} \Bigr)^2 \Bigl( 1-\frac{R_s}{r} \Bigr) -   \Bigl( \frac{dr}{d\lambda} \Bigr)^2 \Bigl( 1-\frac{R_s}{r} \Bigr)^{-1} -   \Bigl( \frac{d\phi}{d\lambda} \Bigr)^2 r^2 \quad \text{(V.A.37)}  \end{align*}

Recall that:

    \begin{align*} \mathcal{E} &\equiv \frac{d(ct)}{d\lambda}\Bigl(1-\frac{R_s}{r} \Bigr) \\  \mathcal{E} &\equiv \frac{d(ct)}{d\lambda} \Bigl(\frac{r-R_s}{r} \Bigr) \\  \mathcal{E} \Bigl(\frac{r}{r-R_s} \Bigr) &= \frac{d(ct)}{d\lambda} \quad \text{(V.A.38)}  \end{align*}

    \begin{align*} \mathcal{L} &\equiv r^2\frac{d\phi}{d\lambda}  \\ \frac{\mathcal{L}}{r^2}&= \frac{d\phi}{d\lambda}  \quad \text{(V.A.39)}  \end{align*}

Placing the results of eq. (38) and eq. (39) into eq. (37) gives us:

    \begin{align*}  \epsilon &=   \mathcal{E}^2 \Bigl(\frac{r}{r-R_s} \Bigr)^2 \Bigl( 1-\frac{R_s}{r} \Bigr) - \Bigl( \frac{dr}{d\lambda} \Bigr)^2 \Bigl( 1-\frac{R_s}{r} \Bigr)^{-1} - \Bigl( \frac{\mathcal{L}}{r^2} \Bigr)^2 r^2  \\  \epsilon &=   \mathcal{E}^2 \Bigl(\frac{r}{r-R_s} \Bigr)^2 \Bigl( \frac{r-R_s}{r} \Bigr) - \Bigl( \frac{dr}{d\lambda} \Bigr)^2 \Bigl( 1-\frac{R_s}{r} \Bigr)^{-1} - \Bigl( \frac{\mathcal{L}}{r^2} \Bigr)^2 r^2  \\ 0 &=   \mathcal{E}^2 \Bigl(\frac{r}{r-R_s} \Bigr) - \Bigl( \frac{dr}{d\lambda} \Bigr)^2 \Bigl( 1-\frac{R_s}{r} \Bigr)^{-1} - \frac{\mathcal{L^2}}{r^2} - \epsilon  \\ \Bigl(\frac{r-R_s}{r} \Bigr) \cdot 0 &=   \Biggl[\mathcal{E}^2 \Bigl(\frac{r}{r-R_s} \Bigr) - \Bigl( \frac{dr}{d\lambda} \Bigr)^2 \Bigl( 1-\frac{R_s}{r} \Bigr)^{-1} - \frac{\mathcal{L}^2}{r^2} - \epsilon \Biggr] \cdot \Bigl(\frac{r-R_s}{r} \Bigr) \\ 0 &= \mathcal{E}^2 - \Biggl(\frac{dr}{d\lambda} \Biggr)^2 - \Biggl(\frac{r-R_s}{r} \Biggr)\Biggl( \frac{\mathcal{L}^2}{r^2} + \epsilon \Biggr) \\ 0 &= \mathcal{E}^2 - \Biggl(\frac{dr}{d\lambda} \Biggr)^2 - \Biggl( 1-\frac{R_s}{r} \Biggr)\Biggl( \frac{\mathcal{L}^2}{r^2} + \epsilon \Biggr) \\ \mathcal{E}^2 &= \Biggl(\frac{dr}{d\lambda} \Biggr)^2 + \Biggl( 1-\frac{R_s}{r} \Biggr)\Biggl( \frac{\mathcal{L}^2}{r^2} + \epsilon \Biggr) \\ \mathcal{E}^2 &= \Biggl(\frac{dr}{d\lambda} \Biggr)^2 + \frac{\mathcal{L}^2}{r^2} - \frac{R_s \mathcal{L}^2}{r^3} + \epsilon - \epsilon \frac{R_s}{r} \\ \mathcal{E}^2 &= \underbrace{\Biggl(\frac{dr}{d\lambda} \Biggr)^2}_{\text{K.E.}} + \underbrace{\epsilon - \epsilon \frac{R_s}{r}  + \frac{\mathcal{L}^2}{r^2} - \frac{R_s \mathcal{L}^2}{r^3}}_{V_{\text{eff}}(r)}  \\ \quad \text{(V.A.40)}   \end{align*}

For light-like geodesics that govern how light beams move, \epsilon = 0. For time-like geodesics that govern the motion of massive bodies, \epsilon > 0.

We’ll take the case of massive bodies (\epsilon > 0). For this case, we’ll use \lambda = \tau and note that:

    \[  \displaystyle R_s = \frac{2GM}{c^2}\]

and

    \[ \epsilon = \frac{d}{d\lambda} \cdot  \frac{d}{d\lambda} = \frac{d}{d\tau} \cdot  \frac{d}{d\tau} = \vec{U} \cdot \vec{U} = c^2\]

Substituting these values into eq. (V.A.40), we have:

    \begin{align*}    \mathcal{E}^2 &= \Biggl(\frac{dr}{d\lambda} \Biggr)^2 + \epsilon - \epsilon \frac{R_s}{r}  + \frac{\mathcal{L}^2}{r^2} - \frac{R_s \mathcal{L}^2}{r^3} \\  \mathcal{E}^2 &= \Biggl(\frac{dr}{d\lambda} \Biggr)^2 + c^2 - \cancel{c^2} \frac{2GM}{\cancel{c^2}}\frac{1}{r}  + \frac{\mathcal{L}^2}{r^2} - \frac{2GM}{c^2}\frac{\mathcal{L}^2}{r^3} \\  \frac12 \cdot \mathcal{E}^2 &= \Biggl[\Biggl(\frac{dr}{d\lambda} \Biggr)^2 + c^2 -  \frac{2GM}{r}  + \frac{\mathcal{L}^2}{r^2} - \frac{2GM}{c^2}\frac{\mathcal{L}^2}{r^3}\Biggr] \cdot \frac12 \\  \frac12 \mathcal{E}^2 &= \frac12\Biggl(\frac{dr}{d\lambda} \Biggr)^2 + \frac12 c^2 - \frac{GM}{r} + \frac{\mathcal{L}^2}{2r^2} - \frac{GM}{c^2}\frac{\mathcal{L}^2}{r^3} \\  & \text{Substituting } \mathcal{L} = \frac{L}{m} \text{ and } \mathcal{E} = \frac{E}{m} \text{ gives us:} \\  \Biggl[\frac12 \Bigl(\frac{E}{m}\Bigr)^2 &= \frac12\Biggl(\frac{dr}{d\lambda} \Biggr)^2 + \frac12 c^2 - \frac{GM}{r} + \frac{L^2}{2m^2r^2} - \frac{GM}{c^2}\frac{L^2}{m^2r^3}\Biggr] \cdot m \\  \frac12 \Bigl(\frac{E}{m}\Bigr)^2 m &= \frac12 m \Biggl(\frac{dr}{d\lambda} \Biggr)^2 + \frac12 mc^2 - m\Biggl(\frac{GM}{r} + \frac{L^2}{2m^2r^2} - \frac{GM}{c^2}\frac{L^2}{m^2r^3}\Biggr) \\  \frac12 \Biggl(\frac{E^2}{m} - mc^2\Biggr)  &= \frac12 m \Biggl(\frac{dr}{d\lambda} \Biggr)^2 - m\Biggl(\frac{GM}{r} + \frac{L^2}{2m^2r^2} - \frac{GM}{c^2}\frac{L^2}{m^2r^3}\Biggr) \\  \quad \text{(V.A.41)}   \end{align*}

We can also arrive at this equation by using action principle. To see this, click .

To summarize:

Newtonian Orbits:

    \[ \underbrace{E}_{E_{\text{tot}}} = \underbrace{\frac12  m\Bigl( \frac{dr}{dt} \Bigr) ^2}_{\text{K.E.}}+ \underbrace{m\Bigl(  -\frac{GM}{r} + \frac{L^2}{2m^2r^2} \Bigr) }_{V_{\text{eff}}(r)} \quad \text{(V.A.40)}    \]

General Relativity Orbits:

    \[ \underbrace{\frac12\Bigl(\frac{E^2}{m} - mc^2 \Bigr)}_{E_\text{tot}}} = \underbrace{\frac12  \Bigl( \frac{dr}{dt} \Bigr)^2}_{\text{K.E.}} + \underbrace{m\Bigl(  -\frac{GM}{r} + \frac{L^2}{2m^2r^2} -  \frac{GM}{c^2}\frac{L^2}{m^2r^3} \Bigr)}_{V_{\text{eff}}(r)}  \quad \text{(V.A.41)}    \]

V.A.3 Derivations

Our task, ultimately, is to derive to equations of motion of the planet Mercury around the sun. At the time of Einstein’s original paper on the subject, Schwarzschild had not yet come up with his solution to Einstein’s field equation. Therefore, Einstein estimated the advance of the perihelion of Mercury using perturbation theory. I, personally, find Einstein’s original paper difficult to follow. In addition, once the Schwarzschild metric became known, it afforded more accurate calculation of Mercury’s perihelion advance. For these reasons, I won’t discuss Einstein’s solution, but rather, will present a well-known solution that uses the equations we’ve already derived. I’ll also present a second derivation that I believe is easier to follow. Because the derivations are mathematically tedious, I’ll just provide the result in the main text, then plug in some numbers to show that general relativity, indeed, accounts for the unexplained portion of Mercury’s perihelion precession. For those interested in actually seeing the derivations, click on the following link:

It turns out that the equation that tells us the number of the radians in excess of 2\pi that Mercury traverses in each orbit around the sun is:

    \[  \delta = \frac{6\pi G M}{c^2 a\left(1-e^2\right)}  \quad \text{(V.A.42)}   \]

where

\delta is the number of radians/period Mercury travels in its orbit around the sun in excess of expected \2\pi radians
G is Newton’s gravitational constant
M is the mass of the sun
c is the speed of light in a vacuum
a is the length of the semi-major axis of Mercury’s elliptical orbit (see figure 1}
e = \displaystyle \frac{c}{a} is the eccentricity of the ellipse (see figure 1)

Elliptical orbit of Mercury around the sun
Figure 1

(I use this equation – the result of the second (“easier”) derivation – because the calculations needed to arrive at the advance of the perihelion of Mercury are more intuitive than if the result of the more “difficult” derivation were employed.)

V.A.4 Calculations

We plug the following values into eq. (V.A.42) to arrive at a quantitative answer for the advance of the perihelion of Mercury.

G=6.6742 \times 10^{-11} \mathrm{~N}(\mathrm{~m} / \mathrm{kg})^{2} \quad c=299.792,458 \mathrm{~m} / \mathrm{s} M_{\text {sun }}=1.98892 \times 10^{30} \mathrm{~kg} \quad a_{\text {mercury }}=57.91 \times 10^{9} \mathrm{~m}

    \begin{align*} \delta&=\frac{\left.6 \pi\left(6.6742 \times 10^{-11} \mathrm{~kg} \cdot \mathrm{m} / \mathrm{s}^{2}\right)(\mathrm{m} / \mathrm{kg})^{2}\right)\left(1.98892 \times 10^{30} \mathrm{~kg}\right)}{(299,792,458 \mathrm{~m} / \mathrm{s})^{2}\left(57.9091 \times 10^{9} \mathrm{~m}\right)\left(1-0.20563^{2}\right)} \\ \delta&=5.01987 \times 10^{-7} \frac{\mathrm{kg} \cdot \frac{\mathrm{m}}{\mathrm{s}^{2}} \cdot \frac{\mathrm{m}^{2}}{\mathrm{~kg}^{2}} \cdot \mathrm{kg}}{\frac{\mathrm{m}^{2}}{\mathrm{~s}^{2}} \cdot \mathrm{m}}=5.01987 \times 10^{-7} \frac{\text {rad}}{\text{period}} \end{align*}

Revolutions per century

    \begin{align*} N&= \frac{\text{period}}{87.9391 \text{ days}} \cdot \frac{365.242199\text{ days}}{\text{year}} \cdot \frac{100\text{ years}}{\text{century}}  = 415.3354 \frac{\text{periods}}{\text{century}}\\ \Delta \phi&=415.3354 \frac{\text{periods}}{\text{century}} \cdot 5.01987 \times 10^{-7} \frac{\text {rad}}{\text{period}}=2.08493 \times 10^{-4} \frac{\mathrm{rad}}{\mathrm{century}} \\ \Delta \phi&=2.08493 \times 10^{-4} \frac{\mathrm{rad}}{\mathrm{century}} \cdot \frac{180^{\circ}}{\pi \mathrm{rad}} \cdot \frac{3600^{\prime \prime}\text{ arcsec}}{1^{\circ}} \\ \Delta \phi &= 43 \frac{\text{arcsec}}{\text{century}} \end{align*}

V.B Bending of Light

Coming eventually

V.C Gravity Waves

V.C.1 Introduction

Our goal in this section is to show that, when we solve Einstein’s field equation in the case of linearized gravity – in specific coordinate systems – we end up with wave equations. These wave equations represent gravity waves.

We know the general form of the wave equation is:

    \[ v^2 \frac{\partial^2 A}{\partial x^2 } =  \frac{\partial^2 A}{\partial t^2 } \]

or

    \[ \frac{\partial^2 A}{\partial x^2 } = \frac{1}{v^2 } \frac{\partial^2 A}{\partial t^2 } \]

where

A is the wave’s amplitude
x is a spatial coordinate
t is time
v is the wave’s velocity

The classic example of a an entity that obeys the wave equation is electromagnetic radiation. The equations of electromagnetism can be written in several ways:

    \[ \frac{1}{c^2} \frac{\partial^2 E}{\partial t^2} = \frac{\partial^2 E}{\partial x^2} +  \frac{\partial^2 E}{\partial y^2} +  \frac{\partial^2 E}{\partial z^2}\quad \text{(V.C.1.1)}\]

    \[  \frac{\partial^2 E}{\partial (ct)^2} - \frac{\partial^2 E}{\partial x^2} -  \frac{\partial^2 E}{\partial y^2} -  \frac{\partial^2 E}{\partial z^2} = 0 \quad \text{(V.C.1.2)}\]

    \[  \frac{\partial^2 E}{\partial (x^0)^2} - \frac{\partial^2 E}{\partial (x^1)^2} -  \frac{\partial^2 E}{\partial (x^2)^2} -  \frac{\partial^2 E}{\partial (x^2)^2} = 0 \quad \text{(V.C.1.3)}\]

    \[ +\partial_0^2E - \partial_1^2E  - \partial_3^2E  - \partial_3^2E - 0   \quad \text{(V.C.1.4)}  \]

    \[ \eta^{\mu \nu} \partial_{\mu}   \partial_{\nu} E = 0 \quad \text{(V.C.1.5)}\]

    \[  \partial_{\mu}   \partial^{\mu} E = 0  \quad \text{(V.C.1.6)}\]

    \[\square E = 0 \quad \text{where } \square \text{ is the d'Alembert operator}  \quad \text{(V.C.1.7)}\]

The equation of gravity waves, on the other hand, are periodic perturbations of the metric of spacetime. They take the following form:

    \begin{align*} & \frac{\partial^2 g_{\mu \nu}}{\partial(c t)^2}-\frac{\partial^2 g_{\mu \nu}}{\partial x^2}-\frac{\partial^2 g_{\mu \nu}}{\partial y^2}-\frac{\partial^2 g_{\mu \nu}}{\partial z^2}=0 \\ & \partial_0^2 g_{\mu \nu}-\partial_1^2 g_{\mu \nu}-\partial_2^2 g_{\mu \nu}-\partial_3^2 g_{\mu \nu}=0 \\ & \eta^{\rho \sigma} \partial_\rho \partial_\sigma g_{\mu \nu}=\partial^\sigma \partial_\sigma g_{\mu \nu}=\square g_{\mu \nu}=0\quad \text{(V.C.1.8)} \end{align*}

The metric can be written:

    \[ g_ {\mu \nu} + h_ {\mu \nu} \quad \text{(V.C.1.9)}\]

where \|h_ {\mu \nu}\| \ll 1. It’s actually small perturbations in the h_ {\mu \nu} term that gives us gravity waves. Thus, the equations we derive will actually will be of the form of eq. (V.C.1.8) but with g_ {\mu \nu} replaced by h_ {\mu \nu}.

V.C.2 Linearized Gravity

V.C.2.a Preliminaries

In the case where gravity is weak, we can approximate the metric by the equation:

    \[  g_{\mu\nu} = \eta _{\mu\nu} + h_{\mu\nu}\quad  \|h_{\mu\nu} \ll 1\| \quad \text{(1)}\]

Small perturbations of the h_{\mu\nu} term are responsible for gravity waves.

Recall Einstein’s field equation:

    \[  G_{\mu\nu} \equiv R_{\mu\nu} - \frac12 R  g_{\mu\nu} = \frac{8\pi G}{c^4}T_{\mu\nu} \quad \text{(2)} \]

Our goal, in this section, is to find the G_{\mu\nu} term in Einstein’s field equation under conditions of linearized gravity. To accomplish this, we’ll have to calculate:

  • Inverse metric g_{\mu\nu}
  • Connection Coefficients \Gamma^{\sigma}_{\mu\nu}
  • Riemann Tensor R^p_{\sigma \mu \nu}
  • Ricci Tensor R_{\mu\nu}
  • Ricci Scalar R

Just as a reminder, in linearized gravity:

  • T^{\mu \alpha}(\eta_{\alpha \nu} + h_{\alpha \nu} ) \approx T^{\mu \alpha}\eta_{\alpha \nu} + 0 \quad \text{(3)}
  • \partial_{\sigma}(\eta_{\mu\nu} + h_{\mu\nu} \approx 0 +  \partial_{\sigma}h_{\mu\nu}  \quad \text{(4)}
  • h_{\alpha \beta} h_{\mu nu} \approx 0  \quad \text{(5)}
  • \|  \partial_{\sigma} h_{\mu nu} \| \ll 1   \quad \text{(6)}
  • (\partial_{\rho}h_{\alpha \beta})(\partial_{\sigma}h_{\mu \nu}) \approx 0  \quad \text{(7)}
  • h_{\alpha \beta}(\partial_{\sigma}h_{\mu \nu})\approx 0 \quad \text{(8)}

The inverse metric is given by:

    \[ g_{\mu \sigma} g^{\sigma \nu} = \delta^{\nu}_{\mu} \quad \text{(9)} \]

and

    \begin{align*} g^{\mu \nu} &= \eta^{\mu \nu} + k^{\mu \nu} \\ &= \eta^{\mu \nu} - h^{\mu \nu};\quad h^{\mu \nu} = h_{\rho \sigma} \eta^{\rho \mu} \eta^{\sigma \nu} \quad \text{(10)} \end{align*}

To see how we get this, click here.

    \begin{align*} g_{\mu \sigma} g^{\sigma \nu} &= (\eta _{\mu\sigma} + h_{\mu\sigma})(\eta^{\sigma \nu} + k^{\sigma \nu}) \\ \delta^{\nu}_{\mu} &= \eta _{\mu\sigma} \eta^{\sigma \nu} + h_{\mu\sigma} \eta^{\sigma \nu} + \eta _{\mu\sigma} k^{\sigma \nu} + h_{\mu\sigma}k^{\sigma \nu} \\  \delta^{\nu}_{\mu} &= \delta^{\nu}_{\mu} + h_{\mu\sigma} \eta^{\sigma \nu} + \eta _{\mu\sigma} k^{\sigma \nu} + h_{\mu\sigma}k^{\sigma \nu} \\  0 &=  h_{\mu\sigma} \eta^{\sigma \nu} + \eta _{\mu\sigma} k^{\sigma \nu} + \underbrace{h_{\mu\sigma}k^{\sigma \nu}}_{\approx 0} \\ 0 &= h_{\mu\sigma} \eta^{\sigma \nu} + \eta _{\mu\sigma} k^{\sigma \nu} \\ k^{\sigma \nu} \eta _{\mu\sigma} &= - h_{\mu\sigma} \eta^{\sigma \nu}  \\ k^{\sigma \nu} \eta _{\mu\sigma}  \eta^{\rho \mu} &= - h_{\mu\sigma} \eta^{\sigma \nu} \eta^{\rho \mu} \\ k^{\sigma \nu} \eta _{\mu\sigma}  \eta^{\rho \mu} &= - h_{\mu\sigma} \eta^{\sigma \nu} \eta^{\rho \mu} \\ k^{\sigma \nu} \delta^{\rho}_{\sigma} &= - h_{\mu\sigma} \eta^{\sigma \nu} \eta^{\rho \mu} \\ k^{\rho \nu} &= - h_{\mu\sigma} \eta^{\sigma \nu} \eta^{\rho \mu} \\ k^{\rho \nu} &= - h^{\rho\nu} \\ k^{\mu \nu} &= - h^{\mu\nu} \\ g^{\mu \nu} &= \eta^{\mu \nu} + k^{\mu \nu} \\ g^{\mu \nu} &= \eta^{\mu \nu} - h^{\mu\nu} \\ \end{align*}

Normally, we would raise and lower indices with g^{\mu\nu} and g_{\mu\nu}, respectively. However, since h is small, we will henceforth raise and lower indices with the Minkowski metric. To see why, click here.

    \begin{align*}h^{\mu}_{\nu} &= h_{\sigma \nu} g^{\mu \sigma} \\ &= h_{\sigma \nu} (\eta)^{\mu \sigma} + h^{\mu \sigma}) \\ &= h_{\sigma \nu} \eta)^{\mu \sigma} + \underbrace{h_{\sigma \nu}h^{\mu \sigma}}_{\approx 0} \\ &= h_{\sigma \nu} \eta)^{\mu \sigma} \end{align*}

V.C.2.b Connection Coefficients

The general formula for a connection coefficient (or Christoffel symbol) is:

    \[ \Gamma^{\sigma}_{\mu \nu} = \frac 12g^{\sigma \alpha} (\partial_{\nu} g_{\alpha \mu}  + \partial_{\mu} g_{\alpha \nu}  - \partial_{\alpha} g_{\mu \nu}) \quad \text{(11)} \]

However, in linearized (weak) gravity:

    \[  \partial_{rho} g_{\mu \nu} = \partial_{rho} (\eta_{\mu \nu} + h_{\mu \nu}) = 0 + \partial_{rho} h_{\mu \nu} \quad \text{(12)} \]

So we can replace partial derivatives of g with partial derivatives of h. Thus:

    \begin{align*}\Gamma^{\sigma}_{\mu \nu} &= \frac 12g^{\sigma \alpha} (\partial_{\nu} h_{\alpha \mu}  + \partial_{\mu} h_{\alpha \nu}  - \partial_{\alpha} h_{\mu \nu}) \\ \\ &= \frac 12(\eta^{\sigma \alpha} + h^{\sigma \alpha}) (\partial_{\nu} h_{\alpha \mu}  + \partial_{\mu} h_{\alpha \nu}  - \partial_{\alpha} h_{\mu \nu}) \\ \\ &= \frac12 \eta^{\sigma \alpha}(\partial_{\nu} h_{\alpha \mu}  + \partial_{\mu} h_{\alpha \nu}  - \partial_{\alpha} h_{\mu \nu}) \\ &-\frac12 \underbrace{h^{\sigma \alpha}(\partial_{\nu} h_{\alpha \mu}  + \partial_{\mu} h_{\alpha \nu}  - \partial_{\alpha} h_{\mu \nu})}_{\approx 0} \\ &= \frac12 \eta^{\sigma \alpha}(\partial_{\nu} h_{\alpha \mu}  + \partial_{\mu} h_{\alpha \nu}  - \partial_{\alpha} h_{\mu \nu})  \quad \text{(13)} \end{align*}

V.C.2.c Riemann Tensor

Recall from eq. (2.6.38), the Riemann tensor in terms of the connection coefficients is:

    \[ R^{\rho}_{\sigma \mu \nu} = \partial_{\mu} \Gamma^{\rho}_{ \nu \sigma}  - \partial_{\nu} \Gamma^{\rho}_{ \mu \sigma}  +  \Gamma^{\gamma}_{ \nu \sigma}  \Gamma^{\rho}_{ \mu \gamma} -  \Gamma^{\delta}_{ \mu \sigma}  \Gamma^{\rho}_{ \nu \delta} \quad \text{(14)} \]

The connection coefficients in linearized gravity are:

    \[ \Gamma_{\nu \sigma}^\gamma=\frac{1}{2} \eta^{\gamma \alpha}\left(\partial_\sigma h_{\alpha \nu}+\partial_v h_{\alpha \sigma}-\partial_\alpha h_{\nu \sigma}\right) \]

    \[ \Gamma_{\mu \gamma}^\rho=\frac{1}{2} \eta^{\rho \alpha}\left(\partial_\gamma h_{\alpha \mu}+\partial_\mu h_{\alpha \gamma}-\partial_\alpha h_{\mu \gamma}\right) \]

The \Gamma \Gamma terms yield results like this:

\Gamma_{\nu \sigma}^\gamma \Gamma_{\mu \gamma}^\rho=\frac{1}{4} \eta^{\gamma \alpha} \eta^{\rho \alpha}\left(\begin{array}{c} \partial_\sigma h_{\alpha \nu} \partial_\gamma h_{\alpha \mu}+\partial_\sigma h_{\alpha \nu} \partial_\mu h_{\alpha \gamma}-\partial_\sigma h_{\alpha \nu} \partial_\alpha h_{\mu \gamma} \\ +\partial_\nu h_{\alpha \sigma} \partial_\gamma h_{\alpha \mu}+\partial_\nu h_{\alpha \sigma} \partial_\mu h_{\alpha \gamma}-\partial_\nu h_{\alpha \sigma} \partial_\alpha h_{\mu \gamma} \\ -\partial_\alpha h_{\nu \sigma} \partial_\gamma h_{\alpha \mu}-\partial_\alpha h_{\nu \sigma} \partial_\mu h_{\alpha \gamma}+\partial_\alpha h_{v \sigma} \partial_\alpha h_{\mu \gamma} \end{array}\right)

All of the h\cdot h terms between the parentheses are negligible. Thus, the \Gamma \Gamma terms in the expression for the Riemann tensor are zero. Eq. (14), then, becomes:

    \[ R^{\rho}_{\sigma \mu \nu} \approx \partial_{\mu} \Gamma^{\rho}_{ \nu \sigma}  - \partial_{\nu} \Gamma^{\rho}_{ \mu \sigma} \quad \text{(15)} \]

We substitute in the values for the connection coefficients into eq. (15) and rearrange. That gives us:

    \begin{align*} R^{\rho}_{\sigma \mu \nu} &=\partial_\mu \left(\frac{1}{2} \eta^{\rho \alpha}\left(\partial_\sigma h_{\alpha \nu}+\partial_v h_{\alpha \sigma}-\partial_\alpha h_{\nu \sigma}\right)\right) \\ & -\partial_\nu\left(\frac{1}{2} \eta^{\rho \alpha}\left(\partial_\sigma h_{\alpha \mu}+\partial_\mu h_{\alpha \sigma}-\partial_\alpha h_{\mu \sigma}\right)\right) \\ &=\frac{1}{2} \eta^{\rho \alpha}\left(\begin{array}{c} \partial_\mu \partial_\sigma h_{\alpha \nu}+\cancel{\partial_\mu \partial_\nu h_{\alpha \sigma}}-\partial_\mu \partial_\alpha h_{\nu \sigma}- \\ \partial_v \partial_\sigma h_{\alpha \mu}-\cancel{\partial_v \partial_\mu h_{\alpha \sigma}}+\partial_v \partial_\alpha h_{\mu \sigma} \end{array}\right) \\ &=\frac{1}{2} \eta^{\rho \alpha}\left( \partial_\mu \partial_\sigma h_{\alpha \nu}-\partial_\mu \partial_\alpha h_{\nu \sigma}-\partial_v \partial_\sigma h_{\alpha \mu}+\partial_v \partial_\alpha h_{\mu \sigma} \right) \quad \text{(16)}  \end{align*}

V.C.2.d Ricci Tensor

To get the Ricci tensor from the Riemann tensor, we simply contract the Riemann tensor in its third index:

    \[ R_{\sigma \nu} =   R^{\mu}_{\sigma \mu \nu} \quad \text{(17)} \]

    \[ R_{\sigma \mu \nu}^\mu=\frac{1}{2} \eta^{\mu \alpha}\left(\partial_\mu \partial_\sigma h_{\alpha \nu}-\partial_\mu \partial_\alpha h_{\nu \sigma}-\partial_v \partial_\sigma h_{\alpha \mu}+\partial_v \partial_\alpha h_{\mu \sigma}\right) \quad \text{(18)} \]

The \mu superscript on the \eta^{\mu \alpha} term raises som indices:

    \[R_{\sigma v}=\frac{1}{2}\left(\partial_\mu \partial_\sigma h_\nu^\mu-\partial^\mu \partial_\mu h_{\nu \sigma}-\partial_v \partial_\sigma h_\mu^\mu+\partial_v \partial_\alpha h_\sigma^\alpha\right) \quad \text{(19)} \]

Next, we define the scalar h \equiv h^{\mu}_{\mu} and plug this into eq. (19). We could also place the d’Alembert operator (\box \equiv\eta^{\mu\alpha}\partial_{\mu}\partial_{\alpha} = \partial_{\mu}\partial^{\mu}) for \partial^{\mu}\partial_{\mu}, but for ensuing calculations, I think this will just make things more confusing. So I won’t make that substitution here. At any rate, we have – for the Ricci tensor – the following equation:

    \[R_{\sigma v}=\frac{1}{2}\left(\partial_\mu \partial_\sigma h_v^\mu+\partial_v \partial_\alpha h_\sigma^\alpha-\partial^{\mu}\partial_{\mu} h_{v \sigma}-\partial_v \partial_\sigma h\right) \quad \text{(20)} \]

V.C.2.e Ricci Scalar

We create the Ricci Scalar by contracting the Ricci Tensor:

    \begin{align*}    R &= R^{\nu}_{\nu} = \eta^{\sigma \nu} \\ &=\eta^{\sigma \nu} \frac{1}{2}\left(\partial_\mu \partial_\sigma h_\nu^\mu+\partial_v \partial_\alpha h_\sigma^\alpha-\partial^{\mu}\partial_{\mu} h_{v \sigma}-\partial_v \partial_\sigma h\right) \\ &=\frac{1}{2}\left(\partial_\mu \partial_\sigma h^{\mu \sigma}+\partial_\nu \partial_\alpha h^{\alpha \nu}-\partial^{\mu}\partial_{\mu} h_\sigma^\sigma-\partial^\sigma \partial_\sigma h\right) \\ &=\frac{1}{2}\left(\partial_\mu \partial_\sigma h^{\mu \sigma}+\partial_\nu \partial_\alpha h^{\alpha \nu}-\partial^{\mu}\partial_{\mu} h-\partial^\mu \partial_\mu h\right) \quad \text{(21)}  \end{align*}

We make the following change in repeated variables in the second term within parentheses in eq. (21) (\partial_\nu \partial_\alpha h^{\alpha \nu}):

  • \nu \rightarrow \mu
  • \alpha \rightarrow \sigma

Since h is symmetric, h^{\mu \sigma} = h^{\sigma \mu}. Therefore, we have:

    \begin{align*} R &=  \frac{1}{2}\left(\partial_\mu \partial_\sigma h^{\mu \sigma}+\partial_\mu \partial_\sigma h^{\sigma \mu}-\partial^{\mu}\partial_{\mu} h-\partial^\mu \partial_\mu h\right)  \\ &=  \frac{1}{2}\left(\partial_\mu \partial_\sigma h^{\mu \sigma}+\partial_\mu \partial_\sigma h^{\mu \sigma}-\partial^{\mu}\partial_{\mu} h-\partial^\mu \partial_\mu h\right)  \\ &= \frac{1}{2}\left(2\partial_\mu \partial_\sigma h^{\mu \sigma}-2\partial^{\mu}\partial_{\mu} h \right) \\ \text{So}& : \\ \\ R &= \partial_\mu \partial_\sigma h^{\mu \sigma}-\partial^{\mu}\partial_{\mu} h  \quad \text{(22)} \end{align*}

V.C.2.f Einstein Tensor

Now that we have expressions for the Ricci tensor, the Ricci scalar and the metric, we can find an expression for the Einstein tensor.

    \begin{align*}    G_{\mu \nu} &= R_{\mu \nu} - \frac12 g_{\mu \nu} R \\ \\ &=\frac{1}{2}\left(\partial_\alpha \partial_\mu h_\nu^\alpha+\partial_\nu \partial_\alpha h_\mu^\alpha-\partial^{\mu}\partial_{\mu} h_{\mu \nu}-\partial_\mu \partial_\nu h\right) \\ & -\frac{1}{2} g_{\mu \nu}\left(\partial_\alpha \partial_\beta h^{\alpha \beta}-\partial^{\mu}\partial_{\mu} h\right) \\ \\ &=\frac{1}{2}\left(\partial_\alpha \partial_\mu h_\nu^\alpha+\partial_\nu \partial_\alpha h_\mu^\alpha-\partial^{\mu}\partial_{\mu} h_{\mu \nu}-\partial_\mu \partial_\nu h\right) \\ & -\frac{1}{2} \left(\eta_{\mu \nu} + h_{\mu \nu}\right)\left(\partial_\alpha \partial_\beta h^{\alpha \beta}-\partial^{\mu}\partial_{\mu} h\right) \\ \\ &=\frac{1}{2}\left(\partial_\alpha \partial_\mu h_\nu^\alpha+\partial_\nu \partial_\alpha h_\mu^\alpha-\partial^{\mu}\partial_{\mu} h_{\mu \nu}-\partial_\mu \partial_\nu h\right) \\ & -\frac{1}{2} \eta_{\mu \nu}\left(\partial_\alpha \partial_\beta h^{\alpha \beta}-\partial^{\mu}\partial_{\mu} h\right) \\ & -\frac{1}{2} \underbrace{h_{\mu \nu}\left(\partial_\alpha \partial_\beta h^{\alpha \beta}-\partial^{\mu}\partial_{\mu} h\right)}_{\approx 0} \\ \\ &=\frac{1}{2}\bigl(\partial_\alpha \partial_\mu h_\nu^\alpha+\partial_\nu \partial_\alpha h_\mu^\alpha-\partial^{\mu}\partial_{\mu} h_{\mu \nu}-\partial_\mu \partial_\nu h \\ & -\frac{1}{2} \eta_{\mu \nu} \partial_\alpha \partial_\beta h^{\alpha \beta}-\eta_{\mu \nu} \partial^{\mu}\partial_{\mu} h \bigr)  \quad \text{(23)}  \end{align*}

We’d like to convert eq. (23) into a form that will be more convenient for future manipulations. To do this, we need to use that following mathematical trick. For repeated indices in the partial derivative and metric correction term h, we can raise the partial derivative’s index and lower the metric correction term index, as follows:

    \[   h^{\mu \nu} = \eta^{\mu \alpha} h_{\alpha}^{\nu} \]

    \[ \partial_{\mu} h^{\mu \nu} = \partial_{\mu} \eta^{\mu \alpha} h_{\alpha}^{\nu}  \]

    \[ \partial_{\mu} h^{\mu \nu} =  \partial^\alpha h_{\alpha}^{\nu}\quad \text{(24)}\]

Using this technique on eq. (23), we obtain:

    \begin{align*} G_{\mu \nu}&=\frac{1}{2}\bigl(\partial^\alpha \partial_\mu h_{\alpha \nu}+\partial^\alpha \partial_\nu h_{\mu \alpha}-\partial^{\alpha}\partial_{\alpha} h_{\mu \nu}-\partial_\mu \partial_\nu h \\ &\quad -\eta_{\mu \nu} \partial^\alpha \partial^\beta h_{\alpha \beta}+\eta_{\mu \nu} \partial^{\mu}\partial_{\mu} h\bigr)\quad \text{(25)}\end{align*}

We now make the following change of variables:

    \[\overline{h} \equiv h_{\mu \nu} - \frac12 \eta _{\mu \nu} h \quad \text{(26)}\]

    \[h_{\mu \nu}  = \overline{h} + \frac12 \eta _{\mu \nu} h \quad \text{(27)}\]

We replace the values of the h terms in eq. (25) with the value on the righthand side of eq. (27), giving us:

    \[G_{\mu \nu}=\frac{1}{2}\left(\partial^\alpha \partial_\mu \bar{h}_{\alpha \nu}+\partial^\alpha \partial_\nu \bar{h}_{\mu \alpha}-\partial^\alpha \partial_\alpha \bar{h}_{\mu \nu}-\eta_{\mu \nu} \partial^\alpha \partial^\beta \bar{h}_{\alpha \beta}\right)\quad \text{(28)}\]

Figure 1 shows how we get from eq. (25) to eq. (28).

Linear gravity equation for Guv with h to Guv to h-bar

Lorenz Gauge

In the last section, we found an equation for the Einstein tensor in terms of a variable that reflects small perturbations of the metric g_{\mu \nu}. Specifically:

    \[G_{\mu \nu}=\frac{1}{2}\left(\partial^\alpha \partial_\mu \bar{h}_{\alpha \nu}+\partial^\alpha \partial_\nu \bar{h}_{\mu \alpha}-\partial^\alpha \partial_\alpha \bar{h}_{\mu \nu}-\eta_{\mu \nu} \partial^\alpha \partial^\beta \bar{h}_{\alpha \beta}\right)\quad \text{(28)}\]

where

    \[  g_{\mu \nu} = \eta_{\mu \nu} + h_{\mu \nu}  \]

and

    \[h_{\mu \nu}  = \overline{h} + \frac12 \eta _{\mu \nu} h \]

Our ultimate goal, though, is to get a wave equation from eq. (28). It turns out that we can do a change of coordinates such that all of the terms on the right side of eq. (28) equal zero except \partial^\alpha \partial_\alpha \bar{h}_{\mu \nu} equal zero:

    \[G_{\mu \nu}=\frac{1}{2}\left(\underbrace{\cancel{\partial^\alpha \partial_\mu \bar{h}_{\alpha \nu}}}_{0}+\underbrace{\cancel{\partial^\alpha \partial_\nu \bar{h}_{\mu \alpha}}}_{0}-\partial^\alpha \partial_\alpha \bar{h}_{\mu \nu}-\underbrace{\cancel{\eta_{\mu \nu} \partial^\alpha \partial^\beta \bar{h}_{\alpha \beta}}}_{0}\right)}\right)\quad \text{(V.C.3.1)}\]

In the vacuum (i.e., T_{\mu \nu} = 0), eq. (V.C.3.1) becomes:

    \[ \partial^\alpha \partial_\alpha \bar{h}_{\mu \nu} = 0\quad \text{(V.C.3.2)}\]

The solution to eq. (V.C.3.2) is a wave, which is what we want. Our task now is to find the coordinate change(s) that bring about the above conditions. Such a coordinate system is called a Lorenz gauge.

It so happens that, if \partial_{\beta} \overline{h}^{\alpha \beta} = 0 then the 3 terms in eq. (28) that we want to become zero, actually become zero. So let’s see if we can manipulate eq. (28) such that this occurs. Recall from eq. (24) that:

    \[   h^{\mu \nu} = \eta^{\mu \alpha} h_{\alpha}^{\nu} \]

    \[ \partial_{\mu} h^{\mu \nu} = \partial_{\mu} \eta^{\mu \alpha} h_{\alpha}^{\nu}  \]

    \[ \partial_{\mu} h^{\mu \nu} =  \partial^\alpha h_{\alpha}^{\nu}\quad \text{(24)}\]

We’ll apply this to the terms in eq. (28) we wish to negate and see if we can get rid of them. We’ll start with the 4th term on the righthand side of eq. (28) \eta_{\mu \nu} \partial^\alpha \partial^\beta \bar{h}_{\alpha \beta}. We’ll work on the \partial^\alpha \partial^\beta \bar{h}_{\alpha \beta} part:

    \begin{align*}  \partial^\alpha \partial^\beta \bar{h}_{\alpha \beta} &= \partial_\alpha \partial_\beta \bar{h}^{\alpha \beta} \\ &=  \partial_\alpha (\underbrace{\partial_\beta \bar{h}^{\alpha \beta}}_{0})\\&= \partial_\alpha(0) \\ &= 0\end{align*}

So the 4th term becomes zero.

Next, we’ll attack the 1rst term on the righthand side, \partial^\alpha \partial_\mu \bar{h}_{\alpha \nu}.

    \begin{align*}  \partial^{\alpha} \partial_{\mu} \bar{h}_{\alpha \nu} &= \partial_{\alpha} \partial{\mu} \bar{h}_{\alpha}_{\nu} \\ &= \partial_{\alpha} \partial_{\mu} \eta_{\beta \nu} \bar{h}^{\alpha \beta} \\ &= \eta_{\beta \nu} \partial_{\mu} (\partial_{\alpha} \bar{h}^{\alpha \beta}) \\ \text{Relabel dummy } & \text{indices } \alpha \leftrightarrow \beta \\  &= \eta_{\alpha \nu} \partial_{\mu} (\partial_{\beta} \bar{h}^{\beta \alpha}) \\ &= \eta_{\alpha \nu} \partial_{\mu} (\underbrace{\partial_{\beta} \bar{h}^{\alpha \beta}}_{0})\\&= \eta_{\alpha \nu} \partial_{\mu} (0)\\&=0\end{align*}

So the 1rst term goes to zero, too.

Finally, we manipulate the 2nd term \partial^\alpha \partial_\nu \bar{h}_{\mu \alpha}. We get:

    \begin{align*} & \partial^\alpha \partial_v \bar{h}_{\mu \alpha} \\ &= \partial_\alpha \partial_v \bar{h}_\mu^\alpha \\ &= \partial_\alpha \partial_v \eta_{\beta \mu} \bar{h}^{\alpha \beta} \\ &= \eta_{\beta \mu} \partial_v\left(\partial_\alpha \bar{h}^{\alpha \beta}\right)\\ &=\eta_{\beta \mu} \partial_v\left(\partial_\alpha \bar{h}^{\alpha \beta}\right) \\ & =\eta_{\alpha \mu} \partial_v\left(\partial_\beta \bar{h}^{\beta \alpha}\right) \\ & =\eta_{\alpha \mu} \partial_v\left(\partial_\beta \bar{h}^{\alpha \beta}\right) \\ & =\eta_{\alpha \mu} \partial_v(0) \\ & =0 \end{align*}

The 2nd term vanishes as well.

Now what we need to do is find a coordinate system(s) that makes \partial_{\beta} \overline{h}^{\alpha \beta} = 0.

The coordinate transformation we’ll make is called a displacement field:

    \[ \widetilde{x}^{\alpha} = x^{\alpha} + \xi^{\alpha} \quad \|\xi^{\alpha}  \| \ll 1 \quad \text{(V.C.3.3)} \]

where

x^{\alpha} are the old coordinates, i.e., (ct, x, y, z)
\tilde{x}^{\alpha} are the new coordinates
\xi^{\alpha} are the components of the displacement field

The displacement field is a tiny vector added to each point in space. The result is a new coordinate system (Figure V.C.3.1):
eigenchris. Relativity 109c: Gravitational Waves – Wave Derivation (The Lorenz Gauge).Feb 7, 2022

Coordinate system made from local field displacement
Figure V.C.3.1

In the diagram, the blue dotted lines represent the original coordinates. The small purple vectors represent the displacement field. And the curved orange lines represent the new coordinate axes associated with the displacement field.

So we have:

    \[ \widetilde{x}^{\alpha} = x^{\alpha} + \xi^{\alpha} \quad \text{(V.C.3.3)} \]

    \[ x^{\alpha} = \widetilde{x}^{\alpha} - \xi^{\alpha}  \quad \text{(V.C.3.4)} \]

We differentiate eq. (33) and eq. (34) and get:

    \[ \frac{\partial \widetilde{x}^{\alpha}}{\partial x^{\beta}} = \frac{\partial x^{\alpha}}{\partial x^{\beta}}  + \frac{\partial \xi^{\alpha}}{\partial x^{\beta}} = \delta^{\alpha}_{\beta} + \frac{\partial \xi^{\alpha}}{\partial x^{\beta}}  \quad \text{(V.C.3.5)}  \]

    \[ \frac{\partial x^{\alpha}}{\partial \widetilde{x}^{\beta}} = \frac{\partial \widetilde{x}^{\alpha}}{\partial \widetilde{x}^{\beta}}  - \frac{\partial \xi^{\alpha}}{\partial \widetilde{x}^{\beta}} = \delta^{\alpha}_{\beta} - \frac{\partial \xi^{\alpha}}{\partial x^{\sigma}}\frac{\partial x^{\sigma}}{\partial \widetilde{x}^{\beta}}  \quad \text{(V.C.3.6)}  \]

Next, we take the \alpha indices on the non-expanded term on the righthand side of eq. (V.C.3.6) and relabel them as \sigma. That gives us:

    \[ \frac{\partial x^{\sigma}}{\partial \widetilde{x}^{\beta}} =  \delta^{\sigma}_{\beta} - \frac{\partial \xi^{\sigma}}{\partial \widetilde{x}^{\beta}}  \quad \text{(V.C.3.7)}   \]

Then we substitute the value of \displaystyle \frac{\partial x^{\sigma}}{\partial \widetilde{x}^{\beta}} into the rightmost term of eq. (V.C.3.6) yielding:

    \begin{align*}    \frac{\partial x^\alpha}{\partial \widetilde{x}^\beta}&=\delta_\beta^\alpha-\frac{\partial \xi^\alpha}{\partial x^\sigma}\left(\delta_\beta^\sigma-\frac{\partial \xi^\sigma}{\partial \widetilde{x}^\beta}\right)\\ &=\delta_\beta^\alpha-\frac{\partial \xi^\alpha}{\partial x^\sigma} \delta_\beta^\sigma+\underbrace{\frac{\partial \xi^\alpha}{\partial x^\sigma} \frac{\partial \xi^\sigma}{\partial \widetilde{x}^\beta}}_{\approx 0}  \quad \text{(V.C.3.8)}   \end{align*}

Since \xi and its derivatives are very small, the product of 2 derivatives of \xi is negligible and we can ignore it. We can use \delta_\beta^\sigma to change the lower \sigma index on the term to its left, to \beta. We get:

    \[ \frac{\partial x^\alpha}{\partial \widetilde{x}^\beta}=\delta_\beta^\alpha-\frac{\partial \xi^\alpha}{\partial x^\beta}  \quad \text{(V.C.3.9)}   \]

The metric changes under a change of coordinates, as follows:

    \[ \widetilde{g}_{\alpha \beta} = \frac{\partial x^{\mu}}{\partial \widetilde{x}^{\alpha}}\frac{\partial x^{\nu}}{\partial \widetilde{x}^{\beta}}  g_{\mu \nu}  \quad \text{(V.C.3.10)}   \]

We plug the results of eq. (V.C.3.9), and substitute the expanded version of the metric g_{\mu \nu} = \eta_{\mu \nu} + h_{\mu \nu}, into eq. (V .C.3.10). That gives us:

    \begin{align*} \widetilde{g}_{\alpha \beta} &=\left(\delta_\alpha^\mu-\frac{\partial \xi^\mu}{\partial x^\alpha}\right)\left(\delta_\beta^\nu-\frac{\partial \xi^\nu}{\partial x^\beta}\right)\left(\eta_{\mu \nu}+h_{\mu \nu}\right) \\ &=\left(\delta_\alpha^\mu \delta_\beta^\nu-\delta_\alpha^\mu \frac{\partial \xi^\nu}{\partial x^\beta}-\delta_\beta^v \frac{\partial \xi^\mu}{\partial x^\alpha}+\underbrace{\frac{\partial \xi^\mu}{\partial x^\alpha} \frac{\partial \xi^\nu}{\partial x^\beta}}_{\approx 0}\right)\left(\eta_{\mu \nu}+h_{\mu \nu}\right) \quad \text{(V.C.3.11)}  \end{align*}

To save space, we’ll use comma notation to indicate derivatives (e.g., \displaystyle \frac{\partial \xi^{\nu}}{\partial x^{\beta}} = \xi^{\nu}_{,\beta}. We institute this comma notation, distribute terms and use the Minkowski metric to lower some indices. We also note that \eta_{\alpha \beta} +h_{\alpha \beta} =g_{\alpha \beta}. Eq. (V.C.3.11) becomes:

    \begin{align*} \tilde{g}_{\alpha \beta}&=\left(\delta_\alpha^\mu \delta_\beta^\nu-\delta_\alpha^\mu \xi_{, \beta}^\nu-\delta_\beta^\nu \xi_{, \alpha}^\mu\right)\left(\eta_{\mu \nu}+h_{\mu \nu}\right) \\ &= \eta_{\alpha \beta}-\eta_{\alpha \nu} \xi_{, \beta}^\nu-\eta_{\mu \beta} \xi_{, \alpha}^\mu+h_{\alpha \beta}-h_{\alpha \nu} \xi_{, \beta}^\nu-h_{\mu \beta} \xi_{, \alpha}^\mu  \\ &=g_{\alpha \beta}-\xi_{\alpha, \beta}-\xi_{\beta, \alpha}  \quad \text{(V.C.3.12)}  \end{align*}

Recognizing that:

    \[  g_{\alpha \beta} = \eta_{\alpha \beta} +h_{\alpha \beta}  \]

and

    \[  \widetilde{g}_{\alpha \beta} = \eta_{\alpha \beta} +\[  g_{\alpha \beta} = \eta_{\alpha \beta} + \widetilde{h}_{\alpha \beta}  \]

we can transform eq. (V.C.3.12) into:

    \begin{align*} \eta_{\alpha \beta} + \widetilde{h}_{\alpha \beta} &= \eta_{\alpha \beta} +h_{\alpha \beta}-\xi_{\alpha, \beta}-\xi_{\beta, \alpha} \\ \widetilde{h}_{\alpha \beta} &= h_{\alpha \beta}-\xi_{\alpha, \beta}-\xi_{\beta, \alpha}  \quad \text{(V.C.3.13)}  \end{align*}

As an aside, it turns out that the components of the Riemann curvature tensor don’t change under a coordinate change created by a small displacement field:

    \[\widetilde{R}^{\rho}_{\sigma \mu \nu}  =  R^{\rho}_{\sigma \mu \nu}\quad \text{(V.C.3.14)}\]

Proof of this can be found in the eigenchris YouTube video, Relativity 109c: Gravitational Waves – Wave Derivation (The Lorenz Gauge) at 10:30.

At any rate, recall that our goal is for our change x^\alpha \rightarrow \widetilde{x}^\alpha leads to the change \overline{h}_{\mu \nu} \rightarrow \overline{\widetilde{h}}_{\mu \nu} such that \partial_{\beta}  \overline{\widetilde{h}}^{\,\,\alpha \beta} = 0 i.e., the Lorenz condition.

We know that:

    \[ \widetilde{h}_{\mu \nu} &\equiv h_{\mu \nu}-\xi_{\mu, \nu}-\xi_{\nu, \mu}\quad \text{(V.C.3.15)}  \]

Therefore:

    \begin{align*}  \overline{\widetilde{h}}_{\mu \nu} &\equiv \widetilde{h}_{\mu \nu}-\frac{1}{2} \eta_{\mu \nu} \widetilde{h}= \widetilde{h}_{\mu \nu} - \frac12\eta_{\mu \nu}\eta^{\alpha \beta}\widetilde{h}_{\alpha \beta} \\ \overline{\widetilde{h}}_{\mu \nu}&=\left(h_{\mu \nu}-\xi_{\mu, \nu}-\xi_{\nu, \mu}\right)-\frac{1}{2} \eta_{\mu \nu}\left[\eta^{\alpha \beta}\left(h_{\alpha \beta}-\xi_{\alpha, \beta}-\xi_{\beta, \alpha}\right)\right]\\ &=h_{\mu \nu}-\frac{1}{2} \eta_{\mu \nu} h-\xi_{\mu, \nu}-\xi_{\nu, \mu}+\frac{1}{2} \eta_{\mu \nu} \eta^{\alpha \beta} \xi_{\alpha, \beta}+\frac{1}{2} \eta_{\mu \nu} \eta^{\alpha \beta} \xi_{\beta, \alpha} \\ &=h_{\mu \nu}-\frac{1}{2} \eta_{\mu \nu} h-\xi_{\mu, \nu}-\xi_{\nu, \mu}+\frac{1}{2} \eta_{\mu \nu} \xi_{, \beta}^\beta+\frac{1}{2} \eta_{\mu \nu} \xi_{, \beta}^\beta \\ \eta^{\mu \alpha} \eta^{\nu \beta} \overline{\widetilde{h}}_{\mu \nu}&=\eta^{\mu \alpha} \eta^{\nu \beta}\left(\overline{h}_{\mu \nu}-\xi_{\mu, \nu}-\xi_{\nu, \mu}+\eta_{\mu \nu} \xi^\sigma, \sigma\right) \\ \overline{h}^{\alpha \beta}&=\overline{h}^{\alpha \beta}-\eta^{\mu \alpha} \eta^{\nu \beta} \xi_{\mu, v}-\eta^{\mu \alpha} \eta^{\nu \beta} \xi_{\nu, \mu}+\eta^{\alpha \beta} \xi_{, \sigma}^\sigma \\ \overline{\tilde{h}}^{\alpha \beta}&=\overline{h^{\alpha \beta}}-\eta^{\nu \beta} \xi^\alpha{ }_{, \nu}-\eta^{\mu \alpha} \xi_{, \mu}^\beta+\eta^{\alpha \beta} \xi_{, \sigma}^\sigma \\ \overline{\tilde{h}}^{\alpha \beta}&=\overline{h_{, \beta}^{\alpha \beta}}-\eta^{\nu \beta} \xi_{, \nu \beta}^\alpha-\eta^{\mu \alpha} \xi_{, \mu \beta}^\beta+\eta^{\alpha \beta} \xi_{, \sigma \beta}^\sigma \\ \beta \rightarrow \mu &\text{ and } \sigma \rightarrow \beta \text{ for rightmost term} \\ \overline{\tilde{h}}^{\alpha \beta}&=\overline{h_{, \beta}^{\alpha \beta}}-\eta^{\nu \beta} \xi_{, \nu \beta}^\alpha-\cancel{\eta^{\mu \alpha} \xi_{, \mu \beta}^\beta}+\cancel{\eta^{\alpha \mu} \xi_{,\beta \mu}^{\beta}} \\ \overline{\tilde{h}}^{\alpha \beta}&=\overline{h}_{, \beta}^{\alpha \beta}-\eta^{\nu \beta} \partial_{\nu} \partial_{\beta}\xi^\alpha \\ \partial_{\beta}\,\overline{\tilde{h}}^{\alpha \beta}&=\partial_{\beta}\,\overline{h}^{\alpha \beta}- \partial^{\beta} \partial_{\beta}\xi^\alpha \\ \partial_{\beta}\,\overline{\tilde{h}}^{\alpha \beta}&=\partial_{\beta}\,\overline{h}^{\alpha \beta}- \square\xi^\alpha \quad \text{(V.C.3.16)}\end{align*}

If we choose \xi^{\alpha} such that \partial_{\beta}\,\overline{h}^{\alpha \beta}= \partial^{\beta} \partial_{\beta}\xi^\alpha, then \partial_{\beta}\,\overline{\tilde{h}}^{\alpha \beta} = 0 which, of course, is the Lorenz condition.

Not that the choice of \xi^{\alpha} is not unique. If we have \chi^{\alpha} such that \square\chi^\alpha = 0, then this will also satisfy the Lorenz condition since:

    \[ \partial_{\beta}\,\overline{\tilde{h}}^{\alpha \beta} = \square(\xi^{\alpha} + \chi^{\alpha})  = \square \xi^{\alpha} + \underbrace{\square \chi^{\alpha}}_{0} = \square \xi^{\alpha} \quad \text{(V.C.3.17)} \]

Thus, the Lorenz gauge is a whole class of coordinates changes that satisfy the Lorenz condition.

So, if we find a Lorenz gauge, then:

    \[\partial_{\beta}\,\overline{\tilde{h}}^{\alpha \beta} = \partial^{\beta}\,\overline{\tilde{h}}_{\alpha \beta} = 0\]

and

    \[ \widetilde{G}_{\mu \nu}=\frac{1}{2}\left(\underbrace{\cancel{\partial^\alpha \partial_\mu \overline{\widetilde{h}}_{\alpha \nu}}}_{0}+\underbrace{\cancel{\partial^\alpha \partial_\nu \overline{\widetilde{h}}_{\mu \alpha}}}_{0}-\underbrace{\cancel{\eta_{\mu \nu} \partial^\alpha \partial^\beta \overline{\widetilde{h}}_{\alpha \beta}}}_{0}-\partial^\alpha \partial_\alpha \overline{\widetilde{h}}_{\mu \nu}\right) \quad \text{(V.C.3.18)} \]

In linearized gravity and the Lorenz gauge:

    \[ \widetilde{G}_{\mu \nu} = \frac12  \square  \overline{\widetilde{h}}_{\mu \nu} = \frac{8 \pi G}{c^4}\,\widetilde{T}_{\mu \nu} \quad \text{(V.C.3.19)} \]

If we’re in a vacuum, \widetilde{T}_{\mu \nu} = 0, which means that:

    \[ \square   \overline{\widetilde{h}}_{\mu \nu} = 0 \quad \text{(V.C.3.20)} \]

And, of course, eq. (V.C.3.20) is a wave equation, which is what we were trying to show.