Spring 2014 Chapter: 1 Problem 1.1 (Quantum Dice)

Name:
Course:
Chapter:
Jesper Toft Kristensen
Spring 2014
1
Problem 1.1 (Quantum Dice):
Let us do some quantum mechanics via a game of dice. The die will not have
6 sides as is often encountered in real life, but we will make have 3 sides. So
it shows either 1 spot, 2 spots, or 3 spots. So this is all we think of right
now. A die with 3 sides. Now we can play 3 different games. One game
is called "distinguishable" another is "bosons" and the last is "fermions".
Each game has its own rules.
What is the meaning of this game?
Of course, each game corresponds to considering a different kind of particle.
E.g., a fermion is a type of particle which behaves in a way similar to the
game "fermion" (not too surprising). What do I mean by "behaves similarly"; well each die represents one of 3 states (remember it has 3 sides) in
which we can put one fermion. So when we roll die number 1 it corresponds
to placing a fermion in either of 3 states. Say we roll a 1. Then we put one
fermion in state 1. Now we roll die number 2. If it turns out to also show a
1 we cannot put the fermion in state 1 as we are suppose to because there is
already a fermion occupying that state. Hence, we must re-roll the die until
we get either 2 or 3 spots showing on the die and we put the second fermion
in the state 2 or 3, respectively.
So we see how the number of sides simply corresponds to the number of
states in the system we can put particles in. The number of valid rolls corresponds to the number of particles we want to put into these states and the
rules of the game govern how this is done. Pretty neat.
a)
Let’s roll
Let us play bosons and role a die twice. This game dictates that the second
roll (when I say this I mean "the number of spots on the die") has to be
equal to or greater than the first roll. First of all, does this problem even
have a solution? We can quickly see it does since the die can take values 1,
2, or 3, so, e.g., 1+3 is 4 which is a valid set of tosses. Ok, but of course we
are not done. We have found one possible way of making a 4. Let us list all
allowed moves and add up the spots, this gives us the answer immediately:
roll 1
roll 2 sum
1
1
=2
1
2
=3
1
3
=4
2
2
=4
2
3
=5
3
3
=6
We see that there are 2 rolls giving a sum of 4. The answer is in the form
(number of ways to get 4) divided by (total number of ways of getting anything). By making the above list the information is staring at us. E.g., the
total number of ways of getting any sum of spots is of course just the number of valid tosses (we don’t care about what the spots sum up to). That
is simply the number of rows in the above list: 6. The number of ways of
getting a sum of 4 we see from the third column is 2. Thus, we have the
result
2
1
= = 33 %.
6
3
So, on average, every third two-rolls gives a sum of 4. Incidentally, we see
from the list that there is something special about "4". It is the only number
to occur twice in the list. Every other sum occurs with just
1
≈ 17 %
6
probability. This state is indeed special because if we consider an odd number of states (that is, an odd number of sides in the die) the sum of energies
we get from putting two particles in the same state–the center state–can also
be obtained by putting one particle in the lowest state and the other in the
highest state. We can form yet another state with the same energy if we
move the lowest state particle one state up and the highest state one state
down, thus maintaining the sum value, and continue this until we reach the
center state. No other state has this feature. It is really due to the symmetry
of the state: there are as many levels below it as above it. We can quantify
how many states will have a given energy.
We can immediately get how many other arrangements n of the particles
give rise to the same energy (we can think of the sum of the die spots as the
total energy of the system if that is what each level represents but it does
not have to be energy. The important thing is that we lose no generality,
hopefully just gaining clarity at the least). We do this: count how many
other states are below N↑ and above N↓ . Take the minimum of this and add
1 (the state itself) to this:
n = min(N↑ , N↓ ) + 1
Notice that, N↓ + N↑ = N − 1 (the minus one from not counting the state
itself) so we can write
n = min(N↑ , (N − 1) − N↑ ) + 1
Due to symmetry, the largest number is obtained for the center state
which has N↓ = N↑ = (N − 1)/2 + 1 = (N + 1)/2. So if N = 3 then the
center state is 2 and the sum is 4. We have n = (3 + 1)/2 = 2 states which
give 4. Good, that is what we saw above. Now, if we have 5 states then the
center is at 3 and gives an energy of 6. There are (5 + 1)/2 = 3 such states,
and all other states have fewer combinations of particles giving rise to the
same energy.
What happens, you may ask, when the number of states is even instead
of odd? Well, then there is no longer a single center state. Instead, you have
the two center states (e.g., with 4 states in total these are states 2 and 3)
playing the role of the single center state above (i.e., these 2 states share the
feature that we can find most particle combinations giving the energy of one
of the states compared to all other states), but you also have one more: any
state which has an energy which is the sum of these two center states is also
going to occur as frequent. Thus, we actually have 3 states in total, instead
of 1 for the odd case, that occur most often in this system. Of course, each
of those 3 states have a different energy than any other.
b)
There is just a single state allowed if we are playing fermions with three-sided
dice and tossing 3 times. The physical meaning is that we are trying to put
3 fermions in 3 quantum states. A quantum state cannot be occupied by
more than one fermion so the only option is to put a fermion in each of the
3 states thus giving a sum of 1+2+3=6. This is the only option, and hence
ρ(6) = 100 %. This is the case for electrons.
One can show that, for the game "Bosons" where each die has N sides and
we roll M dice we get
(N + M − 1)!
N +M −1
=
M
M !(N − 1)!
legal turns.
c)
In distinguishable, all turns are legal. What is the number of legal turns?
Well, each die can show any of its N sides on any of the M rolls. E.g., we
can roll a 112, 411, 816 (depending on N and M of course, but the point is
that there is no restriction on what integer is allowed in the later rolls given
the first ones). Hence, we must have that there are N M legal turns: each
placeholder (die toss) can take N values, and we toss the die M times.
The number of turns which have all faces showing the same value (e.g., 111),
also called an M -tuple, is M in both Bosons and Distinguishable. Since they
are the same in both games and since we are getting rid of some states in
Bosons (e.g., 211 is not allowed) but not in Distinguishable it must be more
likely to roll a state with all faces showing the same value!
For three states (N =3) and 3 rolls (M = 3) we have that in distinguishable there are 27 legal rolls (N M = 33 ) of which 3 are M -tuples. In Bosons
we have 10 legal turns of which also 3 are M -tuples. Thus, the enhancement
factor (how many more M -tuples we get in Bosons compared to Distinguishable) is:
3
27
10 = 2.7.
=
3
10
27
If we had rolled just two dice the enhancement would have been 1.5 so we
see that, the more states we allow the larger is the effect of the Bosons to
pile up in the M -tuple states (all particles in the same quantum state).
More general: for distinguishable the ratio of generating an M -tuple to any
other state is:
M
,
NM
and for Bosons it is
M
(N +M −1)!
M !(N −1)!
,
So let us divide the Boson case with the Distinguishable case to get the
enhancement factor of M -tuples in Bosons:
M
(N +M −1)!
M !(N −1)!
⇒
M
NM
NM
(N +M −1)!
M !(N −1)!
Since the numerator is larger than the denominator it is indeed an enhancement. So, as we stated earlier, the enhancement comes from the fact that
there are more legal turns in Distinguishable than in Bosons and the number
of M -tuples are unchanged between the two games.
Problem 1.2 (Probability distributions):
a)
The probability is found from counting the number of ways to obtain a number in the given range to the total number of possible numbers to obtain in
the entire renege −∞ < x < ∞. Now, the problem is not discrete per se.
This is the number of ways to get a number in the desired range:
Z x=0.75
Z x=0.75
0.75
= 0.75 − 0.70 = 0.05 = 5%.
dx = [1]0.70
ρuniform (x)dx =
x=0.70
x=0.70
Now to the exponential decay. We need to find the probability that the time
is larger than 2τ :
Z t=∞
Z t=∞
1
ρexponential (t)dt =
exp(−t/τ )dt = [− exp(−t/τ )]∞
2τ
t=2τ
t=2τ τ
= 0 − (− exp(−2)) = exp(−2) = 13.5%.
Let us look at the Gaussian distribution and find the probability that my
score on an exam is larger than 2σ above the mean:
Z v=∞
Z v=∞
1
√
ρGaussian (v)dv =
exp(−v 2 /2σ 2 )dv.
2πσ
v=2σ
v=2σ
Introduce the variable y = v/σ, then v = 2σ in the integration limit turns
into y = 2 (the upper limit goes to infinity, so is unchanged) and the differential goes from dv to σdy which makes the σ’s cancel, and we can write
Z y=∞
1
√ exp(−y 2 /2)dy.
2π
y=2
One way to perform this integral is to convert it to polar coordinates but
the problem text gives us the answer:
Z y=∞
1
√ exp(−y 2 /2)dy ≈ 2.3%.
2π
y=2
b)
It is important to have normalized probability distributions. It simply means
that the probability of obtaining some value is 100 % as it should be. Not
more, not less. So are they normalized?
Z x=∞
Z x=1
ρuniform (x)dx =
dx = 1.
x=−∞
x=0
How did the integration range change from an infinite range to the finite
range? Simply do to the fact that the uniform is zero outside the range
0 ≤ x < 1. We can find the mean as:
Z x=∞
Z x=1
1 2 1 1
x0 =
xρuniform (x)dx =
xdx =
x
= ,
2
2
x=−∞
x=0
0
which should make sense: the distribution takes the same value from zero to
one. So the mean should be right in the middle: at a half. Of course, if the
distribution took larger values to the left of the mean than to the right the
mean should be shifted to lower values of x: x0 < 1/2 but for this problem it
makes sense! Visualizing the distribution always helps in convincing yourself
that the mean is correct (in this case it is rather simple of course).
Now let us try the exponential. Is it normalized? Note that it is non-zero in
the range zero to infinity:
Z t=∞
Z t=∞
1
ρexponential (t)dt =
exp(−t/τ )dt = [− exp(−t/τ )]∞
0
τ
t=−∞
t=0
= 0 − (−1) = 1.
What is the mean (which we could call x0 )? The answer is
Z
t=∞
x0 =
Z
y=∞
tρexponential (t)dt = τ
t=−∞
y exp(−y)dy.
y=0
One way to do this integral is to consider some other function of some variable
a:
Z y=∞
f (a) = τ
exp(−ay)dy.
y=0
By comparison, is it not true that the integral we want to solve is
df (a) x0 = −
.
da a=1
Now what is the benefit of this? Well, we can easily find f (a) by doing the
integral of just the exponential:
f (a) =
τ
,
a
and then take the derivative of the result:
τ df (a) x0 = −
=
−
− 2 = τ.
da a=1
a a=1
Finally, consider the Gaussian:
Z v=∞
Z
ρGaussian (v)dv =
v=∞
1
exp(−v 2 /2σ 2 )dv.
2πσ
v=−∞
v=−∞
√
We can then do the following. Introduce y = v/ 2σ, and write
Z y=∞
Z y=∞
√
1
1
2
√
√
exp(−y 2 )dy.
exp(−y ) 2σdy =
π
2πσ
y=−∞
y=−∞
√
One way to do this is to convert to polar coordinates, first multiply by the
same integral again, and take the square root to undo this action (thus
leaving everything unchanged):
Z y=∞
1/2
Z x=∞
1
2
2
√
exp(−y )dy
exp(−x )dx
π
y=−∞
x=−∞
Z x=∞ Z y=∞
1/2
1
2
2
√
=
exp(−(x + y ))dxdy
π
x=−∞ y=−∞
In polar coordinates r2 = x2 + y 2 (r is the distance from the origin to any
point in the plane) and the differentials are rdrdφ where φ is the angle
between the radius vector pointing from the origin to some given point in
the plane and the x-axis. Thus, the integral becomes (it is still an integral
over the plane just in other coordinates than before):
=
φ=2π
1
√
π
Z
1
√
π
Z
2π
Z
r=∞
2
1/2
exp(−r )rdrdφ
φ=0
r=0
r=∞
r=0
1/2
1
2
,
exp(−r ) d(r )
2
2
notice how we used the fact that the integrand was independent of φ so
it became easy to do (it was simply 2π). Also, we changed from dr to
d(r2 ) = 2rdr, to get the same differential variable as is in the exponential.
Now we will rename r2 as z (notice that the integration limits do not change
with this change):
Z z=∞
1/2
1
√
π
exp(−z)dz
π
z=0
1
1/2
= √ (π [− exp(−z)]∞
0 )
π
= 1.
So the Gaussian is normalized. Now let us find the mean (and say we have
already done the variable change):
Z v=∞
Z v=∞
1
x0 =
vρGaussian (v)dv =
v√
exp(−v 2 /2σ 2 )dv.
2πσ
v=−∞
v=−∞
The Gaussian in v is an even function over the integration domain: it will
give some non-zero value. But, the even function is multiplied by v which
is an odd function in the variable v. The product of an even and an odd
function is odd. But integrating an odd function over the entire domain
yields zero. So
x0 = 0.
We now look into the standard deviation of each distribution. This quantity gives us some intuition as to how much we can expect variables to vary
drawn from the distribution. E.g., let us take the exam results as a case in
mind. If the standard deviation on the Gaussian (which describes the exam
results of all students) is huge, then it means that: picking any student at
random she will have a score typically very different from the mean of the
Gaussian. Note that these are just intuitive answers, in reality it can get
much more complicated, also if the distribution is multimodal etc. But the
above feel for the standard deviation suffices for now.
Starting with the uniform (we use the knowledge of the mean x0 of 1/2
we just found):
s
sZ
Z x=1 x=∞
1 2
2
σ0 =
(x − x0 ) ρuniform (x)dx =
x−
dx
2
x=−∞
x=0
v
v
s uZ x= 1
u
1
u
u 1 2
2
1 1
1
1
1
t
t
2
3
=
y dy =
y
=
+ 3 =√ = √ .
3
3
3 2
2
12
2 3
y=− 12
− 12
So the standard deviation is almost 1/4, but not quite, it is slightly larger.
Now let us try the exponential. The mean, which we will need, was x0 = τ .
sZ
t=∞
(t − x0 )2 ρexponential (t)dt
σ0 =
t=−∞
s
Z
t=∞
=
t=0
s
=
τ2
Z
1
(t − τ )2 exp(−t/τ )dt
τ
z=∞
(z − 1)2 exp(−z)dz
z=0
where I made the substitution z = t/τ . Now, let us move the integration
domain to make the polynomial multiplying the exponential a bit nicer (and,
in principle define a new integration variable, but I will just call it z again):
s
Z
z=∞
σ0 = τ
z 2 exp(−z)dz
exp(−1)
z=−1
We know the routine by now: define some function f (a) =
and then we have:
Z z=∞
d2 f (a) z 2 exp(−z)dz
=
da2 a=1
z=−1
R∞
z=−1 exp(−az)dz,
The trick is that we know what f (a) is:
∞
1
1
f (a) = − exp(−az)
= exp(a)
a
a
z=−1
1
df (a)
exp(a) exp(a)
= f (a) 1 −
⇒
=
−
da
a
a2
a
2
d f (a)
1
f (a)
df (a)
⇒
1−
+ 2 .
=
2
da
da
a
a
The first term vanishes at a = 1, and the second term reduces to simply
exp(1), so:
p
σ0 = τ exp(−1) exp(1) = τ.
The standard deviation for the exponential is the same as its mean! Another
interpretation of the standard deviation (for some functions) is the distance
over which the function changes (sort of a characteristic scale defining the
function). So, for the exponential, the average value occurs right where the
function changes.
Let us look now at the Gaussian (where the mean x0 is zero):
sZ
sZ
v=∞
v=∞
1
σ0 =
(v − x0 )2 ρGaussian (v)dv =
v2 √
exp(−v 2 /2σ 2 )dv.
2πσ
v=−∞
v=−∞
A change of variable is in place z = v/σ:
s Z
z=∞
σ0 =
σ2
1
z 2 √ exp(−z 2 /2)dz.
2π
z=−∞
We can keep going (by defining our usual function f (a), etc.), or use the hint
from the book that this integral is just one, so:
σ0 = σ.
c)
We know that we are drawing two numbers from a uniform in [0, 1) and
adding them up. The resulting distribution of the sum z = x + y will be
large for some particular value of the sum z1 if there are many ways z1 can
occur. Indeed, if x + y can never become -1 then z should be zero there. If,
on the other hand, the sum can become 1 it should be non-zero there. But if
it can obtain the value 1 in more ways than it can obtain the value 1.5 then
it better be larger at 1 than at 1.5.
First, the sum can take all values in the range [0,2). The lowest value is
obtained when x = y = 0. The largest value when both x and y are very
close to 1. In this example there is something special about the value 1 because it lies right in the middle of the range of the values of the sum. It is the
value which can be obtained more times than any other. Indeed, consider
how many ways we can obtain 0.2, e.g.: x can be zero and y = 0.2. Then,
we can increase x while decreasing y until x = 0.2 and y = 0 – and I am
thinking about splitting the axis into tiny little bins so that a count actually
makes sense. So, essentially, the count becomes “how many times we can
decrement x from 0.2 until it reaches zero (while simultaneously incrementing y from 0 to 0.2)". But with this logic there should be more counts for
x + y = 0.4. And indeed there is because there are more of these little bins
from 0.4 to zero than from 0.2 to zero. Also, x + y = 0.6 has more counts
than x+y = 0.4. This keeps going until x+y = 1 which has the most counts.
But what about x + y = 1.2, e.g. Well, how do we obtain this value? Start
x close to 1 and y at 0.2. Then, decrement x all the way back to 0.2 (so we
have decremented by 0.8 values, this is less than when x + y = 1 where we
can decrement by 1.0 values) and y is incremented all the way to 1. So you
see that this actually has the same counts as when x + y = 0.8 (in which case
x is also changed over a range of 0.8 units/bins). More generally then, the
sum which is t units above 1 should be the same as the sum t units below 1.
So the graph is symmetric about x + y = 1.
If drawing this in a graph with x + y on the x-axis and the distribution
of the sum on the y-axis, one obtains a triangle with a base in [0,2] on the
x-axis and a height in [0,1]. If we cut the triangle at its height we can overlay the left side with the right side (imagine turning one side over), so it is
symmetric about the height.
So, more generally, the question we need to answer to get the distribution of
z = x + y, called ρ(z), where x and y follow distributions ρ1 (x) and ρ2 (y),
respectively, is this: to get the value ρ(z1 ) we need to find out how many
ways the value z1 can be obtained from the two distributions of the parts
of the sum. Probability theory tells us how to get this. Let z1 = x1 + y1 ,
then we see that, if we pick some value x1 from ρ1 (·), then we can only use
a value drawn from ρ2 (·) if it results in z1 . In other words we need to draw:
ρ2 (z1 − x1 ). Generally, this becomes:
Z
ρ(z) = dxρ1 (x)ρ2 (z − x)
d)
First, we see from the velocity distribution that it can be factored in each
dimension and thus, we don’t have to solve this problem in 3D. We can simply
solve it along, say, x and that is the answer. Kinetic energy is KEx = M vx2 /2
so the mean of this quantity is:
r
Z
M vx2
M
M vx2
hKEx i = dvx
.
exp −
2
2πkT
2kT
So the kinetic energy seems to be related to the standard deviation of the
Gaussian distribution of velocities.
With a variable change the answer bep
comes obvious. Define z = vx M/kT , then
√ !
r
Z
vx M
1
M vx2
kT M vx2
√
hKEx i =
d
exp −
2 kT
2π
2kT
kT
2
Z
kT
1
z
kT
=
dzz 2 √ exp −
=
,
2
2
2
2π
where we used the hint about the integral being one. Thus, the mean kinetic
energy along a particular dimension is indeed kT /2. The answer is the same
for each of the other dimensions.
The probability that the velocity takes a particular value v = |v| is found
as follows. The size of the velocity vector forms a sphere in velocity space
(no dimension is special over any other so the space better be isotropic).
The volume between v and v + dv is then the surface area of the sphere
with radius v, which is 4πv 2 , times dv. The total probability is this volume
times the probability density throughout the volume which is the Gaussian
distribution, so (let d(Vol) be the volume in velocity space just discussed):
1
p(v) = d(Vol)ρ(vx , vy , vz ) = 4πv 2 dv
exp(−M v 2 /2kT )
(2π(kT /M ))3/2
4π
1
v 2 3 exp(−v 2 /2σ 2 )dv
=
3/2
σ
(2π)
r 2
2 v
=
exp(−v 2 /2σ 2 )dv.
π σ3
To get the density from this total probability simply leave out the differential
(that is, the volume element dv, which is indeed one-dimensional if considering the distribution in the scalar v–the length of the velocity vector):
r 2 v2
ρ(v) =
exp(−v 2 /2σ 2 ).
π σ3
All we have done is to find the distribution of the length of a velocity vector (which is one-dimensional: the length is just a scalar) starting from the
distribution of its individual components which was Gaussian (and threedimensional: one component for each dimension). The resulting distribution
just obtained is a Maxwellian distribution.
Problem 1.3 (Waiting times):
a)
The average number of passing cars in 1 hour is found as (the times t and τ
are measured in minutes):
Z
t=60
average # of cars in 1 hour =
t=0
dt
60
=
= 12.
τ
5
b)
I will assume that the 10 minute interval is given as [0, 10). In other words,
the starting point t = 0 is included, but not the end point at exactly 10
minutes. If both points were included I could get a case with n = 3 busses
included. If not, then I always see 2 busses no matter what interval I choose.
Thus
1, n = 2
Pbus (n) =
0,
o.w.
For the cars we can, in principle, observe 10, 100, 1000, or even a million
cars in the time interval (of course, given that τ = 5 minutes it should be
unlikely observing 1000 cars in 10 minutes, although, in principle, possible).
So we have to approach this differently. So what we do is to imagine setting
up some robot by the roadside which has a little detector to detect cars passing by. The detector can be opened for a small period of time and it then
closes again. Think of it as a shutter, like on a camera. Say it is open for
1 minute detecting everything it can and then it closes again. This means
that, in 10 minutes, it is open 10 times. But, each time it is open it can
either detect 1 car or no car, not multiple cars. So a 1 minute open time
seems like a bad idea. Why? because the probability of a car arriving in
a 1 minute time interval is 1 in 5 or 20 % so it will occur that multiple
cars arrive in the same time interval more often than we want. This would
provide a very poor resolution. So let us better the robot. How about we
decrease the shutter time to 1 second. Now we open the little detector for
1 second. If a car passes in this time interval we record it. If multiple cars
arrive we can only detect one of them. So we can do better. We can let the
detector time approach zero. Thus, we can open-close-open-close-....-close
many many many times in a second. Now, the road is such that cars can’t
move next to each other. It is a 1-dimensional road. So you see, that the
smallest shutter time we need is related to the largest velocity of the cars. If
the cars move extremely fast then there is a chance that two cars will pass
in a 1-second interval and so on. However, we don’t know the largest velocity.
Let us define some rate λ of cars. It is defined by how many times we
open the shutter times the probability that a car arrives each time the shutter is open. Let us say that in the 10 minutes we open the shutter N times.
Then λ = N dt/τ . In other words, this is our expected number of cars in
the interval. Notice that N and dt are related: by using this formula for
λ we are saying that there are N intervals of length dt. We cannot make
dt smaller without also changing N . It is a little dangerous talking about
infinitesimal times (dt) mixed with finite intervals (N shutter intervals), but
the arguments would still apply if we called the arrival probability ∆t and
took the limit. We can write that dt/τ = λ/N .
Now, how do we write down the probability of observing n cars in the given
time interval? Well, starting with finite shutter intervals the probability is
binomial: out of N trials (shutter open-close times) what is the probability
of getting n successes (arrival of cars)? The answer is (look up the binomial
distribution if in doubt):
n N
dt
dt N −n
Pcar (n) =
1−
.
n
τ
τ
Why the N choose n factor in front? Well, it is because there are many
different ways (exactly N choose n ways) of obtaining n successes in N
trials. In particular, we don’t care if a car arrived during shutter time 2
compared to another car arriving at shutter time 8. This leaves two ways of
getting 2 cars: a red car arrives at time 2 and a blue car at time 8 OR the
blue car arrived at time 8 and the red car at time 2. We don’t care about
the car color (we don’t label the cars) and so we can get the same result
(arrival of n cars) in many different ways (N choose n to be exact). Let us
write this in terms of λ:
n N
λ
λ N −n
Pcar (n) =
1−
n
N
N
n
N −n
λ
N!
1
λ
=
exp −
n! (N − n)! N n
N
n
λ
N!
1
n
=
exp
(−λ)
exp
−λ
n! (N − n)! N n
N
where we used that, for small enough x we can let 1 − x ≈ exp(−x). First,
what happens to the last exponential factor as N approaches infinity (that is,
the shutter opens and closes for infinitely small time intervals) and n remains
unchanged? It becomes one. Also, let us look at the following factor from
the product above and try to reduce it:
1
N!
(N − n)! N n
=
=
N (N − 1)(N − 2) · · · 1 1
(N − n)(N − n − 1) · · · 1 N n
N (N − 1)(N − 2) · · · (N − n + 1)
Nn
How many factors do we have in the numerator? We have n: (N − 0) is the
first, then (N − 1) is the second. This keeps going until N − (n − 1) which is
the nth (NOT the (n − 1)th, this is because we start counting at zero). But
we also have n factors of N in the denominator. So we can write:
N (N − 1) (N − 2)
(N − n + 1)
···
→ 1.
N
N
N
N
You see how each term approaches 1 as N goes to infinity. Thus, we can
write the binomial, in the limit of N → ∞ as:
Pcar (n) =
λn
exp(−λ),
n!
which is not called “a Binomial in the limit of very large N ", but a Poisson
distribution. We recall that λ is the expected number of cars per time unit
(the rate).
c)
We know that, if we have just observed a bus, it will be exactly 5 minutes
until the next bus. Therefore, the time interval ∆ between busses is fixed at
one number. To write a real function, defined for all values ∆, we need the
delta Dirac function to single out the single number we are talking about.
In particular:
ρbus (∆) = δ(∆ − 5),
where we work in units of minutes (that is, ∆ is the time between busses in
minutes). What is the mean of this distribution? Let us find out
Z ∆=∞
Z ∆=∞
h∆i =
d∆∆ρbus (∆) =
d∆∆δ(∆ − 5) = 5 minutes.
∆=0
∆=0
So notice how the time between busses is 5 minutes. The mean time we just
computed means that: if we arrive at any random time, the mean time to
the next bus is 5 minutes.
Now, for the cars we first think of splitting the time interval ∆ into many
tiny little slivers each of size dt. Let us say we split it into N slivers. Then,
for some time interval ∆ we have ∆ = N dt. Since the slivers are of fixed
size a larger(smaller) ∆ means a larger(smaller) value of N . Now, in order
to answer the question of the probability for another car to arrive given that
we have just observed one we need this to happen: in the N slivers no car
arrive, but in the (N + 1)th sliver a car arrives. Thus, the total probability
of observing the next car after time ∆, given that we have just observed one,
is (and we use 1 − dt/τ ≈ exp(−dt/τ )):
dt
1
N
Pcar (∆) = exp(−dt/τ )
= exp(−∆/τ )dt
τ
τ
So the density (leave out the volume dt) becomes:
ρcar (∆) =
1
exp(−∆/τ ),
τ
which is an exponential distribution. Is it normalized? Yes it is (we already
checked this back in Problem 1.2):
Z
∆=∞
∆=0
d∆ [ρcar (∆)] = [− exp(−∆/τ )]∆=∞
∆=0 = 1.
What is its mean? Well, it is an exponential distribution so we did this in
an earlier problem (Problem 1.2), and found that
Z
∆=∞
h∆icar =
ρcar (∆)∆d∆
∆=0
= τ = 5 minutes.
d)
So now another observer arrives at some random time. What is the probability distribution to the next bus? Well, let us introduce the exact time
t she arrives. Since the busses come at regular time intervals we can just
focus on the interval between two busses (periodicity means that any other
interval is the same). This means that we have the conditional probability:
given that the observer arrives at time t what is her waiting time? Well, if
she arrive at t = 0 then there is 5 minutes until the next bus. But if she
arrives at t = 2.5 then there is 2.5 minutes and so on. So we know the exact
waiting time given that we know when she arrived. The entire interval is
of length τ . So if she arrives at t there is τ − t minutes until the next bus
arrives. Therefore, we have the conditional distribution:
Pbus (∆|t) = δ(∆ − (τ − t)).
Remember that we are working within the time zero to τ (that is, t ∈
[0, τ ]). There is no loss in generality in doing this because of the periodicity
(busses arrive exactly every 5 minutes). Now, we are interested in the joint
distribution Pbus (∆, t). We can get this from the conditional by multiplying
by the probability of t: Pbus (t). This is just a uniform: Pbus (t) = 1/τ . So
we have:
1
Pbus (∆, t) = Pbus (∆|t)Pbus (t) = δ(∆ − (τ − t)) .
τ
Now we can get Pbus (∆) by integrating out t:
Z t=τ
Z
1 t=τ
dtPbus (∆, t) =
dtδ(∆ − (τ − t))
Pbus (∆) =
τ t=0
t=0
This is the probability distribution for waiting times for the next bus. What
is the mean waiting time?
Z ∆=∞
Z t=τ 1
δ(∆ − (τ − t))
h∆ibus =
d∆∆
dt
τ
∆=0
t=0
How are we going to do this integral? Well, it is not so bad. Look, what
happens when ∆ becomes larger than τ ? Then the inner integral is always
zero because there won’t be any t for which the delta function is non-zero.
Thus, we can restrict the ∆ integration to the range [0, τ ]:
Z ∆=τ
Z t=τ 1
h∆ibus =
d∆∆
dt
δ(∆ − (τ − t)),
τ
∆=0
t=0
In this interval there is always some value of t satisfying the inner delta
function. This means that, for every choice of ∆ there is one match of t.
And thus we can get rid of the t integration:
Z
1 1 2
τ
1 ∆=τ
d∆∆ =
τ
= = 2.5 minutes.
h∆ibus =
τ ∆=0
τ 2
2
So, on average, you only have to wait half the interval between busses. That
is, if you arrive at random every day then you only wait 2.5 minutes on average for the next bus. This seems reasonable. Due to symmetry it should
be in the center of the interval as we can arrive at any point at random
in this interval (if, for some reason, maybe by looking at the schedule, we
arrived closer to the next bus well then of course the average waiting time
should be less, but that is, mathematically, because the uniform assumption
Pbus (t) = 1/τ would be changed).
What about for the cars? Well, you see, remember how we derived the
probability between arrival of cars? We said: there should be no cars in N
slivers and then exactly one car in the next sliver (number N + 1). That
argument still applies here. We arrive at some random time at the road.
What is the probability of the next car? Well we go through the exact same
arguments as before and thus get, again, the exponential distribution:
ρcar (∆) =
1
exp(−∆/τ )
τ
The mean waiting time to the next car is thus τ = 5 minutes (we already
did this calculation many times before).
So, as the problem text next explains it seems strange that the mean we
computed in part (c) gave us 5 minutes between cars. But now, in part (d)
it seems like there is 5 minutes until the next car, and also 5 minutes to the
last car (just use the same arguments going back in time). So it seems like
the gap in part (d) between cars is more like 10 minutes than 5 minutes?
Well, in part (c) we didn’t care about when we arrived at the road, we just
said: what is the probability of obtaining a given gap size? This is called a
gap average. In part (d) the time of arrival at the road changes the mean.
Why? Because it is more likely to arrive in a larger gap than in a smaller one
(say 3 cars are arriving. The gap between the first two is 1 second. The gap
between the second and third is 2 hours, isn’t it more likely for you to arrive
in the larger gap? You could be lucky arriving right before the 1 second gap,
but compared to the 2 hours, not really very likely!). So the point is that
the gap will seem larger in the “time average" because we are more likely to
arrive in the larger gaps.
Let us now find the probability that the second observer, arriving at some
random time, will be in a gap of length ∆. This probability should be proportional to ∆: the larger the gap the larger the probability of arriving in
it. If a gap takes 10 years then we are very likely to arrive in it compared to
another one taking 1 minute. But obtaining a 10 year gap in the first place is
exponentially unlikely. So we sense the battle between two opposing forces:
the probability of arriving in gap ∆ increases with the gap size, but, at the
same time, it becomes less and less likely to get a gap of a large size (in other
words, we know that the cars arrive, on the average, every 5 minutes, so a
gap of 10 years should never be observed in practice. Of course, if τ was
on the order of 10 years it would be likely to get this gap size, but that is
a different problem). So the probability is the product of the probability of
arriving in ∆ and that of having ∆ be a possibility in the first place. In other
words, the probability is found by answering: what is the probability that
the interval ∆ exists AND that we arrive in this interval (the “and" hints
that this is the product rule of probability theory, since they are independent
events). The answer is thus:
∆
exp(−∆/τ ).
τ
ρtime
car (∆) ∝
It must be normalized (call the normalization constant A):
Z
∆=∞
1=
∆=0
ρtime
car (∆)
Z
∆=∞
=
A
∆=0
∆
exp(−∆/τ ) = Aτ.
τ
So we see that A = 1/τ , and thus, the distribution is:
ρtime
car (∆) =
∆
exp(−∆/τ ).
τ2
What is now the mean gap size?
Z ∆=∞
Z z=∞
Z ∆=∞
∆2
time
d∆ 2 exp(−∆/τ ) = τ
dzz 2 exp(−z)
d∆∆ρcar (∆) =
τ
∆=0
z=0
∆=0
= 2τ = 10 minutes.
So, indeed, when arriving at some random time the time average of the gaps
between cars is 10 minutes, not 5. This is consistent with what we found
earlier, namely that: the mean to the next car is 5 minutes, and that to the
previous car is also 5 minutes, so a total of 10 minutes should be the average
gap (when arriving at any random time).
Problem 1.4 (Stirling’s approximation):
R
P
This is how to convert a sum into an integral: dk ↔ k δk, where δk is
the distance on the x-axis between measurements of the function. In our
case, we are summing over the integers. So the distance from one integer to
the next (say, from 3 to 4) is just 1. And thus δk = 1. But, the interval is
centered on the integers. So, e.g., the first interval is centered at 1 (the first
index in the sum) and thus goes from 0.5 to 1.5. The next interval, centered
at 2, goes from 1.5 to 2.5, and so on. The final interval is centered on n and
goes from n − 0.5 to n + 0.5. Thus, the integral extends from 0.5 to n + 0.5
(and not 1 to n). Therefore, we have:
Z k= n
k=n
X
2
log(k) ↔
dk log(k)
k= 12
k=1
1
log n +
n+
2
1
n+
log n +
2
=
=
1
1
1
1
1
− n+
− log
+
2
2
2
2
2
1
1
1
− n − log
,
(1)
2
2
2
which is what we had to show first.
Now let
√ us show that the difference between this and the following (n! ≈
n
(n/e) 2πn in log form):
log(n!) ≈ n log(n) − n log(e) +
1
1
log(2π) + log(n).
2
2
(2)
First, let us re-write Eq. (1) as:
1
1
1
1
1
1
n log(n) + n log 1 +
+ log(n) + log 1 +
− n − log
2n
2
2
2n
2
2
1 1
1
1
1
→ n log(n) + + log(n) +
− n − log
2 2
4n
2
2
where, in the second line, we used that log(1 + 1/2n) ∼ 1/2n (up to terms
of order 1/n2 ). Then, we compare each term above to the terms in Eq. (2)
(remember we are taking the difference between Eq. (1) and Eq. (2)).
First, see that the terms n log(n) cancel. Next, since log(e) = 1 we see
that the terms with n cancel as well. Finally, the terms 1/2 log(n) cancel
as well and, of course, the 1/2n goes to zero. This leaves the following
expression for the difference “Eq. (1)-Eq. (2)":
1 1
1
1
Eq. (1) − Eq. (2) →
+ log(2) − log(2π) = (1 − log(π)) ,
2 2
2
2
which is indeed a constant. So, as n approaches infinity the difference between the two expressions approaches a constant value. Thus, they are
compatible. The constant does not have to be zero as they are two different
approximations, but they better not diverge from each other or else one must
be wrong.
Now let us show that the following expression: (2π/(n + 1))1/2 exp(−(n +
1))(n + 1)n+1 is equivalent to the latter expression in the book, that is,
Eq. (2) here. Let us do this, simply by writing out this new expression in
log form:
→
=
1
1
log(2π) − log(n + 1) − (n + 1) + (n + 1) log(n + 1)
2
2
1
1
log(2π) − log(n) − (n + 1) + n log(n) + 1 + log(n)
2
2
1
1
log(2π) − n + n log(n) + log(n)
2
2
(3)
Again, if using log(e) = 1 in Eq. (2) we can match Eq. (2) and Eq. (3) term
by term. Thus, upon taking their difference it becomes zero. This means
that they are compatible as well.
Problem 1.5 (Stirling and asymptotic series):
a)
Let us be given Γ(z) as described in the problem text. What is special about
the negative integers compared to the positive ones? Well, first, let us say
we want to compute Γ(2) (that is, z = 1), then we do:
Γ(2) = 1 × Γ(1) = 1.
It terminates at Γ(1) because that is one (it is the normalization). So, if we
pick any positive number eventually we reduce the situation to computing
Γ(1). Now, let us try z = −2, e.g.:
Γ(−1) = −2 × Γ(−2) = −2 × (−3 × Γ(−3)) = · · · ,
so we see that, for any negative integer m the value of the factorial function
becomes a product over all negative integers from m to negative infinity.
There is no stopping point anymore like Γ(1) before. This leaves a singularity at all negative integers.
b)
The singularities were at all negative integers when expressed in z. Now,
with the change of variable ζ = 1/z the poles are at −1/2, −1/3, and so
on. The series, expressed in ζ, is an expansion about the origin ζ = 0 in
the complex plane (because that corresponds to z → ∞). Thus, for larger
and larger negative integers we approach zero more and more (e.g., the pole
−1/1000 is close to zero). Therefore, the radius of convergence, measured
from the origin, is zero.
c)
Let us now show this explicitly. It turns out that the odd coefficients grow
asymptotically as:
A2j+1 ∼ (−1)j
2(2j)!
.
(2π)2(j+1)
I will use the equal sign from now on, but with the understanding that it is
asymptotically. As is mentioned in the footnote, the radius of convergence
is
s
A2j−1 A2j+1 To get A2j−1 we can just let k = j − 1, then
2(2k)!
(2π)2(k+1)
2(2j − 2)!
= (−1)j−1
.
(2π)2j
A2k+1 = A2j−1 = (−1)k
The radius of convergence is then:
v
v
s
(2j−2)!
(−1)j−1 2(2j−2)! u
u
u
u
2j
A2j−1 u
2π
1
u (2π)
(2π)2j =p
= t
→ → 0 as j → ∞.
A2j+1 = t
2(2j)!
(2j)!
j
2j(2j − 1)
(−1)j (2π)2(j+1) (2π)2j+2
So we see that the radius of convergence does indeed go to zero as we include
more and more terms for some fixed z.
d)
We can compute 0! by computing Γ(1) since Γ(1) = (1 − 1)! = 0!. Thus, we
are considering z = 1 in the series expansion. This means that 1.3 from the
problem text becomes:
√
1
139
571
1
+
−
−
+ ··· .
2π exp(−1) 1 +
12 288 51840 2488320
We can include more and more terms. Starting at just including "1" (which
I call A1 ), then including "1" and "1/12" (called A2 ) and so on, we get:
A1 → 0.92214
A2 → 0.99898
A3 → 1.00218
A4 → 0.99971
A5 → 0.99950.
Considering that we expand about infinity (z → ∞) the series is doing really
well at z = 1 (0! is one so 0.9995 is not so bad for only 5 terms).
Problem 1.6 (Random matrix theory):
a)
Okay, let us generate some matrices. I am going to write programs in python
and make them available along with this document. From running this program I do indeed find a repulsion at (λ is the eigenvalue splitting) λ = 0.
In other words, it seems like the difference between eigenvalues of these random(!) matrices cannot be zero. This is pretty surprising since there is no
correlation between the matrix entries at all.
b)
In the following we will be working in the N = 2 GOE ensemble (matrices
of size 2 × 2). We will not do anything analytically yet, this comes later. For
now, let us just see where this level repulsion comes from. Now, a GOE matrix is formed from Gaussian random variables. Indeed, the process is (Xi ’s
are Gaussian random variables with zero mean and unit standard deviation):
X1 X2
M=
X3 X4
Then, we form the GOE matrix by adding M to its transpose:
2X1
X2 + X3
a b
MGOE =
=
X3 + X2
2X4
b c
The eigenvalues γ of the GOE matrix are:
a−γ
b = (a − γ)(c − γ) − b2 = 0
b
c−γ ⇒ γ 2 − (a + c)γ + (ac − b2 ) = 0.
The equation has two solutions–the two eigenvalues. We will denote one
eigenvalue + and the other −. It is a simple quadratic equation so we get:
p
(a + c) ± (a + c)2 − 4 × 1 × (ac − b2 )
γ± =
2×1
a + c 1p 2
a + c 1p
=
±
a + c2 + 2ac − 4ac + 4b2 =
±
(c − a)2 + 4b2
2
2
2
2
If we define d = (c − a)/2 we get:
γ± =
a+c p 2
± d + b2 .
2
Now, take the difference between the eigenvalues to form the splitting. The
first term is the same for both eigenvalues so it cancels:
p
λ ≡ γ+ − γ− = 2 d2 + b2 .
Thus, we see that the trace (the sum of the diagonal elements) is irrelevant.
What matters is the difference between the diagonal elements.
Eventually what we seek in this exercise is a probability density over λ. But
first, we must understand how λ is formed. We see from the result above
that it depends on b and d. Since b is formed from multiplying a Gaussian
random variable by 2 it is still a Gaussian random variable with mean zero.
The standard deviation changes, but call it σb for now. The same goes for d.
It is formed as the difference between two Gaussian random variables which
is still a Gaussian random variable with mean zero but changed standard
deviation. Dividing it by two does not change the mean from zero but further alters the standard deviation. In any event we shall call this σd . This
all helps us to get the form of the probability density of generating both b
and d:
1 b2
d2
ρM (d, b) ∝ exp −
+
2 σb2 σd2
We want a distribution over λ. But we found earlier that (λ/2)2 = b2 + d2 ,
where the left side tells us that the radius is λ/2. Thus, the eigenvalue splittings form circles in the (b, d) plane (that is, a particular eigenvalue splitting
λ lives on the circumference of a circle in the (b, d) plane). This should motivate us to switch to polar coordinates where the radius alone will then tell
us about the splitting (the angle will be irrelevant due to the isotropy). The
radius r is λ/2. So we ask: the pair (d, b) gives us a particular eigenvalue
splitting λ, but how does this splitting change as we change b and d? In
particular we will be interested in knowing what happens as both b and d
approach zero. So think of the Cartesian (b, d) plane. As we move around in
this plane the eigenvalue splitting changes. Now choose a particular point in
this plane. We ask: how does moving a tiny bit along b and a tiny bit along
d (thus forming a tiny little area) alter the eigenvalue splitting (or, how does
this tiny little region transform to changes in eigenvalue splittings)? The
answer will allow us to let (b, d) go to zero and we can see what happens.
The probability of obtaining the area in the (b, d) plane is ρM (d, b)d(b)d(d),
so how does that change the splitting probability ρM (λ)dλ of level splittings
from λ to λ + ∆ (and we don’t care about the angular part in the final
result but there should be a dθ term with θ being the angle from the b-axis
to the radius vector). The answer is in the Jacobian which is simply the
radius (going from Cartesian to polar coordinates is standard, this Jacobian
is well-known) and thus:
1 b2
d2
ρ(λ)dλ ∝ λ exp −
+
.
2 σb2 σd2
This indeed shows us that the eigenvalue splitting λ vanishes as λ → 0 (because the Gaussian stays finite for all λ but the linear term in λ multiplying
it vanishes).
c)
Let us compute analytically the standard deviations of the diagonal and offdiagonal elements. In particular, I ask, what is the probability of generating
random Gaussian numbers so as to obtain a symmetric matrix with diagonal
elements a and c and off-diagonal elements b. Again, to create a GOE matrix
we start with any matrix with gaussian random numbers as each element:
X1 X2
M=
X3 X4
Then, we form the GOE matrix by adding M to its transpose:
2X1
X2 + X3
a b
MGOE =
=
X3 + X2
2X4
b c
First, the Xi ’s followed a Gaussian with mean zero and standard deviation
of one. Then, let us start with b. It was formed as b = X2 + X3 . Adding
two Gaussians each of mean zero leaves a Gaussian still of mean zero but
the standard deviation changes as:
q
√
2 + σ2 =
σb = σX
2.
X3
2
Now, a = 2X1 and hence it is a mean zero Gaussian but its standard deviation changes due to the multiplication by 2 as follows:
σa = 2σX1 = 2.
Same thing for c:
σc = 2σX4 = 2.
Let us now show that σb = σd . d was formed as d = (c − a)/2. Let us
consider what happens just by doing the transformation k = c − a. Forming
the difference of a and c leaves a Gaussian of mean zero but the standard
deviation changes:
p
√
√
σk = σc2 + σa2 = 4 + 4 = 2 2.
Now, finally, we want to form d from k by dividing by 2. The mean stays at
zero, but the standard deviation halves:
√
√
1
2 2 = 2.
σd = σk =
2
And thus we see that
σb = σd =
√
2.
QED.
Let us plot H11 and H12 from the N = 2 GOE ensemble. We know that H11
should follow a Gaussian
of standard deviation 1/2 and H12 a Gaussian of
√
standard deviation 2.
d)
We will
the distribution we found in part a). We use σb =
√ start off with
2
σd = 2 and that λ /4 = b2 + d2 :
1 2
1 2
2
ρ(λ)dλ ∝ λ exp − b + d
= λ exp − λ
4
16
We can normalize the probability density and get:
1 2
λ
ρ(λ) = exp − λ .
8
16
This is the probability distribution of eigenvalue spacings λ. Now let us
rescale this to have zero mean. First, let us compute the mean in its current
form:
Z λ=∞
Z λ=∞
√
λ
1 2
hλi =
dλλρ(λ) =
dλλ exp − λ = 2 π.
8
16
λ=0
λ=0
So, let us work on the equation with the goal of defining a new variable
√
s = λ/2 π:
Z λ=∞ √
λ
λ
λ
1 2
√
√
d
4π exp − λ = 2 π
8
16
2 π
2 π
λ=0
Z s=∞
s
π dss π exp − s2
⇒
=1
2
4
s=0
So we have scaled the variable from λ to s to make sure the mean is one.
The distribution in eigenvalue spacings which has a mean of 1 is the Wigner
surmise given by:
πs
πs2
ρWigner (s) =
.
exp −
2
4
e)
I implemented the plots in the python program. It does indeed seem like
the distribution of eigenvalues does not depend on the GOE (the plots look
similar no matter the size of the matrices making up the GOE).
f)
For N = 2 the distribution seems different than what we have seen before.
For N = 4 it starts looking like the GOE distribution (so yes, it does start
looking more universal). We do indeed notice a spike at zero as the footnote
says. This is because of equal columns (which is highly likely at low N but
vanishes rapidly with larger N ).
For N = 10 it resembles the GOE distribution a lot. Plotting the Wigner
surmise is seen to fit the histogram well.
So this does provide more evidence that the eigenvalue distribution is universal!
g)
First, let us show that the trace is the sum of squares of all elements of H.
I will do this using index notation. A general matrix A in index notation is
written Aij . The product of two matrices is AB = Aij Bjk (match on index
of inner indices). The trace of a product AB is Aij Bji (notice the match of
outer indices as well). The transpose is (AT )ij = Aji . With these rules we
have:
Tr HH T = Tr Hij (H T )jk = Hij (H T )ji = Hij Hij .
Now let us show that the trace stays invariant under orthogonal coordinate
transformations (H → RT HR, so H T → RT H T R). Notice that RT R =
RRT = 1:
Tr H T H → Tr (RT H T R)(RT HR)
= Tr RT H T HR = Tr RRT H T H = Tr H T H ,
where we used the cyclic invariance of the trace.
h)
The probability density of generating a GOE matrix H is similar
to what
√
we have done before. We use that σa = σc = 2 and that σb = 2. Then, we
need to generate three numbers to create the matrix (3 and not 4, since it
is symmetric). Each number is independent of the others so the probability
breaks into a product of probabilities (the “and" rule):
a2
c2
b2
ρ(H) ∝ exp − 2 exp − 2 exp − 2
2σa
2σc
2σb
2
2
2
1 a
c
b
= exp −
+
+
2 4
4
2
1
= exp − a2 + c2 + 2b2
8
1
T
= exp − Tr(H H) ,
8
where we used that the trace of H T H is the sum of squares of all elements of
H and since H12 = H21 = b we get twice b2 in the the expression of the trace.
This expression is invariant to orthogonal transformations, that is, the following holds: ρ(H) → ρ(RT HR) due to our result from part g).
The point is that the original matrices in the ensemble did not have rotational symmetries, but the ensemble does start showing symmetries as
N → ∞. This is an emergent symmetry of the system. The same thing happens in random walks on a lattice (take enough steps and the shape starts
showing symmetries that aren’t there for the detailed valid steps).
In particular, consider a square lattice. Only steps along the horizontal
or vertical are allowed. But as we take enough steps suddenly rotational
symmetry (for any angle) develops in the pattern of steps taken. But the
square lattice has no rotational symmetry for any angle (only specific ones,
multiple of 90 degrees) so where does this symmetry come from? It emerges
on the macro-scale. So many little (non-symmetric, or at least lower symmetric) microscopic phenomena can add up to some macroscopic behavior
with symmetries (or at least more symmetries).
Problem 1.7 (Six Degrees of Seperation):
a)
I wrote a Python object called Network with the given functions. There is
one attribute of the class: a dictionary which holds nodes as keys and each
key/node has a list of values which are nodes that it connects to. Thus, the
edges are essentially stored as key-value pairs. My implementation is the
following:
1
2
3
import numpy as np
import sys
from matplotlib . pylab import plt
4
5
6
7
class Network :
def __init__ ( self ) :
self . neighbor_dict = {}
8
9
10
11
def AddNode ( self , node ) :
if not self . HasNode ( node ) :
self . neighbor_dict [ node ] = []
12
13
14
15
16
17
18
def AddEdge ( self , node1 , node2 ) :
if node1 == node2 :
return # A node is connected to itself
already
self . AddNode ( node1 )
self . AddNode ( node2 )
d = self . neighbor_dict
19
20
21
22
23
if not node2 in d [ node1 ]:
d [ node1 ]. append ( node2 )
if not node1 in d [ node2 ]:
d [ node2 ]. append ( node1 )
24
25
26
def HasNode ( self , node ) :
return ( node in self . neighbor_dict )
27
28
29
30
def GetNodes ( self ) :
# return nodes as a list ( keys does this )
return self . neighbor_dict . keys ()
31
32
33
34
35
36
def GetNeighbors ( self , node ) :
# the neighbors are stored as a list
# so simply return them :
return self . neighbor_dict [ node ][:]
37
38
def __str__ ( self ) :
return str ( self . neighbor_dict )
It is pretty much self-explanatory also in conjunction with the problem text.
I think the most non-trivial is how to store the nodes and the edges. One
could also have used an adjacency matrix, but for large systems the dictionary approach is sparse: we only store what we need, nothing more, nothing
less. The matrix approach would have a whole bunch of zeros in it (for large
L and small Z and p which is typical).
Using the above class (from part a)) I wrote a function which constructs
a small world network. All it does is: adds all the short edges, then it adds
the random edges. The short edges are just added by going to each node
in the graph (represented as a key in the dictionary remember) and adding
that nodes neighbors as values to that key entry (by appending: the values
are lists so we just append each neighbor to the list associated with the given
node/key). The random edges are added just by choosing two nodes in the
network at random and joining them with an edge. The code is:
1
2
3
4
5
6
7
def AddShortEdges (g , L , Z ) :
for node in xrange (0 , L ) :
for edge in xrange ( - int ( Z /2.) , int ( Z /2.) ) :
if edge ==0:
continue
g . AddEdge ( node ,( node + edge ) % L ) # periodic
bc 's
return g
8
9
10
11
12
13
14
def AddRandomEdges (g , L , Z , p ) :
existingNodes = g . GetNodes ()
for i in xrange ( int ( np . ceil ( L * Z * p *0.5) ) ) :
nodes = np . random . choice ( existingNodes , size =2 ,
replace = True )
g . AddEdge ( nodes [0] , nodes [1])
return g
15
16
17
18
19
20
21
22
23
# Routine to construct a small world network
# based on Network class :
def Co n s tr u ct S m al l Wo r ld ( L =10 , Z =4 , p =0.) :
assert L > Z and L >0 and Z >0 and p >=0.
g = Network ()
g = AddShortEdges (g ,L , Z )
g = AddRandomEdges (g ,L ,Z , p )
return g
So, e.g., to construct a small small network we just call:
1
2
3
4
if __name__ == ' __main__ ':
# construct a small world :
L , Z , p = 1000 , 2 , 0.02
g = C on s tr u c tS m al l Wo r l d (L , Z , p )
This worked. Then I downloaded and imported Prof. Sethna’s plotting tool
for these networks. I simply extended the above code to:
1
2
3
4
if __name__ == ' __main__ ':
# construct a small world :
L , Z , p = 1000 , 2 , 0.02
g = C on s tr u c tS m al l Wo r l d (L , Z , p )
5
6
7
import NetGraphics as ng
ng . D is pl ayC ir cl eGr ap h ( g )
and a nice graph comes up similar to the one in the book. Now, I needed to
install PIL for python which I simply did using “port" (I am on a MacBook):
sudo port install py26-pil. Also, to run my code above you need numpy and
matplotlib, these can be similarly obtained from “port".
b)
(1)
I implement the FindPathLengthsFromNode(graph, node) function. I implemented the algorithm proposed in the book. I used a dictionary to do
this: each key is a node in the graph (different than the incoming node) and
the corresponding values are the distances from the incoming node to that
particular node in the graph. This is the code I got (I am not implementing
error checking on the input):
1
2
3
4
5
6
7
8
9
10
11
12
13
def F i n d P a t h L e n g t h s F r o m N o d e ( graph , node ) :
# makes little sense if node is not in graph :
if not graph . HasNode ( node ) :
return
l =0 # distance to self is zero
currentShell =[ node ]
distances ={}
while 1:
nextShell =[]
for cur_node in currentShell :
for cur_neigh in graph . GetNeighbors (
cur_node ) :
# distance from node to cur_neigh
stored already ?
if ( not cur_neigh in distances ) and (
cur_neigh != node ) :
14
15
16
17
18
19
20
# no , so we should investigate this
new node " cur_neigh "
nextShell . append ( cur_neigh )
distances [ cur_neigh ] = l +1
l += 1
currentShell = nextShell [:]
if len ( currentShell ) ==0: # we have seen all
nodes in graph
return distances
(2)
I then implemented FindAllPathLengths(graph). Again I use a dictionary
to store the distances as values with keys being a string of the form “[0,4]"
meaning: node pair 0 and 4. Using a “min" and “max" call I make sure that
I don’t store both [0,4] and [4,0] (the graph is undirected so these should be
the same). I then return the values of this dictionary at the end (because I
need to return a list, not a dictionary). The code is:
1
2
3
4
5
6
7
8
9
def F in dA llP at hL eng th s ( graph ) :
dist ={}
for node in graph . GetNodes () :
distance_dict = F i n d P a t h L e n g t h s F r o m N o d e ( graph ,
node )
for neigh in distance_dict :
key = ' [% d ,% d ] ' % ( min ( node , neigh ) , max (
node , neigh ) )
if not key in dist :
dist [ key ] = distance_dict [ neigh ]
return dist . values ()
I did verify that the lengths are constant for 0 < l < L/2 (my program shows
the histogram when you run it, try setting p = 0 and check for yourself).
The distribution of distances in the network looks like a Gaussian. For small
p (0.02) the Gaussian is centered at around 40. For large p (0.2 – more long
bonds between far-apart nodes) it is centered around 12. The 6 is within 1-2
standard deviations. I needed around p = 0.75 to center the Gaussian at 6
(and thus obtain six degrees of separation). We might not need to exactly
center it at 6 (e.g., we could require that 6 is within 1 standard deviation of
the mean) so a lower p could be possible as well. (3)
FindAveragePathLength(graph) is straightforward to implement given that
we can find all path lengths between all pairs of nodes. So we simply find
all these lengths (we already wrote this function in part (2)) and return the
mean:
1
2
def F i n d A v e r a g e P a t h L e n g t h ( graph ) :
nodes = graph . GetNodes ()
3
4
5
6
7
8
9
count , total = 0 , 0.
for node in nodes :
distances = F i n d P a t h L e n g t h s F r o m N o d e ( graph ,
node )
for node2 , dist in distances . items () :
count += 1
total += dist
return total / count
And that is it. When I do this on the (L, Z, p) = (100, 2, 0.1) network
4 times I get these values: (8.86343434343, 9.27575757576, 9.26888888889,
10.0268686869). So there are indeed some fluctuations, but the value is
around 10 (mean is 9.36) as the problem text says, so things look fine. The
standard deviation is 0.42.
The amount of long bonds in the system is roughly equal to the number
of random bonds we add (why “roughly": well a random bond does not
have to be long per se, but typically ends up being substantially longer
than a short edge simply by chance). This amounts to pLZ/2 = 0.1 ×
100 × 2/2 = 10 long bonds, or 10 %. With 10 bonds in a network of 100
nodes there should be fluctuations in the distances yes. Of course, the larger
the value of p the more long bonds we have and hence these fluctuations
should decrease. I tested this prediction by choosing p = 0.5 and I got:
4.56666666667, 4.7096969697, 4.73616161616, 4.75535353535). We see that
these fluctuations are much smaller, the standard deviation being 0.074 (the
mean is 4.69 and has decreased because of the increase in long bonds in the
system connecting far-apart nodes).
c)
Now we would like to plot the average path length between nodes l(p) divided
by l(0). We find a graph similar to ref. 142 Fig. 2 (from the text book): at
large p (many long bonds) the average distance is small but it gets larger and
larger as we decrease it (meaning: we get rid of more and more long bonds).
It then goes to a constant 1 at very low p. This is because we are getting
rid of all the long bonds. What is left? Just the non-random short-edge
bonds with fixed lengths (we only have short-edge bonds when p = 0, no
random long bonds). Some discrepancies can be due to the randomness of
the addition of random bonds in the network. I did notice that sometimes
an edge is added which is either already there or connects the node to itself.
d), e), and f)
I implemented this as part of my code. Another version of the code is online
at Prof. Sethna’s website (there are answers to exercise 1.7) free for download.
Problem 1.8 (Satisfactory Map Colorings):
a)
Let us assume that each Boolean variable has some color. The logical expression that A is colored with only a single color is this:
¬(AR ∧ AG ) ∧ ¬(AR ∧ AB ) ∧ ¬(AG ∧ AB )
from left to right this states that: A is NOT both red AND green. AND
A is NOT both red AND blue. AND A is NOT both green and blue. Can
it be all colors simultaneously as in red, green, and blue? No, because this
can be broken into smaller pieces (it has to be both red and green for this to
happen, e.g.). This leaves just one color for A to have (remember we assume
it has to be colored).
Next, that A and B cannot have the same color is represented as:
¬(AR ∧ BR ) ∧ ¬(AG ∧ BG ) ∧ ¬(AB ∧ BB )
We see how the above forms conform to the hint given in the problem text:
they are both a conjunction of three clauses each involving two variables.
b)
Writing
out all the cases we get the following table:
X
1
1
0
0
Y
1
0
1
0
¬(X ∧ Y )
¬(1 ∧ 1) = ¬(1) = 0
¬(1 ∧ 0) = ¬(0) = 1
¬(0 ∧ 1) = ¬(0) = 1
¬(0 ∧ 0) = ¬(0) = 1
(¬X) ∨ (¬Y )
¬(1) ∨ ¬(1) = 0 ∨ 0 = 0
¬(1) ∨ ¬(0) = 0 ∨ 1 = 1
¬(0) ∨ ¬(1) = 1 ∨ 0 = 1
¬(0) ∨ ¬(0) = 1 ∨ 1 = 1
We see that columns 3 and 4 give the same output for all logical cases of X
and Y and thus, they are indeed equivalent.
Let us re-write the answer to part a) in conjunctive normal form (which,
by the way, means: and AND of OR’s). Starting with "A is colored a single
color" we get (starting from the result in part a)):
¬(AR ∧ AG ) ∧ ¬(AR ∧ AB ) ∧ ¬(AG ∧ AB )
Now, we know from above that ¬(X ∧ Y ) = (¬X) ∨ (¬Y ), so we can write:
[(¬AR ) ∨ (¬AG )] ∧ [(¬AR ) ∨ (¬AB )] ∧ [(¬AG ) ∨ (¬AB )] ,
which is in conjunctive normal form.
Next, consider the statement that A and B are not the same color. In
part a) we got:
¬(AR ∧ BR ) ∧ ¬(AG ∧ BG ) ∧ ¬(AB ∧ BB ),
which is straightforward to also change to conjunctive normal form (we just
did this):
[(¬AR ) ∨ (¬BR )] ∧ [(¬AG ) ∨ (¬BG )] ∧ [(¬AB ) ∨ (¬BB )] .