Name: Course: Chapter: Jesper Toft Kristensen Spring 2014 1 Problem 1.1 (Quantum Dice): Let us do some quantum mechanics via a game of dice. The die will not have 6 sides as is often encountered in real life, but we will make have 3 sides. So it shows either 1 spot, 2 spots, or 3 spots. So this is all we think of right now. A die with 3 sides. Now we can play 3 different games. One game is called "distinguishable" another is "bosons" and the last is "fermions". Each game has its own rules. What is the meaning of this game? Of course, each game corresponds to considering a different kind of particle. E.g., a fermion is a type of particle which behaves in a way similar to the game "fermion" (not too surprising). What do I mean by "behaves similarly"; well each die represents one of 3 states (remember it has 3 sides) in which we can put one fermion. So when we roll die number 1 it corresponds to placing a fermion in either of 3 states. Say we roll a 1. Then we put one fermion in state 1. Now we roll die number 2. If it turns out to also show a 1 we cannot put the fermion in state 1 as we are suppose to because there is already a fermion occupying that state. Hence, we must re-roll the die until we get either 2 or 3 spots showing on the die and we put the second fermion in the state 2 or 3, respectively. So we see how the number of sides simply corresponds to the number of states in the system we can put particles in. The number of valid rolls corresponds to the number of particles we want to put into these states and the rules of the game govern how this is done. Pretty neat. a) Let’s roll Let us play bosons and role a die twice. This game dictates that the second roll (when I say this I mean "the number of spots on the die") has to be equal to or greater than the first roll. First of all, does this problem even have a solution? We can quickly see it does since the die can take values 1, 2, or 3, so, e.g., 1+3 is 4 which is a valid set of tosses. Ok, but of course we are not done. We have found one possible way of making a 4. Let us list all allowed moves and add up the spots, this gives us the answer immediately: roll 1 roll 2 sum 1 1 =2 1 2 =3 1 3 =4 2 2 =4 2 3 =5 3 3 =6 We see that there are 2 rolls giving a sum of 4. The answer is in the form (number of ways to get 4) divided by (total number of ways of getting anything). By making the above list the information is staring at us. E.g., the total number of ways of getting any sum of spots is of course just the number of valid tosses (we don’t care about what the spots sum up to). That is simply the number of rows in the above list: 6. The number of ways of getting a sum of 4 we see from the third column is 2. Thus, we have the result 2 1 = = 33 %. 6 3 So, on average, every third two-rolls gives a sum of 4. Incidentally, we see from the list that there is something special about "4". It is the only number to occur twice in the list. Every other sum occurs with just 1 ≈ 17 % 6 probability. This state is indeed special because if we consider an odd number of states (that is, an odd number of sides in the die) the sum of energies we get from putting two particles in the same state–the center state–can also be obtained by putting one particle in the lowest state and the other in the highest state. We can form yet another state with the same energy if we move the lowest state particle one state up and the highest state one state down, thus maintaining the sum value, and continue this until we reach the center state. No other state has this feature. It is really due to the symmetry of the state: there are as many levels below it as above it. We can quantify how many states will have a given energy. We can immediately get how many other arrangements n of the particles give rise to the same energy (we can think of the sum of the die spots as the total energy of the system if that is what each level represents but it does not have to be energy. The important thing is that we lose no generality, hopefully just gaining clarity at the least). We do this: count how many other states are below N↑ and above N↓ . Take the minimum of this and add 1 (the state itself) to this: n = min(N↑ , N↓ ) + 1 Notice that, N↓ + N↑ = N − 1 (the minus one from not counting the state itself) so we can write n = min(N↑ , (N − 1) − N↑ ) + 1 Due to symmetry, the largest number is obtained for the center state which has N↓ = N↑ = (N − 1)/2 + 1 = (N + 1)/2. So if N = 3 then the center state is 2 and the sum is 4. We have n = (3 + 1)/2 = 2 states which give 4. Good, that is what we saw above. Now, if we have 5 states then the center is at 3 and gives an energy of 6. There are (5 + 1)/2 = 3 such states, and all other states have fewer combinations of particles giving rise to the same energy. What happens, you may ask, when the number of states is even instead of odd? Well, then there is no longer a single center state. Instead, you have the two center states (e.g., with 4 states in total these are states 2 and 3) playing the role of the single center state above (i.e., these 2 states share the feature that we can find most particle combinations giving the energy of one of the states compared to all other states), but you also have one more: any state which has an energy which is the sum of these two center states is also going to occur as frequent. Thus, we actually have 3 states in total, instead of 1 for the odd case, that occur most often in this system. Of course, each of those 3 states have a different energy than any other. b) There is just a single state allowed if we are playing fermions with three-sided dice and tossing 3 times. The physical meaning is that we are trying to put 3 fermions in 3 quantum states. A quantum state cannot be occupied by more than one fermion so the only option is to put a fermion in each of the 3 states thus giving a sum of 1+2+3=6. This is the only option, and hence ρ(6) = 100 %. This is the case for electrons. One can show that, for the game "Bosons" where each die has N sides and we roll M dice we get (N + M − 1)! N +M −1 = M M !(N − 1)! legal turns. c) In distinguishable, all turns are legal. What is the number of legal turns? Well, each die can show any of its N sides on any of the M rolls. E.g., we can roll a 112, 411, 816 (depending on N and M of course, but the point is that there is no restriction on what integer is allowed in the later rolls given the first ones). Hence, we must have that there are N M legal turns: each placeholder (die toss) can take N values, and we toss the die M times. The number of turns which have all faces showing the same value (e.g., 111), also called an M -tuple, is M in both Bosons and Distinguishable. Since they are the same in both games and since we are getting rid of some states in Bosons (e.g., 211 is not allowed) but not in Distinguishable it must be more likely to roll a state with all faces showing the same value! For three states (N =3) and 3 rolls (M = 3) we have that in distinguishable there are 27 legal rolls (N M = 33 ) of which 3 are M -tuples. In Bosons we have 10 legal turns of which also 3 are M -tuples. Thus, the enhancement factor (how many more M -tuples we get in Bosons compared to Distinguishable) is: 3 27 10 = 2.7. = 3 10 27 If we had rolled just two dice the enhancement would have been 1.5 so we see that, the more states we allow the larger is the effect of the Bosons to pile up in the M -tuple states (all particles in the same quantum state). More general: for distinguishable the ratio of generating an M -tuple to any other state is: M , NM and for Bosons it is M (N +M −1)! M !(N −1)! , So let us divide the Boson case with the Distinguishable case to get the enhancement factor of M -tuples in Bosons: M (N +M −1)! M !(N −1)! ⇒ M NM NM (N +M −1)! M !(N −1)! Since the numerator is larger than the denominator it is indeed an enhancement. So, as we stated earlier, the enhancement comes from the fact that there are more legal turns in Distinguishable than in Bosons and the number of M -tuples are unchanged between the two games. Problem 1.2 (Probability distributions): a) The probability is found from counting the number of ways to obtain a number in the given range to the total number of possible numbers to obtain in the entire renege −∞ < x < ∞. Now, the problem is not discrete per se. This is the number of ways to get a number in the desired range: Z x=0.75 Z x=0.75 0.75 = 0.75 − 0.70 = 0.05 = 5%. dx = [1]0.70 ρuniform (x)dx = x=0.70 x=0.70 Now to the exponential decay. We need to find the probability that the time is larger than 2τ : Z t=∞ Z t=∞ 1 ρexponential (t)dt = exp(−t/τ )dt = [− exp(−t/τ )]∞ 2τ t=2τ t=2τ τ = 0 − (− exp(−2)) = exp(−2) = 13.5%. Let us look at the Gaussian distribution and find the probability that my score on an exam is larger than 2σ above the mean: Z v=∞ Z v=∞ 1 √ ρGaussian (v)dv = exp(−v 2 /2σ 2 )dv. 2πσ v=2σ v=2σ Introduce the variable y = v/σ, then v = 2σ in the integration limit turns into y = 2 (the upper limit goes to infinity, so is unchanged) and the differential goes from dv to σdy which makes the σ’s cancel, and we can write Z y=∞ 1 √ exp(−y 2 /2)dy. 2π y=2 One way to perform this integral is to convert it to polar coordinates but the problem text gives us the answer: Z y=∞ 1 √ exp(−y 2 /2)dy ≈ 2.3%. 2π y=2 b) It is important to have normalized probability distributions. It simply means that the probability of obtaining some value is 100 % as it should be. Not more, not less. So are they normalized? Z x=∞ Z x=1 ρuniform (x)dx = dx = 1. x=−∞ x=0 How did the integration range change from an infinite range to the finite range? Simply do to the fact that the uniform is zero outside the range 0 ≤ x < 1. We can find the mean as: Z x=∞ Z x=1 1 2 1 1 x0 = xρuniform (x)dx = xdx = x = , 2 2 x=−∞ x=0 0 which should make sense: the distribution takes the same value from zero to one. So the mean should be right in the middle: at a half. Of course, if the distribution took larger values to the left of the mean than to the right the mean should be shifted to lower values of x: x0 < 1/2 but for this problem it makes sense! Visualizing the distribution always helps in convincing yourself that the mean is correct (in this case it is rather simple of course). Now let us try the exponential. Is it normalized? Note that it is non-zero in the range zero to infinity: Z t=∞ Z t=∞ 1 ρexponential (t)dt = exp(−t/τ )dt = [− exp(−t/τ )]∞ 0 τ t=−∞ t=0 = 0 − (−1) = 1. What is the mean (which we could call x0 )? The answer is Z t=∞ x0 = Z y=∞ tρexponential (t)dt = τ t=−∞ y exp(−y)dy. y=0 One way to do this integral is to consider some other function of some variable a: Z y=∞ f (a) = τ exp(−ay)dy. y=0 By comparison, is it not true that the integral we want to solve is df (a) x0 = − . da a=1 Now what is the benefit of this? Well, we can easily find f (a) by doing the integral of just the exponential: f (a) = τ , a and then take the derivative of the result: τ df (a) x0 = − = − − 2 = τ. da a=1 a a=1 Finally, consider the Gaussian: Z v=∞ Z ρGaussian (v)dv = v=∞ 1 exp(−v 2 /2σ 2 )dv. 2πσ v=−∞ v=−∞ √ We can then do the following. Introduce y = v/ 2σ, and write Z y=∞ Z y=∞ √ 1 1 2 √ √ exp(−y 2 )dy. exp(−y ) 2σdy = π 2πσ y=−∞ y=−∞ √ One way to do this is to convert to polar coordinates, first multiply by the same integral again, and take the square root to undo this action (thus leaving everything unchanged): Z y=∞ 1/2 Z x=∞ 1 2 2 √ exp(−y )dy exp(−x )dx π y=−∞ x=−∞ Z x=∞ Z y=∞ 1/2 1 2 2 √ = exp(−(x + y ))dxdy π x=−∞ y=−∞ In polar coordinates r2 = x2 + y 2 (r is the distance from the origin to any point in the plane) and the differentials are rdrdφ where φ is the angle between the radius vector pointing from the origin to some given point in the plane and the x-axis. Thus, the integral becomes (it is still an integral over the plane just in other coordinates than before): = φ=2π 1 √ π Z 1 √ π Z 2π Z r=∞ 2 1/2 exp(−r )rdrdφ φ=0 r=0 r=∞ r=0 1/2 1 2 , exp(−r ) d(r ) 2 2 notice how we used the fact that the integrand was independent of φ so it became easy to do (it was simply 2π). Also, we changed from dr to d(r2 ) = 2rdr, to get the same differential variable as is in the exponential. Now we will rename r2 as z (notice that the integration limits do not change with this change): Z z=∞ 1/2 1 √ π exp(−z)dz π z=0 1 1/2 = √ (π [− exp(−z)]∞ 0 ) π = 1. So the Gaussian is normalized. Now let us find the mean (and say we have already done the variable change): Z v=∞ Z v=∞ 1 x0 = vρGaussian (v)dv = v√ exp(−v 2 /2σ 2 )dv. 2πσ v=−∞ v=−∞ The Gaussian in v is an even function over the integration domain: it will give some non-zero value. But, the even function is multiplied by v which is an odd function in the variable v. The product of an even and an odd function is odd. But integrating an odd function over the entire domain yields zero. So x0 = 0. We now look into the standard deviation of each distribution. This quantity gives us some intuition as to how much we can expect variables to vary drawn from the distribution. E.g., let us take the exam results as a case in mind. If the standard deviation on the Gaussian (which describes the exam results of all students) is huge, then it means that: picking any student at random she will have a score typically very different from the mean of the Gaussian. Note that these are just intuitive answers, in reality it can get much more complicated, also if the distribution is multimodal etc. But the above feel for the standard deviation suffices for now. Starting with the uniform (we use the knowledge of the mean x0 of 1/2 we just found): s sZ Z x=1 x=∞ 1 2 2 σ0 = (x − x0 ) ρuniform (x)dx = x− dx 2 x=−∞ x=0 v v s uZ x= 1 u 1 u u 1 2 2 1 1 1 1 1 t t 2 3 = y dy = y = + 3 =√ = √ . 3 3 3 2 2 12 2 3 y=− 12 − 12 So the standard deviation is almost 1/4, but not quite, it is slightly larger. Now let us try the exponential. The mean, which we will need, was x0 = τ . sZ t=∞ (t − x0 )2 ρexponential (t)dt σ0 = t=−∞ s Z t=∞ = t=0 s = τ2 Z 1 (t − τ )2 exp(−t/τ )dt τ z=∞ (z − 1)2 exp(−z)dz z=0 where I made the substitution z = t/τ . Now, let us move the integration domain to make the polynomial multiplying the exponential a bit nicer (and, in principle define a new integration variable, but I will just call it z again): s Z z=∞ σ0 = τ z 2 exp(−z)dz exp(−1) z=−1 We know the routine by now: define some function f (a) = and then we have: Z z=∞ d2 f (a) z 2 exp(−z)dz = da2 a=1 z=−1 R∞ z=−1 exp(−az)dz, The trick is that we know what f (a) is: ∞ 1 1 f (a) = − exp(−az) = exp(a) a a z=−1 1 df (a) exp(a) exp(a) = f (a) 1 − ⇒ = − da a a2 a 2 d f (a) 1 f (a) df (a) ⇒ 1− + 2 . = 2 da da a a The first term vanishes at a = 1, and the second term reduces to simply exp(1), so: p σ0 = τ exp(−1) exp(1) = τ. The standard deviation for the exponential is the same as its mean! Another interpretation of the standard deviation (for some functions) is the distance over which the function changes (sort of a characteristic scale defining the function). So, for the exponential, the average value occurs right where the function changes. Let us look now at the Gaussian (where the mean x0 is zero): sZ sZ v=∞ v=∞ 1 σ0 = (v − x0 )2 ρGaussian (v)dv = v2 √ exp(−v 2 /2σ 2 )dv. 2πσ v=−∞ v=−∞ A change of variable is in place z = v/σ: s Z z=∞ σ0 = σ2 1 z 2 √ exp(−z 2 /2)dz. 2π z=−∞ We can keep going (by defining our usual function f (a), etc.), or use the hint from the book that this integral is just one, so: σ0 = σ. c) We know that we are drawing two numbers from a uniform in [0, 1) and adding them up. The resulting distribution of the sum z = x + y will be large for some particular value of the sum z1 if there are many ways z1 can occur. Indeed, if x + y can never become -1 then z should be zero there. If, on the other hand, the sum can become 1 it should be non-zero there. But if it can obtain the value 1 in more ways than it can obtain the value 1.5 then it better be larger at 1 than at 1.5. First, the sum can take all values in the range [0,2). The lowest value is obtained when x = y = 0. The largest value when both x and y are very close to 1. In this example there is something special about the value 1 because it lies right in the middle of the range of the values of the sum. It is the value which can be obtained more times than any other. Indeed, consider how many ways we can obtain 0.2, e.g.: x can be zero and y = 0.2. Then, we can increase x while decreasing y until x = 0.2 and y = 0 – and I am thinking about splitting the axis into tiny little bins so that a count actually makes sense. So, essentially, the count becomes “how many times we can decrement x from 0.2 until it reaches zero (while simultaneously incrementing y from 0 to 0.2)". But with this logic there should be more counts for x + y = 0.4. And indeed there is because there are more of these little bins from 0.4 to zero than from 0.2 to zero. Also, x + y = 0.6 has more counts than x+y = 0.4. This keeps going until x+y = 1 which has the most counts. But what about x + y = 1.2, e.g. Well, how do we obtain this value? Start x close to 1 and y at 0.2. Then, decrement x all the way back to 0.2 (so we have decremented by 0.8 values, this is less than when x + y = 1 where we can decrement by 1.0 values) and y is incremented all the way to 1. So you see that this actually has the same counts as when x + y = 0.8 (in which case x is also changed over a range of 0.8 units/bins). More generally then, the sum which is t units above 1 should be the same as the sum t units below 1. So the graph is symmetric about x + y = 1. If drawing this in a graph with x + y on the x-axis and the distribution of the sum on the y-axis, one obtains a triangle with a base in [0,2] on the x-axis and a height in [0,1]. If we cut the triangle at its height we can overlay the left side with the right side (imagine turning one side over), so it is symmetric about the height. So, more generally, the question we need to answer to get the distribution of z = x + y, called ρ(z), where x and y follow distributions ρ1 (x) and ρ2 (y), respectively, is this: to get the value ρ(z1 ) we need to find out how many ways the value z1 can be obtained from the two distributions of the parts of the sum. Probability theory tells us how to get this. Let z1 = x1 + y1 , then we see that, if we pick some value x1 from ρ1 (·), then we can only use a value drawn from ρ2 (·) if it results in z1 . In other words we need to draw: ρ2 (z1 − x1 ). Generally, this becomes: Z ρ(z) = dxρ1 (x)ρ2 (z − x) d) First, we see from the velocity distribution that it can be factored in each dimension and thus, we don’t have to solve this problem in 3D. We can simply solve it along, say, x and that is the answer. Kinetic energy is KEx = M vx2 /2 so the mean of this quantity is: r Z M vx2 M M vx2 hKEx i = dvx . exp − 2 2πkT 2kT So the kinetic energy seems to be related to the standard deviation of the Gaussian distribution of velocities. With a variable change the answer bep comes obvious. Define z = vx M/kT , then √ ! r Z vx M 1 M vx2 kT M vx2 √ hKEx i = d exp − 2 kT 2π 2kT kT 2 Z kT 1 z kT = dzz 2 √ exp − = , 2 2 2 2π where we used the hint about the integral being one. Thus, the mean kinetic energy along a particular dimension is indeed kT /2. The answer is the same for each of the other dimensions. The probability that the velocity takes a particular value v = |v| is found as follows. The size of the velocity vector forms a sphere in velocity space (no dimension is special over any other so the space better be isotropic). The volume between v and v + dv is then the surface area of the sphere with radius v, which is 4πv 2 , times dv. The total probability is this volume times the probability density throughout the volume which is the Gaussian distribution, so (let d(Vol) be the volume in velocity space just discussed): 1 p(v) = d(Vol)ρ(vx , vy , vz ) = 4πv 2 dv exp(−M v 2 /2kT ) (2π(kT /M ))3/2 4π 1 v 2 3 exp(−v 2 /2σ 2 )dv = 3/2 σ (2π) r 2 2 v = exp(−v 2 /2σ 2 )dv. π σ3 To get the density from this total probability simply leave out the differential (that is, the volume element dv, which is indeed one-dimensional if considering the distribution in the scalar v–the length of the velocity vector): r 2 v2 ρ(v) = exp(−v 2 /2σ 2 ). π σ3 All we have done is to find the distribution of the length of a velocity vector (which is one-dimensional: the length is just a scalar) starting from the distribution of its individual components which was Gaussian (and threedimensional: one component for each dimension). The resulting distribution just obtained is a Maxwellian distribution. Problem 1.3 (Waiting times): a) The average number of passing cars in 1 hour is found as (the times t and τ are measured in minutes): Z t=60 average # of cars in 1 hour = t=0 dt 60 = = 12. τ 5 b) I will assume that the 10 minute interval is given as [0, 10). In other words, the starting point t = 0 is included, but not the end point at exactly 10 minutes. If both points were included I could get a case with n = 3 busses included. If not, then I always see 2 busses no matter what interval I choose. Thus 1, n = 2 Pbus (n) = 0, o.w. For the cars we can, in principle, observe 10, 100, 1000, or even a million cars in the time interval (of course, given that τ = 5 minutes it should be unlikely observing 1000 cars in 10 minutes, although, in principle, possible). So we have to approach this differently. So what we do is to imagine setting up some robot by the roadside which has a little detector to detect cars passing by. The detector can be opened for a small period of time and it then closes again. Think of it as a shutter, like on a camera. Say it is open for 1 minute detecting everything it can and then it closes again. This means that, in 10 minutes, it is open 10 times. But, each time it is open it can either detect 1 car or no car, not multiple cars. So a 1 minute open time seems like a bad idea. Why? because the probability of a car arriving in a 1 minute time interval is 1 in 5 or 20 % so it will occur that multiple cars arrive in the same time interval more often than we want. This would provide a very poor resolution. So let us better the robot. How about we decrease the shutter time to 1 second. Now we open the little detector for 1 second. If a car passes in this time interval we record it. If multiple cars arrive we can only detect one of them. So we can do better. We can let the detector time approach zero. Thus, we can open-close-open-close-....-close many many many times in a second. Now, the road is such that cars can’t move next to each other. It is a 1-dimensional road. So you see, that the smallest shutter time we need is related to the largest velocity of the cars. If the cars move extremely fast then there is a chance that two cars will pass in a 1-second interval and so on. However, we don’t know the largest velocity. Let us define some rate λ of cars. It is defined by how many times we open the shutter times the probability that a car arrives each time the shutter is open. Let us say that in the 10 minutes we open the shutter N times. Then λ = N dt/τ . In other words, this is our expected number of cars in the interval. Notice that N and dt are related: by using this formula for λ we are saying that there are N intervals of length dt. We cannot make dt smaller without also changing N . It is a little dangerous talking about infinitesimal times (dt) mixed with finite intervals (N shutter intervals), but the arguments would still apply if we called the arrival probability ∆t and took the limit. We can write that dt/τ = λ/N . Now, how do we write down the probability of observing n cars in the given time interval? Well, starting with finite shutter intervals the probability is binomial: out of N trials (shutter open-close times) what is the probability of getting n successes (arrival of cars)? The answer is (look up the binomial distribution if in doubt): n N dt dt N −n Pcar (n) = 1− . n τ τ Why the N choose n factor in front? Well, it is because there are many different ways (exactly N choose n ways) of obtaining n successes in N trials. In particular, we don’t care if a car arrived during shutter time 2 compared to another car arriving at shutter time 8. This leaves two ways of getting 2 cars: a red car arrives at time 2 and a blue car at time 8 OR the blue car arrived at time 8 and the red car at time 2. We don’t care about the car color (we don’t label the cars) and so we can get the same result (arrival of n cars) in many different ways (N choose n to be exact). Let us write this in terms of λ: n N λ λ N −n Pcar (n) = 1− n N N n N −n λ N! 1 λ = exp − n! (N − n)! N n N n λ N! 1 n = exp (−λ) exp −λ n! (N − n)! N n N where we used that, for small enough x we can let 1 − x ≈ exp(−x). First, what happens to the last exponential factor as N approaches infinity (that is, the shutter opens and closes for infinitely small time intervals) and n remains unchanged? It becomes one. Also, let us look at the following factor from the product above and try to reduce it: 1 N! (N − n)! N n = = N (N − 1)(N − 2) · · · 1 1 (N − n)(N − n − 1) · · · 1 N n N (N − 1)(N − 2) · · · (N − n + 1) Nn How many factors do we have in the numerator? We have n: (N − 0) is the first, then (N − 1) is the second. This keeps going until N − (n − 1) which is the nth (NOT the (n − 1)th, this is because we start counting at zero). But we also have n factors of N in the denominator. So we can write: N (N − 1) (N − 2) (N − n + 1) ··· → 1. N N N N You see how each term approaches 1 as N goes to infinity. Thus, we can write the binomial, in the limit of N → ∞ as: Pcar (n) = λn exp(−λ), n! which is not called “a Binomial in the limit of very large N ", but a Poisson distribution. We recall that λ is the expected number of cars per time unit (the rate). c) We know that, if we have just observed a bus, it will be exactly 5 minutes until the next bus. Therefore, the time interval ∆ between busses is fixed at one number. To write a real function, defined for all values ∆, we need the delta Dirac function to single out the single number we are talking about. In particular: ρbus (∆) = δ(∆ − 5), where we work in units of minutes (that is, ∆ is the time between busses in minutes). What is the mean of this distribution? Let us find out Z ∆=∞ Z ∆=∞ h∆i = d∆∆ρbus (∆) = d∆∆δ(∆ − 5) = 5 minutes. ∆=0 ∆=0 So notice how the time between busses is 5 minutes. The mean time we just computed means that: if we arrive at any random time, the mean time to the next bus is 5 minutes. Now, for the cars we first think of splitting the time interval ∆ into many tiny little slivers each of size dt. Let us say we split it into N slivers. Then, for some time interval ∆ we have ∆ = N dt. Since the slivers are of fixed size a larger(smaller) ∆ means a larger(smaller) value of N . Now, in order to answer the question of the probability for another car to arrive given that we have just observed one we need this to happen: in the N slivers no car arrive, but in the (N + 1)th sliver a car arrives. Thus, the total probability of observing the next car after time ∆, given that we have just observed one, is (and we use 1 − dt/τ ≈ exp(−dt/τ )): dt 1 N Pcar (∆) = exp(−dt/τ ) = exp(−∆/τ )dt τ τ So the density (leave out the volume dt) becomes: ρcar (∆) = 1 exp(−∆/τ ), τ which is an exponential distribution. Is it normalized? Yes it is (we already checked this back in Problem 1.2): Z ∆=∞ ∆=0 d∆ [ρcar (∆)] = [− exp(−∆/τ )]∆=∞ ∆=0 = 1. What is its mean? Well, it is an exponential distribution so we did this in an earlier problem (Problem 1.2), and found that Z ∆=∞ h∆icar = ρcar (∆)∆d∆ ∆=0 = τ = 5 minutes. d) So now another observer arrives at some random time. What is the probability distribution to the next bus? Well, let us introduce the exact time t she arrives. Since the busses come at regular time intervals we can just focus on the interval between two busses (periodicity means that any other interval is the same). This means that we have the conditional probability: given that the observer arrives at time t what is her waiting time? Well, if she arrive at t = 0 then there is 5 minutes until the next bus. But if she arrives at t = 2.5 then there is 2.5 minutes and so on. So we know the exact waiting time given that we know when she arrived. The entire interval is of length τ . So if she arrives at t there is τ − t minutes until the next bus arrives. Therefore, we have the conditional distribution: Pbus (∆|t) = δ(∆ − (τ − t)). Remember that we are working within the time zero to τ (that is, t ∈ [0, τ ]). There is no loss in generality in doing this because of the periodicity (busses arrive exactly every 5 minutes). Now, we are interested in the joint distribution Pbus (∆, t). We can get this from the conditional by multiplying by the probability of t: Pbus (t). This is just a uniform: Pbus (t) = 1/τ . So we have: 1 Pbus (∆, t) = Pbus (∆|t)Pbus (t) = δ(∆ − (τ − t)) . τ Now we can get Pbus (∆) by integrating out t: Z t=τ Z 1 t=τ dtPbus (∆, t) = dtδ(∆ − (τ − t)) Pbus (∆) = τ t=0 t=0 This is the probability distribution for waiting times for the next bus. What is the mean waiting time? Z ∆=∞ Z t=τ 1 δ(∆ − (τ − t)) h∆ibus = d∆∆ dt τ ∆=0 t=0 How are we going to do this integral? Well, it is not so bad. Look, what happens when ∆ becomes larger than τ ? Then the inner integral is always zero because there won’t be any t for which the delta function is non-zero. Thus, we can restrict the ∆ integration to the range [0, τ ]: Z ∆=τ Z t=τ 1 h∆ibus = d∆∆ dt δ(∆ − (τ − t)), τ ∆=0 t=0 In this interval there is always some value of t satisfying the inner delta function. This means that, for every choice of ∆ there is one match of t. And thus we can get rid of the t integration: Z 1 1 2 τ 1 ∆=τ d∆∆ = τ = = 2.5 minutes. h∆ibus = τ ∆=0 τ 2 2 So, on average, you only have to wait half the interval between busses. That is, if you arrive at random every day then you only wait 2.5 minutes on average for the next bus. This seems reasonable. Due to symmetry it should be in the center of the interval as we can arrive at any point at random in this interval (if, for some reason, maybe by looking at the schedule, we arrived closer to the next bus well then of course the average waiting time should be less, but that is, mathematically, because the uniform assumption Pbus (t) = 1/τ would be changed). What about for the cars? Well, you see, remember how we derived the probability between arrival of cars? We said: there should be no cars in N slivers and then exactly one car in the next sliver (number N + 1). That argument still applies here. We arrive at some random time at the road. What is the probability of the next car? Well we go through the exact same arguments as before and thus get, again, the exponential distribution: ρcar (∆) = 1 exp(−∆/τ ) τ The mean waiting time to the next car is thus τ = 5 minutes (we already did this calculation many times before). So, as the problem text next explains it seems strange that the mean we computed in part (c) gave us 5 minutes between cars. But now, in part (d) it seems like there is 5 minutes until the next car, and also 5 minutes to the last car (just use the same arguments going back in time). So it seems like the gap in part (d) between cars is more like 10 minutes than 5 minutes? Well, in part (c) we didn’t care about when we arrived at the road, we just said: what is the probability of obtaining a given gap size? This is called a gap average. In part (d) the time of arrival at the road changes the mean. Why? Because it is more likely to arrive in a larger gap than in a smaller one (say 3 cars are arriving. The gap between the first two is 1 second. The gap between the second and third is 2 hours, isn’t it more likely for you to arrive in the larger gap? You could be lucky arriving right before the 1 second gap, but compared to the 2 hours, not really very likely!). So the point is that the gap will seem larger in the “time average" because we are more likely to arrive in the larger gaps. Let us now find the probability that the second observer, arriving at some random time, will be in a gap of length ∆. This probability should be proportional to ∆: the larger the gap the larger the probability of arriving in it. If a gap takes 10 years then we are very likely to arrive in it compared to another one taking 1 minute. But obtaining a 10 year gap in the first place is exponentially unlikely. So we sense the battle between two opposing forces: the probability of arriving in gap ∆ increases with the gap size, but, at the same time, it becomes less and less likely to get a gap of a large size (in other words, we know that the cars arrive, on the average, every 5 minutes, so a gap of 10 years should never be observed in practice. Of course, if τ was on the order of 10 years it would be likely to get this gap size, but that is a different problem). So the probability is the product of the probability of arriving in ∆ and that of having ∆ be a possibility in the first place. In other words, the probability is found by answering: what is the probability that the interval ∆ exists AND that we arrive in this interval (the “and" hints that this is the product rule of probability theory, since they are independent events). The answer is thus: ∆ exp(−∆/τ ). τ ρtime car (∆) ∝ It must be normalized (call the normalization constant A): Z ∆=∞ 1= ∆=0 ρtime car (∆) Z ∆=∞ = A ∆=0 ∆ exp(−∆/τ ) = Aτ. τ So we see that A = 1/τ , and thus, the distribution is: ρtime car (∆) = ∆ exp(−∆/τ ). τ2 What is now the mean gap size? Z ∆=∞ Z z=∞ Z ∆=∞ ∆2 time d∆ 2 exp(−∆/τ ) = τ dzz 2 exp(−z) d∆∆ρcar (∆) = τ ∆=0 z=0 ∆=0 = 2τ = 10 minutes. So, indeed, when arriving at some random time the time average of the gaps between cars is 10 minutes, not 5. This is consistent with what we found earlier, namely that: the mean to the next car is 5 minutes, and that to the previous car is also 5 minutes, so a total of 10 minutes should be the average gap (when arriving at any random time). Problem 1.4 (Stirling’s approximation): R P This is how to convert a sum into an integral: dk ↔ k δk, where δk is the distance on the x-axis between measurements of the function. In our case, we are summing over the integers. So the distance from one integer to the next (say, from 3 to 4) is just 1. And thus δk = 1. But, the interval is centered on the integers. So, e.g., the first interval is centered at 1 (the first index in the sum) and thus goes from 0.5 to 1.5. The next interval, centered at 2, goes from 1.5 to 2.5, and so on. The final interval is centered on n and goes from n − 0.5 to n + 0.5. Thus, the integral extends from 0.5 to n + 0.5 (and not 1 to n). Therefore, we have: Z k= n k=n X 2 log(k) ↔ dk log(k) k= 12 k=1 1 log n + n+ 2 1 n+ log n + 2 = = 1 1 1 1 1 − n+ − log + 2 2 2 2 2 1 1 1 − n − log , (1) 2 2 2 which is what we had to show first. Now let √ us show that the difference between this and the following (n! ≈ n (n/e) 2πn in log form): log(n!) ≈ n log(n) − n log(e) + 1 1 log(2π) + log(n). 2 2 (2) First, let us re-write Eq. (1) as: 1 1 1 1 1 1 n log(n) + n log 1 + + log(n) + log 1 + − n − log 2n 2 2 2n 2 2 1 1 1 1 1 → n log(n) + + log(n) + − n − log 2 2 4n 2 2 where, in the second line, we used that log(1 + 1/2n) ∼ 1/2n (up to terms of order 1/n2 ). Then, we compare each term above to the terms in Eq. (2) (remember we are taking the difference between Eq. (1) and Eq. (2)). First, see that the terms n log(n) cancel. Next, since log(e) = 1 we see that the terms with n cancel as well. Finally, the terms 1/2 log(n) cancel as well and, of course, the 1/2n goes to zero. This leaves the following expression for the difference “Eq. (1)-Eq. (2)": 1 1 1 1 Eq. (1) − Eq. (2) → + log(2) − log(2π) = (1 − log(π)) , 2 2 2 2 which is indeed a constant. So, as n approaches infinity the difference between the two expressions approaches a constant value. Thus, they are compatible. The constant does not have to be zero as they are two different approximations, but they better not diverge from each other or else one must be wrong. Now let us show that the following expression: (2π/(n + 1))1/2 exp(−(n + 1))(n + 1)n+1 is equivalent to the latter expression in the book, that is, Eq. (2) here. Let us do this, simply by writing out this new expression in log form: → = 1 1 log(2π) − log(n + 1) − (n + 1) + (n + 1) log(n + 1) 2 2 1 1 log(2π) − log(n) − (n + 1) + n log(n) + 1 + log(n) 2 2 1 1 log(2π) − n + n log(n) + log(n) 2 2 (3) Again, if using log(e) = 1 in Eq. (2) we can match Eq. (2) and Eq. (3) term by term. Thus, upon taking their difference it becomes zero. This means that they are compatible as well. Problem 1.5 (Stirling and asymptotic series): a) Let us be given Γ(z) as described in the problem text. What is special about the negative integers compared to the positive ones? Well, first, let us say we want to compute Γ(2) (that is, z = 1), then we do: Γ(2) = 1 × Γ(1) = 1. It terminates at Γ(1) because that is one (it is the normalization). So, if we pick any positive number eventually we reduce the situation to computing Γ(1). Now, let us try z = −2, e.g.: Γ(−1) = −2 × Γ(−2) = −2 × (−3 × Γ(−3)) = · · · , so we see that, for any negative integer m the value of the factorial function becomes a product over all negative integers from m to negative infinity. There is no stopping point anymore like Γ(1) before. This leaves a singularity at all negative integers. b) The singularities were at all negative integers when expressed in z. Now, with the change of variable ζ = 1/z the poles are at −1/2, −1/3, and so on. The series, expressed in ζ, is an expansion about the origin ζ = 0 in the complex plane (because that corresponds to z → ∞). Thus, for larger and larger negative integers we approach zero more and more (e.g., the pole −1/1000 is close to zero). Therefore, the radius of convergence, measured from the origin, is zero. c) Let us now show this explicitly. It turns out that the odd coefficients grow asymptotically as: A2j+1 ∼ (−1)j 2(2j)! . (2π)2(j+1) I will use the equal sign from now on, but with the understanding that it is asymptotically. As is mentioned in the footnote, the radius of convergence is s A2j−1 A2j+1 To get A2j−1 we can just let k = j − 1, then 2(2k)! (2π)2(k+1) 2(2j − 2)! = (−1)j−1 . (2π)2j A2k+1 = A2j−1 = (−1)k The radius of convergence is then: v v s (2j−2)! (−1)j−1 2(2j−2)! u u u u 2j A2j−1 u 2π 1 u (2π) (2π)2j =p = t → → 0 as j → ∞. A2j+1 = t 2(2j)! (2j)! j 2j(2j − 1) (−1)j (2π)2(j+1) (2π)2j+2 So we see that the radius of convergence does indeed go to zero as we include more and more terms for some fixed z. d) We can compute 0! by computing Γ(1) since Γ(1) = (1 − 1)! = 0!. Thus, we are considering z = 1 in the series expansion. This means that 1.3 from the problem text becomes: √ 1 139 571 1 + − − + ··· . 2π exp(−1) 1 + 12 288 51840 2488320 We can include more and more terms. Starting at just including "1" (which I call A1 ), then including "1" and "1/12" (called A2 ) and so on, we get: A1 → 0.92214 A2 → 0.99898 A3 → 1.00218 A4 → 0.99971 A5 → 0.99950. Considering that we expand about infinity (z → ∞) the series is doing really well at z = 1 (0! is one so 0.9995 is not so bad for only 5 terms). Problem 1.6 (Random matrix theory): a) Okay, let us generate some matrices. I am going to write programs in python and make them available along with this document. From running this program I do indeed find a repulsion at (λ is the eigenvalue splitting) λ = 0. In other words, it seems like the difference between eigenvalues of these random(!) matrices cannot be zero. This is pretty surprising since there is no correlation between the matrix entries at all. b) In the following we will be working in the N = 2 GOE ensemble (matrices of size 2 × 2). We will not do anything analytically yet, this comes later. For now, let us just see where this level repulsion comes from. Now, a GOE matrix is formed from Gaussian random variables. Indeed, the process is (Xi ’s are Gaussian random variables with zero mean and unit standard deviation): X1 X2 M= X3 X4 Then, we form the GOE matrix by adding M to its transpose: 2X1 X2 + X3 a b MGOE = = X3 + X2 2X4 b c The eigenvalues γ of the GOE matrix are: a−γ b = (a − γ)(c − γ) − b2 = 0 b c−γ ⇒ γ 2 − (a + c)γ + (ac − b2 ) = 0. The equation has two solutions–the two eigenvalues. We will denote one eigenvalue + and the other −. It is a simple quadratic equation so we get: p (a + c) ± (a + c)2 − 4 × 1 × (ac − b2 ) γ± = 2×1 a + c 1p 2 a + c 1p = ± a + c2 + 2ac − 4ac + 4b2 = ± (c − a)2 + 4b2 2 2 2 2 If we define d = (c − a)/2 we get: γ± = a+c p 2 ± d + b2 . 2 Now, take the difference between the eigenvalues to form the splitting. The first term is the same for both eigenvalues so it cancels: p λ ≡ γ+ − γ− = 2 d2 + b2 . Thus, we see that the trace (the sum of the diagonal elements) is irrelevant. What matters is the difference between the diagonal elements. Eventually what we seek in this exercise is a probability density over λ. But first, we must understand how λ is formed. We see from the result above that it depends on b and d. Since b is formed from multiplying a Gaussian random variable by 2 it is still a Gaussian random variable with mean zero. The standard deviation changes, but call it σb for now. The same goes for d. It is formed as the difference between two Gaussian random variables which is still a Gaussian random variable with mean zero but changed standard deviation. Dividing it by two does not change the mean from zero but further alters the standard deviation. In any event we shall call this σd . This all helps us to get the form of the probability density of generating both b and d: 1 b2 d2 ρM (d, b) ∝ exp − + 2 σb2 σd2 We want a distribution over λ. But we found earlier that (λ/2)2 = b2 + d2 , where the left side tells us that the radius is λ/2. Thus, the eigenvalue splittings form circles in the (b, d) plane (that is, a particular eigenvalue splitting λ lives on the circumference of a circle in the (b, d) plane). This should motivate us to switch to polar coordinates where the radius alone will then tell us about the splitting (the angle will be irrelevant due to the isotropy). The radius r is λ/2. So we ask: the pair (d, b) gives us a particular eigenvalue splitting λ, but how does this splitting change as we change b and d? In particular we will be interested in knowing what happens as both b and d approach zero. So think of the Cartesian (b, d) plane. As we move around in this plane the eigenvalue splitting changes. Now choose a particular point in this plane. We ask: how does moving a tiny bit along b and a tiny bit along d (thus forming a tiny little area) alter the eigenvalue splitting (or, how does this tiny little region transform to changes in eigenvalue splittings)? The answer will allow us to let (b, d) go to zero and we can see what happens. The probability of obtaining the area in the (b, d) plane is ρM (d, b)d(b)d(d), so how does that change the splitting probability ρM (λ)dλ of level splittings from λ to λ + ∆ (and we don’t care about the angular part in the final result but there should be a dθ term with θ being the angle from the b-axis to the radius vector). The answer is in the Jacobian which is simply the radius (going from Cartesian to polar coordinates is standard, this Jacobian is well-known) and thus: 1 b2 d2 ρ(λ)dλ ∝ λ exp − + . 2 σb2 σd2 This indeed shows us that the eigenvalue splitting λ vanishes as λ → 0 (because the Gaussian stays finite for all λ but the linear term in λ multiplying it vanishes). c) Let us compute analytically the standard deviations of the diagonal and offdiagonal elements. In particular, I ask, what is the probability of generating random Gaussian numbers so as to obtain a symmetric matrix with diagonal elements a and c and off-diagonal elements b. Again, to create a GOE matrix we start with any matrix with gaussian random numbers as each element: X1 X2 M= X3 X4 Then, we form the GOE matrix by adding M to its transpose: 2X1 X2 + X3 a b MGOE = = X3 + X2 2X4 b c First, the Xi ’s followed a Gaussian with mean zero and standard deviation of one. Then, let us start with b. It was formed as b = X2 + X3 . Adding two Gaussians each of mean zero leaves a Gaussian still of mean zero but the standard deviation changes as: q √ 2 + σ2 = σb = σX 2. X3 2 Now, a = 2X1 and hence it is a mean zero Gaussian but its standard deviation changes due to the multiplication by 2 as follows: σa = 2σX1 = 2. Same thing for c: σc = 2σX4 = 2. Let us now show that σb = σd . d was formed as d = (c − a)/2. Let us consider what happens just by doing the transformation k = c − a. Forming the difference of a and c leaves a Gaussian of mean zero but the standard deviation changes: p √ √ σk = σc2 + σa2 = 4 + 4 = 2 2. Now, finally, we want to form d from k by dividing by 2. The mean stays at zero, but the standard deviation halves: √ √ 1 2 2 = 2. σd = σk = 2 And thus we see that σb = σd = √ 2. QED. Let us plot H11 and H12 from the N = 2 GOE ensemble. We know that H11 should follow a Gaussian of standard deviation 1/2 and H12 a Gaussian of √ standard deviation 2. d) We will the distribution we found in part a). We use σb = √ start off with 2 σd = 2 and that λ /4 = b2 + d2 : 1 2 1 2 2 ρ(λ)dλ ∝ λ exp − b + d = λ exp − λ 4 16 We can normalize the probability density and get: 1 2 λ ρ(λ) = exp − λ . 8 16 This is the probability distribution of eigenvalue spacings λ. Now let us rescale this to have zero mean. First, let us compute the mean in its current form: Z λ=∞ Z λ=∞ √ λ 1 2 hλi = dλλρ(λ) = dλλ exp − λ = 2 π. 8 16 λ=0 λ=0 So, let us work on the equation with the goal of defining a new variable √ s = λ/2 π: Z λ=∞ √ λ λ λ 1 2 √ √ d 4π exp − λ = 2 π 8 16 2 π 2 π λ=0 Z s=∞ s π dss π exp − s2 ⇒ =1 2 4 s=0 So we have scaled the variable from λ to s to make sure the mean is one. The distribution in eigenvalue spacings which has a mean of 1 is the Wigner surmise given by: πs πs2 ρWigner (s) = . exp − 2 4 e) I implemented the plots in the python program. It does indeed seem like the distribution of eigenvalues does not depend on the GOE (the plots look similar no matter the size of the matrices making up the GOE). f) For N = 2 the distribution seems different than what we have seen before. For N = 4 it starts looking like the GOE distribution (so yes, it does start looking more universal). We do indeed notice a spike at zero as the footnote says. This is because of equal columns (which is highly likely at low N but vanishes rapidly with larger N ). For N = 10 it resembles the GOE distribution a lot. Plotting the Wigner surmise is seen to fit the histogram well. So this does provide more evidence that the eigenvalue distribution is universal! g) First, let us show that the trace is the sum of squares of all elements of H. I will do this using index notation. A general matrix A in index notation is written Aij . The product of two matrices is AB = Aij Bjk (match on index of inner indices). The trace of a product AB is Aij Bji (notice the match of outer indices as well). The transpose is (AT )ij = Aji . With these rules we have: Tr HH T = Tr Hij (H T )jk = Hij (H T )ji = Hij Hij . Now let us show that the trace stays invariant under orthogonal coordinate transformations (H → RT HR, so H T → RT H T R). Notice that RT R = RRT = 1: Tr H T H → Tr (RT H T R)(RT HR) = Tr RT H T HR = Tr RRT H T H = Tr H T H , where we used the cyclic invariance of the trace. h) The probability density of generating a GOE matrix H is similar to what √ we have done before. We use that σa = σc = 2 and that σb = 2. Then, we need to generate three numbers to create the matrix (3 and not 4, since it is symmetric). Each number is independent of the others so the probability breaks into a product of probabilities (the “and" rule): a2 c2 b2 ρ(H) ∝ exp − 2 exp − 2 exp − 2 2σa 2σc 2σb 2 2 2 1 a c b = exp − + + 2 4 4 2 1 = exp − a2 + c2 + 2b2 8 1 T = exp − Tr(H H) , 8 where we used that the trace of H T H is the sum of squares of all elements of H and since H12 = H21 = b we get twice b2 in the the expression of the trace. This expression is invariant to orthogonal transformations, that is, the following holds: ρ(H) → ρ(RT HR) due to our result from part g). The point is that the original matrices in the ensemble did not have rotational symmetries, but the ensemble does start showing symmetries as N → ∞. This is an emergent symmetry of the system. The same thing happens in random walks on a lattice (take enough steps and the shape starts showing symmetries that aren’t there for the detailed valid steps). In particular, consider a square lattice. Only steps along the horizontal or vertical are allowed. But as we take enough steps suddenly rotational symmetry (for any angle) develops in the pattern of steps taken. But the square lattice has no rotational symmetry for any angle (only specific ones, multiple of 90 degrees) so where does this symmetry come from? It emerges on the macro-scale. So many little (non-symmetric, or at least lower symmetric) microscopic phenomena can add up to some macroscopic behavior with symmetries (or at least more symmetries). Problem 1.7 (Six Degrees of Seperation): a) I wrote a Python object called Network with the given functions. There is one attribute of the class: a dictionary which holds nodes as keys and each key/node has a list of values which are nodes that it connects to. Thus, the edges are essentially stored as key-value pairs. My implementation is the following: 1 2 3 import numpy as np import sys from matplotlib . pylab import plt 4 5 6 7 class Network : def __init__ ( self ) : self . neighbor_dict = {} 8 9 10 11 def AddNode ( self , node ) : if not self . HasNode ( node ) : self . neighbor_dict [ node ] = [] 12 13 14 15 16 17 18 def AddEdge ( self , node1 , node2 ) : if node1 == node2 : return # A node is connected to itself already self . AddNode ( node1 ) self . AddNode ( node2 ) d = self . neighbor_dict 19 20 21 22 23 if not node2 in d [ node1 ]: d [ node1 ]. append ( node2 ) if not node1 in d [ node2 ]: d [ node2 ]. append ( node1 ) 24 25 26 def HasNode ( self , node ) : return ( node in self . neighbor_dict ) 27 28 29 30 def GetNodes ( self ) : # return nodes as a list ( keys does this ) return self . neighbor_dict . keys () 31 32 33 34 35 36 def GetNeighbors ( self , node ) : # the neighbors are stored as a list # so simply return them : return self . neighbor_dict [ node ][:] 37 38 def __str__ ( self ) : return str ( self . neighbor_dict ) It is pretty much self-explanatory also in conjunction with the problem text. I think the most non-trivial is how to store the nodes and the edges. One could also have used an adjacency matrix, but for large systems the dictionary approach is sparse: we only store what we need, nothing more, nothing less. The matrix approach would have a whole bunch of zeros in it (for large L and small Z and p which is typical). Using the above class (from part a)) I wrote a function which constructs a small world network. All it does is: adds all the short edges, then it adds the random edges. The short edges are just added by going to each node in the graph (represented as a key in the dictionary remember) and adding that nodes neighbors as values to that key entry (by appending: the values are lists so we just append each neighbor to the list associated with the given node/key). The random edges are added just by choosing two nodes in the network at random and joining them with an edge. The code is: 1 2 3 4 5 6 7 def AddShortEdges (g , L , Z ) : for node in xrange (0 , L ) : for edge in xrange ( - int ( Z /2.) , int ( Z /2.) ) : if edge ==0: continue g . AddEdge ( node ,( node + edge ) % L ) # periodic bc 's return g 8 9 10 11 12 13 14 def AddRandomEdges (g , L , Z , p ) : existingNodes = g . GetNodes () for i in xrange ( int ( np . ceil ( L * Z * p *0.5) ) ) : nodes = np . random . choice ( existingNodes , size =2 , replace = True ) g . AddEdge ( nodes [0] , nodes [1]) return g 15 16 17 18 19 20 21 22 23 # Routine to construct a small world network # based on Network class : def Co n s tr u ct S m al l Wo r ld ( L =10 , Z =4 , p =0.) : assert L > Z and L >0 and Z >0 and p >=0. g = Network () g = AddShortEdges (g ,L , Z ) g = AddRandomEdges (g ,L ,Z , p ) return g So, e.g., to construct a small small network we just call: 1 2 3 4 if __name__ == ' __main__ ': # construct a small world : L , Z , p = 1000 , 2 , 0.02 g = C on s tr u c tS m al l Wo r l d (L , Z , p ) This worked. Then I downloaded and imported Prof. Sethna’s plotting tool for these networks. I simply extended the above code to: 1 2 3 4 if __name__ == ' __main__ ': # construct a small world : L , Z , p = 1000 , 2 , 0.02 g = C on s tr u c tS m al l Wo r l d (L , Z , p ) 5 6 7 import NetGraphics as ng ng . D is pl ayC ir cl eGr ap h ( g ) and a nice graph comes up similar to the one in the book. Now, I needed to install PIL for python which I simply did using “port" (I am on a MacBook): sudo port install py26-pil. Also, to run my code above you need numpy and matplotlib, these can be similarly obtained from “port". b) (1) I implement the FindPathLengthsFromNode(graph, node) function. I implemented the algorithm proposed in the book. I used a dictionary to do this: each key is a node in the graph (different than the incoming node) and the corresponding values are the distances from the incoming node to that particular node in the graph. This is the code I got (I am not implementing error checking on the input): 1 2 3 4 5 6 7 8 9 10 11 12 13 def F i n d P a t h L e n g t h s F r o m N o d e ( graph , node ) : # makes little sense if node is not in graph : if not graph . HasNode ( node ) : return l =0 # distance to self is zero currentShell =[ node ] distances ={} while 1: nextShell =[] for cur_node in currentShell : for cur_neigh in graph . GetNeighbors ( cur_node ) : # distance from node to cur_neigh stored already ? if ( not cur_neigh in distances ) and ( cur_neigh != node ) : 14 15 16 17 18 19 20 # no , so we should investigate this new node " cur_neigh " nextShell . append ( cur_neigh ) distances [ cur_neigh ] = l +1 l += 1 currentShell = nextShell [:] if len ( currentShell ) ==0: # we have seen all nodes in graph return distances (2) I then implemented FindAllPathLengths(graph). Again I use a dictionary to store the distances as values with keys being a string of the form “[0,4]" meaning: node pair 0 and 4. Using a “min" and “max" call I make sure that I don’t store both [0,4] and [4,0] (the graph is undirected so these should be the same). I then return the values of this dictionary at the end (because I need to return a list, not a dictionary). The code is: 1 2 3 4 5 6 7 8 9 def F in dA llP at hL eng th s ( graph ) : dist ={} for node in graph . GetNodes () : distance_dict = F i n d P a t h L e n g t h s F r o m N o d e ( graph , node ) for neigh in distance_dict : key = ' [% d ,% d ] ' % ( min ( node , neigh ) , max ( node , neigh ) ) if not key in dist : dist [ key ] = distance_dict [ neigh ] return dist . values () I did verify that the lengths are constant for 0 < l < L/2 (my program shows the histogram when you run it, try setting p = 0 and check for yourself). The distribution of distances in the network looks like a Gaussian. For small p (0.02) the Gaussian is centered at around 40. For large p (0.2 – more long bonds between far-apart nodes) it is centered around 12. The 6 is within 1-2 standard deviations. I needed around p = 0.75 to center the Gaussian at 6 (and thus obtain six degrees of separation). We might not need to exactly center it at 6 (e.g., we could require that 6 is within 1 standard deviation of the mean) so a lower p could be possible as well. (3) FindAveragePathLength(graph) is straightforward to implement given that we can find all path lengths between all pairs of nodes. So we simply find all these lengths (we already wrote this function in part (2)) and return the mean: 1 2 def F i n d A v e r a g e P a t h L e n g t h ( graph ) : nodes = graph . GetNodes () 3 4 5 6 7 8 9 count , total = 0 , 0. for node in nodes : distances = F i n d P a t h L e n g t h s F r o m N o d e ( graph , node ) for node2 , dist in distances . items () : count += 1 total += dist return total / count And that is it. When I do this on the (L, Z, p) = (100, 2, 0.1) network 4 times I get these values: (8.86343434343, 9.27575757576, 9.26888888889, 10.0268686869). So there are indeed some fluctuations, but the value is around 10 (mean is 9.36) as the problem text says, so things look fine. The standard deviation is 0.42. The amount of long bonds in the system is roughly equal to the number of random bonds we add (why “roughly": well a random bond does not have to be long per se, but typically ends up being substantially longer than a short edge simply by chance). This amounts to pLZ/2 = 0.1 × 100 × 2/2 = 10 long bonds, or 10 %. With 10 bonds in a network of 100 nodes there should be fluctuations in the distances yes. Of course, the larger the value of p the more long bonds we have and hence these fluctuations should decrease. I tested this prediction by choosing p = 0.5 and I got: 4.56666666667, 4.7096969697, 4.73616161616, 4.75535353535). We see that these fluctuations are much smaller, the standard deviation being 0.074 (the mean is 4.69 and has decreased because of the increase in long bonds in the system connecting far-apart nodes). c) Now we would like to plot the average path length between nodes l(p) divided by l(0). We find a graph similar to ref. 142 Fig. 2 (from the text book): at large p (many long bonds) the average distance is small but it gets larger and larger as we decrease it (meaning: we get rid of more and more long bonds). It then goes to a constant 1 at very low p. This is because we are getting rid of all the long bonds. What is left? Just the non-random short-edge bonds with fixed lengths (we only have short-edge bonds when p = 0, no random long bonds). Some discrepancies can be due to the randomness of the addition of random bonds in the network. I did notice that sometimes an edge is added which is either already there or connects the node to itself. d), e), and f) I implemented this as part of my code. Another version of the code is online at Prof. Sethna’s website (there are answers to exercise 1.7) free for download. Problem 1.8 (Satisfactory Map Colorings): a) Let us assume that each Boolean variable has some color. The logical expression that A is colored with only a single color is this: ¬(AR ∧ AG ) ∧ ¬(AR ∧ AB ) ∧ ¬(AG ∧ AB ) from left to right this states that: A is NOT both red AND green. AND A is NOT both red AND blue. AND A is NOT both green and blue. Can it be all colors simultaneously as in red, green, and blue? No, because this can be broken into smaller pieces (it has to be both red and green for this to happen, e.g.). This leaves just one color for A to have (remember we assume it has to be colored). Next, that A and B cannot have the same color is represented as: ¬(AR ∧ BR ) ∧ ¬(AG ∧ BG ) ∧ ¬(AB ∧ BB ) We see how the above forms conform to the hint given in the problem text: they are both a conjunction of three clauses each involving two variables. b) Writing out all the cases we get the following table: X 1 1 0 0 Y 1 0 1 0 ¬(X ∧ Y ) ¬(1 ∧ 1) = ¬(1) = 0 ¬(1 ∧ 0) = ¬(0) = 1 ¬(0 ∧ 1) = ¬(0) = 1 ¬(0 ∧ 0) = ¬(0) = 1 (¬X) ∨ (¬Y ) ¬(1) ∨ ¬(1) = 0 ∨ 0 = 0 ¬(1) ∨ ¬(0) = 0 ∨ 1 = 1 ¬(0) ∨ ¬(1) = 1 ∨ 0 = 1 ¬(0) ∨ ¬(0) = 1 ∨ 1 = 1 We see that columns 3 and 4 give the same output for all logical cases of X and Y and thus, they are indeed equivalent. Let us re-write the answer to part a) in conjunctive normal form (which, by the way, means: and AND of OR’s). Starting with "A is colored a single color" we get (starting from the result in part a)): ¬(AR ∧ AG ) ∧ ¬(AR ∧ AB ) ∧ ¬(AG ∧ AB ) Now, we know from above that ¬(X ∧ Y ) = (¬X) ∨ (¬Y ), so we can write: [(¬AR ) ∨ (¬AG )] ∧ [(¬AR ) ∨ (¬AB )] ∧ [(¬AG ) ∨ (¬AB )] , which is in conjunctive normal form. Next, consider the statement that A and B are not the same color. In part a) we got: ¬(AR ∧ BR ) ∧ ¬(AG ∧ BG ) ∧ ¬(AB ∧ BB ), which is straightforward to also change to conjunctive normal form (we just did this): [(¬AR ) ∨ (¬BR )] ∧ [(¬AG ) ∨ (¬BG )] ∧ [(¬AB ) ∨ (¬BB )] .
© Copyright 2025 Paperzz