Introduction to Markov Chains Learning goal: Students see the tip of the iceberg of Markov chain theory. Many situations can be modeled as a number of discrete states, where at fixed time intervals the system switches from one state to another with a fixed probability. Our peoplemoving-into-and-out-of-California system is one such. Another such is the “gambler’s ruin” where two gamblers, with initial bankrolls of $A and $B each bet a dollar on the flip of a coin. They keep this up until one gambler has all the other’s money. What is the probability that each will win—and how long will it take, on average? Such processes are called Markov processes or chains. We number the states from 1 to n (e.g. s1 = “in California”, s2 = “outside CA”, or s1 = gambler one has all the money, s2 = gambler one has A + B – 1 and gambler two has $1, etc.). We define pij to be the probability of passing from state i to state j. So we create the transition matrix P = (pij). For the gambler’s ruin, the ⎛ 1 0 0 0 ! 0 ⎞ ⎜ 1/ 2 0 1/ 2 0 ! 0 ⎟ ⎜ ⎟ 0 1/ 2 0 1/ 2 ! 0 ⎟ ⎜ matrix will look like . The usual situation gives us a ⎜ 0 0 1/ 2 0 ! 0 ⎟ ⎜ " " " " # " ⎟ ⎜ ⎟ 0 0 0 ! 1 ⎠ ⎝ 0 probability vector v that gives the initial probability of being in each state. The standard is to have this be a row vector. Then the probability vector after one time step is vP. The probability of moving from state i to state j in two steps is (P2)ij, etc. We ask several questions: 1. In the long term, what is the probability of being in each state? There may be absorbing states, from which you can’t escape (one gambler has all the money, e.g.). In these cases: 2. If there are several absorbing states, what is the probability of ending in each? 3. On average, how long until we are absorbed? 4. On average, how many times do we visit any particular state before being absorbed? 1. 2. 3. 4. To get started we notice several important things about the transition matrix. All of its entries are non-negative, because they are probabilities. Each row adds to one (since the probability is one that from state i you will go somewhere on your next turn!) P has an eigenvalue of one. For since all rows add to one, (1, 1, …, 1) is an eigenvector of eigenvalue one. No eigenvalue of P is larger than one in modulus. For let λ be an eigenvalue (which might be complex) with corresponding eigenvector x. Let xk be the largest component n of x (in modulus). Then since Ax = λx, we have λ xk = ∑ akj x j , so j=1 λ xk = λ xk = n ∑a j=1 kj n n j=1 j=1 x j ≤ ∑ akj x j ≤ ∑ akj xk = xk , so |λ| ≤ 1. Here’s where we break the theory into three parts. One is where there is complete “mixing” of all the states. That is, there is some positive probability of getting from any state to any other. This means that some positive power of the matrix P has all positive entries. Such a matrix is called “positive” or “irreducible.” The second case is where some state or states are “absorbing” in the sense that once you get there you can never get out. The row in P corresponding to such a state is just a one on the diagonal and all zeroes elsewhere. The third case is all others—things like “inaccessible” states, or groups of states that can’t get to each other, or loops that absorb. These last cases can sometimes be analyzed by simplifying to one of the former cases, but their complete analysis is much more complex. The first case is also called the ergodic case. Let P be a positive matrix and v a nonnegative vector (all entries ≥ 0). Then vP is positive. Similarly, if Pn is positive, then so is every larger power. There is the following theorem, called the Perron-Frobenius Theorem: If A is a positive matrix, then A has a unique eigenvalue of largest modulus, which is real and positive. It has algebraic multiplicity one. Furthermore, its eigenvector is positive, and no other eigenvector is non-negative. In our case, the Perron-Frobenius eigenvalue is λ = 1. We have already proven that |λ| ≤ 1, so it would remain to show that we must have λ = 1. But if P had another eigenvalue of modulus one, then some power of P would have an eigenvalue of modulus one and negative real part. Then Pn – εI has an eigenvalue whose modulus is larger than one. But it is still positive, has all rows adding to 1 – ε, so by the argument above cannot have an eigenvalue larger than one in modulus. There can be no other eigenvectors other than (1, 1, …, 1) for λ = 1. For if w is such, we can choose ε so that z = (1, 1, …, 1) – εw is positive except for one entry which is zero. But then z is an eigenvector of eigenvalue one, yet Pz has all positive entries, so it cannot be an eigenvector at all! Thus 1 is the only eigenvalue of modulus one, and it has geometric multiplicity one also. Could it have larger algebraic multiplicity? If it did, then the triangular matrix obtained from P from Schur’s lemma would look like ⎛ 1 a * * ⎞ ⎛ 1 ka * * ⎞ ⎜ ⎜ 1 * * ⎟ 1 * * ⎟ ⎜ ⎟ and the kth power of it would be ⎜ ⎟ . Since this entry grows * * ⎟ * * ⎟ ⎜ ⎜ ⎜⎝ ⎜⎝ * ⎟⎠ * ⎟⎠ without bound, there is no way for the matrix to stay with all entries ≤ 1. (The general case of Perron-Frobenius is a bit harder to prove—we had the advantage that all rows added to the same thing!) Take any matrix, A. Take any unit vector v0. Create the sequence vn+1 = Avn / ||Avn||. This creates a sequence of unit vectors (unless v0 is in the nullspace of some power of A). In most cases, this sequence will converge to an eigenvector of A corresponding to the largest (in modulus) eigenvalue. This is because we could express v0 as a linear combination of eigenvectors, and as we multiply by higher and higher powers of A, the smaller eigenvalues become insignificant. This is known as the power method for finding eigenvectors. In our case, since the largest eigenvalue (1) is algebraic multiplicity one, this sequence will converge. If we start with a non-negative vector v0, PTv0 will be positive, so our PerronFrobenius eigenvector will be positive. What does all this mean? It means that if you start with any probability distribution, and do the Markov process long enough, it will converge on the eigenvector of λ = 1 for the distribution. Thus Pn converges to a matrix all of whose rows are this eigenvector, and this eigenvector is the “steady state” probability of being in each state. Of course, it can most easily be found by finding a null vector of PT – I. Now what about absorbing Markov processes? The first thing to do is to re-order all the ⎛ Q R ⎞ states so that the absorbing states come last. Then the Markov matrix has the form ⎜ ⎟. ⎝ 0 I ⎠ Q is a non-negative matrix that gives the probabilities of transitioning from one transient state to another, R is a non-negative matrix that gives the probabilities of transitioning from each transient state to one of the absorbing states, and I is the identity, because you don’t move from the absorbing states. It turns out Q will help us answer all our questions. First, the probability that, if we start in state i we are in state j after exactly n steps is the ij-entry of Qn. That means the expected number of visits to state j, given that we started in state i is the ij-entry of I + Q + Q2 + Q3 + !. But this infinite series converges! And it happens to converge to (I – Q)-1 = N. The expected number of steps before being absorbed (when starting in state i) is the ith entry of N1, where 1 is the vector of all ones. This is because each entry of N1 is simply the sum of the expected number of times we visit each state before being absorbed. Finally, the probability of being absorbed in absorbing state k, given that you start in starting state i is the ik-entry of NR.
© Copyright 2025 Paperzz