MATH 4/5470: MARKOV CHAINS
WINFRIED JUST, OHIO UNIVERSITY
1. Basic examples
A stochastic process is a sequence ξt of random variables, where the index variable t stands for time. If t ∈ [0, ∞) or t ∈ R, then we talk about a continuous
stochastic process; if t ∈ [0, 1, 2, . . .) = N or t ∈ Z, then we talk about a discrete stochastic process. What makes stochastic processes different from considering merely
indexed sets of random variables is that there is some dependence between the
ξt ’s at different times t. In this lecture we will focus only on one kind of discrete
stochastic process called a Markov Chain.
In this type of process, each variable can take values in a (usually finite) set S
of states. Let us look at some examples:
Example 1. Consider the time course of an infection in a population of P humans. At any given time, a person can be healthy (but susceptible to infection),
infected, or recovered. Let us assume that recovery confers permanent immunity to
subsequent infections. One can model the state ξt of the population at a given time
as a stochastic process, where ξt = (St , It , Rt ) and St , It , Rt stand for the number
of susceptible, infected, and recovered individuals at time t. Alternatively, if P is
large, one can model this situation with a system of ODE’s, where the variables
S(t), I(t), R(t) represent the proportions of susceptible, infected, and recovered individuals at a given time.
Example 2. A drunkard walks along a path of length n steps. At each time step,
he makes either one step forward, one step backwards. At position 1.5 of the path,
there is a steep cliff; reaching position 1 is equivalent to falling off the cliff. At the
other end of the path, there is a haystack. If the drunkard reaches position n, he
will stumble into the haystack and fall asleep.
Example 3. Consider a DNA locus. Here the state space S = {a, c, g, t}. Assume
evolution of this locus proceeds in discrete steps (let us say: generations); at each
step the nucleotide at this locus may either be faithfully copied or it may be mutated
into another nucleotide (we ignore insertions and deletions here).
What do all the stochastic processes in these examples have in
common?
• We can conceptualize time t ∈ N as progressing in discrete increments.
• We can conveniently treat S = [n] = {1, 2, . . . , n}.
• At every step, the system may transition into another state (with a certain
probability).
• The transition probabilities at any given time can be expressed by a square
matrix M (t) = [mij (t)]1≤i,j≤n of state transition probabilities.
1
2
WINFRIED JUST, OHIO UNIVERSITY
2. Properties of discrete finite-state stochastic processes
A stochastic process as above is said to have the Markov property or to be a
Markov Chain if the distribution of ξt+1 depends only on the distribution of ξt , not
on prior history.
• The process in Example 1 is a Markov Chain iff it takes at most one time
step to recover from the disease. If it takes exactly k time steps to recover,
this process can be modeled by a k-th order Markov Chain.
• The process in Example 2 is a Markov Chain iff it the probability of making
a step forward is not dependent on whether the previous step was made
forward or backward.
• The process in Example 3 is a Markov Chain iff we assume that mutation
probabilities do not depend on the prior history of mutations at this locus.
Most models of molecular evolution make this assumption.
A Markov Chain is said to be stationary or time-homogeneous if the transition
probability matrix does not depend on time, that is, M (t) = M for all times t.
• The process in Example 1 is stationary in the absence of medical interventions or behavior modifications, but is non-stationary if, for example, an
immunization program is initiated or if people start avoiding contact after
the outbreak of the disease.
• The process in Example 2 is stationary for a while, but as the person sobers
up the transition probability m21 may diminish.
• The process in Example 3 is presumably not stationary due to varying
evolutionary pressures over different periods of evolution. However, we
usually have no specific such information and we model evolution of a locus
as a stationary process in the absence of specific evidence that tells us
otherwise.
A state i of a Markov Chain is said to be an absorbing state iff mii = 1.
• The process in Example 1 has P + 1 absorbing states: (P, 0, 0), (P −
1, 0, 1), (P − 2, 0, 2), . . . , (1, 0, P − 1), (0, 0, P ).
• The process in Example 2 has two absorbing states: 1 and n.
• The process in Example 3 has no absorbing states.
A state i of a Markov Chain is said to be irreducible iff every state can eventually
be reached from every state with positive probability.
• The process in Example 1 is not irreducible. More generally, no nontrivial
Markov Chain with an absorbing state can be irreducible.
• Thus the process in Example 2 is not irreducible. However, if we replace
the cliff and the haystack with two vertical walls, the process becomes
irreducible.
• The process in Example 3 is usually assumed irreducible.
MATH 4/5470: MARKOV CHAINS
3
3. More on the SIR model
The model of Exercise 1 is often called the SIR model. Let us make several
simplifying assumptions:
(1) Infected individuals become infective immediately and will recover with
certainty after exactly one time step.
(2) During any time step, there is a fixed number N of interactions between
any two individuals in the population. Each interaction between an infected
and a susceptible individual may result in a transmission of the disease with
probability α, and these events are independent. Let β = αN .
Under these assumptions, one can treat the total number ηt of transmissions
to one susceptible individual during a given time interval from t to t + 1 as a
Poisson random variable with parameter λ = It N α, at least as long as λ has a
moderate value. I wrote “transmissions” here, because this would count multiple
transmissions of which only the first will cause an actual infection. But note that
this point of view leads to the formula ηt (0) = e−λ = e−It β . This is the probability
that a given susceptible individual will not be infected at time step t + 1.
Now let consider a given state (St , It , Rt ). This can transition into any of the
states
(St , 0, Rt + It ), (St − 1, 1, Rt + It ), . . . , (0, St , Rt + It ).
For simplicity of notation, assume that state (St , It , Rt ) has number i, and state
(St − k, k, Rt ) has number i + k + 1.
Exercise 1. In the notation introduced above, find a formula for mi,i+k+1 .
Modeling the SIR dynamics with a Markov Chain becomes impractical when P
is large. In this case, we can consider continuous variables S(t), I(t), R(t) ∈ [0, 1]
that represent the proportions of susceptible, infected, and recovered individuals
at a given time t ∈ [0, ∞). Let us consider a very small time interval (t, t + dt] so
that the possibility of multiple “infections” of a given susceptible in this interval
is negligible. Then we can assume that for some constant γ > 0 we will see about
γS(t)I(t)(dt) new infections in the interval (t, t + dt]. For simplicity, let us also
assume that an infected individual will spontaneously recover at any given time,
no matter how long ago the infection took place. Note that this is a different
assumption than we made for the Markov Chain model; it allows us to construct
an ODE model. If we were to translate the assumption of the Markov chain model
literally, we would need to work with a system of delay differential equations, which
is conceptually more difficult. Under our modified assumption, during the time
interval (t, t + dt] about δI(t)(dt) individuals will recover. Thus we can model the
dynamics with the following system of ODEs:
dS
= −γSI;
dt
dI
(1)
= γSI − δI;
dt
dR
= δI.
dt
Since we always have R(t) = 1 − (S(t) + I(t)) we may reduce dimension and
consider (1) as a system of only two autonomous DE’s.
4
WINFRIED JUST, OHIO UNIVERSITY
Exercise 2. (a) Find all steady states of (1).
(b) Do a phase-plane analysis for the two-dimensional version of (1), that is, show
all nullclines and direction arrows.
(c) Assuming that I(0) is sufficiently small and γ > δ, show that I(t) will initially
increase, then peak, and eventually decrease.
(d) Under the assumptions of (b), try to derive a nontrivial upper bound for I and
show that for some constant c > 0 and time t = t0 we will have I(t) ≤ e−ct for all
t > t0 .
4. More on drunkards
4.1. Action of the transition matrix. Let us assume for now that the story of
Example 2 can be modeled with a stationary Markov Chain, and that in each of the
non-absorbing states our drunkard makes a step forward (towards the cliff) with
probability p and a step backward (toward the haystack) with probability q = 1−p.
For n = 5 this gives the following transition probability matrix:
(2)
Md,p,5
1
p
=
0
0
0
0
0
p
0
0
0
q
0
p
0
0 0
0 0
q 0
.
0 q
0 1
Let us have a look at this matrix. Note that all entries are nonnegative and each
row sums up to 1; it is a (right) stochastic matrix. Every transition (probability)
matrix of any Markov Chain must be right stochastic. Not all columns sum up
to 1 though; this matrix is not left stochastic, thus it is in particular not doubly
stochastic. Some Markov Chains do have doubly stochastic transition matrices.
Suppose our drunk starts at position i. If i ∈ {1, n} we know for sure where
he will be one time step later, but if i ∈ {2, . . . , n − 1} he might be in one of two
positions. In general, given the starting position, we will know, for every t ≥ 0,
a vector π(t) = [π1 (t), . . . , πn (t)] of probabilities πi (t) that he is in position i at
time t. These vectors are probability distributions (aka probability vectors). Given
π(t), we can determine π(t + 1), in any Markov Chain, by:
(3)
π(t + 1) = π(t)M (t).
More generally, if the Markov Chain is stationary, we get for every k ∈ N:
π(t + k) = π(t)M k .
(4)
Exercise 3. Characterize the property of irreducibility of a stationary Markov
Chain that was introduced in the previous section in terms of powers M k of the
transition matrix.
Note that (3) and (4) define a deterministic discrete-time linear dynamical system on Rn . Let
Dist = {π ∈ Rn :
n
X
i=1
= 1 & ∀ i ∈ {1, . . . , n} 0 ≤ πi ≤ 1}.
MATH 4/5470: MARKOV CHAINS
5
Exercise 4. Show that Dist is forward invariant in the dynamical system defined
by (3) and (4) in the sense that π(t) ∈ Dist implies π(t + k) ∈ Dist for all k ∈ N.
In view of Exercise 4, we may consider Dist as the state space of the dynamical
system defined by (3) and (4).
Notice that in the example of M = Md,p,5 , if the drunkard is never going to sober
up, then with probability 1 he will eventually reach one of the two absorbing states.
These correspond to steady states (1, 0, . . . , 0), (0, . . . , 0, 1) ∈ Dist. Moreover, for
every probability r the vector (r, . . . , 0, 1 − r) ∈ Dist is a steady state.
Since linear systems have a unique steady state ~0 ∈
/ Dist, in view of the results
of the previous lecture this tells us that λ1 = 1 must be an eigenvalue of Md,p,5 .
In fact, we can see that 1 must be an eigenvalue with multiplicity at least 2. More
generally, we can see that if a Markov chain with transition matrix M has m
absorbing states, then 1 must be an eigenvalue of M with multiplicity at least m.
Can there be eigenvalues λ of M with |λ| > 1?
Proposition 1. Let M be a stochastic matrix and let λ be an eigenvalue of M .
Then |λ| ≤ 1.
Proof: Define a norm on the space of matrices M = [mij ] by
kM k = max
i
X
|mij |.
j
If M is stochastic, then we must have kM k = 1. Now for every ~x the inequality
kM~xk ≤ kM k k~xk = k~xk
shows that we cannot have ~x with M~x = λ~x and |λ| > 1. Since the right eigenvalues
and left eigenvalues of M are the same, the result follows. Is there always an eigenvector in Dist with eigenvalue 1?
Yes. This follows from the fact that the map F (~x) = ~xM that defines our system
is continuous, Dist is forward invariant under the action of F , and is a compact
convex subset of Rn . By Brower’s Fixed Point Theorem, there must exist at least
one π ∗ ∈ Dist with F (π ∗ ) = π ∗ , that is, π ∗ M = π ∗ . Thus π ∗ must be a left
eigenvector of M with eigenvalue 1. It is called a stationary distribution for the
Markov Chain with transition matrix M .
Under which conditions is the stationary distribution π ∗ unique?
We have seen that if there is more than one absorbing state, π ∗ is not unique.
Theorem 2. Every irreducible Markov Chain has a unique stationary distribution.
If we start with π(0) ∈ Dist, does π(t) = π(0)M t always converge to a
stationary distribution?
If π(t) always converges to a unique stationary distribution π ∗ , then one can
interpret π ∗ as the long-term average proportions of time steps during which the
system resides in each of the states.
6
WINFRIED JUST, OHIO UNIVERSITY
Theorem 3. [Perron-Frobenius Theorem] Suppose M = [mij ] is a real square
matrix with all mij ≥ 0 such that there exists a positive integer k for which all
entries in M k are strictly positive. Then there exists a positive real eigenvalue λ∗
such that
(i) |λ∗ | > |λ| for every (real or complex) eigenvalue λ 6= λ∗ of M .
(ii) There exists an eigenvector ~x of M with eigenvalue λ∗ with all coordinates
xi > 0.
(iii) No other eigenvector of M has exclusively nonnegative coordinates.
(iv) λ∗ has multiplicity 1.
A Markov Chain for which the transition matrix M contains only positive entries
is called transitive; if there exists some k > 0 such that M k contains only positive
entries the Markov Chain is called ergodic. Clearly,
transitive ⇒ ergodic ⇒ irreducible,
but none of these implications can be reversed.
For example, the Markov Chain with transition matrix
0.5 0.5
M2 =
1
0
is ergodic but not transitive, and if we replace the cliff and the haystack by walls
that will force our drunkard to change direction, then we get a transition matrix
like
(5)
Mw,p,5
0
p
=
0
0
0
1
0
p
0
0
0
q
0
p
0
0 0
0 0
q 0
0 q
1 0
that defines an irreducible but nonergodic Markov Chain. Can you see why this
one is not ergodic?
Corollary 4. Each ergodic Markov Chain has a unique stationary distribution and
π(t) will always converge to the unique stationary distribution if the Markov Chain
is ergodic.
Exercise 5. Prove that for any Markov Chain with a transition matrix as in (5)
there exists π(0) ∈ Dist such that limt→∞ π(t) does not exist.
5. A note on time-reversibility
In the context of studying Markov chains and the dynamics of their distributions,
the notion of “time-reversibility” enters in three different meanings, and we need
to carefully distinguish between them.
The first is time-reversibility of the linear system F (~x) = ~xM on Rn . As we
discussed previously, this is simply equivalent to M being invertible, that is, not
having 0 as an eigenvalue.
The second notion is time-reversibility of the dynamical system F (π) = πM
on the state space Dist. For this one to be time reversible, we would need that
MATH 4/5470: MARKOV CHAINS
7
Dist is backward invariant as well as forward invariant, that is, we would need
that ~xM ∈ Dist implies ~x ∈ Dist. This property, however, will almost always be
violated. To see this, consider M that is transitive. Then points on the boundary
of Dist will be mapped by F to the interior of Dist (Can you see why?), which
by linearity implies that some points from outside Dist end up in Dist as well.
The third notion is time-reversibility of the Markov Chain itself, which is totally
unrelated to time-reversibility of the deterministic dynamical system defined by M .
To illustrate this notion, suppose you watch a movie of the drunkard we discussed
earlier, but you don’t know whether the movie is run backwards or forwards. But
since this is your good friend, you know his transition matrix. If based on this
knowledge you can figure out with any level of confidence different from 0.5 whether
the movie is run backward or forward, the Markov Chain is not time-reversible;
otherwise it is.
To illustrate the difference between these notions, consider two transition matrices:
0.5 0.5 0
1 0
0
M3 = 0 0.5 0.5
M4 = 0 0.5 0.5
0.5 0 0.5
0 0.5 0.5
Exercise 6. For which of these matrices is the corresponding Markov Chain timereversible? For which is the dynamical system on Rn defined by F (~x) = ~xM timereversible?
6. The Leslie matrix revisited
Suppose we have a Leslie matrix L where all age groups contribute offspring to
P1 (t + 1). Then Ln has only positive entries and the Perron-Frobenius Theorem
implies that there exists a unique asymptotically stable age distribution that will
be approached whenever we start from a population P~ (0) 6= ~0.
Exercise 7. Show that point (iii) of Theorem 3 may fail if we only assume that M
has nonnegative entries. Hint: Use your findings from Homework 8.
For the remaining exercises make the assumption that σn = 0 (no individuals
in the oldest class survive for one more time step), all other σi are positive, and
let R = {i : βi > 0}.
Exercise 8. Give an example of a Leslie matrix that satisfies these assumptions and
that has at least two distinct stable age distributions so that point (iv) of Theorem 3
fails.
Exercise 9. Prove that if {n − 1, n} ⊆ R, then there exists some k such thatLk
has only positive entries. Then find a necessary and sufficient condition on R that
will assure the latter, and prove the equivalence.
© Copyright 2026 Paperzz