Stochastic processes and Markov chains (part I)

Stochastic processes and
Markov chains (part I)
Wessel van Wieringen
w n van wieringen@vu nl
[email protected]
Department of Epidemiology and Biostatistics, VUmc
& Department of Mathematics,
Mathematics VU University
Amsterdam, The Netherlands
Stochastic processes
Stochastic processes
Example 1
• The intensity of the sun.
• Measured every day by
the KNMI.
• Stochastic variable Xt
represents the sun’s
intensity at day t, 0 ≤ t ≤
T. Hence, Xt assumes
values in R+ (positive
values only)
only).
Stochastic processes
Example 2
• DNA sequence of 11 bases long.
• At each base position there is an
A, C, G or T.
• Stochastic variable Xi is the base
at position i, i = 1,…,11.
• In case the sequence
q
has been
observed, say:
(x1, x2, …, x11) = ACCCGATAGCT,
then A is the realization of X1, C
that of X2, et cetera.
Stochastic processes
position
base
…
…
t
A
t+1
A
t+2
T
t+3
C
…
…
…
A
A
A
A
…
…
G
G
G
G
…
…
C
C
C
C
…
…
T
T
T
T
…
position
t
position
t+1
position
t+2
position
t+3
Stochastic processes
Example 3
• A patient’s heart pulse during surgery.
• Measured continuously during interval [0
[0, T].
T]
• Stochastic variable Xt represents the occurrence of
a heartbeat at time t, 0 ≤ t ≤ T. Hence, Xt assumes
only the values 0 (no heartbeat) and 1 (heartbeat).
Stochastic processes
Example 4
• Brain activity of a human
under experimental
conditions.
• Measured continuously
during interval [0, T].
• Stochastic variable Xt
represents the magnetic
field at time t, 0 ≤ t ≤ T.
Hence, Xt assumes values
on R.
Stochastic processes
Differences between examples
Discrete
Continuous
D
Discret
te
Example 2
Example 1
Con
ntinuous
Time
Xt
Example 3
Example 4
Stochastic processes
The state space S is the collection of all possible values
that the random variables of the stochastic process may
assume.
If S = {{E1, E2, …,, Es} discrete,, then Xt is a discrete
stochastic variable.
→ examples 2 and 3.
If S = [0, ∞)) discrete, then Xt is a continuous stochastic
variable.
→ examples
p
1 and 4.
Stochastic processes
Stochastic process:
• Discrete time: t = 0,
0 1
1, 2
2, 3
3, ….
• S = {-3, -2, -1, 0, 1, 2, …..}
• P(one
(
step
p up)
p) = ½ = P(one
(
step
p down))
• Absorbing state at -3
7
6
5
4
3
2
1
0
-1
-2
-3
-4
1
2
3
4
5
6
7
8
9
10
11 12 13 14 15 16 17 18
Stochastic processes
The first passage time of a certain state Ei in S is the
time t at which Xt = Ei for the first time since the start
of the process.
7
6
5
4
3
2
1
0
-1
-2
-3
-4
What is the first p
passage
g
time of 2? And of -1?
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18
Stochastic processes
The absorbing state is a state Ei for which the
following holds: if Xt = Ei than Xt+1
t 1 = Ei for all s ≥ tt.
The process will never leave state Ei.
7
6
5
4
3
2
1
0
-1
-2
-3
-4
The absorbing state is -3.
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18
Stochastic processes
The time of absorption of an absorbing state is the
first passage time of that state
state.
7
6
5
4
3
2
1
0
-1
-2
-3
-4
The time of absorption is 17.
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18
Stochastic processes
The recurrence time is the first time t at which the
process has returned to its initial state
state.
7
6
5
4
3
2
1
0
-1
-2
-3
-4
The recurrence time is 14.
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18
Stochastic processes
A stochastic process is described by a collection of
time points, the state space and the simultaneous
distribution of the variables Xt, i.e., the distributions of
all Xt and their dependency.
dependency
There are two important types of processes:
• Poisson process: all variables are identically and
independently distributed. Examples: queues for
counters, call centers, servers, et cetera.
• Markov process: the variables are dependent in a
simple
p manner.
Markov processes
Markov processes
A 1st order Markov process in discrete time is a stochastic process {Xt}t=1,2, … for which the following holds:
P(Xt+1=xt+1 | Xt=xt, …, X1=x1) = P(Xt+1=xt+1 | Xt=xt).
In other words, only the present determines the future,
the past is irrelevant.
From every
state with
probability 0
0.5
5
one step up or
down (except
for state -3).
7
6
5
4
3
2
1
0
-1
-2
-3
-4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Markov processes
The 1st order Markov property, i.e.
P(Xt+1=xt+1 | Xt=xt, …, X1=x1) = P(Xt+1=xt+1 | Xt=xt),
does not imply independence between Xt-1 and Xt+1.
In fact, often P(Xt+1=xt+1 | Xt-1=xt-1) is unequal to zero.
For instance,
P(Xt+1=2 | Xt-1=0) =
P(Xt+1=2
2 | Xt=1)
1)
* P(Xt=1 | Xt-1=0)
= ¼.
¼
7
6
5
4
3
2
1
0
-1
-2
-3
-4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Markov processes
An m-order Markov process in discrete time is a stochastic
process {Xt}t=1,2, … for which the following holds:
P(Xt+1=xt+1 | Xt=xt, …, X1=x1)
= P(Xt+1=xt+1 | Xt=xt,…,Xt-m+1= xt-m+1).
Loosely, the future depends on the most recent past.
Suppose m=9:
9 preceeding
bases
t+1
… CTTCCTCAAGATGCGTCCAATCCCTCAAAC …
Distribution of the t+1-th base
depends on the 9 preceeding ones.
Markov processes
A Markov process is called a Markov chain if the state
space is discrete
discrete, ii.e.,
e is finite or countable
countable.
In these lecture series we consider Markov chains in
discrete time.
Recall the DNA
example.
Markov processes
The transition probabilities are the
P(Xt+1=xt+1 | Xt=xt),
but also the
P(Xt+1=xt+1 | Xs=xs)
for s < t,
where xt+1, xt in {E1, …, Es} = S.
S
P(Xt+1=3 | Xt=2) = 0.5
P(Xt+1=0 | Xt=1) = 0.5
7
6
5
4
3
2
1
( t+1=5 | Xt=7)) = 0
P(X
0
-1
-2
-3
-4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Markov processes
A Markov process is called time homogeneous if the
transition probabilities are independent of t:
P(Xt+1=x1 | Xt=x2) = P(Xs+1=x1 | Xs=x2).
For example:
P(X2=-1
= 1 | X1=2) = P(X6=-1
= 1 | X5=2)
P(X584=5 | X583=4) = P(X213=5 | X212=4)
7
6
5
4
3
2
1
0
-1
1
-2
-3
-4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Markov processes
Example of a time inhomogeneous process
Bathtub
curve
h(t)
t
Infant mortality
Random
failures
Wearout failures
Markov processes
Consider a DNA sequence of 11 bases. Then, S={a, c,
g t},
g,
t} Xi is the base of position i,
i and {Xi}i=1, …, 11 is a
Markov chain if the base of position i only depends on
the base of position ii-1,
1, and not on those before ii-1.
1.
If this is plausible, a Markov chain is an acceptable
model for base ordering in DNA sequences
sequences.
A
C
G
T
state diagram
Markov processes
In remainder, only time homogeneous Markov processes.
We then denote the transition probabilities of a finite time
homogeneous Markov chain in discrete time {Xt}t=1,2,… with
S {E1, …, Es} as:
S={E
P(Xt+1=Ej | Xt=Ei) = pij (does not depend on t).
Putting the pij in a matrix yields the transition matrix:
The rows of this matrix sum to one.
Markov processes
In the DNA example:
where
pCC
pAA
A
pAA = P(Xt+1 = A | Xt = A)
and:
pAA + pAC + pAG + pAT = 1
pCA + pCC + pCG + pCT = 1
et cetera
pGA
pCA
pGC
pAG
G
pGG
pAC
C
pGC
pTC
pGT
pTG
pCT
T
pTT
Markov processes
The initial distribution π = (π1, …,πs)T gives the
probabilities
b bili i off the
h iinitial
i i l state:
πi = P(X1 = Ei) for i = 1, …, s,
and π1 + … + πs = 1.
Together
g
the initial distribution π and the transition
matrix P determine the probability distribution of the
process, the Markov chain {Xt}t=1,2, …
Markov processes
We now show how the couple (π
(π, P) determines the
probability distribution (transition probabilities) of time
steps
p larger
g than one.
Hereto define for
n=2
:
generall n :
Markov processes
For n = 2 :
just the definition
Markov processes
For n = 2 :
use the fact that
P(A B) + P(A,
P(A,
P(A BC) = P(A)
Markov processes
For n = 2 :
use the definition of conditional probability:
P(A B) = P(A | B) P(B)
P(A,
Markov processes
For n = 2 :
use the Markov property
Markov processes
For n = 2 :
Markov processes
In summary,
summary we have shown that:
In a similar fashion we can show that:
IIn words:
d th
the ttransition
iti matrix
t i for
f n steps
t
is
i the
th onestep transition matrix raised to the power n.
Markov processes
The general case:
is proven by induction to n.
One needs the Kolmogorov-Chapman equations:
for all n, m ≥ 0 and i, j =1,…,S.
Markov processes
Proof of the Kolmogorov-Chapman equations:
Markov processes
Induction proof of
Assume
Then:
Markov processes
A numerical example:
Then:
matrix multiplication (“rows
( rows times columns
columns”))
Markov processes
Thus:
0.6490 = P(Xt+2 = I | Xt = I) is composed of two probabilities:
probability of this route
plus
probability of this route
I
I
I
I
I
I
II
II
II
II
II
II
position
t
position
t+1
position
t+2
position
t
position
t+1
position
t+2
Markov processes
In similar fashion we may obtain:
Sum over probs. of all possible paths between 2 states:
I
I
I
I
I
I
II
II
II
II
II
II
pos. t
pos. t+1
pos. t+2
pos. t+3
pos. t+4
pos. t+5
Similarly:
Markov process
How do we sample from a Markov process?
Reconsider the DNA example.
A
C
G
pCC
pAA
A
T
pGA
to
A
f
r
o
m
A
C
G
T
C
T
pGG
pCA
pGC
pAG
G
G
pAC
C
pGC
pTC
pGT
pTG
pCT
T
pTT
Markov process
1
Sample the first base from the initial distribution π:
P(X1=A) = 0.45, P(X1=C) = 0.10, et cetera
Suppose we have drawn an A. Our DNA sequence
now consists of one base, namely A.
After the first step,
step our sequence looks like:
A
2
Sample the next base using P given the previous base.
Here X1=A, thus we sample from:
P(X2=A
A | X1=A)
A) = 0
0.1,
1 P(X2=C
C | X1=A)
A) = 0.1,
0 1 ett cetera.
t
In case X1=C, we would have sampled from:
P(X2=A | X1=C) = 0.2, P(X2=C | X1=C) = 0.3, et cetera.
Markov process
2
After the second step:
AT
3
The last step is iterated until
ntil the last base
base.
After the third step:
ATC
9
After nine steps:
ATCCGATGC
CCG GC
Andrey Markov
Andrey Markov
1856 1922
1856–1922
Andrey Markov
In the famous first application of Markov chains, Andrey
Markov studied the sequence of 20,000 letters in Pushkin’s
poem “Eugeny Onegin”, discovering that
- the stationary vowel probability is 0.432,
- the probability of a vowel following a vowel is 0.128, and
- the probability of a vowel following a consonant is 0
0.663.
663
Markov also studied the sequence of 100,000 letters in
Aksakov’s novel “The Childhood of Bagrov, the Grandson”.
This was in 1913
1913, long before the computer age!
Basharinl et al. (2004)
Parameter estimation
Parameter estimation
T
G
C
T
C
G
C
G
T
C
C
C
G
C
T
T
G
A
G
G
G
C
G
A
G
C
C
C
C
C
T
C
G
G
G
G
G
C
A
A
C
C
A
C
A
C
C
C
T
A
C
A
G
C
G
G
G
A
T
G
A
G
T
A
T
G
G
C
A
A
G
G
T
C
C
G
A
T
T
G
G
T
G
A
G
A
C
C
C
T
G
C
G
A
C
A
A
G
G
G
G
T
C
A
G
G
C
T
G
G
G
T
C
G
A
C
C
C
G
G
G
C
G
C
G
G
T
T
C
C
A
G
C
C
A
C
G
C
C
A
A
T
C
A
G
G
T
C
A
C
C
G
C
G
A
G
C
C
T
G
C
G
G
C
G
C
A
T
T
G
C
A
G
G
G
A
T
C
A
C
A
C
G
G
T
G
G
G
C
A
A
G
A
G
C
G
G
G
C
T
G
A
G
G
G
G
G
A
G
G
C
G
G
C
G
G
G
G
G
C
A
C
A
T
A
A
C
A
A
C
G
G
A
C
C
G
A
A
C
A
G
T
A
A
T
C
G
T
G
A
C
C
G
C
G
C
G
C
C
C
T
C
G
T
G
G
C
C
A
G
T
C
G
G
G
G
G
T
A
G
C
C
C
C
C
C
G
C
T
C
G
G
T
C
G
G
C
C
T
G
G
C
G
G
C
G
A
T
C
G
C
G
G
T
C
C
G
C
G
T
T
G
T
T
T
C
T
C
G
T
G
C
A
A
A
T
A
G
G
T
A
C
C
A
C
G
C
G
G
T
G
G
C
C
C
A
G
C
G
A
C
A
A
T
C
T
A
G
C
G
C
G
G
T
G
C
A
G
A
C
C
C
G
C
C
C
C
A
C
C
G
T
T
A
C
C
C
G
C
T
C
A
A
C
C
T
G
G
G
G
G
G
T
A
G
C
G
G
C
C
G
C
G
A
C
G
C
T
C
C
C
A
G
T
G
C
C
C
C
G
C
C
T
C
C
A
G
C
T
A
G
G
C
C
C
C
G
G
A
G
A
C
C
C
A
C
C
G
G
C
A
A
T
G
T
T
C
G
C
C
C
C
G
T
A
A
T
C
T
G
C
G
G
A
C
C
A
T
C
T
C
G
G
T
A
A
C
C
G
G
C
C
G
C
G
G
C
C
G
C
G
C
G
T
C
A
C
A
C
C
G
C
T
A
T
A
G
C
G
T
C
A
G
C
A
T
C
C
G
C
C
C
G
A
C
T
G
A
T
C
C
G
G
C
A
T
A
A
C
A
T
A
C
G
C
A
A
G
G
C
T
G
C
A
T
G
G
A
G
G
G
T
A
G
A
T
C
G
G
C
C
G
A
C
A
G
T
C
C
G
C
G
A
C
G
G
G
G
C
G
T
C
G
C
C
C
A
A
A
G
T
C
G
T
C
G
C
G
G
G
C
T
G
T
A
T
C
G
C
G
T
C
A
G
T
T
A
C
C
T
T
T
C
G
C
C
G
A
C
C
C
T
G
T
A
C
G
A
G
G
A
C
G
G
C
C
G
C
A
C
G
G
C
T
G
C
A
C
A
T
C
A
G
C
C
C
C
G
G
G
G
A
C
T
G
A
C
C
A
C
A
G
G
G
C
A
C
C
A
G
G
T
C
G
C
A
C
A
G
C
G
G
G
A
C
A
G
G
G
C
G
C
T
A
C
C
A
T
C
G
T
C
T
G
G
A
T
C
T
C
G
A
C
A
G
A
C
T
G
C
C
C
A
T
A
G
G
G
C
A
C
G
T
G
A
A
G
G
T
T
G
C
G
C
A
T
G
C
C
T
C
A
G
T
C
T
T
G
T
A
C
T
A
C
A
A
G
C
T
A
G
C
G
T
C
A
C
A
C
G
G
C
C
T
C
A
C
C
G
C
T
G
G
G
G
C
A
A
G
G
G
G
C
C
C
C
C
A
G
A
G
C
C
T
T
A
C
C
T
C
C
G
A
G
G
C
G
G
G
C
G
C
T
C
G
A
G
C
G
G
G
G
C
C
G
T
G
G
G
A
A
C
T
C
G
G
G
C
A
C
G
G
G
C
G
T
C
A
G
A
A
G
C
C
G
C
G
T
G
C
A
C
G
G
G
C
C
G
G
A
T
A
T
G
C
T
C
C
G
C
C
A
G
A
G
G
G
C
C
C
A
T
C
T
C
G
C
G
G
G
C
C
C
C
A
C
T
C
A
C
T
G
C
C
C
C
A
G
G
G
G
C
G
C
G
C
A
C
A
T
G
C
T
A
A
A
C
C
T
C
G
G
T
C
A
A
T
A
G
C
T
A
A
G
T
C
A
C
A
T
A
C
C
G
C
G
C
C
T
T
A
G
C
C
A
G
G
C
G
A
G
C
G
G
A
G
C
C
A
T
C
T
C
G
G
T
G
C
C
C
C
G
G
G
C
C
T
C
A
G
C
A
C
T
G
T
G
A
A
C
C
C
G
A
T
A
C
T
A
A
C
T
C
G
G
C
C
T
C
A
G
G
C
A
C
A
G
G
A
T
G
G
T
G
C
A
G
T
G
T
C
C
C
C
T
C
T
T
C
T
T
T
C
G
G
C
A
C
C
A
G
C
A
A
A
G
A
T
C
C
T
C
G
A
C
C
C
C
C
A
G
C
A
G
A
A
C
C
G
G
C
C
T
G
C
C
G
C
C
T
A
G
G
A
G
C
G
C
A
C
T
C
C
C
C
A
G
C
G
G
C
T
G
G
G
C
G
C
A
C
C
G
A
A
T
C
G
G
T
C
G
G
C
C
A
C
T
G
T
A
C
G
T
T
G
C
A
G
T
C
C
C
A
C
T
G
G
C
C
C
G
C
C
A
C
C
C
C
C
C
C
A
T
G
T
C
G
A
C
A
C
G
C
G
T
T
A
G
G
A
T
G
C
A
A
G
A
T
A
C
C
C
A
C
G
A
T
C
T
T
G
C
T
C
C
C
G
A
C
T
G
A
A
A
C
C
C
T
C
A
G
G
G
T
A
A
C
G
T
G
A
G
C
C
A
C
G
T
T
G
C
A
G
C
C
C
C
C
C
T
G
A
G
G
C
G
C
A
T
C
G
A
C
C
G
C
C
G
T
A
T
C
A
G
A
C
A
G
C
T
T
G
C
C
C
T
C
T
C
A
A
C
C
C
T
C
G
A
T
G
C
G
T
G
G
G
T
G
C
C
A
G
T
C
G
G
G
A
T
T
C
A
C
G
T
A
T
C
T
T
T
C
G
C
C
T
C
G
A
G
A
A
C
C
A
C
G
A
A
T
C
C
G
G
A
C
A
G
G
C
A
C
G
A
T
A
G
T
T
G
T
C
T
C
T
T
A
C
G
G
G
C
A
C
A
C
T
T
C
G
C
C
C
T
C
G
T
C
A
G
G
C
C
C
G
C
G
C
C
C
A
A
G
G
G
A
A
C
T
G
C
G
T
C
C
T
G
C
G
G
T
G
C
C
A
C
G
C
G
G
A
C
G
C
C
A
G
G
C
C
G
C
C
T
T
T
G
G
C
G
A
G
C
G
G
A
C
C
C
T
G
G
G
A
G
C
T
G
C
A
G
G
G
A
A
G
G
C
G
C
G
C
A
T
G
A
G
G
G
G
C
A
C
T
C
C
C
G
A
C
C
G
T
A
G
G
A
T
C
T
A
C
C
C
C
C
C
T
T
T
C
A
T
T
G
C
A
C
T
A
T
C
G
G
A
C
T
C
C
G
C
T
C
A
T
T
C
A
G
A
C
T
G
T
G
C
A
T
C
A
C
A
A
C
A
G
T
G
A
G
A
C
G
G
T
A
C
C
C
A
C
A
C
G
C
C
G
A
A
C
C
T
G
T
C
C
C
C
T
T
G
G
G
C
G
G
T
C
G
C
T
A
A
A
G
G
A
A
G
C
A
G
C
T
A
T
T
G
C
A
C
G
C
T
C
T
C
G
C
C
C
T
G
C
T
G
G
C
A
C
G
G
C
C
C
C
C
C
G
C
A
A
A
C
A
G
A
G
A
A
T
A
C
T
G
T
C
A
G
C
C
A
C
C
A
T
T
G
T
C
T
G
T
T
A
C
G
C
G
C
G
G
T
C
A
A
C
C
C
C
C
C
G
C
C
A
C
A
T
T
C
A
T
A
A
G
C
C
T
A
T
A
A
G
G
C
A
G
G
C
G
C
A
A
A
A
G
C
C
C
C
G
A
C
C
G
T
G
C
A
G
C
A
C
C
C
G
C
T
A
A
A
C
T
C
A
C
A
A
G
T
G
T
T
C
C
A
C
G
C
C
A
G
G
C
C
G
A
G
C
G
C
G
C
G
G
A
C
G
C
A
G
T
A
A
C
G
T
C
C
C
G
G
G
T
C
C
T
C
G
A
G
T
C
G
G
G
G
T
C
A
A
T
T
T
C
A
G
C
G
G
C
A
T
C
G
G
A
T
T
G
A
A
T
A
G
A
T
A
A
T
G
G
G
G
T
C
G
C
C
C
T
G
C
A
G
G
C
C
A
C
A
A
C
G
A
C
T
G
C
T
T
C
C
C
C
T
T
A
C
C
A
A
G
C
T
G
A
G
A
C
G
A
T
G
C
T
C
T
G
C
T
C
A
C
C
G
C
G
G
A
A
A
A
T
G
G
C
A
T
C
G
C
G
C
T
T
A
C
A
G
G
C
T
A
C
C
G
A
G
T
C
G
A
A
G
G
G
G
A
T
A
G
C
G
A
T
G
C
C
C
G
A
A
C
C
G
G
T
A
G
C
A
C
A
T
C
A
A
T
C
C
How do we estimate the parameters
of a Markov chain from the data?
Maximum likelihood!
T
C
C
G
G
G
G
C
C
C
G
G
T
C
G
G
A
G
C
C
G
C
G
G
G
G
G
T
C
A
A
G
G
A
C
A
T
G
A
T
A
G
G
G
C
T
G
G
C
G
C
G
G
G
T
T
A
G
C
C
G
G
G
C
C
C
G
C
T
A
A
C
C
C
G
T
T
C
C
C
A
T
T
A
A
C
A
T
C
T
G
G
A
A
A
G
A
G
C
C
C
T
C
C
G
C
A
G
T
G
T
A
A
C
C
T
A
A
A
C
A
A
T
C
C
G
T
C
C
G
G
A
C
A
C
G
C
G
T
C
T
G
G
C
G
C
G
C
G
G
G
G
T
G
G
T
C
C
C
C
C
T
C
A
G
C
A
T
A
A
G
G
A
G
G
G
C
G
T
G
T
A
C
C
G
A
G
C
G
C
A
C
A
G
A
C
T
G
G
A
C
A
T
A
G
T
C
G
C
G
G
G
T
T
T
G
C
A
G
A
C
C
T
C
G
G
G
G
C
A
T
A
T
C
C
A
T
A
T
C
G
C
C
G
C
T
C
A
G
C
A
G
C
A
G
A
C
A
T
G
T
G
G
C
C
G
A
T
T
T
G
C
A
C
C
A
A
A
A
C
G
G
C
C
A
G
G
G
T
A
G
G
C
C
T
G
C
G
T
C
T
C
A
A
T
A
C
C
A
G
C
T
T
G
C
G
T
G
C
C
G
T
G
A
G
G
G
C
A
A
C
G
A
G
C
G
G
T
G
G
C
T
G
T
G
G
A
A
A
C
G
C
T
C
G
C
C
T
T
C
A
G
G
C
A
C
A
T
C
T
A
G
G
G
C
C
T
A
T
A
C
G
G
G
C
G
G
G
C
C
G
G
C
C
G
G
C
T
G
C
T
A
G
A
T
G
C
A
G
A
A
C
C
G
A
G
G
T
G
G
A
C
A
G
G
T
T
A
G
A
G
C
G
G
C
G
A
C
G
C
G
C
G
G
T
G
A
T
C
A
G
C
G
T
G
G
T
A
C
T
C
C
T
C
G
G
G
C
G
G
T
A
A
G
C
C
C
G
T
C
G
T
T
C
C
G
G
G
C
C
C
G
A
C
A
C
T
A
G
C
G
C
A
T
T
G
G
A
C
G
G
A
A
C
C
C
C
C
A
G
A
T
G
T
G
G
C
A
G
C
G
A
T
T
C
T
A
C
T
T
T
G
A
C
C
C
C
C
A
C
C
G
A
C
C
C
C
G
C
C
G
C
C
G
C
T
T
G
A
C
A
G
T
G
G
C
C
T
T
T
T
C
A
G
A
G
G
G
G
A
C
A
C
A
C
A
A
C
T
T
C
C
G
C
G
G
G
G
G
T
G
C
G
A
C
T
A
C
G
C
A
C
T
A
C
T
C
A
G
G
G
G
T
T
G
C
T
A
G
G
G
A
A
C
G
C
G
G
G
G
C
A
A
C
A
G
G
C
C
T
C
C
T
G
T
T
A
C
C
A
C
C
Parameter estimation
Using the definition of conditional probability, the
likelihood can be decomposed as follows:
where x1, …, xT in S = {E1, …, ES}.
Parameter estimation
Using the Markov property, we obtain:
Parameter estimation
Furthermore, e.g.:
where only one transition probability at the time enters the
likelihood due to the indicator function
likelihood,
function.
Parameter estimation
The likelihood then becomes:
where, e.g.,
Parameter estimation
Now note that, e.g.,
pAA + pAC + pAG + pAT = 1.
Or,
pAT = 1 - pAA - pAC - pAG.
Substitute this in the likelihood, and take the logarithm
to arrive at the log
log-likelihood.
likelihood
Note: it is irrelevant which transition probability is
substituted.
Parameter estimation
The log
log-likelihood:
likelihood:
Parameter estimation
Differentiation the log
log-likelihood
likelihood yields, e.g.:
Equate the derivatives to zero. This yields four systems of
equations to solve
solve, e
e.g.:
g:
Parameter estimation
The transition probabilities are then estimated by:
V if 2ndd order
Verify
d part.
t derivatives
d i ti
off llog-likehood
lik h d are negative!
ti !
Parameter estimation
If one may assume the observed data is a realization of a
stationary Markov chain, the initial distribution is estimated
by the stationary distribution.
If only
y one realization of a stationaryy Markov chain is
available and stationarity cannot be assumed, the initial
distribution is estimated by:
Parameter estimation
Thus,, for the following
g sequence:
q
ATCGATCGCA,
,
tabulation yields
The maximum likelihood estimates thus become:
Parameter estimation
In R:
> DNAseq <- c("A", "T", "C", "G", "A",
"T", "C", "G", "C" "A")
> table(DNAseq)
DNAseq
A C G T
3 3 2 2
> table(DNAseq[1:9], DNAseq[2:10])
A C G T
A 0 0 0 2
C 1 0 2 0
G 1 1 0 0
T 0 2 0 0
Testing the order of
the Markov chain
Testing the order of a Markov chain
Often the order of the Markov chain is unknown and
needs
d to b
be d
determined
i d ffrom the
h d
data. Thi
This iis d
done using
i
a Chi-square test for independence.
Consider a DNA-sequence. To assess the order, one first
tests whether the sequence consists of independent
letters. Hereto count the nucleotides, di-nucleotides and
tri-nucleotides in the sequence:
q
Testing the order of a Markov chain
The null hypothesis of independence between the letters
off the
th DNA sequence evaluated
l t db
by th
the χ2 statistic:
t ti ti
which has (4-1) x (4-1) = 9 degrees of freedom.
If the null hypothesis cannot be rejected,
rejected one would fit a
Markov model of order m=0. However, if H0 can be
rejected one would test for higher order dependence.
Testing the order of a Markov chain
To test for 1st order dependence, consider how often a dinucleotide e
nucleotide,
e.g.,
g CT,
CT is followed by
by, e
e.g.,
g T.
T
The 1st order dependence hypothesis is evaluated by:
where
This is χ2 distributed with (16
(16-1)
1) x (4-1)
(4 1) = 45 d.o.f..
dof
Testing the order of a Markov chain
A 1st order Markov chain provides a reasonable description of the
sequence of the hlyE gene of the E
E.coli
coli bacteria
bacteria.
Independence test:
Chi-sq
q stat: 22.45717,
, p
p-value: 0.00754
1st order dependence test:
Chi-sq stat: 55.27470, p-value: 0.14025
The sequence
q
of the p
prrA g
gene of the E.coli bacteria requires
q
a higher
g
order Markov chain model.
Independence test:
Chi-sq
hi
stat: 33.51356,
33 51356 p-value:
l
1
1.266532e-04
266532 04
1st order dependence test:
Chi-sq stat: 114
114.56290,
56290 p-value: 5
5.506452e-08
506452e-08
Testing the order of a Markov chain
In R (the independence case, assuming the DNAseq-object is
j
containing
g the sequence):
q
)
a character-object
> # calculate nucleotide and dimer frequencies
> nuclFreq <- matrix(table(DNAseq), ncol=1)
> dimerFreq <- table(DNAseq[1:(length(DNAseq)-1)],
DNAseq[2:length(DNAseq)])
> # calculate expected dimer frequencies
> dimerExp <- nuclFreq %*% t(nuclFreq)/(length(DNAseq)-1)
> # calculate test statistic and p-value
> teststat <- sum((dimerFreq - dimerExp)^2 / dimerExp)
> pval << exp(pchisq(teststat,
exp(pchisq(teststat 9
9, lower
lower.tail=FALSE,
tail=FALSE
log.p=TRUE))
E
Exercise:
i
modify
dif th
the code
d above
b
ffor th
the 1st order
d ttest.
t
Application:
pp
sequence
di i i ti
discrimination
Application: sequence discrimination
We have fitted 1st order Markov chain models to a
representative coding and noncoding sequence of the
E.coli bacteria.
Pcoding
A
C
G
T
A
0.321
0.319
0.259
0.223
C
0.257
0.207
0.284
0.243
G
0.211
0.266
0.237
0.309
Pnoncoding
T
0.211
0.208
0.219
0.225
A
C
G
T
A
0.320
0.295
0.241
0.283
C
0.278
0.205
0.261
0.238
These models can used to discriminate between
coding and noncoding sequences
sequences.
G
0.231
0.286
0.233
0.256
T
0.172
0.214
0.265
0.223
Application: sequence discrimination
For a new sequence with unknown function calculate the
likelihood under the coding and noncoding model, e.g.:
Th
These
lik
likelihoods
lih d are compared
db
by means off th
their
i llog ratio:
ti
the log-likelihood ratio.
Application: sequence discrimination
If the log likelihood LR(X) of a new sequence exceeds a
certain threshold, the sequence is classified as coding
and non-coding otherwise.
Back to E.coli
To illustrate the p
potential of the discrimination approach,
pp
,
we calculate the log likelihood ratio for large set of known
coding and noncoding sequences. The distribution of the
t
two
sets
t off LR(X)’s
LR(X)’ are compared.
d
Application: sequence discrimination
coding
di
sequences
f
frequenc
cy
noncoding
di
sequences
log likelihood ratio
Application: sequence discrimination
flip
mirror
violinplot
noncoding
coding
Application: sequence discrimination
Conclusion E.coli example
Comparison of the distributions indicate that one could
g and noncoding
g
discriminate reasonablyy between coding
sequences on the basis of simple 1st order Markov.
Improvements:
- Higher order Markov model.
- Incorporate more structure information of the DNA.
References &
further reading
References and further reading
Basharin, G.P., Langville, A.N., Naumov, V.A (2004),
“The life and work of A.A. Markov”, Linear Algebra
and its Applications,
Applications 386,
386 3-26.
3 26
Ewens, W.J, Grant, G (2006), Statistical Methods for
Bioinformatics Springer,
Bioinformatics,
Springer New York
York.
Reinert, G., Schbath, S., Waterman, M.S. (2000),
Probabilistic and statistical properties of words: an
“Probabilistic
overview”, Journal of Computational Biology, 7, 1-46.
This material is provided under the Creative Commons
Attribution/Share-Alike/Non-Commercial License.
See http://www.creativecommons.org for details.