Continuous Time Markov Chains. Applications in Bioinformatics

Continuous Time Markov Chains.
Applications in Bioinformatics
Javier Campos
Universidad de Zaragoza
http://webdiis.unizar.es/~jcampos/
Outline
•
•
•
•
Definitions
Steady-state distribution
Examples
Alternative presentation of CTMC:
embedded DTMC of a CTMC
• Applications in Bioinformatics
2
Definitions
• Remember DTMC
– pij is the transition probability from i to j over one
time slot
– The time spent in a state is geometrically
distributed
• Result of the Markov (memoryless) property
– When there is a jump from state i, it goes to state
j with probability
3
Definitions
• Continuous time version
qij
i
j
qik
k
– qij is the transition rate from state i to state j
4
Definitions
• Formally:
– A CTMC is a stochastic process {X(t) | t ≥0, t  lR} s.t.
for all t0,...,tn-1,tn,t  lR, 0≤t0<…<tn-1<tn<t , for all n  lN
P( X (t)  x | X (tn )  xn , X (tn1)  xn1,, X (t0 )  x0 ) 
 P( X (t)  x | X (tn )  xn )
– Alternative (equivalent) definition:
{X(t) | t ≥0, t  lR} s.t. for all t,s ≥ 0
P( X (t  s)  x | X (t)  xt , X (u),0  u  t) 
 P( X (t  s)  x | X (t)  xt )
5
Definitions
• Homogeneity
– We are considering discrete state (sample) space, then we
denote
pij(t,s) = P(X(t+s)=j | X(t)=i), for s > 0.
– A CTMC is called (time-)homogeneous if
pij(t,s) = pij(s) for all t ≥ 0
6
Definitions
• Time spent in a state (sojourn time):
– Markov property and time homogeneity imply that if at time
t the process is in state j, the time remaining in state j is
independent of the time already spent in state j :
memoryless property.
Let S be the random variable “time spent in state j”.
 time spent in state j is exponentially distributed.
Sojourn times of a CTMC are exponentially distributed.
7
Definitions
• Transition rates:
– In time-homogeneous CTMC, pij(s) is the probability of
jumping from i to j during an interval time of duration s.
– Therefore, we define the instantaneous transition rate from
state i to state j as:
qij  lim
t0
qij
pij (t)
i
t
– And the exit rate from
state i as – qii
p (t) 1
qii  qij  lim ii
t0
t
ji
j
qik
k
– Q = [qij] is called infinitesimal generator matrix
(or transition rate matrix or Q matrix)
8
Outline
•
•
•
•
Definitions
Steady-state distribution
Examples
Alternative presentation of CTMC:
embedded DTMC of a CTMC
• Applications in Bioinformatics
9
Steady-state distribution
•
Kolmogorov differential equation:
Denote the distribution at instant t: i(t) = P(X(t)=i)
And denote in matrix form: P(t) = [pij(t)]
Then (t) = (u)P(t-u) , for u < t
(we omit vector transposition to simplify notation)
Substituting u = t–t and substracting (t–t):
(t) – (t–t) = (t–t) [P(t) – I], with I the identity matrix
Dividing by t and taking the limit
d
P(t)  I
 (t)   (t) lim
t0
dt
t
Then, by definition of Q = [qij], we obtain the
Kolmogorov differential equation
d
 (t )   (t ) Q
dt
10
Steady-state distribution
• Since also (t)e = 1, with e = (1,1,…,1)
If the following limit exists
lim (t)
t
then taking the limit of Kolmogorov differential
equation we get the equations for the steady-state
probabilities:
Q = 0
e = 1
(balance equations)
(normalizing equation)
11
Outline
•
•
•
•
Definitions
Steady-state distribution
Examples
Alternative presentation of CTMC:
embedded DTMC of a CTMC
• Applications in Bioinformatics
12
Examples
• Example 1: A 2-state CTMC
– Consider a simple
two-state CTMC
– The corresponding
Q matrix is given by
– The Kolmogorov differential
equation yields:
– Given that 1(0) = 1,
we get the transient solution:
   
Q

   
d
0 (t)  0 (t)  1(t)
dt
d
1(t)  0 (t)  1(t)
dt
0 (t)  1(t)  1
 1 (t ) 



e (    ) t
 

 (    ) t

e
 0 (t ) 
 
13
Examples
1  0  0
0  1  0
0  1  1
– And the steady-state
solution comes from
Q = 0; e = 1:
– We get:
0 

 
;
1 

balance eq.
 
• Which can also be obtained by taking the limits as t   of
the equations for 1(t) and 0(t).
 1  lim  1 (t )  lim



e (    ) t 


 

 (    ) t

 0  lim  0 (t )  lim
e


t 
t    


t 
t 
14
Examples
•
Example 2:
A simple open system with loss

(Poisson)


1
2
may be lost
– Works enter to the system with exponentially distributed
(parameter  interarrival time (Poisson process)
– The service time in both processing stations is exponentially
distributed with rate 
– If a work ends in station 1 when station 2 is busy, station 1 is
blocked
– If station 1 is busy or blocked when a work arrives, arriving work is
lost
– Questions:
• proportion of lost works?
• mean number of working stations?
• mean number of works in the system?
15
Examples
– The set of states of the system:
S={(0,0), (1,0), (0,1), (1,1), (b,1)}
0  empty station

1  working station
(Poisson)
b  blocked station


1
2
may be lost
State transition diagram:

0,0

1,0


0,1


1,1

b,1
Infinitesimal generator matrix:
00
10
Q  01
11
b1
 

  



 



 




(  ) 




 

 
 

 


16
Examples
– Steady-state solution:
Q  0 

e  1 

0,0

1,0
balance eq.


0,1


1,1

 00   01
 00   11   10
 10   b1  (   ) 01
 01  2  11
 11   b1
 00   10   01   11   b1  1
b,1
 2 2   2 2  2  2 

, , ,  where  
    ,


  

and
  3 2  4  2
17
Examples
– Proportion of lost works?
• Is the probability of the event “when a new work arrives, the first
station is non-empty”, i.e.:
32 2
10  11  b1  2
3 4 2
– Mean number of working stations?
• In state (0,0) there is no working station and in state (1,1) there are
two; in the rest of states there is only one, thus
42 4
B  01  10  b1 211  2
3 4 2
– Mean number of works in the system?
• In state (0,0) there is no one; in states (1,1) and (b,1) there are two
and in the rest there is only one, thus
L   01   10  2 b1  2 11
52  4
 2
3  4   2
18
Examples
• Example 3: I/O buffer with limited capacity
– Records arrive according to a Poisson process (rate )
– Buffer capacity: M records
– Buffer cleared at times spaced by intervals which are
exponentially distributed (parameter ) and independent of
arrivals







19
Examples
– Steady-state solution:
0  (1  M )
(  )i  i1, 1  i  M 1
M  M1
0  M  1







i
   
, 0  i  M 1

   

i  
M
  
M  

 
Thus, for example, the mean number of records in the buffer in
steady-state:
M 1
B  M   i
M
i 0
i


,
where  


20
Outline
•
•
•
•
Definitions
Steady-state distribution
Examples
Alternative presentation of CTMC:
embedded DTMC of a CTMC
• Applications in Bioinformatics
21
Semi-Markov processes
• Also called Markov renewal processes
– Generalization of CTMC with arbitrary distributed
sojourn times
– As in the case of CTMC, they can be used for
obtaining both steady-state distribution and
transient distribution
22
Semi-Markov processes
• (Time homogeneous) Semi-Markov process:
Qij(t)
Distribution, from i to j in time t:
j
i
Qik(t)
k
pij
Transition probability from i to j:
i
j
pik
Sojourn time distribution in state i when the next state is j:
k
Transition prob. matrix of the embedded Discrete Time Markov Chain:
Sojourn time distribution in state i regardless of the next state:
23
Semi-Markov processes
• Steady-state analysis
– Build transition probability matrix, P,
and mean residence (sojourn) time vector, D.
– From them, compute steady-state distribution:
Steady-state of embedded DTMC
Mean residence time vector
Steady-state distribution of
semi-Markov process
24
Embedded DTMC of a CTMC
• CTMC can be seen as a particular case of
semi-Markov process when sojourn time
distributions are exponential
• In other words, if we consider a DTMC with
null diagonal elements and we add
exponentially distributed soujourn times in
the states, we get a CTMC.
this is called the
Embedded DTMC
of the CTMC
CTMC = DTMC (where to move) +
+ exponential holding times (when to move)
25
Outline
•
•
•
•
Definitions
Steady-state distribution
Examples
Alternative presentation of CTMC:
embedded DTMC of a CTMC
• Applications in Bioinformatics
26
Simple population size model
•
We model the size of the population (n = 0,1,2…)
n is the birth rate in a population of size n; n is the death rate
0 is an absorbing state
A case of relevance to population dynamics is the choice:
n = n and n = nμ, for constants  and μ.
We can ask:
What is the extinction probability starting from state k?
(i.e., probability of being absorbed into state 0 (ever))
If extinction is certain, what is the mean time to extinction?
27
Extinction probability
•
•
Let uk be the probab. of absorption at 0, starting from state k.
Then, since the first step away from k is either to k+1, with
probability k / (k + μk) or to k−1, with probability μk / (k + μk),
and u0 = 1.
•
Let 0 = 1 and j = (μj / j) j–1 for j ≥ 1.
– If
– If
j j =  then uk = 1 for all k (extinction is certain).
j j is finite then
28
Mean time to extinction
•
Let wk be the mean time to absorption at 0, starting from state
k. By first-step analysis
and w0 = 0.
–
–
29
Many other applications…
• As embedded analytical model for more
abstract modelling paradigms:
–
–
–
–
birth-death processes and population dynamics
queueing processes and queueing networks
stochastic Petri nets
stochastic process algebras…
30