Stochastic processes and Markov chains (part I) Wessel van Wieringen w n van wieringen@vu nl [email protected] Department of Epidemiology and Biostatistics, VUmc & Department of Mathematics, Mathematics VU University Amsterdam, The Netherlands Stochastic processes Stochastic processes Example 1 • The intensity of the sun. • Measured every day by the KNMI. • Stochastic variable Xt represents the sun’s intensity at day t, 0 ≤ t ≤ T. Hence, Xt assumes values in R+ (positive values only) only). Stochastic processes Example 2 • DNA sequence of 11 bases long. • At each base position there is an A, C, G or T. • Stochastic variable Xi is the base at position i, i = 1,…,11. • In case the sequence q has been observed, say: (x1, x2, …, x11) = ACCCGATAGCT, then A is the realization of X1, C that of X2, et cetera. Stochastic processes position base … … t A t+1 A t+2 T t+3 C … … … A A A A … … G G G G … … C C C C … … T T T T … position t position t+1 position t+2 position t+3 Stochastic processes Example 3 • A patient’s heart pulse during surgery. • Measured continuously during interval [0 [0, T]. T] • Stochastic variable Xt represents the occurrence of a heartbeat at time t, 0 ≤ t ≤ T. Hence, Xt assumes only the values 0 (no heartbeat) and 1 (heartbeat). Stochastic processes Example 4 • Brain activity of a human under experimental conditions. • Measured continuously during interval [0, T]. • Stochastic variable Xt represents the magnetic field at time t, 0 ≤ t ≤ T. Hence, Xt assumes values on R. Stochastic processes Differences between examples Discrete Continuous D Discret te Example 2 Example 1 Con ntinuous Time Xt Example 3 Example 4 Stochastic processes The state space S is the collection of all possible values that the random variables of the stochastic process may assume. If S = {{E1, E2, …,, Es} discrete,, then Xt is a discrete stochastic variable. → examples 2 and 3. If S = [0, ∞)) discrete, then Xt is a continuous stochastic variable. → examples p 1 and 4. Stochastic processes Stochastic process: • Discrete time: t = 0, 0 1 1, 2 2, 3 3, …. • S = {-3, -2, -1, 0, 1, 2, …..} • P(one ( step p up) p) = ½ = P(one ( step p down)) • Absorbing state at -3 7 6 5 4 3 2 1 0 -1 -2 -3 -4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Stochastic processes The first passage time of a certain state Ei in S is the time t at which Xt = Ei for the first time since the start of the process. 7 6 5 4 3 2 1 0 -1 -2 -3 -4 What is the first p passage g time of 2? And of -1? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Stochastic processes The absorbing state is a state Ei for which the following holds: if Xt = Ei than Xt+1 t 1 = Ei for all s ≥ tt. The process will never leave state Ei. 7 6 5 4 3 2 1 0 -1 -2 -3 -4 The absorbing state is -3. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Stochastic processes The time of absorption of an absorbing state is the first passage time of that state state. 7 6 5 4 3 2 1 0 -1 -2 -3 -4 The time of absorption is 17. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Stochastic processes The recurrence time is the first time t at which the process has returned to its initial state state. 7 6 5 4 3 2 1 0 -1 -2 -3 -4 The recurrence time is 14. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Stochastic processes A stochastic process is described by a collection of time points, the state space and the simultaneous distribution of the variables Xt, i.e., the distributions of all Xt and their dependency. dependency There are two important types of processes: • Poisson process: all variables are identically and independently distributed. Examples: queues for counters, call centers, servers, et cetera. • Markov process: the variables are dependent in a simple p manner. Markov processes Markov processes A 1st order Markov process in discrete time is a stochastic process {Xt}t=1,2, … for which the following holds: P(Xt+1=xt+1 | Xt=xt, …, X1=x1) = P(Xt+1=xt+1 | Xt=xt). In other words, only the present determines the future, the past is irrelevant. From every state with probability 0 0.5 5 one step up or down (except for state -3). 7 6 5 4 3 2 1 0 -1 -2 -3 -4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Markov processes The 1st order Markov property, i.e. P(Xt+1=xt+1 | Xt=xt, …, X1=x1) = P(Xt+1=xt+1 | Xt=xt), does not imply independence between Xt-1 and Xt+1. In fact, often P(Xt+1=xt+1 | Xt-1=xt-1) is unequal to zero. For instance, P(Xt+1=2 | Xt-1=0) = P(Xt+1=2 2 | Xt=1) 1) * P(Xt=1 | Xt-1=0) = ¼. ¼ 7 6 5 4 3 2 1 0 -1 -2 -3 -4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Markov processes An m-order Markov process in discrete time is a stochastic process {Xt}t=1,2, … for which the following holds: P(Xt+1=xt+1 | Xt=xt, …, X1=x1) = P(Xt+1=xt+1 | Xt=xt,…,Xt-m+1= xt-m+1). Loosely, the future depends on the most recent past. Suppose m=9: 9 preceeding bases t+1 … CTTCCTCAAGATGCGTCCAATCCCTCAAAC … Distribution of the t+1-th base depends on the 9 preceeding ones. Markov processes A Markov process is called a Markov chain if the state space is discrete discrete, ii.e., e is finite or countable countable. In these lecture series we consider Markov chains in discrete time. Recall the DNA example. Markov processes The transition probabilities are the P(Xt+1=xt+1 | Xt=xt), but also the P(Xt+1=xt+1 | Xs=xs) for s < t, where xt+1, xt in {E1, …, Es} = S. S P(Xt+1=3 | Xt=2) = 0.5 P(Xt+1=0 | Xt=1) = 0.5 7 6 5 4 3 2 1 ( t+1=5 | Xt=7)) = 0 P(X 0 -1 -2 -3 -4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Markov processes A Markov process is called time homogeneous if the transition probabilities are independent of t: P(Xt+1=x1 | Xt=x2) = P(Xs+1=x1 | Xs=x2). For example: P(X2=-1 = 1 | X1=2) = P(X6=-1 = 1 | X5=2) P(X584=5 | X583=4) = P(X213=5 | X212=4) 7 6 5 4 3 2 1 0 -1 1 -2 -3 -4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Markov processes Example of a time inhomogeneous process Bathtub curve h(t) t Infant mortality Random failures Wearout failures Markov processes Consider a DNA sequence of 11 bases. Then, S={a, c, g t}, g, t} Xi is the base of position i, i and {Xi}i=1, …, 11 is a Markov chain if the base of position i only depends on the base of position ii-1, 1, and not on those before ii-1. 1. If this is plausible, a Markov chain is an acceptable model for base ordering in DNA sequences sequences. A C G T state diagram Markov processes In remainder, only time homogeneous Markov processes. We then denote the transition probabilities of a finite time homogeneous Markov chain in discrete time {Xt}t=1,2,… with S {E1, …, Es} as: S={E P(Xt+1=Ej | Xt=Ei) = pij (does not depend on t). Putting the pij in a matrix yields the transition matrix: The rows of this matrix sum to one. Markov processes In the DNA example: where pCC pAA A pAA = P(Xt+1 = A | Xt = A) and: pAA + pAC + pAG + pAT = 1 pCA + pCC + pCG + pCT = 1 et cetera pGA pCA pGC pAG G pGG pAC C pGC pTC pGT pTG pCT T pTT Markov processes The initial distribution π = (π1, …,πs)T gives the probabilities b bili i off the h iinitial i i l state: πi = P(X1 = Ei) for i = 1, …, s, and π1 + … + πs = 1. Together g the initial distribution π and the transition matrix P determine the probability distribution of the process, the Markov chain {Xt}t=1,2, … Markov processes We now show how the couple (π (π, P) determines the probability distribution (transition probabilities) of time steps p larger g than one. Hereto define for n=2 : generall n : Markov processes For n = 2 : just the definition Markov processes For n = 2 : use the fact that P(A B) + P(A, P(A, P(A BC) = P(A) Markov processes For n = 2 : use the definition of conditional probability: P(A B) = P(A | B) P(B) P(A, Markov processes For n = 2 : use the Markov property Markov processes For n = 2 : Markov processes In summary, summary we have shown that: In a similar fashion we can show that: IIn words: d th the ttransition iti matrix t i for f n steps t is i the th onestep transition matrix raised to the power n. Markov processes The general case: is proven by induction to n. One needs the Kolmogorov-Chapman equations: for all n, m ≥ 0 and i, j =1,…,S. Markov processes Proof of the Kolmogorov-Chapman equations: Markov processes Induction proof of Assume Then: Markov processes A numerical example: Then: matrix multiplication (“rows ( rows times columns columns”)) Markov processes Thus: 0.6490 = P(Xt+2 = I | Xt = I) is composed of two probabilities: probability of this route plus probability of this route I I I I I I II II II II II II position t position t+1 position t+2 position t position t+1 position t+2 Markov processes In similar fashion we may obtain: Sum over probs. of all possible paths between 2 states: I I I I I I II II II II II II pos. t pos. t+1 pos. t+2 pos. t+3 pos. t+4 pos. t+5 Similarly: Markov process How do we sample from a Markov process? Reconsider the DNA example. A C G pCC pAA A T pGA to A f r o m A C G T C T pGG pCA pGC pAG G G pAC C pGC pTC pGT pTG pCT T pTT Markov process 1 Sample the first base from the initial distribution π: P(X1=A) = 0.45, P(X1=C) = 0.10, et cetera Suppose we have drawn an A. Our DNA sequence now consists of one base, namely A. After the first step, step our sequence looks like: A 2 Sample the next base using P given the previous base. Here X1=A, thus we sample from: P(X2=A A | X1=A) A) = 0 0.1, 1 P(X2=C C | X1=A) A) = 0.1, 0 1 ett cetera. t In case X1=C, we would have sampled from: P(X2=A | X1=C) = 0.2, P(X2=C | X1=C) = 0.3, et cetera. Markov process 2 After the second step: AT 3 The last step is iterated until ntil the last base base. After the third step: ATC 9 After nine steps: ATCCGATGC CCG GC Andrey Markov Andrey Markov 1856 1922 1856–1922 Andrey Markov In the famous first application of Markov chains, Andrey Markov studied the sequence of 20,000 letters in Pushkin’s poem “Eugeny Onegin”, discovering that - the stationary vowel probability is 0.432, - the probability of a vowel following a vowel is 0.128, and - the probability of a vowel following a consonant is 0 0.663. 663 Markov also studied the sequence of 100,000 letters in Aksakov’s novel “The Childhood of Bagrov, the Grandson”. This was in 1913 1913, long before the computer age! Basharinl et al. (2004) Parameter estimation Parameter estimation T G C T C G C G T C C C G C T T G A G G G C G A G C C C C C T C G G G G G C A A C C A C A C C C T A C A G C G G G A T G A G T A T G G C A A G G T C C G A T T G G T G A G A C C C T G C G A C A A G G G G T C A G G C T G G G T C G A C C C G G G C G C G G T T C C A G C C A C G C C A A T C A G G T C A C C G C G A G C C T G C G G C G C A T T G C A G G G A T C A C A C G G T G G G C A A G A G C G G G C T G A G G G G G A G G C G G C G G G G G C A C A T A A C A A C G G A C C G A A C A G T A A T C G T G A C C G C G C G C C C T C G T G G C C A G T C G G G G G T A G C C C C C C G C T C G G T C G G C C T G G C G G C G A T C G C G G T C C G C G T T G T T T C T C G T G C A A A T A G G T A C C A C G C G G T G G C C C A G C G A C A A T C T A G C G C G G T G C A G A C C C G C C C C A C C G T T A C C C G C T C A A C C T G G G G G G T A G C G G C C G C G A C G C T C C C A G T G C C C C G C C T C C A G C T A G G C C C C G G A G A C C C A C C G G C A A T G T T C G C C C C G T A A T C T G C G G A C C A T C T C G G T A A C C G G C C G C G G C C G C G C G T C A C A C C G C T A T A G C G T C A G C A T C C G C C C G A C T G A T C C G G C A T A A C A T A C G C A A G G C T G C A T G G A G G G T A G A T C G G C C G A C A G T C C G C G A C G G G G C G T C G C C C A A A G T C G T C G C G G G C T G T A T C G C G T C A G T T A C C T T T C G C C G A C C C T G T A C G A G G A C G G C C G C A C G G C T G C A C A T C A G C C C C G G G G A C T G A C C A C A G G G C A C C A G G T C G C A C A G C G G G A C A G G G C G C T A C C A T C G T C T G G A T C T C G A C A G A C T G C C C A T A G G G C A C G T G A A G G T T G C G C A T G C C T C A G T C T T G T A C T A C A A G C T A G C G T C A C A C G G C C T C A C C G C T G G G G C A A G G G G C C C C C A G A G C C T T A C C T C C G A G G C G G G C G C T C G A G C G G G G C C G T G G G A A C T C G G G C A C G G G C G T C A G A A G C C G C G T G C A C G G G C C G G A T A T G C T C C G C C A G A G G G C C C A T C T C G C G G G C C C C A C T C A C T G C C C C A G G G G C G C G C A C A T G C T A A A C C T C G G T C A A T A G C T A A G T C A C A T A C C G C G C C T T A G C C A G G C G A G C G G A G C C A T C T C G G T G C C C C G G G C C T C A G C A C T G T G A A C C C G A T A C T A A C T C G G C C T C A G G C A C A G G A T G G T G C A G T G T C C C C T C T T C T T T C G G C A C C A G C A A A G A T C C T C G A C C C C C A G C A G A A C C G G C C T G C C G C C T A G G A G C G C A C T C C C C A G C G G C T G G G C G C A C C G A A T C G G T C G G C C A C T G T A C G T T G C A G T C C C A C T G G C C C G C C A C C C C C C C A T G T C G A C A C G C G T T A G G A T G C A A G A T A C C C A C G A T C T T G C T C C C G A C T G A A A C C C T C A G G G T A A C G T G A G C C A C G T T G C A G C C C C C C T G A G G C G C A T C G A C C G C C G T A T C A G A C A G C T T G C C C T C T C A A C C C T C G A T G C G T G G G T G C C A G T C G G G A T T C A C G T A T C T T T C G C C T C G A G A A C C A C G A A T C C G G A C A G G C A C G A T A G T T G T C T C T T A C G G G C A C A C T T C G C C C T C G T C A G G C C C G C G C C C A A G G G A A C T G C G T C C T G C G G T G C C A C G C G G A C G C C A G G C C G C C T T T G G C G A G C G G A C C C T G G G A G C T G C A G G G A A G G C G C G C A T G A G G G G C A C T C C C G A C C G T A G G A T C T A C C C C C C T T T C A T T G C A C T A T C G G A C T C C G C T C A T T C A G A C T G T G C A T C A C A A C A G T G A G A C G G T A C C C A C A C G C C G A A C C T G T C C C C T T G G G C G G T C G C T A A A G G A A G C A G C T A T T G C A C G C T C T C G C C C T G C T G G C A C G G C C C C C C G C A A A C A G A G A A T A C T G T C A G C C A C C A T T G T C T G T T A C G C G C G G T C A A C C C C C C G C C A C A T T C A T A A G C C T A T A A G G C A G G C G C A A A A G C C C C G A C C G T G C A G C A C C C G C T A A A C T C A C A A G T G T T C C A C G C C A G G C C G A G C G C G C G G A C G C A G T A A C G T C C C G G G T C C T C G A G T C G G G G T C A A T T T C A G C G G C A T C G G A T T G A A T A G A T A A T G G G G T C G C C C T G C A G G C C A C A A C G A C T G C T T C C C C T T A C C A A G C T G A G A C G A T G C T C T G C T C A C C G C G G A A A A T G G C A T C G C G C T T A C A G G C T A C C G A G T C G A A G G G G A T A G C G A T G C C C G A A C C G G T A G C A C A T C A A T C C How do we estimate the parameters of a Markov chain from the data? Maximum likelihood! T C C G G G G C C C G G T C G G A G C C G C G G G G G T C A A G G A C A T G A T A G G G C T G G C G C G G G T T A G C C G G G C C C G C T A A C C C G T T C C C A T T A A C A T C T G G A A A G A G C C C T C C G C A G T G T A A C C T A A A C A A T C C G T C C G G A C A C G C G T C T G G C G C G C G G G G T G G T C C C C C T C A G C A T A A G G A G G G C G T G T A C C G A G C G C A C A G A C T G G A C A T A G T C G C G G G T T T G C A G A C C T C G G G G C A T A T C C A T A T C G C C G C T C A G C A G C A G A C A T G T G G C C G A T T T G C A C C A A A A C G G C C A G G G T A G G C C T G C G T C T C A A T A C C A G C T T G C G T G C C G T G A G G G C A A C G A G C G G T G G C T G T G G A A A C G C T C G C C T T C A G G C A C A T C T A G G G C C T A T A C G G G C G G G C C G G C C G G C T G C T A G A T G C A G A A C C G A G G T G G A C A G G T T A G A G C G G C G A C G C G C G G T G A T C A G C G T G G T A C T C C T C G G G C G G T A A G C C C G T C G T T C C G G G C C C G A C A C T A G C G C A T T G G A C G G A A C C C C C A G A T G T G G C A G C G A T T C T A C T T T G A C C C C C A C C G A C C C C G C C G C C G C T T G A C A G T G G C C T T T T C A G A G G G G A C A C A C A A C T T C C G C G G G G G T G C G A C T A C G C A C T A C T C A G G G G T T G C T A G G G A A C G C G G G G C A A C A G G C C T C C T G T T A C C A C C Parameter estimation Using the definition of conditional probability, the likelihood can be decomposed as follows: where x1, …, xT in S = {E1, …, ES}. Parameter estimation Using the Markov property, we obtain: Parameter estimation Furthermore, e.g.: where only one transition probability at the time enters the likelihood due to the indicator function likelihood, function. Parameter estimation The likelihood then becomes: where, e.g., Parameter estimation Now note that, e.g., pAA + pAC + pAG + pAT = 1. Or, pAT = 1 - pAA - pAC - pAG. Substitute this in the likelihood, and take the logarithm to arrive at the log log-likelihood. likelihood Note: it is irrelevant which transition probability is substituted. Parameter estimation The log log-likelihood: likelihood: Parameter estimation Differentiation the log log-likelihood likelihood yields, e.g.: Equate the derivatives to zero. This yields four systems of equations to solve solve, e e.g.: g: Parameter estimation The transition probabilities are then estimated by: V if 2ndd order Verify d part. t derivatives d i ti off llog-likehood lik h d are negative! ti ! Parameter estimation If one may assume the observed data is a realization of a stationary Markov chain, the initial distribution is estimated by the stationary distribution. If only y one realization of a stationaryy Markov chain is available and stationarity cannot be assumed, the initial distribution is estimated by: Parameter estimation Thus,, for the following g sequence: q ATCGATCGCA, , tabulation yields The maximum likelihood estimates thus become: Parameter estimation In R: > DNAseq <- c("A", "T", "C", "G", "A", "T", "C", "G", "C" "A") > table(DNAseq) DNAseq A C G T 3 3 2 2 > table(DNAseq[1:9], DNAseq[2:10]) A C G T A 0 0 0 2 C 1 0 2 0 G 1 1 0 0 T 0 2 0 0 Testing the order of the Markov chain Testing the order of a Markov chain Often the order of the Markov chain is unknown and needs d to b be d determined i d ffrom the h d data. Thi This iis d done using i a Chi-square test for independence. Consider a DNA-sequence. To assess the order, one first tests whether the sequence consists of independent letters. Hereto count the nucleotides, di-nucleotides and tri-nucleotides in the sequence: q Testing the order of a Markov chain The null hypothesis of independence between the letters off the th DNA sequence evaluated l t db by th the χ2 statistic: t ti ti which has (4-1) x (4-1) = 9 degrees of freedom. If the null hypothesis cannot be rejected, rejected one would fit a Markov model of order m=0. However, if H0 can be rejected one would test for higher order dependence. Testing the order of a Markov chain To test for 1st order dependence, consider how often a dinucleotide e nucleotide, e.g., g CT, CT is followed by by, e e.g., g T. T The 1st order dependence hypothesis is evaluated by: where This is χ2 distributed with (16 (16-1) 1) x (4-1) (4 1) = 45 d.o.f.. dof Testing the order of a Markov chain A 1st order Markov chain provides a reasonable description of the sequence of the hlyE gene of the E E.coli coli bacteria bacteria. Independence test: Chi-sq q stat: 22.45717, , p p-value: 0.00754 1st order dependence test: Chi-sq stat: 55.27470, p-value: 0.14025 The sequence q of the p prrA g gene of the E.coli bacteria requires q a higher g order Markov chain model. Independence test: Chi-sq hi stat: 33.51356, 33 51356 p-value: l 1 1.266532e-04 266532 04 1st order dependence test: Chi-sq stat: 114 114.56290, 56290 p-value: 5 5.506452e-08 506452e-08 Testing the order of a Markov chain In R (the independence case, assuming the DNAseq-object is j containing g the sequence): q ) a character-object > # calculate nucleotide and dimer frequencies > nuclFreq <- matrix(table(DNAseq), ncol=1) > dimerFreq <- table(DNAseq[1:(length(DNAseq)-1)], DNAseq[2:length(DNAseq)]) > # calculate expected dimer frequencies > dimerExp <- nuclFreq %*% t(nuclFreq)/(length(DNAseq)-1) > # calculate test statistic and p-value > teststat <- sum((dimerFreq - dimerExp)^2 / dimerExp) > pval << exp(pchisq(teststat, exp(pchisq(teststat 9 9, lower lower.tail=FALSE, tail=FALSE log.p=TRUE)) E Exercise: i modify dif th the code d above b ffor th the 1st order d ttest. t Application: pp sequence di i i ti discrimination Application: sequence discrimination We have fitted 1st order Markov chain models to a representative coding and noncoding sequence of the E.coli bacteria. Pcoding A C G T A 0.321 0.319 0.259 0.223 C 0.257 0.207 0.284 0.243 G 0.211 0.266 0.237 0.309 Pnoncoding T 0.211 0.208 0.219 0.225 A C G T A 0.320 0.295 0.241 0.283 C 0.278 0.205 0.261 0.238 These models can used to discriminate between coding and noncoding sequences sequences. G 0.231 0.286 0.233 0.256 T 0.172 0.214 0.265 0.223 Application: sequence discrimination For a new sequence with unknown function calculate the likelihood under the coding and noncoding model, e.g.: Th These lik likelihoods lih d are compared db by means off th their i llog ratio: ti the log-likelihood ratio. Application: sequence discrimination If the log likelihood LR(X) of a new sequence exceeds a certain threshold, the sequence is classified as coding and non-coding otherwise. Back to E.coli To illustrate the p potential of the discrimination approach, pp , we calculate the log likelihood ratio for large set of known coding and noncoding sequences. The distribution of the t two sets t off LR(X)’s LR(X)’ are compared. d Application: sequence discrimination coding di sequences f frequenc cy noncoding di sequences log likelihood ratio Application: sequence discrimination flip mirror violinplot noncoding coding Application: sequence discrimination Conclusion E.coli example Comparison of the distributions indicate that one could g and noncoding g discriminate reasonablyy between coding sequences on the basis of simple 1st order Markov. Improvements: - Higher order Markov model. - Incorporate more structure information of the DNA. References & further reading References and further reading Basharin, G.P., Langville, A.N., Naumov, V.A (2004), “The life and work of A.A. Markov”, Linear Algebra and its Applications, Applications 386, 386 3-26. 3 26 Ewens, W.J, Grant, G (2006), Statistical Methods for Bioinformatics Springer, Bioinformatics, Springer New York York. Reinert, G., Schbath, S., Waterman, M.S. (2000), Probabilistic and statistical properties of words: an “Probabilistic overview”, Journal of Computational Biology, 7, 1-46. This material is provided under the Creative Commons Attribution/Share-Alike/Non-Commercial License. See http://www.creativecommons.org for details.
© Copyright 2026 Paperzz