Models for DNA substitution http://www.stat.rice.edu/ ~mathbio/Polanski/stat655/ Plan • • • • Basics Models in discrete time Model is continuous time Parameter estimation Nucleotides • Adenine ( A ) or ( a ) • Guanine ( G ) or ( g ) purines • Cytosine ( C ) or ( c ) • Thymine ( T ) or ( t ) pyrimidines Substitution Purine Pyrimidine Purine Pyrimidine Transitions AG, G A, C T, T C Purine Pyrimidine Pyrimidine Purine AT, T A, A C, C A GT, T G, G C, C G Transversions Other Deletions, insertions Insertions in reverse order Hypothesis Substitution of nucleotides in the evolution of DNA sequences can be modeled by a Markov chain or Markov process Other assumptions • Stationarity • Reversibility Transition matrix P = a g c t a paa pag pac pat g pga pgg pgc pgt c pca pcg pcc pct t pta ptg ptc ptt Models – discrete time Jukes – Cantor model All substitutions are equally probable 1 3 1 3 P 1 3 1 3 Stationary distribution a g c t 0.25 0.25 0.25 0.25 Spectral decomposition of 0.25 0.25 Pn 0.25 0.25 n P 0.25 0.25 0.25 0.75 0.25 0.25 0.25 0.25 0.75 0.25 0.25 0.25 0.25 0.25 (1 4 ) n 0.25 0.25 0.75 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0 . 25 0 . 25 0 . 25 0 . 75 Remark • When learning and researching Markov models for nucleotide substitution, it greatly helps to use a software for symbolic computation, like Mathematica, Maple, Scientific Workplace. Kimura models I) - probability of a transition - probability of a specific transversion 1 2 1 2 P 1 2 1 2 II) Kimura 3ST model - probability of : AG, C T - probability of : AC, G T - probability of : AT, C G 1 P 1 1 1 Stationary distribution a g c t 0.25 0.25 0.25 0.25 Generalizations of Kimura models By Ewens: - probability of : AG, C T - probability of : AC, A T, G C, G T - probability of : CA, T A, C G, T G 1 2 P 1 2 1 2 1 2 Stationary distribution 2( ) a g c t 2( ) 2( ) 2( ) Spectral decomposition 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 n (1 4 ) 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0 . 25 0 . 25 0 . 25 0 . 25 0 0 0.5 0.5 0.5 0.5 0 0 (1 2( )) n 0 0 0.5 0.5 0 0 0 . 5 0 . 5 0.25 0.25 n P 0.25 0.25 By Blaisdell: - probability of : AG, - probability of : GA, - probability of : AC, - probability of : CA, 1 2 P CT TC A T, G C, G T T A, C G, T G 1 2 1 2 1 2 Stationary distribution a g c t ( ) ( ) ( ) ( ) ( 2 ) ( 2 ) ( 2 ) ( 2 ) where Remark: this model is not reversible Felsenstein model Probability of substitution of any nucleotide by another is proportional to the stationary probability of the substituting nucleotide u g u c u t 1 u u a u 1 u u u u a g c t P u a u g 1 u u c u t u g u c 1 u u t u a Stationary distribution a g c t HKY model Hasegawa, Kishino, Yano Different rates for transitions and transversions 1 u g P v( c t ) u a v a v a u g v c v t 1 u a v( c t ) v c v t v g 1 u t v( a g ) u t v g u c 1 u c v( a g ) Eigenvalues of P 1 1 2 1 v 3 1 u ( c t ) v( a g ) 4 1 v( c t ) u ( a g ) Left (row) eigenvectors l2 ( c t ) a l1 a g ( c t ) g c t ( a g ) c ( a g ) t l3 0 0 1 1 l4 1 1 0 0 Right (column) eigenvectors 1 g 0 a g 1 1 g 0 a 1 t a g a r1 , r2 , r3 , r4 c t 1 1 a g 0 c c t 1 1 0 c t c t General 12 parameter model Tavare, 1986 1 uW uA g uB c uC t uD 1 uX uE uF a c t P uG a uH g 1 uY uI t uJ a uK g uL c 1 uZ W A g Bc Ct X D a Ec Ft Y G a H g It Z J a K g Lc Stationary distribution a g c t Reversibility A=D, B=G, C=J, E=H, F=K, I=L Conclusion – the most general reversible model has 12 – 6 = 6 free parameters Continuous – time models Matrix of transition probabilites P(t ) exp(Qt) Q – intensity matrix Jukes – Cantor model 3 Q 3 3 3 Spectral decomposition of P(t) 0.25 0.25 0.25 0.25 0.25 0.25 P (t ) exp(Qt ) 0.25 0.25 0.25 0.25 0.25 0.25 0.75 0.25 0.25 0.25 0.75 0.25 exp(4t ) 0.25 0.25 0.75 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.75 Kimura model 2 Q 2 2 2 Spectral decomposition of P(t) 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 exp(4 t ) 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0 . 25 0 . 25 0 . 25 0 . 25 0 0 0.5 0.5 0.5 0.5 0 0 exp(2( )t ) 0 0 0.5 0.5 0 0 0 . 5 0 . 5 0.25 0.25 P(t ) 0.25 0.25 Parameter estimation Jukes – Cantor model Three things are equivalent due to reversibility: Ancestor (A) D1 D1 D2 A D2 D2 A D1 Probability that the nucleotides are different in two descendants p p(t ) 0.75(1 exp(8t )) Estimating p We have two DNA sequences of length N D1: ACAATACAGGGCAGATAGATACAGATAGACACAGACAGAGCAGAGACAG D2: ACAATACAGGACAGTTAGATACAGATAGACACAGACAGAGCAGAGACAG Number of differences p = N 1 4 t log(1 pˆ ) 8 3 Kimura model p – probability of two different purines or pyrimidines q – probability of purine and pyrimidine p p(t ) 0.25 0.25 exp(4t ) 0.5 exp(2( )t ) q q(t ) 0.5 0.5 exp(4t )
© Copyright 2026 Paperzz