Realistic evolutionary models Marjolijn Elsinga & Lars Hemel Realistic evolutionary models Contents • • • • • • Models with different rates at different sites Models which allow gaps Evaluating different models Break Probabilistic interpretation of Parsimony Maximum Likelihood distances Unrealistic assumptions 1 Same rate of evolution at each site in the substitution matrix - In reality: the structure of proteins and the base pairing of RNA result in different rates 2 Ungapped alignments - Discard useful information given by the pattern of deletions and insertions Different rates in matrix Maximum likelihood, sites are independent Xj for j = 1…n Different rates in matrix (2) Introduce a site-dependent variable ru Different rates in matrix (3) We don’t know ru, so we use a prior Yang [1993] suggests a gamma distribution g(r, α , α), with mean = 1 and variance = 1/α Problem Number of terms grows exponentially with the number of sequences computationally slow Solution: approximation - Replace integral by a discrete sum - Subdivide domain into m intervals - Let rk denote the mean of the gamma distribution in the kth interval Solution Yang [1993] found m = 3.4 gives a good approximation Only m times as much computation as for non-varying sites Evolutionary models with gaps (1) Idea 1: introduce ‘_’ as an extra character of the alphabet of K residues and replace the (KxK) matrix with a (K+1) x (K+1) matrix Drawback: no possibility to assign lower cost to a following gap, gaps are now independent Evolutionary models with gaps (2) Idea 2: Allison, Wallace & Yee [1992] introduce delete and insertion states to ensure affine-type gaps Drawback: computationally intractable Evolutionary models with gaps (3) Idea 3: Thorne, Kishino & Felsenstein [1992] use fragment substitution to get a degree of biological plausibility Drawback: usable for only two sequences Finally Find a way to use affine-type gap penalties in a computationally reasonable way Mitchison & Durbin [1995] made a tree HMM which uses a profile HMM architecture, and treats paths through the model as objects that undergo evolutionary change Assumptions needed again We will use a architecture quite simpler than that of the profile HMM of Krogh et al [1994]: it has only match and delete states Match state: Mk Delete state: Dk k = position in the model Tree HMM with gaps (1) Sequence y is ancestor of sequence x Both sequences are aligned to the model, so both follow a prescribed path through the model Tree HMM with gaps (2) x emits residu xi at Mk y emits residu yj at Mk Probability of substitution yj xi is P(xi| yj,t) Tree HMM with gaps (3) What if x goes a different path than y x: Mk Dk+1 (= MD) y: Mk Mk+1 (= MM) P(MD|MM, t) Tree HMM with gaps (4) x: Dk+1 Mk+2 (= DM) y: Mk+1 Mk+2 (= MM) We assume that the choice between DD and DM is controlled by a mutational process that operates independently from y Substitution matrix The probabilities of transitions of the path of x are given by priors: Dk+1 Mk+2 has probability qDM How it works At position k: qyjP(xi|yj,t) Transition k k+1: qMMP(MD|MM,t) Transition k+1 k+2: qMMqDM An other example Evaluating models: evidence Comparing models is difficult Compare probabilities: P(D|M1) and P(D|M2) by integrating over all parameters of each model Parameters θ Prior probabilities P(θ ) Comparing two models Natural way to compare M1 and M2 is to compute the posterior probability of M1 Parametric Bootstrap Let be the maximum likelihood of the data D for the model M1 Let be the maximum likelihood of the data D for the model M2 Parametric bootstrap (2) Simulate datasets Di with the values of the parameters of M1 that gave the maximum likelihood for D If Δ exceed almost all values of Δi M2 captured more aspects of the data that M1 did not mimic, therefore M1 is rejected Break Probabilistic interpretation of various models Lars Hemel Overview Review of last week’s method Parsimony – Assumptions, Properties Probabilistic interpretation of Parsimony Maximum Likelihood distances – Example: Neighbour joining More probabilistic interpretations – Sankoff & Cedergren – Hein’s affine cost algorithm Conclusion / Questions? Review Parsimony = Finding a tree which can explain the observed sequences with a minimal number of substitutions Parsimony Remember the following assumptions: – Sequences are aligned – Alignments do not have gaps – Each site is treated independently Further more, many families have: – Substitution matrix is multiplicative: S(t s) S(t)S(s) – Reversibility: P(b | a, t )qa P(a | b, t )qb Parsimony Basic step: counting the minimal number of changes for one site Final number of substitutions is summing over all the sites Weighted parsimony uses different ‘weights’ for different substitutions Probabilistic interpretation of parsimony Given: A set of substitution probabilities P(b|a) in which we neglect the dependence on length t Calculate substitution costs S(a,b) = -log P(b|a) Felsenstein [1981] showed that by using these substitution costs, the minimal cost at site u for the whole tree T obtained by the weighted parsimony algorithm is regarded as an approximation to the likelihood Probabilistic interpretation of parsimony Testing performance for tree-building algorithms can be done by generating trees probabilistic with sampling and then see how often a given algorithm reconstructs them correctly Sampling is done as follows: – Pick a residue a at the root with probability q a – Accept substitution to b along the edge down to node i with probability P(b | a, ti ) repetitive – Sequences of length N are generated by N independent repetitions of this procedure – Maximum likelihood should reconstruct the correct tree for large N Probabilistic interpretation of 3 parsimony T 1 0.3 0.1 Suppose we have tree T, with the following edgelengths 0.09 2 0.1 0.3 4 And substitutionmatrix 1 p p p 1 p with p=0.3 for leaves 1,3 and p=0.1 for 2 and 4 Probabilistic interpretation of parsimony Tree with n leaves has (2n-5)!! unrooted trees 3 1 4 2 1 1 2 2 T1 3 4 T2 4 T3 3 Probabilistic interpretation of parsimony Parsimony Maximum likelihood N 20 100 500 2000 T1 419 638 904 997 T2 T3 339 204 61 3 242 158 35 0 N 20 100 500 2000 T1 396 405 404 353 T2 T3 378 515 594 646 Parsimony can constructs the wrong tree even for large N 224 79 2 0 Probabilistic interpretation of parsimony Suppose the following example: A tree with A,A,B,B at the places 1,2,3 and 4 B A A B Probabilistic interpretation of parsimony With parsimony the number of substitutions are calculated B A A A A A B 2 A 1 A B B B Parsimony constructs the right tree with 1 substitution more often than the left tree with 2 Maximum Likelihood distances Suppose tree T, edge lengths t t1 ,, tn i and sampled sequences x at the leafs 1 We’ll try to compute the distance between x 3 x and x8 t6 t1 x1 t7 x6 x2 x7 t4 t3 x 3 x4 x5 Maximum Likelihood distances x8 t6 t1 x1 t7 x6 x2 x7 t4 t3 x 3 x4 x5 By multiplicativety P(a | a , t1 t6 ) P(a | a , t1 ) P(a | a , t6 ) 1 8 1 a6 6 6 8 x3 x8 t6 t1 t7 x6 x2 x1 t3 x x3 t 7 t3 t1 t6 x7 t4 4 x 5 x t1 t6 t7 t3 x3 1 x1 By reversibility and multiplicativity P(a 1 a8 | a , t1 t6 ) P(a | a , t7 t3 )qa8 8 3 8 P(a | a , t1 t6 ) P(a | a , t3 t7 )qa 3 1 8 8 a8 P(a | a , t1 t6 t3 t7 )qa 3 1 3 3 Maximum Likelihood distances P( x , x | T , t ) qx j P( x | x , tk1 tkr ) i u j u u i u j u i j d arg max qx j P( xu | xu , t ) u t u ML i j d ij arg max P( xu | xu , t ) t u ML ij Maximum Likelihood distances d ML ij tk1 tkr ML distances between leaf sequences are close to additive, given large amount of data Example: Neighbour joining m i k j dim dik d km , d jm d jk d km 1 d km d im d jm d ij 2 Example: Neighbour joining Use Maximum Likelihood distances Suppose we have a multiplicative reversible model Suppose we have plenty of data The underlying probabilistic model is correct Then Neighbour joining will construct any tree correctly. Example: Neighbour joining Neighbour joining using ML distances N 20 100 500 2000 T1 477 635 896 997 T2 T3 301 231 85 5 222 134 19 0 It constructs the correct tree where Parsimony failed More probabilistic interpretations Sankoff & Cedergren – Simultaneously aligning sequences and finding its phylogeny, by using a character substitution model – Probabilistic when scores are interpreted as log probabilities and if the procedure is additive in stead of maximizing. Allison, Wallace & Yee [1992] – But as the original S&C method it is not practical for most problems. More probabilistic interpretations Hein’s affine cost algorithm – Simultaneously aligning sequences and finding its phylogeny, by using affine gap penalties – Probabilistic when scores are interpreted as log probabilities and if the procedure is additive in stead of maximizing. – But when using plus in stead of max we have to include all the paths, which will cost N 2 at the first node above the leaf and N 3 at the next and so on. So all the speed advantages are gone. Conclusion Probabilistic interpretations can be better – Compare ML with parsimony They can also be less useful, because of costs which get too high – Sankoff & Cedergren Neighbour joining constructs the correct tree if it has the correct assumptions So, the trick is to know your problem and to decide which method is the best Questions??
© Copyright 2026 Paperzz