Model assumptions and violations Taxon Hemiehinus Hedgehog Echinosorex Pipistrellus Echinops Mogera Urotrichus Dormouse Thryonomys . . . Horse Rhinopholus Dugong Hippo Donkey Pika SpermWhale Macaca Baboon Gorilla Chimp Human Gibbon Orangutan A 0.34 0.32 0.34 0.32 0.30 0.34 0.34 0.31 0.33 C 0.20 0.21 0.21 0.24 0.24 0.24 0.25 0.25 0.25 G 0.11 0.12 0.11 0.12 0.12 0.12 0.12 0.12 0.11 T 0.36 0.36 0.34 0.32 0.34 0.29 0.29 0.33 0.30 0.31 0.30 0.29 0.32 0.31 0.30 0.31 0.31 0.31 0.29 0.29 0.29 0.29 0.29 0.30 0.30 0.30 0.31 0.31 0.31 0.32 0.32 0.32 0.32 0.33 0.33 0.33 0.34 0.12 0.13 0.14 0.13 0.12 0.12 0.12 0.12 0.12 0.12 0.12 0.12 0.13 0.12 0.26 0.27 0.27 0.25 0.26 0.27 0.25 0.26 0.25 0.26 0.26 0.26 0.25 0.25 Recall: nucleotide frequency bias. Easy to see large variability across taxa. Especially contrast hedgehogs and hominoid primates. • All models are subsets of the General Time Reversible (GTR). Assumptions: – Stationarity: no base composition bias across taxa (across the tree). – Symmetric substitution matrix implying time reversibility: p(A T)=p(T A), p(A C)=p(C A)... • Actual sequence data: – Asymmetric substitution matrix: time irreversibility – Nonstationary nucleotide frequencies: vary across taxa: base composition bias. Hedgehogs have very different nucleotide frequencies from other mammals 2nd PC Models of sequence evolution III: time reversibility Hedgehogs low C and high T Most general Time Reversible model 1st PC high C and low T General Time Reversible model GTR: symmetric substitution matrix: Symmetric substitution matrix p(A>T)=p(T>A), p(A>C)=p(C>A)…. A C G T A α β δ C α γ ε G β γ µ T δ ε µ - A C G T A α β δ C α γ ε G β γ µ T δ ε µ - 1 MacClade. MacClade. A C G T A C G T Easy to check graphically. Appears asymmetric. A C G T Model violations (ML & Bayesian) Asymmetric substitution matrix from MacClade e.g Observed p(C>T) > p(T>C) Expected A Statistical test: view graph as a table, export and analyze statistically. C G T Model violations (ML & Bayesian) • Asymmetric substitution matrix e.g Ho: symmetric substitutions rates p(A>G) > p(G>A)… 63 mammals, 11kb mtDNA 1st position Χ26 = 322 p<0.001 2nd position Χ26 = 4.3 NS 3rd position Χ26 = 1539 p<<0.001 Asymmetry of 12 nucleotide changes 2 63 mammals, 11kb mtDNA Transitions 63 mammals, 11kb mtDNA Transitions & transversions Whale “lice” (cyamids) COI sequence Whale “lice” (cyamids) COI sequence •Asymmetry especially problematic for ancient divergences. 1st position Χ26 = 10.3 NS •Cyamid tree has much more recent divergence than mammals 2nd position Χ26 = 6.7 NS •But still appears asymmetric. Whale “lice” (cyamids) COI sequence Small sample (n=31). Randomly asymmetric. 3rd position Χ26 = 73.5 p<0.001 Whale “lice” (cyamids) COI sequence 1st position Χ26 = 10.3 NS Looks a little 1st position asymmetric, but p > 0.05. Χ26 = 10.3 NS n=166 2nd position Χ26 = 6.7 NS 2nd position Χ26 = 6.7 NS 3rd position Χ26 = 73.5 3rd position Χ26 = 73.5 p<0.001 p<<0.001 3 Whale “lice” (cyamids) COI sequence 1st position Χ26 = 10.3 NS 2nd position Χ26 = 6.7 NS rd Truly departs 3 2 position from symmetry Χ 6 = 73.5 p<<0.001 n=848 Why do we care? • No easy solutions at this point – Time irreversibility is still difficult to implement • Nevertheless, knowledge helps – Gives clues to why we obtain odd results – In turn, lowers or heightens our faith in the results. – Inspires math nerds to come up with better (not bigger) models for us. • Use maximum likelihood: – Statistically robust to departures from model assumptions. Suggested Reading • • • • Rosenberg, MS. 2005.My SSP: Non-stationary evolutionary sequence simulation, including indels. Evolutionary Bioinfomatics Online. 81-83. Sudhindra R. Gadagkar* and S Kumar. 2005. Letter. Maximum Likelihood Outperforms Maximum Parsimony Even When Evolutionary Rates Are Heterotachous. Molecular Biology and Evolution 22(11):21392141; Jayaswal, V, LS Jermin, J Robinson. 2005. Estimation of Phylogeny Using a General Markov Model. Evolutionary Bioinformatics Online. 6280. Galtier, N Gouy, M. 1998. Inferring pattern and process: maximumlikelihood implementation of a nonhomogeneous model of DNA sequence evolution for phylogenetic analysis. Molecular Biology and Evolution 15(7):871-9. 4
© Copyright 2026 Paperzz