Calculating relative abundance values Let P be a ancestor-to-descendant substitution pattern with length L : P = b1 b2 . . . bL → b01 b02 . . . b0L Where b1 , bL ∈ [A, T, G, C] and all other bi ∈ [A, T, G, C, N ]. We can write each ancestordescendant nucleotide pair as Bi = bi → b0i . Then P = B1 B2 . . . BL Given a set of ancestor-descendant alignments, the proportion of P is the fraction of ancestral words that convert to the appropriate descendant sequence: pr(P ) = Number of observed b1 b2 ...bL →b01 b02 ...b0L Number of observed b1 b2 ...bL The normal recursive method for calculating relative abundance is: ( pr(P ) if L = 1 ρ(P ) = pr(P ) if L > 1 ψ(P ) (1) where ψ(P ) is the product of all elements in SP , the set of all subpatterns s of P : ψ(P ) = Y ρ(s) s∈SP SP contains all gapped and ungapped subpatterns, with N representing any base. We have proposed a different method of calculating relative abundance, which we refer to as the “seg algorithm.” If we let GP be the set of all full-length gapped subpatterns s of P , define a new function γ: γ(P ) = Y ρ(s) s∈GP The seg algorithm is: ρ(P ) = pr(P ) pr(P ) ψ(P ) pr(P )pr(B2 ...BL−1 ) pr(B1 ...BL−1 )pr(B2 ...BL )γ(P ) if L = 1 if L = 2 (2) if L > 2 The algorithms are the same for patterns of length 1 or 2 by definition. We can demonstrate by mathematical induction that they are also equal for all patterns with L > 2. 1 Justification of the “seg algorithm” Proof. Suppose that P is a substitution pattern with L = 3. Then from Equation 1, we have ρ(P ) = = = = pr(B1 B2 B3 ) ψ(B1 B2 B3 ) pr(B1 B2 B3 ) pr(B1 )pr(B2 )pr(B3 )ρ(B1 B2 )ρ(B1 N B3 )ρ(B2 B3 ) pr(B1 B2 B3 ) pr(B1 B2 ) pr(B1 N B3 ) pr(B2 B3 ) ][ ][ ] 1 )pr(B2 ) pr(B1 )pr(B3 ) pr(B2 )pr(B3 ) pr(B1 )pr(B2 )pr(B3 )[ pr(B pr(B1 B2 B3 )pr(B1 )pr(B2 )pr(B3 ) pr(B1 B2 )pr(B2 B3 )pr(B1 N B3 ) Similarly, using the same pattern P and Equation 2, ρ(P ) = = = = pr(B1 B2 B3 )pr(B2 ) pr(B1 B2 )pr(B2 B3 )γ(B1 B2 B3 ) pr(B1 B2 B3 )pr(B2 ) pr(B1 B2 )pr(B2 B3 )ρ(B1 N B3 ) pr(B1 B2 B3 )pr(B2 ) pr(B1 N B3) ] 1 )pr(B3 ) pr(B1 B2 )pr(B2 B3 )[ pr(B pr(B1 B2 B3 )pr(B1 )pr(B2 )pr(B3 ) pr(B1 B2 )pr(B2 B3 )pr(B1 N B3 ) Thus, Equations 1 and 2 are equal for patterns with L = 3. Inductive step. Suppose that Eq. 1 is equal to Eq. 2 for patterns of length n > 2. Then for a pattern P = B1 . . . Bn , combining the equations gives us the following inductive hypothesis: ρ(P ) = pr(P )pr(B2 . . . Bn−1 ) pr(P ) = ψ(P ) pr(B1 . . . Bn−1 )pr(B2 . . . Bn )γ(P ) (3) Assuming it works for P , we want to prove that this holds for a pattern P + , with length n + 1. Starting with the right side of Eq. 3, for P + we have: ρ(P + ) = pr(P + )pr(B2 ...Bn ) pr(B1 ...Bn )pr(B2 ...Bn+1 )γ(P + ) (4) = pr(P + )pr(B2 ...Bn ) pr(P )pr(B2 ...Bn+1 )γ(P + ) 2 Solving Eq. 3 for γ(P ) gives γ(P ) = ψ(P )pr(B2 . . . Bn−1 ) pr(B1 . . . Bn−1 )pr(B2 . . . Bn ) (5) Then from Eq. 4, using the expression for γ(P ) in Eq. 5 leads to pr(P + )pr(B2 ...Bn ) pr(P )pr(B2 ...Bn+1 )γ(P + ) pr(P + )pr(B2 ...Bn ) = ψ(P + )pr(B2 ...Bn ) ] 1 ...Bn )pr(B2 ...Bn+1 ) pr(P )pr(B2 ...Bn+1 )[ pr(B = pr(P + )pr(B1 ...Bn ) pr(P )ψ(P + ) = pr(P + )pr(P ) pr(P )ψ(P + ) = pr(P + ) ψ(P + ) We have shown that the two algorithms are equal for patterns of length 1, 2, and 3. We have also shown by induction that if they are equivalent for patterns of length n > 2, then they must also be equal for patterns of length n + 1. As such, we conclude that the algorithms are equivalent for substitution patterns of all lengths. 3
© Copyright 2025 Paperzz