Appendix A

Appendix A. EM algorithms for TDT1-NGS and TDT2-NGS
The incomplete-data likelihood function is written as
๐‘
15
3
๐ฟ(๐, ๐‘Ÿ1 , ๐‘Ÿ2 , ๐œ€0 , ๐œ€1 |๐‘ฟ, ๐‘ฝ) = โˆ {โˆ‘ ๐œ‹๐‘– (๐ , ๐‘Ÿ1 , ๐‘Ÿ2 ) โˆ ๐‘“๐ต (๐‘ฅ๐‘˜๐‘— |๐‘ฃ๐‘˜๐‘— , ๐œ€0 +
๐‘˜=1
๐‘–=1
๐‘—=1
1 โˆ’ ๐œ€0 โˆ’ ๐œ€1
๐‘”๐‘–๐‘— )}.
2
Let ๐’›๐‘˜ = (๐‘ง1๐‘˜ , โ‹ฏ , ๐‘ง15๐‘˜ ) be the missing indicator random vector representing the
membership of trio ๐‘˜. The value of ๐‘ง๐‘–๐‘˜ = 1 if and only if trio ๐‘˜ has the genotype
vector ๐‘ฎ๐‘– = (๐‘”๐‘–1 , ๐‘”๐‘–2 , ๐‘”๐‘–3 ). Otherwise, ๐‘ง๐‘–๐‘˜ = 0.
The complete-data likelihood function can be written as
๐ฟ๐ถ (๐, ๐‘Ÿ1 , ๐‘Ÿ2 , ๐œ€0 , ๐œ€1 |๐‘ฟ, ๐‘ฝ)
15
3
= โˆ๐‘
๐‘˜=1 {โˆ๐‘–=1 ๐œ‹๐‘– (๐, ๐‘Ÿ1 , ๐‘Ÿ2 ) โˆ๐‘—=1 ๐‘“๐ต (๐‘ฅ๐‘˜๐‘— |๐‘ฃ๐‘˜๐‘— , ๐œ€0 +
1โˆ’๐œ€0 โˆ’๐œ€1
2
๐‘”๐‘–๐‘— )}
๐‘ง๐‘–๐‘˜
The complete-data log-likelihood function is as follow:
๐‘™๐ถ (๐, ๐‘Ÿ1 , ๐‘Ÿ2 , ๐œ€0 , ๐œ€1 |๐‘ฟ, ๐‘ฝ)
15
3
= โˆ‘๐‘
๐‘˜=1 {โˆ‘๐‘–=1 ๐‘ง๐‘–๐‘˜ [log ๐œ‹๐‘– (๐, ๐‘Ÿ1 , ๐‘Ÿ2 ) + โˆ‘๐‘—=1 log ๐‘“๐ต (๐‘ฅ๐‘˜๐‘— |๐‘ฃ๐‘˜๐‘— , ๐œ€0 +
1โˆ’๐œ€0 โˆ’๐œ€1
2
๐‘”๐‘–๐‘— )]}
(A1)
E-step
(๐‘Ÿ)
(๐‘Ÿ)
Given ๐‘Ÿ1 , ๐‘Ÿ2 , ๐(๐‘Ÿ) , ๐œ€0 (๐‘Ÿ) , ๐œ€1 (๐‘Ÿ) that are obtained form the ๐‘Ÿ ๐‘กโ„Ž M-step of the algorithm
and the read counts ๐’™๐‘˜ , the missing random vector is distributed as a multinomial
distribution with the posterior probability estimates of trio k having the genotype
category ๐‘ฎ๐‘– in (๐‘Ÿ + 1)๐‘กโ„Ž step can be obtained by
(๐‘Ÿ)
(๐‘Ÿ)
๐œ๐‘–๐‘˜ (๐‘Ÿ+1) = Pr(trio ๐‘˜ โ€ฒ s genotype vector is ๐‘ฎ๐‘– |๐(๐‘Ÿ) , ๐‘Ÿ1 , ๐‘Ÿ2 , ๐œ€0 (๐‘Ÿ) , ๐œ€1 (๐‘Ÿ) , ๐’™๐‘˜ , ๐’—๐‘˜ )
(๐‘Ÿ)
=
(๐‘Ÿ)
๐œ‹๐‘– (๐(๐‘Ÿ) ,๐‘Ÿ1 ,๐‘Ÿ2 )๐‘“๐‘– (๐’™๐‘˜ ;๐’—๐‘˜ ,๐‘ฎ๐‘– ,๐œ€0 (๐‘Ÿ) ,๐œ€1 (๐‘Ÿ) )
(๐‘Ÿ)
(๐‘Ÿ)
(๐‘Ÿ)
(๐‘Ÿ) ,๐œ€ (๐‘Ÿ) )
โˆ‘15
1
๐‘ =1 ๐œ‹๐‘  (๐ ,๐‘Ÿ1 ,๐‘Ÿ2 )๐‘“๐‘  (๐’™๐‘˜ ;๐’—๐‘˜ ,๐‘ฎ๐‘— ,๐œ€0
(๐‘Ÿ)
=
(๐‘Ÿ) (๐‘Ÿ)
(๐‘Ÿ) (๐‘Ÿ)
(๐‘Ÿ) 1โˆ’๐œ€0 โˆ’๐œ€1
๐‘”๐‘–๐‘— )
2
๐œ‡๐‘” ๐‘” Pr(๐‘”๐‘–3 |๐‘”๐‘–1 ,๐‘”๐‘–2 ,๐‘Ÿ1 ,๐‘Ÿ2 ) โˆ3๐‘—=1 ๐‘“๐ต (๐‘ฅ๐‘˜๐‘— |๐‘ฃ๐‘˜๐‘— ,๐œ€0 +
๐‘–1 ๐‘–2
(๐‘Ÿ)
(๐‘Ÿ) (๐‘Ÿ)
(๐‘Ÿ)
(๐‘Ÿ)
.
(๐‘Ÿ) (๐‘Ÿ)
(๐‘Ÿ) 1โˆ’๐œ€0 โˆ’๐œ€1
๐‘”๐‘ ๐‘— )
2
3
โˆ‘15
๐‘ =1 ๐œ‡๐‘”๐‘ 1 ๐‘”๐‘ 2 Pr(๐‘”๐‘ 3 |๐‘”๐‘ 1 ,๐‘”๐‘ 2 ,๐‘Ÿ1 ,๐‘Ÿ2 ) โˆ๐‘—=1 ๐‘“๐ต (๐‘ฅ๐‘˜๐‘— |๐‘ฃ๐‘˜๐‘— ,๐œ€0 +
Therefore, ๐ธ(๐‘ง๐‘–๐‘˜ |๐‘Ÿ1 , ๐‘Ÿ2 , ๐(๐‘Ÿ) , ๐œ€0 (๐‘Ÿ) , ๐œ€1 (๐‘Ÿ) , ๐’™๐‘˜ , ๐’—๐‘˜ ) = ๐œ๐‘–๐‘˜ (๐‘Ÿ+1) .
M-step
Let ๐‘„ (๐‘Ÿ+1) be the conditional expectation of the complete-data log-likelihood
function:
(๐‘Ÿ)
(๐‘Ÿ)
(๐‘Ÿ)
(๐‘Ÿ)
๐‘„ (๐‘Ÿ+1) = ๐ธ[๐‘™๐ถ |๐(๐‘Ÿ) , ๐‘Ÿ1 , ๐‘Ÿ2 , ๐œ€0 , ๐œ€1 , ๐’™, ๐’—]
(๐‘Ÿ+1)
15
= โˆ‘๐‘
๐‘˜=1 โˆ‘๐‘ =1 ๐œ๐‘ ๐‘˜
1โˆ’๐œ€0 โˆ’๐œ€1
2
[log ๐œ‡๐‘”๐‘ 1 ๐‘”๐‘ 2 + log Pr(๐‘”๐‘ 3 |๐‘”๐‘ 1 , ๐‘”๐‘ 2 , ๐‘Ÿ1 , ๐‘Ÿ2 ) + โˆ‘3๐‘—=1 log ๐‘“๐ต (๐‘ฅ๐‘˜๐‘— |๐‘ฃ๐‘˜๐‘— , ๐œ€0 +
๐‘”๐‘ ๐‘— )]
M-step for mating frequencies
To derive the (๐‘Ÿ + 1)๐‘กโ„Ž step estimates of mating frequencies, differentiating ๐‘„ (๐‘Ÿ)
with respect to ๐œ‡๐‘–๐‘— . It is obtained that
๐œ•๐‘„ (๐‘Ÿ+1)
๐œ•๐œ‡๐‘–๐‘—
๐œ•
15
= โˆ‘๐‘
๐‘˜=1 โˆ‘๐‘ =1 ๐œ๐‘ ๐‘˜ ๐œ•๐œ‡ [log ๐œ‡๐‘”๐‘ 1 ๐‘”๐‘ 2 ] =
โˆ‘15
๐‘ =1 ๐œ๐‘ + ๐ผ(๐‘”๐‘ 1 =๐‘–,๐‘”๐‘ 2 =๐‘—)
๐œ‡๐‘–๐‘—
๐‘–๐‘—
Since โˆ‘ ๐œ‡๐‘–๐‘— = 1, using the Lagrange multiplier ๐œ†,
๐œ‡๐‘–๐‘— =
โˆ‘15
๐‘ =1 ๐œ๐‘ + ๐ผ(๐‘”๐‘ 1 =๐‘–,๐‘”๐‘ 2 =๐‘—)
๐œ†
๐œ•๐‘„ (๐‘Ÿ)
๐œ•๐œ‡๐‘–๐‘—
.
โˆ’ ๐œ† = 0.
. From โˆ‘ ๐œ‡๐‘–๐‘— = 1, the Lagrange multiplier is equal to the
total number of trios ๐œ† = ๐‘ and the (๐‘Ÿ + 1)๐‘กโ„Ž step estimates of the mating
frequencies are
(๐‘Ÿ+1)
๐œ‡๐‘–๐‘—
(๐‘Ÿ+1)
โˆ‘15
๐ผ(๐‘”๐‘ 1 = ๐‘–, ๐‘”๐‘ 2 = ๐‘—)
๐‘ =1 ๐œ๐‘ +
=
.
๐‘
When mating symmetric ๐œ‡๐‘–๐‘— = ๐œ‡๐‘—๐‘– ,
๐œ•๐‘„ (๐‘Ÿ+1)
๐œ•๐œ‡๐‘–๐‘—
=
โˆ‘15
๐‘ =1 ๐œ๐‘ + {๐ผ(๐‘”๐‘ 1 =๐‘–,๐‘”๐‘ 2 =๐‘—)+๐ผ(๐‘”๐‘ 1 =๐‘—,๐‘”๐‘ 2 =๐‘–)}
๐œ‡๐‘–๐‘—
for ๐‘– โ‰  ๐‘— and
๐œ•๐‘„ (๐‘Ÿ+1)
๐œ•๐œ‡๐‘–๐‘–
=
โˆ‘15
๐‘ =1 ๐œ๐‘ + ๐ผ(๐‘”๐‘ 1 =๐‘–,๐‘”๐‘ 2 =๐‘–)
๐œ‡๐‘–๐‘–
.
Since โˆ‘ ๐œ‡๐‘–๐‘— = โˆ‘3๐‘–=1 ๐œ‡๐‘–๐‘– + 2 โˆ‘๐‘–<๐‘— ๐œ‡๐‘–๐‘— = 1,
๐œ•๐‘„ (๐‘Ÿ+1)
๐œ•๐œ‡๐‘–๐‘–
๐œ‡๐‘–๐‘– =
= ๐œ† and
๐œ•๐‘„ (๐‘Ÿ)
๐œ•๐œ‡๐‘–๐‘—
= 2๐œ†. This implies that
โˆ‘15
๐‘ =1 ๐œ๐‘ + ๐ผ(๐‘”๐‘ 1 =๐‘–,๐‘”๐‘ 2 =๐‘–)
๐œ†
=
โˆ‘15
๐‘ =1 ๐œ๐‘ + {๐ผ(๐‘”๐‘ 1 =๐‘–,๐‘”๐‘ 2 =๐‘–)+๐ผ(๐‘”๐‘ 1 =๐‘–,๐‘”๐‘ 2 =๐‘–)}
โˆ‘15
๐‘ =1 ๐œ๐‘ + {๐ผ(๐‘”๐‘ 1 =๐‘–,๐‘”๐‘ 2 =๐‘—)+๐ผ(๐‘”๐‘ 1 =๐‘—,๐‘”๐‘ 2 =๐‘–)}
2๐œ†
2๐œ†
for ๐‘– โ‰  ๐‘—.
and ๐œ‡๐‘–๐‘— =
Similarly, ๐œ† = ๐‘ and
(๐‘Ÿ+1)
โˆ‘15
{๐ผ(๐‘”๐‘ 1 = ๐‘–, ๐‘”๐‘ 2 = ๐‘—) + ๐ผ(๐‘”๐‘ 1 = ๐‘—, ๐‘”๐‘ 2 = ๐‘–)}
๐‘ =1 ๐œ๐‘ +
=
.
2๐‘
(๐‘Ÿ+1)
๐œ‡๐‘–๐‘—
M-step for error parameters
For symmetric error model ๐œ€ = ๐œ€0 = ๐œ€1 ,
๐œ•๐‘„ (๐‘Ÿ+1)
๐œ•๐œ€
๐œ•
๐œ•
(๐‘Ÿ+1)
15
= ๐œ•๐œ€ โˆ‘๐‘
๐‘˜=1 โˆ‘๐‘ =1 ๐œ๐‘ ๐‘˜
(๐‘Ÿ+1)
=
1
(๐‘Ÿ+1)
[โˆ‘3๐‘—=1 log ๐‘“๐ต (๐‘ฅ๐‘˜๐‘— |๐‘ฃ๐‘˜๐‘— , ๐œ€ + (2 โˆ’ ๐œ€)๐‘”๐‘ ๐‘— )]
15
= ๐œ•๐œ€ โˆ‘๐‘
๐‘˜=1 โˆ‘๐‘ =1 ๐œ๐‘ ๐‘˜
15
โˆ‘๐‘
๐‘˜=1 โˆ‘๐‘ =1 ๐œ๐‘ ๐‘˜
[โˆ‘3๐‘—=1 ๐‘ฅ๐‘˜๐‘— ๐ผ(๐‘”๐‘ ๐‘— = 0) log ๐œ€ + (๐‘ฃ๐‘˜๐‘— โˆ’ ๐‘ฅ๐‘˜๐‘— )๐ผ(๐‘”๐‘ ๐‘— = 2) log(1 โˆ’ ๐œ€)]
[โˆ‘3๐‘—=1 ๐‘ฅ๐‘˜๐‘— ๐ผ(๐‘”๐‘ ๐‘— =0)]
๐œ€
(๐‘Ÿ+1)
โˆ’
15
โˆ‘๐‘
๐‘˜=1 โˆ‘๐‘ =1 ๐œ๐‘ ๐‘˜
[โˆ‘3๐‘—=1(๐‘ฃ๐‘˜๐‘— โˆ’๐‘ฅ๐‘˜๐‘— )๐ผ(๐‘”๐‘ ๐‘— =2)]
1โˆ’๐œ€
=0
Therefore,
๐œ€
(๐‘Ÿ+1)
=
15
3
(๐‘Ÿ+1)
โˆ‘๐‘
{๐‘ฅ๐‘˜๐‘— ๐ผ(๐‘”๐‘–๐‘— = 0) + (๐‘ฃ๐‘˜๐‘— โˆ’ ๐‘ฅ๐‘˜๐‘— )๐ผ(๐‘”๐‘–๐‘— = 2)}
๐‘˜=1 โˆ‘๐‘–=1 โˆ‘๐‘—=1 ๐œ๐‘–๐‘˜
15
3
(๐‘Ÿ+1) ๐‘ฃ {๐ผ(๐‘” = 0) + ๐ผ(๐‘” = 2)}
โˆ‘๐‘
๐‘˜๐‘—
๐‘–๐‘—
๐‘–๐‘—
๐‘˜=1 โˆ‘๐‘–=1 โˆ‘๐‘—=1 ๐œ๐‘–๐‘˜
.
For asymmetric error model, it is necessary to introduce another missing random
vector ๐’˜ = (๐‘ค1 , ๐‘ค2 ) for genotype 1. For genotype 1, define ๐‘ค1 to be the number of
correct reads of allele A and ๐‘ค2 to be the number of correct reads of allele B. The
total number of reads ๐‘ฃ๐‘˜ = ๐‘ค2๐‘˜ + (๐‘ฅ๐‘˜ โˆ’ ๐‘ค1๐‘˜ ) + (๐‘ฃ๐‘˜ โˆ’ ๐‘ฅ๐‘˜ โˆ’ ๐‘ค2๐‘˜ ) + ๐‘ค1๐‘˜ . The random
vector (๐‘ค2๐‘˜ , ๐‘ฅ๐‘˜ โˆ’ ๐‘ค1๐‘˜ , ๐‘ฃ๐‘˜ โˆ’ ๐‘ฅ๐‘˜ โˆ’ ๐‘ค2๐‘˜ , ๐‘ค1๐‘˜ ) follows a multinomial distribution with
the probability vector (
1โˆ’๐œ€0 ๐œ€0 ๐œ€1 1โˆ’๐œ€1
,
2
2
,
2
,
2
) unconditionally. The complete-data log-
likelihood for asymmetric error parameters can be written as
๐‘™๐ถโˆ— (๐, ๐‘Ÿ1 , ๐‘Ÿ2 , ๐œ€0 , ๐œ€1 |๐‘ฟ, ๐‘ฝ)
15
3
๐‘ฅ๐‘˜๐‘—
(1 โˆ’ ๐œ€0 )๐‘ฃ๐‘˜๐‘— โˆ’๐‘ฅ๐‘˜๐‘— } +
= โˆ‘๐‘
๐‘˜=1 ๐‘ง๐‘–๐‘˜ {โˆ‘๐‘–=1 [log ๐œ‹๐‘– (๐, ๐‘Ÿ1 , ๐‘Ÿ2 ) + โˆ‘๐‘—=1 (๐ผ(๐‘”๐‘–๐‘— = 0) log{(๐œ€0 )
1โˆ’๐œ€0 ๐‘ค2๐‘˜๐‘—
๐ผ(๐‘”๐‘–๐‘— = 1) log {(
2
)
๐œ€
( 20 )
๐‘ฅ๐‘˜๐‘— โˆ’๐‘ค1๐‘˜๐‘—
๐œ€
๐‘ฃ๐‘˜๐‘— โˆ’๐‘ฅ๐‘˜๐‘— โˆ’๐‘ค2๐‘˜๐‘—
( 21 )
(
1โˆ’๐œ€1 ๐‘ค1๐‘˜๐‘—
2
)
2) log{(1 โˆ’ ๐œ€1 )๐‘ฅ๐‘˜๐‘— (๐œ€1 )๐‘ฃ๐‘˜๐‘—โˆ’๐‘ฅ๐‘˜๐‘— })]}.
} + ๐ผ(๐‘”๐‘–๐‘— =
(A2)
The conditional expectation of ๐‘ค1๐‘˜๐‘— given the ๐‘Ÿ ๐‘กโ„Ž estimates of the parameters and
(๐‘Ÿ)
(๐‘Ÿ)
(๐‘Ÿ)
(๐‘Ÿ)
the data is ๐ธ(๐‘ค1๐‘˜๐‘— |๐(๐‘Ÿ) , ๐‘Ÿ1 , ๐‘Ÿ2 , ๐œ€0 , ๐œ€1 , ๐’™, ๐’—) = ๐‘ฅ๐‘˜๐‘—
(๐‘Ÿ)
1โˆ’๐œ€1
(๐‘Ÿ)
(๐‘Ÿ)
๐œ€0 +1โˆ’๐œ€1
and that of ๐‘ค2๐‘˜๐‘— is
(๐‘Ÿ)
(๐‘Ÿ)
(๐‘Ÿ)
(๐‘Ÿ)
1โˆ’๐œ€0
(๐‘Ÿ)
๐ธ(๐‘ค2๐‘˜๐‘— |๐(๐‘Ÿ) , ๐‘Ÿ1 , ๐‘Ÿ2 , ๐œ€0 , ๐œ€1 , ๐’™, ๐’—) = (๐‘ฃ๐‘˜๐‘— โˆ’ ๐‘ฅ๐‘˜๐‘— )
(๐‘Ÿ)
(๐‘Ÿ)
๐œ€1 +1โˆ’๐œ€0
. The conditional
expectation of Equation (A2) can be given by
(๐‘Ÿ+1)
๐‘„ โˆ—(๐‘Ÿ+1) = โˆ‘๐‘
๐‘˜=1 ๐œ๐‘–๐‘˜
3
{โˆ‘15
๐‘–=1 [log ๐œ‹๐‘– (๐, ๐‘Ÿ1 , ๐‘Ÿ2 ) + โˆ‘๐‘—=1 (๐ผ(๐‘”๐‘–๐‘— = 0){๐‘ฅ๐‘˜๐‘— log ๐œ€0 +
(๐‘Ÿ)
(๐‘ฃ๐‘˜๐‘— โˆ’ ๐‘ฅ๐‘˜๐‘— ) log(1 โˆ’ ๐œ€0 )} + ๐ผ(๐‘”๐‘–๐‘— = 1) {(๐‘ฃ๐‘˜๐‘— โˆ’ ๐‘ฅ๐‘˜๐‘— )
(๐‘Ÿ)
๐‘ฅ๐‘˜๐‘—
๐œ€0
(๐‘Ÿ)
1โˆ’๐œ€0
(๐‘Ÿ)
(๐‘Ÿ)
๐œ€1 +1โˆ’๐œ€0
(๐‘Ÿ)
(๐‘Ÿ)
(๐‘Ÿ) log ๐œ€0 + (๐‘ฃ๐‘˜๐‘— โˆ’ ๐‘ฅ๐‘˜๐‘— )
๐œ€0 +1โˆ’๐œ€1
๐œ€1
(๐‘Ÿ)
log(1 โˆ’ ๐œ€0 ) +
(๐‘Ÿ) log ๐œ€1 + ๐‘ฅ๐‘˜๐‘—
๐œ€1 +1โˆ’๐œ€0
1โˆ’๐œ€1
(๐‘Ÿ)
(๐‘Ÿ)
๐œ€0 +1โˆ’๐œ€1
log(1 โˆ’ ๐œ€1 )} +
๐ผ(๐‘”๐‘–๐‘— = 2){๐‘ฅ๐‘˜๐‘— log(1 โˆ’ ๐œ€1 ) + (๐‘ฃ๐‘˜๐‘— โˆ’ ๐‘ฅ๐‘˜๐‘— ) log ๐œ€1 })]}.
By differentiating ๐‘„ โˆ—(๐‘Ÿ+1) with respect to the error parameters,
๐œ•๐‘„ โˆ—(๐‘Ÿ+1)
๐œ•๐œ€1
๐œ•๐‘„ โˆ—(๐‘Ÿ+1)
๐œ•๐œ€0
= 0 and
= 0 yield
๐œ€0 (๐‘Ÿ+1)
๐œ€0 (๐‘Ÿ)
๐ผ(๐‘”๐‘–๐‘— = 1)}
๐œ€0 (๐‘Ÿ) + 1 โˆ’ ๐œ€1 (๐‘Ÿ)
=
,
๐œ€0 (๐‘Ÿ)
๐‘ฅ๐‘˜๐‘— {๐ผ(๐‘”๐‘–๐‘— = 0) + (๐‘Ÿ)
๐ผ(๐‘”๐‘–๐‘— = 1)}
๐œ€0 + 1 โˆ’ ๐œ€1 (๐‘Ÿ)
15 โˆ‘3
(๐‘Ÿ+1)
โˆ‘๐‘
โˆ‘
๐œ
๐‘˜=1 ๐‘–=1 ๐‘—=1 ๐‘–๐‘˜
1 โˆ’ ๐œ€0 (๐‘Ÿ)
+(๐‘ฃ๐‘˜๐‘— โˆ’ ๐‘ฅ๐‘˜๐‘— ) {๐ผ(๐‘”๐‘–๐‘— = 0) + ( (๐‘Ÿ)
)๐ผ(๐‘”๐‘–๐‘— = 1)}
๐œ€1 + 1 โˆ’ ๐œ€0 (๐‘Ÿ)
[
]
15
3
(๐‘Ÿ+1)
โˆ‘๐‘
๐‘ฅ๐‘˜๐‘— {๐ผ(๐‘”๐‘–๐‘— = 0) +
๐‘˜=1 โˆ‘๐‘–=1 โˆ‘๐‘—=1 ๐œ๐‘–๐‘˜
๐œ€1 (๐‘Ÿ+1)
๐œ€1 (๐‘Ÿ)
๐ผ(๐‘”๐‘–๐‘— = 1)}
๐œ€1 (๐‘Ÿ) + 1 โˆ’ ๐œ€0 (๐‘Ÿ)
=
.
๐œ€ (๐‘Ÿ)
(๐‘ฃ๐‘˜๐‘— โˆ’ ๐‘ฅ๐‘˜๐‘— ) {๐ผ(๐‘”๐‘–๐‘— = 2) + (๐‘Ÿ) 1
๐ผ(๐‘”
=
1)}
๐‘–๐‘—
๐œ€1 + 1 โˆ’ ๐œ€0 (๐‘Ÿ)
15 โˆ‘3
(๐‘Ÿ+1)
โˆ‘๐‘
โˆ‘
๐œ
๐‘˜=1 ๐‘–=1 ๐‘—=1 ๐‘–๐‘˜
1 โˆ’ ๐œ€1 (๐‘Ÿ)
+๐‘ฅ๐‘˜๐‘— {๐ผ(๐‘”๐‘–๐‘— = 2) + ( (๐‘Ÿ)
)๐ผ(๐‘”๐‘–๐‘— = 1)}
๐œ€0 + 1 โˆ’ ๐œ€1 (๐‘Ÿ)
[
]
15
3
(๐‘Ÿ+1)
โˆ‘๐‘
(๐‘ฃ๐‘˜๐‘— โˆ’ ๐‘ฅ๐‘˜๐‘— ) {๐ผ(๐‘”๐‘–๐‘— = 2) +
๐‘˜=1 โˆ‘๐‘–=1 โˆ‘๐‘—=1 ๐œ๐‘–๐‘˜
M-step for transmission parameters
๐‘Ÿ
For multiplicative model ๐‘Ÿ1 = ๐‘Ÿ and ๐‘Ÿ2 = ๐‘Ÿ 2 , let ๐‘ก = 1+๐‘Ÿ.
log Pr(๐‘”๐‘–3 |๐‘”๐‘–1 , ๐‘”๐‘–2 , ๐‘Ÿ1 , ๐‘Ÿ2 ) =
{
๐ผ(๐‘ฎ๐‘– = (0,1,1)) + ๐ผ(๐‘ฎ๐‘– = (1,0,1)) + ๐ผ(๐‘ฎ๐‘– = (1,1,1))
} log ๐‘ก +
+2๐ผ(๐‘ฎ๐‘– = (1,1,2)) + ๐ผ(๐‘ฎ๐‘– = (1,2,2)) + ๐ผ(๐‘ฎ๐‘– = (2,1,2))
{
๐ผ(๐‘ฎ๐‘– = (0,1,0)) + ๐ผ(๐‘ฎ๐‘– = (1,0,0)) + 2๐ผ(๐‘ฎ๐‘– = (1,1,0))
} log(1 โˆ’ ๐‘ก) + ๐ถ๐‘œ๐‘›๐‘ ๐‘ก.
+๐ผ(๐‘ฎ๐‘– = (1,1,1)) + ๐ผ(๐‘ฎ๐‘– = (1,2,1)) + ๐ผ(๐‘ฎ๐‘– = (2,1,1))
๐œ•๐‘„ (๐‘Ÿ+1)
๐œ•๐‘ก
15
= โˆ‘ ๐œ๐‘–+ (๐‘Ÿ+1) {
๐‘–=1
15
โˆ’ โˆ‘ ๐œ๐‘–+ (๐‘Ÿ+1) {
๐‘–=1
๐ผ(๐‘ฎ๐‘– = (0,1,1)) + ๐ผ(๐‘ฎ๐‘– = (1,0,1)) + ๐ผ(๐‘ฎ๐‘– = (1,1,1)) 1
}
+2๐ผ(๐‘ฎ๐‘– = (1,1,2)) + ๐ผ(๐‘ฎ๐‘– = (1,2,2)) + ๐ผ(๐‘ฎ๐‘– = (2,1,2)) ๐‘ก
๐ผ(๐‘ฎ๐‘– = (0,1,0)) + ๐ผ(๐‘ฎ๐‘– = (1,0,0)) + 2๐ผ(๐‘ฎ๐‘– = (1,1,0))
1
}
= 0.
+๐ผ(๐‘ฎ๐‘– = (1,1,1)) + ๐ผ(๐‘ฎ๐‘– = (1,2,1)) + ๐ผ(๐‘ฎ๐‘– = (2,1,1)) 1 โˆ’ ๐‘ก
๐ผ(๐‘ฎ๐‘– = (0,1,1)) + ๐ผ(๐‘ฎ๐‘– = (1,0,1)) + ๐ผ(๐‘ฎ๐‘– = (1,1,1))
}
+2๐ผ(๐‘ฎ๐‘– = (1,1,2)) + ๐ผ(๐‘ฎ๐‘– = (1,2,2)) + ๐ผ(๐‘ฎ๐‘– = (2,1,2))
.
๐ผ(๐‘ฎ๐‘– = (0,1,1)) + ๐ผ(๐‘ฎ๐‘– = (1,0,1)) + ๐ผ(๐‘ฎ๐‘– = (1,1,1))
+2๐ผ(๐‘ฎ๐‘– = (1,1,2)) + ๐ผ(๐‘ฎ๐‘– = (1,2,2)) + ๐ผ(๐‘ฎ๐‘– = (2,1,2))
(๐‘Ÿ+1)
โˆ‘15
{
๐‘–=1 ๐œ๐‘–+
๐‘ก (๐‘Ÿ+1) =
(๐‘Ÿ+1)
โˆ‘15
๐‘–=1 ๐œ๐‘–+
+๐ผ(๐‘ฎ๐‘– = (0,1,0)) + ๐ผ(๐‘ฎ๐‘– = (1,0,0)) + 2๐ผ(๐‘ฎ๐‘– = (1,1,0))
{ +๐ผ(๐‘ฎ๐‘– = (1,1,1)) + ๐ผ(๐‘ฎ๐‘– = (1,2,1)) + ๐ผ(๐‘ฎ๐‘– = (2,1,1)) }
Therefore,
(๐‘Ÿ+1)
๐‘Ÿ1
๐ผ(๐‘ฎ๐‘– = (0,1,1)) + ๐ผ(๐‘ฎ๐‘– = (1,0,1)) + ๐ผ(๐‘ฎ๐‘– = (1,1,1))
}
+2๐ผ(๐‘ฎ๐‘– = (1,1,2)) + ๐ผ(๐‘ฎ๐‘– = (1,2,2)) + ๐ผ(๐‘ฎ๐‘– = (2,1,2))
=
,
๐ผ(๐‘ฎ๐‘– = (0,1,0)) + ๐ผ(๐‘ฎ๐‘– = (1,0,0)) + 2๐ผ(๐‘ฎ๐‘– = (1,1,0))
15
(๐‘Ÿ+1)
โˆ‘๐‘–=1 ๐œ๐‘–+
{
}
+๐ผ(๐‘ฎ๐‘– = (1,1,1)) + ๐ผ(๐‘ฎ๐‘– = (1,2,1)) + ๐ผ(๐‘ฎ๐‘– = (2,1,1))
(๐‘Ÿ+1)
โˆ‘15
{
๐‘–=1 ๐œ๐‘–+
(๐‘Ÿ+1)
๐‘Ÿ2
(๐‘Ÿ+1) 2
= (๐‘Ÿ1
) .
๐‘Ÿ
For genetic model-free TDT2-NGS, let ๐›ผ = log ๐‘Ÿ1 and ๐›ฝ = log ๐‘Ÿ2.
1
log Pr(๐‘”๐‘–3 |๐‘”๐‘–1 , ๐‘”๐‘–2 , ๐‘Ÿ1 , ๐‘Ÿ2 ) = {๐ผ(๐‘ฎ๐‘– = (0,1,1)) + ๐ผ(๐‘ฎ๐‘– = (1,0,1)) + ๐ผ(๐‘ฎ๐‘– = (1,1,1)) +
๐ผ(๐‘ฎ๐‘– = (1,1,2))}๐›ผ โˆ’ {๐ผ(๐‘ฎ๐‘– = (0,1,1)) + ๐ผ(๐‘ฎ๐‘– = (1,0,1)) + ๐ผ(๐‘ฎ๐‘– = (0,1,0)) +
๐ผ(๐‘ฎ๐‘– = (1,0,0))} log(1 + ๐‘’ ๐›ผ ) + {๐ผ(๐‘ฎ๐‘– = (1,2,2)) + ๐ผ(๐‘ฎ๐‘– = (2,1,2)) + ๐ผ(๐‘ฎ๐‘– =
(1,1,2))}๐›ฝ โˆ’ {๐ผ(๐‘ฎ๐‘– = (1,2,2)) + ๐ผ(๐‘ฎ๐‘– = (2,1,2)) + ๐ผ(๐‘ฎ๐‘– = (1,2,1)) + ๐ผ(๐‘ฎ๐‘– =
(2,1,1))} log(1 + ๐‘’ ๐›ฝ ) โˆ’ {๐ผ(๐‘ฎ๐‘– = (1,1,0)) + ๐ผ(๐‘ฎ๐‘– = (1,1,1)) + ๐ผ(๐‘ฎ๐‘– =
(1,1,2))} log(1 + 2๐‘’ ๐›ผ + ๐‘’ ๐›ผ+๐›ฝ )
There is no closed iteration formula in this case. To find the (๐‘Ÿ + 1)๐‘กโ„Ž step estimates
of the transmission parameters, the Fisherโ€™s scoring method is applied. For
(๐‘Ÿ+1)
simplicity, let ๐‘Ž = (โˆ‘15
{๐ผ(๐‘ฎ๐‘– = (0,1,0)) + ๐ผ(๐‘ฎ๐‘– =
๐‘–=1 ๐œ๐‘–+
๐‘‡
(๐‘Ÿ+1)
(1,0,0))} , โˆ‘15
{๐ผ(๐‘ฎ๐‘– = (0,1,1)) + ๐ผ(๐‘ฎ๐‘– = (1,0,1))}) , ๐‘Ž+ = ๐‘Ž1 + ๐‘Ž2,
๐‘–=1 ๐œ๐‘–+
(๐‘Ÿ+1)
(๐‘Ÿ+1)
(๐‘Ÿ+1)
๐‘ = (โˆ‘15
๐ผ(๐‘ฎ๐‘– = (1,1,0)) , โˆ‘15
๐ผ(๐‘ฎ๐‘– = (1,1,1)) , โˆ‘15
๐ผ(๐‘ฎ๐‘– =
๐‘–=1 ๐œ๐‘–+
๐‘–=1 ๐œ๐‘–+
๐‘–=1 ๐œ๐‘–+
๐‘‡
(1,1,2))) , ๐‘+ = ๐‘1 + ๐‘2 + ๐‘3,
(๐‘Ÿ+1)
(๐‘Ÿ+1)
๐‘ = (โˆ‘15
{๐ผ(๐‘ฎ๐‘– = (1,2,1)) + ๐ผ(๐‘ฎ๐‘– = (2,1,1))} , โˆ‘15
{๐ผ(๐‘ฎ๐‘– =
๐‘–=1 ๐œ๐‘–+
๐‘–=1 ๐œ๐‘–+
(1,2,2)) + ๐ผ(๐‘ฎ๐‘– = (2,1,2))}) , ๐‘+ = ๐‘1 + ๐‘2 .
The first and second derivatives of ๐‘„ (๐‘Ÿ+1) in ๐›ผ and ๐›ฝ can be obtained as
๐‘‡
๐œ•๐‘„ (๐‘Ÿ+1) ๐œ•๐‘„ (๐‘Ÿ+1)
(
,
)
๐œ•๐‘Ÿ1
๐œ•๐‘Ÿ2
๐‘’๐›ผ
2๐‘’ ๐›ผ + ๐‘’ ๐›ผ+๐›ฝ
๐‘’๐›ฝ
= (๐‘Ž2 โˆ’ ๐‘Ž+
+ ๐‘2 + ๐‘3 โˆ’ ๐‘+
, ๐‘ โˆ’ ๐‘+
1 + ๐‘’๐›ผ
1 + 2๐‘’ ๐›ผ + ๐‘’ ๐›ผ+๐›ฝ 1
1 + ๐‘’๐›ฝ
๐‘‡
๐‘’ ๐›ผ+๐›ฝ
+ ๐‘3 โˆ’ ๐‘+
)
1 + 2๐‘’ ๐›ผ + ๐‘’ ๐›ผ+๐›ฝ
๐œ•2 ๐‘„ (๐‘Ÿ+1)
๐œ•2 ๐‘„(๐‘Ÿ+1)
๐œ•๐›ผ2
๐œ•๐›ผ๐œ•๐›ฝ
๐œ•2 ๐‘„ (๐‘Ÿ+1)
๐œ•2 ๐‘„(๐‘Ÿ+1)
๐œ•๐›ผ๐œ•๐›ฝ
๐œ•๐›ฝ 2
(
)=
๐‘’๐›ผ
๐‘Ž+ (1+๐‘’ ๐›ผ)2 + ๐‘+
๐‘+
(
(๐‘Ÿ),๐‘ 
Let ๐‘Ÿ1
2๐‘’ ๐›ผ +๐‘’ ๐›ผ+๐›ฝ
๐‘+
2
(1+2๐‘’ ๐›ผ +๐‘’ ๐›ผ+๐›ฝ )
๐‘’ ๐›ผ+๐›ฝ
๐‘+
2
(1+2๐‘’ ๐›ผ +๐‘’ ๐›ผ+๐›ฝ )
(๐‘Ÿ),๐‘ 
= ๐‘’ ๐›ผ and ๐‘Ÿ2
scoring algorithm.
(๐‘Ÿ),๐‘  ๐›ฝ
= ๐‘Ÿ1
๐‘’ ๐›ผ+๐›ฝ
(1+2๐‘’ ๐›ผ +๐‘’ ๐›ผ+๐›ฝ )
๐‘’ ๐›ผ ๐‘’ ๐›ผ+๐›ฝ
(๐‘Ÿ),๐‘ 
(๐‘Ÿ),๐‘  2
(๐‘Ÿ1 +๐‘Ÿ2 )
+ ๐‘+
2
,
(1+2๐‘’ ๐›ผ )๐‘’ ๐›ผ+๐›ฝ
2
(1+2๐‘’ ๐›ผ +๐‘’ ๐›ผ+๐›ฝ )
)
๐‘’ for the ๐‘  ๐‘กโ„Ž step iteration in the following Fisherโ€™s
๐›ฟ (๐‘Ÿ),๐‘  = (๐‘Ž2 โˆ’ ๐‘Ž+
๐‘Ÿ2
(๐‘Ÿ),๐‘ 
1+2๐‘Ÿ1
(๐‘Ÿ),๐‘ 
(๐‘Ÿ),๐‘  + ๐‘2 + ๐‘3 โˆ’ ๐‘+
1+๐‘Ÿ1
(๐‘Ÿ),๐‘ 
+๐‘Ÿ2
(๐‘Ÿ),๐‘ 
1+2๐‘Ÿ1
(๐‘Ÿ),๐‘ 
+๐‘Ÿ2
(๐‘Ÿ),๐‘ 
(๐‘Ÿ),๐‘  , ๐‘1 โˆ’ ๐‘+
+๐‘Ÿ2
๐‘Ÿ2
(๐‘Ÿ),๐‘ 
๐‘Ÿ1
(๐‘Ÿ),๐‘ 
+๐‘Ÿ2
+ ๐‘3 โˆ’
)
(๐‘Ÿ),๐‘ 
๐‘Ž+
๐ป (๐‘Ÿ),๐‘  =
2๐‘Ÿ1
๐‘‡
(๐‘Ÿ),๐‘ 
๐‘+
(๐‘Ÿ),๐‘ 
๐‘Ÿ1
(๐‘Ÿ),๐‘ 
๐‘Ÿ1
(๐‘Ÿ),๐‘  2
(1+๐‘Ÿ1 )
+ ๐‘+
2๐‘Ÿ1
(๐‘Ÿ),๐‘ 
(๐‘Ÿ),๐‘ 
+๐‘Ÿ2
๐‘+
(๐‘Ÿ),๐‘ 
(๐‘Ÿ),๐‘  2
(1+2๐‘Ÿ1 +๐‘Ÿ2 )
(๐‘Ÿ),๐‘ 
(
๐‘+
๐‘Ÿ2
(๐‘Ÿ),๐‘ 
(1+2๐‘Ÿ1
๐‘+
(๐‘Ÿ),๐‘  2
+๐‘Ÿ2
)
๐‘Ÿ2
(๐‘Ÿ),๐‘ 
(1+2๐‘Ÿ1
(๐‘Ÿ),๐‘  (๐‘Ÿ),๐‘ 
๐‘Ÿ2
(๐‘Ÿ),๐‘ 
(๐‘Ÿ),๐‘  2
(๐‘Ÿ1 +๐‘Ÿ2 )
๐‘Ÿ1
(๐‘Ÿ),๐‘  2
+๐‘Ÿ2
)
(๐‘Ÿ),๐‘ 
+ ๐‘+
(1+2๐‘Ÿ1
(๐‘Ÿ),๐‘ 
(1+2๐‘Ÿ1
,
(๐‘Ÿ),๐‘ 
)๐‘Ÿ2
(๐‘Ÿ),๐‘  2
+๐‘Ÿ2
)
)
โˆ’1
๐›ฟ (๐‘Ÿ),๐‘ +1 = (๐ป (๐‘Ÿ),๐‘  ) ๐›ฟ (๐‘Ÿ),๐‘ 
To recover the transmission parameters,
(๐‘Ÿ),๐‘ +1
๐‘Ÿ๐‘–
= exp(๐›ฟ (๐‘Ÿ),๐‘ +1 ), for ๐‘– = 1,2.
(๐‘Ÿ),๐‘  โˆ— +1
Iterate these until |๐‘Ÿ1
(๐‘Ÿ),๐‘ โˆ—
โˆ’ ๐‘Ÿ1
(๐‘Ÿ),๐‘  โˆ— +1
| + |๐‘Ÿ2
(๐‘Ÿ),๐‘ โˆ—
โˆ’ ๐‘Ÿ2
| is less than the specified
tolerance for some ๐‘  โˆ— or ๐‘  โˆ— reaches the pre-specified maximum number of
iterations.
(๐‘Ÿ+1)
๐‘Ÿ1
(๐‘Ÿ),๐‘ โˆ— +1
= ๐‘Ÿ1
(๐‘Ÿ+1)
and ๐‘Ÿ2
(๐‘Ÿ),๐‘ โˆ— +1
= ๐‘Ÿ2
.