Maximum-likelihood estimation of admixture proportions from

Maximum-likelihood estimation of
admixture proportions from
genetic data
Jinliang Wang
t1 = ξ/2n1
t2 = ξ/2n2
P0
n1
P1
T1 = ψ/2N1
Th = ψ/2Nh
n2
Ph
p1
N1
p2
Nh
ξ
P2
N2
T2 = ψ/2N2
P1
S1
Ph
Sh
ψ
P2
S2
Ω = {p1, t1,t2,T1,Th,T2}
t1 = ξ/2n1
t2 = ξ/2n2
Ω = {p1, t1,t2,T1,Th,T2}
T1 = ψ/2N1
Th = ψ/2Nh
T2 = ψ/2N2
P0
w
n1
P1
x1
Ph
p1
xh
N1
ξ
P2
x2
N2
Ph
y1
c1
p2
Nh
P1
S1
n2
P2
yh
Sh
ch
ψ
y2
S2
c2
C = (c1,c2,c3)
Likelihood function
Pr(C )   Pr(c1 , c2 , ch | y1 , y2 , yh )

  Pr( y1 , y2 , yh | x1 , x2 , p1 , T1 , T2 , Th )
  Pr( x1 , x2 | t1 , t 2 , w)
  Pr( w) d 
Likelihood function
Random sampling
Pr(C )   Pr(c1 , c2 , ch | y1 , y2 , yh )

Admixture and
genetic drift
  Pr( y1 , y2 , yh | x1 , x2 , p1 , T1 , T2 , Th )
Genetic drift
  Pr( x1 , x2 | t1 , t 2 , w)
Prior on w
  Pr( w) d 
Allele frequencies in P0
P0
w
Pr(w)
Genetic drift after population split
P0
n1
P1
w
n2
ξ
P2
x1
x2
Pr( x1 , x2 | t1 , t2 , w)
t1 = ξ/2n1
t2 = ξ/2n2
Genetic drift in independent populations
Genetic drift: the diffusion approximation
2
Pr( x1 , x2 | t1 , t 2 , w)   Pr( xi | ti , w)
i 1

Pr( xi ti , w)   w(1  w)a (a  1)( 2a  1) H (1  a, a  2,2, w)
a 1
  a (a  1) 
 H (1  a, a  2,2, xi ) exp 

4ni


ti = ξ/2ni
Crow and Kimura (1970) p. 382
The admixture event
P0
P1
x1
p1
xh  p1 x1  p2 x2
Ph
xh
p2
Pr( y1 , y2 , yh | x1 , x2 , p1 , T1 , T2 , Th )
P2
x2
Genetic drift since admixture event
P0
P1
xh  p1 x1  p2 x2
Ph
x1
N1
P2
xh
Nh
P1
N2
Ph
y1
x2
ψ
P2
yh
Pr( y1 , y2 , yh | x1 , x2 , p1 , T1 , T2 , Th )
y2
T1 = ψ/2N1
Th = ψ/2Nh
T2 = ψ/2N2
Random sampling
Pr(c1 , c2 , ch | y1 , y2 , yh ) 
 Pr(c | y )
i
i
i 1.2.h
P1
Ph
y1
S1
c1
P2
yh
Sh
ch
y2
S2
c2
C = (c1,c2,c3)
Likelihood function
Random sampling
h
Pr(C )    Pr(c j | y j )
 j 1
h
Admixture and
genetic drift
  Pr( y j | x j , T j )
j 1
2
Genetic drift
  Pr( xi | w, ti )
i 1
Prior on w
  Pr( w) d 
African-American Admixture Proportions
30
25
European ancestry
20
15
10
5
0
New
Orleans
New York
Pittsburg
Maywood nr
Chicago
Houston
Detroit
Baltimore
Philadelphia Philadelphia Charleston,
2
1
South
Carolina
Jamaica
Profile log-likelihoods for New York
Drift before admixture event
Proportion of European ancestry
Drift since admixture event
Application to canid populations:
Grey wolf and coyote in North America
70
60
Common
Ancestor
Wolverine ancestry
50
40
Grey Wolf
Coyote
30
20
10
Grey Wolf
0
CoyoteWolflike
Hybrid
Grey wolf-like hybrid
Coyote
Coyote-like hybrid
Discussion
Suitable data
Assumptions of the method given the
model
Comparing the model to other scenarios
Aspects of the data used for inference
Discussion
Suitable data
 Human data
Genotypes of 10 nuclear loci. Chosen because
they are either African or European specific or
highly differentiated between the two.
 Canid data
10 microsatellite loci. Neither species-specific
nor highly differentiated between wolves and
coyotes.
Discussion
Assumptions of method given the model
Alleles are inherited independently across
loci in the admixture event
Drift acts independently on alleles across
loci
Alleles in a sampled individual are
independent across loci
Discussion
Assumptions of method given the model
The prior distribution on w is flat, not Ushaped
Admixture occurs instantaneously
The effect of mutation on perturbing allele
frequency is negligible
Discussion
Comparing the model to other scenarios
Modern ‘pure’ populations need to be
sampled
Thus the ‘structure’ of the population is
assumed to be known
If we cannot sample modern ‘pure’
populations assumes we cannot make
inference on the admixture proportions
Discussion
Aspects of the data used for inference
 Inference proceeds solely on the basis of allele
frequencies
 Linkage disequilibrium is
Firstly, not used for inference
Secondly, assumed to be negligible
 LD might be exploited
Enhance inference when modern ‘pure’ populations are
sampled
Relax the necessity to sample modern ‘pure’ populations
at all