Relationships Between Wright`s FST and FIS Statistics in a Context

Journal of Heredity, 2015, 306–309
doi:10.1093/jhered/esv019
Brief Communication
Advance Access publication April 17, 2015
Brief Communication
Relationships Between Wright’s FST and FIS
Statistics in a Context of Wahlund Effect
Lev A. Zhivotovsky
From the Institute of General Genetics, The Russian Academy of Sciences, Moscow 119991, Russia.
Address correspondence to Lev A. Zhivotovsky, Institute of General Genetics, 3 Gubkin Street, Moscow 119991, Russia, or
e-mail: [email protected].
Received 4 January 2015; First decision 6 February 2015; Accepted 20 March 2015.
Corresponding Editor: Robin Waples
Abstract
Waples (2015) has suggested a formula for the Wahlund effect in a case of unequal contribution of samples
from genetically different populations that relates Wright’s inbreeding coefficient, FIS, and normalized
variance in allele frequencies between populations, FST. I generalize this relationship to a case of multiple
alleles and multiple populations not assuming Hardy–Weinberg ratios prior to mixing. This can help to
evaluate the impact of a Wahlund effect on heterozygote deficiency relative to other factors such as null
alleles, nonrandom mating, or selection. It is suggested that Wahlund effect cannot be an important factor
of deviations from Hardy–Weinberg proportions in natural populations in the majority of instances, but
it can have a substantial contribution to heterozygote deficiency in a population that has low genetic
diversity compared to that among immigrants or in mixed samples that contain comparable fractions of
individuals from genetically different populations.
Subject areas: Population structure and phylogeography, Bioinformatics and computational genetics
Key words: fixation index, gene diversity, heterozygote deficiency, inbreeding coefficient, mixture rate, population differentiation
In a recent article, Waples (2015) found a relationship between Wright’s
inbreeding coefficient, FIS, caused by Wahlund effect (Wahlund 1928),
that is, by mixing individuals from genetically different populations,
and normalized variance in allele frequencies between populations, FST.
Specifically, if a “recipient” population has a fraction m of individuals
that have arrived from a “donor” population with different allele frequencies at a biallelic autosomal locus, then FIS = ( 4m(1 − m) / C ) FST ,
where C is a known function of allele frequencies and migration
(mixture) rate m. Here, I generalize the Waples relationship to a case
of multiple alleles and multiple populations that are not necessarily
at Hardy–Weinberg equilibria, and consider how strongly Wahlund
effects can influence the inbreeding coefficient compared to other factors; for example, null alleles or nonrandom mating.
Results
A General Formula
Let us consider 2 infinite populations, recipient and donor, with
allele frequencies pr1 , pr 2 ,... and pd 1 , pd 2 ,... ∑ j prj = 1, ∑ j pdj = 1 ,
(
)
respectively. Their gene diversities, Hr and Hd, are defined as the
expected heterozygosities, H r = 1 − ∑ j prj2 , H d = 1 − ∑ j pdj2 (Nei
1977). Wright’s FIS (Wright 1951), or inbreeding coefficient, was
introduced to measure excess of homozygotes, or heterozygote defiH exp − Hobs
ciency. Based on Wright’s definition, FIS =
, where Hexp
H exp
and Hobs are the expected and observed heterozygosities (Nei 1977).
In general, the recipient and donor populations are not assumed to be
at Hardy-Weinberg equilibrium: their inbreeding coefficients are denoted
by Fr,IS and Fd,IS, respectively. Therefore, the observed heterozygosities in
these populations are H r , obs = (1 − Fr , IS ) H r and H d , obs = (1 − Fd , IS ) H d .
Let the recipient population receive a fraction m of immigrants from the
donor population. What is the value of inbreeding coefficient in the recipient population after the event of immigration (mixing), Fr′, IS ? Hereafter,
the prime stands for the values of population statistics in the mixture.
H −H
Define Wright’s FST-statistics as FST = T
, where
HT
H=
1
2
( Hr + Hd ) is the average within-population gene diversity and
© The American Genetic Association 2015. All rights reserved. For permissions, please e-mail: [email protected]
306
Journal of Heredity, 2015, Vol. 106, No. 3
307
HT is the total gene diversity of an equal blend of the 2 populations
(Nei 1977). As shown in the appendix, the inbreeding coefficient
in the recipient population, Fr′, IS , after immigration of rate m, is as
follows:
H
H
H
Fr′, IS = 4m (1 − m ) T FST + (1 − m) r Fr , IS + m d Fd , IS , (1)
H r′
H r′
H r′
or, using an expansion of H r′ (the Appendix),
Fr′, IS =
4m (1 − m ) FST + (1 − FST ) {(1 − m)(1 − ∆)Fr , IS + m(1 + ∆)Fd , IS }
4m(1 − m)FST + (1 − FST )(1 − ∆ + 2m∆)
(1a)
where H r′ is the gene diversity in the recipient population after immiH − Hr
gration and ∆ = d
is a relative difference in gene diversity
Hd + Hr
between the recipient and donor populations prior to mixing; the
value of Δ lies between −1 and 1.
Equation (1) can be expanded to a case of multiple donor populations (see the Appendix, equations A5–A6).
A Linear Regression of F-Statistics Transforms
If the expected and observed frequencies of heterozygotes equilibrate, that is Fr , IS = 0 and Fd , IS = 0, Equation (1) simplifies to
Fr′, IS = 4m(1 − m)
HT
FST H r′
which generalizes Waples’ equation to a case of multiple alleles.
Inverting and taking logarithms, a linear regression between

 1
 1

ln 
− 1 holds with a slope of 1:
− 1 and ln 


F
F
′

 r , IS
ST

 1
 1

1 − ∆(1 − 2m)
− 1 + ln
ln 
− 1 = ln 
.(2)
 FST

4m(1 − m)

 Fr′, IS
Waples (2015) found a relationship between untransformed F values
that is linear with a slope of 1 only if m = 0.5 or if populations fixed
for alternative alleles.
Wahlund Effect Versus Other Factors
The level of heterozygote deficiency in a recipient population after
mixing depends on both a Wahlund effect itself and heterozygote
deficiencies in the recipient and donor populations prior to migration. The strength of Wahlund effect is determined by both FST and
m. It follows from equation (1a) that the Wahlund effect dominates
over other factors, that cause within-population positive FIS values, if
4m (1 − m ) FST > (1 − m)(1 − ∆)Fr , IS + m(1 + ∆)Fd , IS (3)
FST
is a linearized FST value. Failure of
1 − FST
the inequality will mean that other factors of heterozygote deficiency
dominate and obscure Wahlund effects.
and vice versa; here FST =
Discussion
The Wahlund effect is frequently involved to explain heterozygote deficiencies in samples that presumably include a mixture of
individuals from genetically different populations of the same or different species. In this context, Fr′, IS is an inbreeding coefficient in a
mixture with fractions 1 − m and m from populations denoted in
Equations (1)–(3) with indexes r and d, respectively. Alternatively,
Fr′, IS can be interpreted as heterozygote deficiency in a recipient population due to immigration of (mixing with) individuals from a genetically different donor population. The latter context can be useful in
studies on population genetics processes.
The analytical expressions obtained in the current study can be
used in 2 ways.
1. Equation (2) might serve as a test for Wahlund effects based on
the regression across loci. However, sampling bias, sampling
error, and other statistical properties of the regression parameters are not known; thus, such a test cannot be developed without careful statistical analyses. Nevertheless, this equation can
be used as a transformation of Waples’ relationship between the
corresponding F-statistics that maintains the linearity.
2. Inequality (3) might be useful to find the bounds within which
heterozygote deficiency can be explained at least by Wahlund
effects. Let us assume for simplicity that inbreeding coefficients
in recipient and donor populations prior to migration are equal
to each other, Fr,IS = Fd,IS; denote by FIS their common value. Then,
Equation (3) simplifies to
4m (1 − m ) FST > {1 − ∆ (1 − 2m )} FIS (3a)
Now, we should compute the value of alternative factors that presumably contribute in FIS. One of such factors is null alleles (Waples
2015). Null alleles are not distinguishable with a given method of
genotyping. For example, polymerase chain reaction amplification
of a DNA locus can fail due to mutation in the flanking regions for
primer hybridization (Callen et al. 1993). Null alleles lead to the
false discovery of homozygotes and cause heterozygote deficiency;
null alleles can be distributed wide across populations and reach
high frequencies (Zhivotovsky et al. 2015, and references therein).
2 pnull
As follows from Chakraborty et al. (1992), FIS =
, where
1 + pnull
pnull is a population frequency of null alleles at a target locus. In a
case of similar gene diversities (Δ = 0), the Wahlund effect dominates over the contribution of null alleles in heterozygote deficiency
pnull
if 2m (1 − m ) FST >
. Therefore, if migration rates are not very
1 + pnull
strong, FST values should be much greater than the frequency of null
alleles to contribute significantly in heterozygote deficiency. For
example, even if pnull is as low as 0.05 (which is almost impossible to
be tested with small sample sizes), the inequality holds if FST exceeds
0.26 (FST > 0.2) if m = 0.1 or exceeds 0.50 (FST > 0.33) if m = 0.05.
Even if the populations are evenly mixed (m = 0.5), FST values need
to exceed 0.09.
Analogous comparisons can be provided for other factors that
decrease heterozygosity and obscure Wahlund effects—selfing,
inbreeding as mating of relatives, assortative mating, and diversifying selection. For example, if there is partial selfing with rate s
s
at a equilibrium between outin both populations then FIS =
2−s
crossing and selfing (Weir 1996, p. 263). Then, we can use the same
arguments as above and conclude that even low rates of selfing
may contribute in heterozygote deficiency at a greater extent than
Wahlund effects.
Journal of Heredity, 2015, Vol. 106, No. 3
308
The Wahlund effect, as a cause of heterozygote deficiency, is
distinguishable if 4m (1 − m ) FST in Equation (3) is not small. In a
mixture context, large values of 4m (1 − m ) FST can occur in mixtures
with similar fractions of individuals from genetically different populations, for example, if the sample is collected from migration routes
or feeding areas, where individuals from more than 1 population
often mix but do not interbreed. In a migration context, however,
4m (1 − m ) FST does not seem to be great in natural populations as
an immigration rate and genetic differentiation are usually inversely
1
at migration-drift equilibrium, where
related. Indeed, FST =
1 + 4mN e
Ne is an effective size of populations that exchange by migrants at
1− m
rate m. Therefore, 4m(1 − m)FST equals
, which is simply negliNe
gible. A strong Wahlund effect would mostly occur with nonequilibrium situations where a large fraction of genetically divergent
immigrants occurs. This cannot last for long in nature, or FST will
quickly decline, but it could easily happen over the short term in
human-altered landscapes.
Another case when a Wahlund effect may have a substantial impact
on heterozygote deficiency is low gene diversity in a recipient population relative to that in the donor population. As an extreme, let us
assume that a recipient population is monomorphic, whereas the donor
population is polymorphic; that is, Δ = 1. It follows from Equation (3)
that the Wahlund effect contributes significantly if 2 (1 − m ) FST exceeds
Fd , IS , the inbreeding coefficient in the donor population prior to migration. This might be used for conservation biology purposes, for example, when a population with low genetic variation is under risks of
invasion from populations with higher genetic diversities.
This brief note does not concern statistical aspects such as
estimation procedures for Equation (2) or tests on inequality (3),
although this is an important issue for practical applications that
include estimates of the sampling biases and sampling errors for
basic parameters of this study, FIS and FST , and their transforms. For
example, the logarithm transforms of the inverse of F-statistics in
Equation (2) seem to be greatly biased for small F values and small
sample sizes. Also, testing Inequality (3) requires the knowledge of
joint sampling errors of F values. Both analytical approaches and
resampling procedures should be used to develop a statistical basis
for estimating relationships between F-statistics.
respectively, prior to migration. Their allele diversities, Hr and
Hd, are defined as H r = 1 − ∑ j prj2 , H d = 1 − ∑ j pdj2 . In general, the
recipient and donor populations are not assumed to be at HardyWeinberg equilibrium: their inbreeding coefficients prior to migration are denoted by Fr,IS and Fd,IS , respectively. Therefore, the
observed heterozygosities in these populations prior to migration
are H r , obs = (1 − Fr , IS ) H r and H d , obs = (1 − Fd , IS ) H d .
The strength of differentiation between both populations prior
H −H
to migration, Wright’s FST value, is defined as FST = T
, where
HT
H = 12 ( H r + H d ) is the average within-population allele diversity, and
2
 prj + pdj 
HT = 1 − ∑ j 
is a total diversity of an equal blend of the 2

2 
populations. Obviously,
HT = 14 H r + 14 H d +
1
2
(1 − ∑ p p ).(A1)
j
rj
dj
Hr − H
H −H
and ∆ d = d
be normalized deviations of the
H
H
within-population allele diversities prior to migration from their
H − Hr
average value. Obviously, ∆ d = − ∆ r = ∆ , where ∆ = d
; the ∆s
Hd + Hr
are equal to 0 if H d = H r , Δ lies between −1 and 1. Further,
Let ∆ r =
H r = H(1 + ∆ r ), H d = H(1 + ∆ d ), H = (1 − FST )HT (A2)
After immigration into the recipient population, at rate m, the allele
frequencies and the observed heterozygosity in the recipient population change to
prj′ = (1 − m) prj + mpdj , Hr′, obs = (1 − m) (1 − Fr , IS ) Hr + m (1 − Fd , IS ) Hd (A3)
and the expected heterozygosity becomes
(
)
H r′ = 1 − ∑ j ( prj′ ) = (1 − m ) H r + m2 H d +2m(1 − m) 1 − ∑ j prj pdj .
2
2
Using equations (A1–A3), obtain
H r′ = (1 − m ) H r + m2 H d + 2m(1 − m) ( 2HT − H )
2
Funding
The Russian Foundation for Basic Research (grants 14-04-92005NNS and 15-04-02511) and RAS Program “Biodiversity in Life
Systems” to L.A.Zh.
and
Acknowledgments
I am grateful to Dr. Robin Waples and 2 anonymous reviewers for their valuable comments on the manuscript.
= 4m(1 − m)FST HT + (1 − m)H r + mH d
= 4m(1 − m)FST HT + (1 − FST )(1 − ∆ d + 2m∆ d )HT , H r′ − H r′, obs = 4m(1 − m)FST HT + (1 − m ) Fr , IS H r + mFd , IS H d
= 4m(1 − m)FST HT +  (1 − m ) (1 + ∆ r ) Fr , IS
+ m (1 + ∆ d ) Fd , IS  (1 − FST ) HT .
This implies
Appendix. A Model of the Wahlund Effect
One Donor Population
Let us consider a recipient population that has received a fraction m
of migrants from a genetically distinct donor population. Hereafter,
the prime stands for the values of population statistics after mixing.
(
)
Let pr1 , pr 2 ,... and pd 1 , pd 2 ,... ∑ j prj = 1, ∑ j pdj = 1 be allele frequencies at an autosomal locus in recipient and donor populations,
Fr′, IS = 4m (1 − m )
HT
H
H
FST + (1 − m) r Fr , IS + m d Fd , IS ,(A4)
H r′
H r′
H r′
Multiple Donor Populations
Let us denote by k the number of donor populations; mi is a fraction of migrants in the recipient population from donor population
i (i = 1,2,…, k) and m is the total fraction of migrants (m = m1+m2+
Journal of Heredity, 2015, Vol. 106, No. 3
309
…+ mk); HT (r , i ) is a total diversity of an equal blend of the recipient
and the ith donor population and HT (i , j ) is that for donor populations i and j; FST (r , i ) is an FST value between the recipient population
and the ith donor population and FST (i , j ) is that between donor populations i and j; H r′ and Fr′, IS are the allele diversity and the inbreeding
coefficient in the recipient population after migration from all donor
populations. Then
k
= (1 − m ) (1 − Fr , IS ) H r + ∑mi (1 − Fi , IS ) Hi ,
k
k
i =1
i =1
(
H r′ = 1 − ∑ j ( prj′ ) = (1 − m ) H r + ∑ mi2 Hi + ∑ 2mi (1 − m) 1 − ∑ j prj pij
k −1
+∑
∑ 2m m (1 − ∑
k
i
s
i =1 s = i +1
j
)
pij psj
)
As in Equation (A1) ,
1 − ∑j pij psj = 2HT (is) − His = 2HT (is) − (1 − FST (is) ) HT (is) = (1 + FST (is) ) HT (is) ,
where His and HT (is) are the average of allele diversities and the
total diversity in an equal blend of populations i and s, and
FST (is) = HT (is) − His HT (is) is an FST-value between these populations;
the same relationships hold between the recipient population and the
ith population. Therefore,
(
)
k
k
i =1
i =1
H r′ = (1 − m ) H r + ∑mi2 Hi + ∑2mi (1 − m) (1 + FST (ri ) ) HT (ri )
2
k −1
+∑
k
∑ 2m m (1 + F
ST (is )
s
i
i =1 s = i + 1
) HT (is) ,
and
H r′ − H r′, obs =
k
∑ 2m (1 − m)(1 + F
ST (ri )
i
i =1
k −1
+∑
)H
k
∑ 2m m (1 + F
i
ST (is)
s
i =1 s = i +1
T (ri )
)H
T (is)
k
− m (1 − m ) H r − ∑ mi (1 − mi ) Hi
i =1
k
+ (1 − m ) Fr , IS H r + ∑ mi Fi , IS Hi
i =1
=
k −1
k
∑ 2mi (1 − m)FST (ri)HT (ri) + ∑
i =1
k
∑ 2m m F
i
s ST (is )
HT (is)
i =1 s = i +1
k
+ (1 − m ) Fr , IS H r + ∑ mi Fi , IS Hi
i =1
k −1
k
+ ∑ 2mi (1 − m)H
HT (ri ) + ∑
i =1
k
∑ 2m m H
i
s
T (is )
i =1 s = i +1
k
− m (1 − m ) H r − ∑ mi (1 − mi ) Hi .
i =1
+ … + (1 − m)
k −1
i =1
2
i =1
k −1 k
HT (ri )
H
FST (ri ) + 4∑ ∑ mi mj T (is) FST (is)
′
i =1 s = i + 1
Hr
H r′
k
Hr
H
Fr , IS + ∑mi i Fi , IS ,
′
i =1
Hr
H r′
(A5)
or, in a shorter form,
i =1
2
k
Fr′, IS = 4∑mi (1 − m)
k
prj′ = (1 − m ) prj + ∑mi pij , H r′, obs
After simple algebra, the following relationship holds for
Fr′, IS = ( H r′ − H r′, obs ) H r′:
Fr′, IS = 4∑
k
∑ mm
i = 0 s = i +1
i
j
k
HT (is)
H
FST (is) + ∑ mi i Fi , IS (A6)
H r′
H r′
i =0
where index “0” stands for the recipient population; that is,
F0, IS = Fr , IS , HT (0 s) = HT (rs), FST (0 s) = FST (rs) , and m0 = 1 − m.
References
Callen DF, Thompson AD, Shen Y, Phillips HA, Richards RI, Mulley JC,
Sutherland GR. 1993. Incidence and origin of “null” alleles in the (AC)n
microsatellite markers. Am J Hum Genet. 52:922–927.
Chakraborty R, De Andrade M, Daiger SP, Budowle B. 1992. Apparent heterozygote deficiencies observed in DNA typing data and their implications
in forensic applications. Ann Hum Genet. 56:45–57.
Nei M. 1977. F-statistics and analysis of gene diversity in subdivided populations. Ann Hum Genet. 41:225–233.
Wahlund, S. 1928. Zusammensetzung von Populationen und Korrelationerscheinungen vom Standpunkt der Vererbungslehre aus betrachtet. Hereditas. 11:65–106.
Waples RS. 2015. Testing for Hardy-Weinberg proportions: have we lost the
plot? J Hered. 106:1–19.
Weir BS. 1996. Genetic data analysis II: Methods for discrete population
genetic data. Sunderland (MA): Sinauer Associates.
Wright S. 1951. The genetical structure of populations. Ann Eugen. 15:323–
354.
Zhivotovsky LA, Kordicheva SY, Shaikhaev EG, Rubtsova GA, Afanasiev KI,
Shitova MV, Fuller SA, Shaikhaev GO, Gharrett AJ. 2015. Efficiency of the
inbreeding coefficient f and other estimators in detecting null alleles, as
revealed by empirical data of locus oke3 across 65 populations of chum
salmon Oncorhynchus keta. J Fish Biol. 86:402–408.