Article Genetic Adaptation to Levels of Dietary

Genetic Adaptation to Levels of Dietary Selenium in Recent
Human History
Louise White,*,1 Frederic Romagne,1 Elias M€uller,1 Eva Erlebach,1 Antje Weihmann,1 Genıs Parra,1
Aida M. Andres,1 and Sergi Castellano*,1
1
Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
*Corresponding author: E-mail: [email protected]; [email protected].
Associate editor: Ryan Hernandez
Abstract
As humans migrated around the world, they came to inhabit environments that differ widely in the soil levels of certain
micronutrients, including selenium (Se). Coupled with cultural variation in dietary practices, these migrations have led to
a wide range of Se intake levels in populations around the world. Both excess and deficiency of Se in the diet can have
adverse health consequences in humans, with severe Se deficiency resulting in diseases of the bone and heart. Se is
required by humans mainly due to its function in selenoproteins, which contain the amino acid selenocysteine as one of
their constituent residues. To understand the evolution of the use of this micronutrient in humans, we surveyed the
patterns of polymorphism in all selenoprotein genes and genes involved in their regulation in 50 human populations. We
find that single nucleotide polymorphisms from populations in Asia, particularly in populations living in the extreme Sedeficient regions of China, have experienced concerted shifts in their allele frequencies. Such differentiation in allele
frequencies across genes is not observed in other regions of the world and is not expected under neutral evolution, being
better explained by the action of recent positive selection. Thus, recent changes in the use and regulation of Se may
harbor the genetic adaptations that helped humans inhabit environments that do not provide adequate levels of Se in the
diet.
Key words: micronutrients, polygenic adaptation, local adaptation.
Introduction
ß The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please
e-mail: [email protected]
Mol. Biol. Evol. 32(6):1507–1518 doi:10.1093/molbev/msv043 Advance Access publication March 3, 2015
1507
Article
As humans spread out of Africa about 60,000–100,000 years
ago (Stringer and Andrews 1988; Fu et al. 2013) and migrated
around the world, they came to inhabit a vast range of environments. These environments differ widely in their geology,
and the abundance of the chemical elements required by
humans varies in their rocks and soils (Selinus and Alloway
2005). Elements such as zinc, iron, manganese, copper, iodine,
molybdenum, and selenium (Se), which are essential to the
human diet but needed only in trace amounts (Mertz 1981),
accumulate in varying quantities in the plants and animals in
these environments. The intake of these micronutrients,
which may have changed with the domestication of plants
and animals in the past 10,000 years (Diamond 2002), can
exceed or fail the dietary needs of human populations today
(Selinus and Alloway 2005).
Both excess and deficiency of micronutrients in the diet
have adverse health consequences in humans, but deficiency
is more common with iron and iodine deficiencies today
affecting almost one-quarter of the world’s population
(Caballero 2002; Bhutta and Salam 2012). Deficiency of
other micronutrients, such as Se, appears less widespread
but severe in some regions of the world. Studies on the
level and bioavailability of Se in plant and animal foods
across the world suggest that it varies widely at both global
and local scales (Oldfield 2002; Selinus and Alloway 2005),
with relatively little Se in countries such as Finland
(Koivistoinen and Huttunen 1986), New Zealand (Thomson
2004), possibly Malawi (Hurst et al. 2013) and, most notably,
some parts of China (Oldfield 2002; Xia et al. 2005). Heart
(Keshan) and bone (Kashin–Beck) diseases, which are treated
with Se supplementation, are endemic to large areas of China
(Keshan Disease Research Group of the Chinese Academy of
Medical Sciences 1979; Moreno-Reyes et al. 2001). More generally, Se deficiency is associated with low immune response,
infertility, cognitive decline, and increased risk of mortality
(Hatfield et al. 2012; Rayman 2012).
Se is required by humans mainly due to its function in 25
selenoproteins (Kryukov et al. 2003), which contain the
amino acid selenocysteine (Sec) as one of their constituent
residues. This amino acid is encoded by a codon (UGA) that
in all other genes represents a stop codon. Sec is analogous to
the amino acid cysteine (Cys) in its molecular structure with
an atom of Se replacing that of sulfur (Hatfield et al. 2012).
These elements confer different properties when incorporated into proteins (Gromer et al. 2003; Steinmann et al.
2010; Snider et al. 2013) and although substitutions between
Sec and Cys in orthologous proteins are uncommon in vertebrates (Castellano et al. 2009), paralogous proteins with Cys
instead of Sec exist in most vertebrate species, with humans
having six of them. Selenoproteins and their Cys-containing
paralogs are generally involved in preventing oxidative
damage (Steinbrenner and Sies 2009).
The homeostasis and metabolism of Se and Sec involves
the transport of dietary Se from the liver to other organs in
the body, its receptor-mediated uptake into cells, the
MBE
White et al. . doi:10.1093/molbev/msv043
synthesis of Sec, and the readthrough of a UGA stop codon
with a tRNA charged with Sec. Se deficiency tightly regulates
these processes to favor the supply of Se to some tissues and
selenoproteins at the expense of others (Schomburg and
Schweizer 2009). The molecular basis of such hierarchy in
the distribution of Se is not fully understood but some 19
proteins may be involved (Allmang et al. 2009; Burk and Hill
2009; Hatfield et al. 2012), including different receptors for the
uptake of Se in cells (Burk and Hill 2009) and proteins directing the recoding of the UGA codon in selenoproteins
(Dumitrescu et al. 2005; Allmang et al. 2009; Bifano et al.
2013) through their differential binding of an RNA secondary
structure in each selenoprotein mRNA (Latreche et al. 2009).
To explore whether changes in dietary Se have shaped the
evolution of genes that use or regulate Se, we have conducted
a survey of single nucleotide polymorphisms (SNPs) in selenoprotein genes and genes involved in the regulation of Se
and Sec in 50 human populations (Cann et al. 2002). We
investigated whether human populations have adapted to
the different levels of Se in their diets by assessing the signatures of positive selection in these genes. We investigate both
classical signatures of selection and signatures of polygenic
adaptation, including population differentiation and the concerted change in allele frequencies in different regions of the
world. We find evidence for polygenic adaptation in the
groups of selenoprotein genes and genes involved in the regulation of Se and Sec, with the strongest signatures in populations from China living in regions characterized by Se
deficiency (Oldfield 2002; Xia et al. 2005).
Results
Candidate Genes and Their Genetic Variation in
Humans
We used a hybridization approach (see Materials and
Methods) to enrich and sequence the protein-coding and
regulatory regions of 25 selenoprotein genes and 19 genes
involved in the regulation of Se and Sec (table 1) in 855
unrelated individuals from the Human Genome Diversity
Panel (HGDP-CEPH) (supplementary tables S1 and S2,
Supplementary Material online) (Cann et al. 2002). These individuals belong to 50 populations and provide an unbiased
sample of Se nutritional histories in humans across Africa,
Middle East, Europe, Central South Asia, East Asia, Oceania,
and America. We also sequenced the six Cys-containing paralogs of selenoproteins (table 1) in the same populations. We
called SNPs in the HGDP-CEPH individuals, with an average
coverage of 20-fold across the 50 candidate genes, and identified 7,989 SNPs (see Materials and Methods). To derive a
neutral expectation for these SNPs and genes, we sequenced
48 neutrally evolving pseudogenes (supplementary table S3,
Supplementary Material online) in the same populations and
identified 1,035 SNPs in them (see Materials and Methods).
Signatures of Positive Selection in the Candidate
Genes
We investigated whether certain genes have differentiated,
in certain populations, beyond what is expected under
1508
Table 1. Candidate Genes Grouped According to Their Type and the
Biological Process They Participate in.
Genes
Selenoproteins
Glutathione peroxidase (GPx) 1, 2, 3, 4, and 6
Iodothyronine deiodinase (DI) 1, 2, and 3
15 kDa selenoprotein (Sel15)
Selenoprotein H (SelH)
Selenoprotein I (SelI)
Selenoprotein K (SelK)
Selenoprotein M (SelM)
Selenoprotein N (SelN)
Selenoprotein O (SelO)
Selenoprotein R 1 (SelR1)
Selenoprotein S (SelS)
Selenoprotein T (SelT)
Selenoprotein V (SelV)
Selenoprotein W 1 (SelW1)
Thioredoxin reductase (TR) 1, 2, and 3
Cys-containing paralogs
Glutathione peroxidase (GPx) 5, 7, and 8
Selenoprotein R (SelR) 2 and 3
Selenoprotein W 2 (SelW2)
Regulation of Se and Sec
Transport and uptake of Se into cells
Selenoprotein P (SelP)
Apolipoprotein E receptor 2 (ApoER2)
Megalin
Metabolism of Se
Selenocysteine lyase (SCLY)
Se binding protein 1 (SELENBP1)
Biosynthesis of Sec
O-phosphoryl tRNASec Kinase (PSTK)
Selenophosphate synthetase 2 (SPS2)
O-phosphoseryl-tRNASec Se transferase (SEPSECS)
Seryl-tRNA synthethase (SARS2)
tRNA
Ser-tRNASec, copy in chromosome 17
Ser-tRNASec, copy in chromosome 19
Ser-tRNASec, copy in chromosome 22
Incorporation of Sec into proteins
CUGBP, Elav-like family member 1 (CELF1)
Elongation factor for Sec (EFSec)
Eukaryotic translation initiation factor 4A3 (EIF4A3)
ELAV like RNA binding protein 1 (ELAVL1)
Ribosomal protein L30 (RPL30)
SECIS binding protein 2 (SBP2)
Selenophosphate synthetase 1 (SPS1)
tRNASec 1 associated protein 1 (TRNAU1AP)
Exportin 1 (XPO1)
NOTE.—SelP and SPS2 are selenoproteins with a regulatory role.
neutrality. Specifically, we tested whether the groups of selenoprotein genes, their Cys-containing paralogs, or the genes
involved in the regulation of Se and Sec (table 1) show
concerted shifts in their SNP allele frequencies in certain
regions of the world. We first computed the FST statistic
MBE
Genetic Adaptation to Dietary Selenium . doi:10.1093/molbev/msv043
(supplementary table S2, Supplementary Material online) to
consider together the extreme FST ranks from populations in
each world region (excluding pairs of populations in the same
world region). We then compared the FST ranks across world
regions with a two-sided Mann–Whitney U test (fig. 1A). The
skewness of the FST ranks measures the deviation of these
genes from the neutral expectation, with stronger skewness
(higher FST ranks) indicating higher allele frequency differentiation in a given world region. The P values from the Mann–
Whitney U pairwise comparisons are represented as colors on
a heat map in figure 2, with red indicating a significantly
stronger departure from the neutral expectation in the lefthand world region than in the world region at the top. Such
unexpected and region-specific allele frequency differentiation is a signature of local adaptation in that region of the
world, through positive selection.
(Weir and Cockerham 1984) for each of these candidate genes
by combining the FST values of all SNPs in each gene (see
Materials and Methods). This FST value takes into account the
differentiation of functional alleles, if any, and other linked
alleles and, thus, measures the overall genetic differentiation
of each gene between each pair of populations in our study.
We then compared the FST values of each gene with the FST
values of our set of neutral loci, for the same population pair
(fig. 1). This provides a rank for each FST value with respect
to its neutral expectation, which measures how unusual the
differentiation for a gene is in a given population pair.
We divided the genes into the three functional groups
above and focused only on genes whose FST values rank in
the top 5% of the distribution of neutral FST values, as
these extreme genes bear the strongest signature of local
adaptation. We then grouped populations by world region
Population in world region A
Compute FST ranks
per population pair
Population in world region B
5% highest FST
5% highest FST
FST neutral distribution between a population in
world region A and a population
from another world region
FST neutral distribution between a population in
world region B and a population
from another world region
Repeat for each
population in
world region B
Repeat for each population in world region A
ST ranks
World
region A
World
region B
Mann-Whitney U-test
World
region B
P-value
World
region A
Red indicates a stronger
FST shift in A than in B
ST
ranks by gene
World
region A
20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
Rank values to sum
X
Gene
X X
Y X
Y
Gene X: 20 + 17 + 16 + 13 = 66 (70.2%)
Gene Y: 14 + 8 + 5 +
Y
Y
Contributes more to the shift in FST ranks
1 = 28 (29.8%)
FIG. 1. Computation of FST ranks for each set of candidate genes (selenoproteins, Cys-containing paralogs, or genes involved in the regulation of Se and
Sec) between pairs of populations. The FST ranks are based on the distribution of FST values from neutral loci (pseudogenes) between the same pair of
populations. (A) Comparison of FST ranks between pairs of world regions: 1) Comparison of the ranks of FST values in the top 5% of the distribution of
neutral FST with a two-sided Mann–Whitney U test between a pair of world regions; and 2) P values from these pairwise comparisons are shown as
colors on a heat map. (B) Contribution of individual genes to the FST ranks in a world region: 1) The first (lowest) FST rank a value of 1, the next rank a
value of 2, and so on. This produces a combined measure of the number of times the FST of a gene (X or Y) significantly departs from the neutral
expectation between region A and the rest of the world; and 2) Gene X has the highest sum of rank values and contributes more to the signatures of
positive selection in region A than gene Y.
1509
MBE
White et al. . doi:10.1093/molbev/msv043
FIG. 2. Comparison of FST values falling in the top 5% of the distribution of neutral FST values across populations from different world regions. Their
relative departure from the neutral expectation is tested, such that populations from the left-hand regions that appear in red are significantly more
differentiated than populations living in the regions at the top. Regions in gray do not have FST in the top 5%. (A) Comparison among regions in the
world. (B) Comparison between populations in China living in either Se deficient or Se adequate regions to other world regions.
Cys-Containing Paralogs
We observe no signatures of local positive selection in
the group of Cys-containing paralogs, as no comparison in
figure 2A shows a significant difference between pairs of
world regions (lowest P = 0.292, for Middle East vs. Africa).
These genes have functions that, while related to those of
selenoproteins, are independent of Se and its abundance
throughout the world. It is perhaps then not surprising that
as a group Cys-containing paralogs have not experienced local
adaptation in recent human history. If changes in dietary Se
pose weak selective pressures in humans, we expect a similar
pattern in selenoprotein genes.
Selenoprotein Genes
The pattern in selenoprotein genes (fig. 2A) is quite different
from their Cys-containing homologs (fig. 2A). We observe
significant signatures of local positive selection in selenoprotein genes, with several world region comparisons below or
close to the 5% significance level. These include comparisons
that involve Central South Asia (P = 0.035; vs. Oceania), East
Asia (P = 0.11 vs. Africa, P = 0.180 vs. Middle East, P = 0.133 vs.
Europe, P = 0.027 vs. Oceania), and America (P = 0.010 vs.
Africa, P = 0.025 vs. Middle East, P = 0.011 vs. Europe,
P = 0.054 vs. Central South Asia, P = 0.053 vs. East Asia,
P = 0.004 vs. Oceania). Thus, local allele frequency differentiation in selenoprotein genes departs from neutral expectations significantly more in Asia and America than in other
regions of the world. Furthermore, in these regions they
depart from neutral expectations significantly more than
their Cys-containing paralogs when all FST ranks are considered (P = 2.53 104 in Central South Asia, P < 2.20 1016
1510
in East Asia, P = 4.87 1011 in America). This is consistent
with selenoprotein genes, unlike genes that use Cys in their
proteins, having experienced local adaptation in some regions
of the world.
Genes Involved in the Regulation of Se and Sec
Interestingly, the pattern in genes that regulate Se and Sec
(fig. 2A) resembles that of selenoprotein genes rather than
that of their Cys-containing paralogs, with the strongest differences in allele frequencies found in Central South Asia
(P = 0.207 vs. Africa, P = 0.057 vs. America) and East Asia
(P = 0.147 vs. Africa, P = 0.173 vs. Europe, P = 0.028 vs.
America). Overall, the differences among pairs of world regions are somewhat weaker in these genes than in selenoprotein genes. Still, they also depart from neutral expectations
significantly more than the Cys-containing paralogs when all
FST ranks are considered (P < 2.20 1016 in Central South
Asia, P < 2.20 1016 in East Asia). This is consistent with
regulatory changes in the homeostasis of Se and Sec having
experienced local adaptation in human populations in Asia.
These regulatory adaptations are likely to have a broader
physiological impact than adaptations in selenoprotein genes.
Se Deficiency in China
The majority of populations from East Asia, a region with
signatures of positive selection (fig. 2A), are from China
(supplementary table S1, Supplementary Material online).
Because China has large Se deficient areas (Oldfield 2002;
Xia et al. 2005), we further investigated whether these signatures are due to local differences in dietary Se intake. To do
this, we compared the FST ranks of Se-related genes from six
Genetic Adaptation to Dietary Selenium . doi:10.1093/molbev/msv043
Chinese populations living in known Se deficient areas to
eight Chinese populations living in areas that are not deficient
(supplementary table S7, Supplementary Material online) (see
Materials and Methods). The group of populations living in Se
deficient areas has a significantly stronger enrichment in
highly differentiated genes than populations living in
nondeficient areas (P = 1.42 1010 for selenoproteins,
P = 2.00 105 for regulation of Se and Sec). Furthermore,
as a group, populations living in Se deficient areas show significant signatures of positive selection compared with other
world regions (fig. 2B) in selenoprotein genes (P = 3.22 104
vs. Africa, P = 3.16 103 vs. Middle East, P = 1.02 104 vs.
Europe, P = 1.51 103 vs. Central South Asia, P = 1.72 103 vs. Oceania) and genes involved in the regulation of
Se and Sec (P = 6.62 103 vs. Africa, P = 0.04 vs. Middle
East, P = 7.49 103 vs. Europe, P = 0.05 vs. Central South
Asia, P = 5.84 104 vs. America). This signature is absent
in the set of Chinese populations living in Se adequate
areas (fig. 2B). This is consistent with Se-related genes
having experienced local adaptation in populations from geographic regions whose soil does not provide adequate levels of
dietary Se.
Polygenic Adaptation
We next investigated which particular genes, among the selenoproteins and genes involved in the regulation of Se and Sec,
contribute most to the signatures of local adaptation in Asia.
We aimed to determine whether the signatures of positive
selection in these regions are spread across genes. For each
gene we counted the number of times its FST values appear in
the upper 5% tail of the neutral FST distribution, and measured the contribution of each gene as its fraction in the total
number of extreme FST values (fig. 1B and Materials and
Methods). Signatures of positive selection in Asia are the
result of allele frequency shifts in many genes, each of them
having a contribution to the extreme FST values in this region
(supplementary table S4, Supplementary Material online). In
fact, only three and six genes do not contribute to the signatures in selenoproteins from East Asia and Central South Asia,
respectively (supplementary tables S4 and S5, Supplementary
Material online). For genes involved in the regulation of
Se and Sec, the number of genes not contributing is 2 and
5, respectively (supplementary tables S4 and S5,
Supplementary Material online). This suggests that local adaptation was polygenic in nature in Asia, with SNPs in several
genes changing in allele frequency in response to differences
in environmental Se.
Still, some genes have a particularly large contribution to
these signatures of local adaptation. The five selenoprotein
genes that show the strongest evidence of local adaptation in
East Asia already have 75.6% of the extreme FST values in this
world region. These genes are, in order: DI2, SelS, GPx1, SelM,
and Sel15 (table 2 and supplementary table S4, Supplementary Material online). With the addition of DI3, the same
genes contribute the most to the signatures of positive selection in Central South Asia (supplementary table S5,
Supplementary Material online). In addition, multiple
MBE
populations contribute to these signatures of positive selection in Asia, with the Hezhen, Naxi and Oroqen populations
in East Asia and the Balochi, Brahui, Kalash, Sindhi and Uygur
in Central South Asia contributing the most. The five regulatory genes that show the strongest evidence of local adaptation in East Asia have 64.4% of the extreme FST values in this
world region. These genes are, in order: CELF1, SPS2, SEPSECS,
ELAVL1 and Ser-tRNASec (table 2, fig. 3, and supplementary
table S4, Supplementary Material online). With the addition
of SelP, the same genes contribute the most to the signatures
in Central South Asia (table 2 and supplementary table S5,
Supplementary Material online). We note that the picture in
America is very different, as the signatures from the allele
frequency differentiation in selenoprotein genes (fig. 2A) are
almost exclusive due to the DI2 gene (supplementary table S6,
Supplementary Material online), particularly in the
Colombian population.
In any case, we do not find a significant excess of low
frequency or high frequency derived alleles in any of the 50
candidate genes using classic neutrality tests, such as Tajima’s
D and Fay & Wu’s H (see Materials and Methods). The neutral
expectation for these tests was based on coalescent simulations for the genes and populations in our study using a
modified demography from Gutenkunst et al. (2009), which
accounts for the low admixture of Europeans with the
American populations analyzed (see Materials and
Methods). We conclude that polygenic adaptation was important in those genes that use or regulate Se and Sec in Asia.
Discussion
An interesting question in biology is whether shifts in micronutrients intake as humans migrated around the world or
changed their dietary practices are an important selective
pressure in humans. This is the case for certain macronutrients, such as starch (Perry et al. 2007) and lactose (Tishkoff
et al. 2007), but the effect of micronutrients such as Se remains largely unexplored. We have addressed this question
surveying the patterns of polymorphism in selenoprotein
genes and genes involved in the regulation of Se and Sec in
50 human populations, which have a range of Se nutritional
histories.
Although a few regions in Africa, middle East, Europe or
Oceania have abnormal levels of Se (Koivistoinen and
Huttunen 1986; Thomson 2004; Hurst et al. 2013), Se deficiency or toxicity has not been broadly reported (Oldfield
2002; Safaralizadeh et al. 2005; Selinus and Alloway 2005;
Novakovic et al. 2014), with no Se-related diseases being
common in these regions (Oldfield 2002; Selinus and
Alloway 2005; Novakovic et al. 2013). This agrees well with
the absence of signatures of local positive selection in these
world regions (fig. 2A).
In contrast, levels of Se in the diet vary widely across some
parts of Asia (Oldfield 2002), with Pakistan, where most
Central South Asian populations were sampled (supplementary tables S1 and S2, Supplementary Material online), having
levels of Se in the soil that are borderline deficient across large
areas (Khan et al. 2006, 2008; Ahmad et al. 2009). This agrees
with the signatures of positive selection identified in
1511
MBE
White et al. . doi:10.1093/molbev/msv043
Table 2. Genes and SNPs Contributing Most to the Signatures of Positive Selection in East Asia (figs. 2 and 3).
Gene Rank
Gene
Selenoproteins
1
DI2
SNP Rank
SNPs ID in SelenoDB 2.0, Chromosome and Coordinate (hg19)
1
2
3
16
2
SelS
1
2
3
15
3
GPx1
1
2
3
10
11
4
SelM
1
2
3
5
Sel15
1
2
3
Biosynthesis and incorporation of Sec into proteins
1
CELF1
1
2
3
2
SPS2
1
2
3
3
SEPSECS
1
2
3
4
ELAVL1
1
2
3
5
Ser-tRNASec
1
2
3
SNP00000190_2.0,
SNP00000163_2.0,
SNP00000193_2.0,
SNP00000160_2.0,
SNP00005709_2.0,
SNP00005693_2.0,
SNP00005729_2.0,
SNP00005750_2.0,
SNP00000352_2.0,
SNP00000327_2.0,
SNP00000309_2.0,
SNP00000330_2.0,
SNP00000377_2.0,
SNP00005262_2.0,
SNP00005302_2.0,
SNP00005326_2.0,
SNP00001985_2.0,
SNP00001936_2.0,
SNP00001965_2.0,
14, 80,677,639
14, 80,669,618
14, 80,677,659
14, 80,669,580
15, 101,815,150
15, 101,814,579
15, 101,817,144
15, 101,817,876
3, 49,395,766
3, 49,394,636
3, 49,393,640
3, 49,394,834
3, 49,396,360
22, 31,498,917
22, 31,500,490
22, 31,501,788
1, 87,381,198
1, 87,379,501
1, 87,380,644
SNP0005905_2.0, 11, 47,500,661
SNP0005814_2.0, 11, 47,490,809
SNP0005961_2.0, 11, 47,521,325
SNP00001698_2.0, 16, 30,457,350
SNP00001710_2.0, 16, 30,459,119
SNP00001663_2.0, 16, 30,455,458
SNP00001861_2.0, 4, 25,161,929
SNP00001794_2.0, 4, 25,146,251
SNP00008709_2.0, 4, 25,161,942
SNP0006324_2.0, 19, 8,027,942
SNP0006321_2.0, 19, 8,027,784
SNP0006262_2.0, 19, 8,026,170
—, 17, 38,273,561
—, 17, 38,270,885
—, 17, 38,274,085
Gene Region, SNP ID in dbSNP
Coding/Intron
Coding/30 -UTR
Coding
30 -UTR/Coding, rs225014
Intron
Intron
Intron
Promoter region, rs34713741
50 -UTR/Promoter region
30 -UTR
Downstream region
30 -UTR/Coding, rs1050450
Promoter region, rs3811699
Downstream region
Downstream region
Intron
Promoter region
Intron
Promoter region
Intron
Downstream region/30 -UTR
Promoter region /Intron
50 -UTR
Promoter region
30 -UTR
Coding
Intron
Coding
30 -UTR/Downstream region
30 -UTR/Downstream region
30 -UTR/Downstream region
—
—
—
NOTE.—The SNP ID in SelenoDB 2.0 is given for all SNPs in protein-coding genes. The SNP ID in dbSNP is given for those SNPs with known functional consequences discussed in
the text.
FIG. 3. Genes involved in the biosynthesis of Sec and its incorporation into proteins (in orange) that contribute most to the signatures of positive
selection in regulatory genes from East Asian populations.
1512
Genetic Adaptation to Dietary Selenium . doi:10.1093/molbev/msv043
populations from these regions in both selenoprotein genes
and genes involved in the regulation of Se and Sec (fig. 2A).
Furthermore, the most Se deficient areas worldwide, where
Se-related diseases are endemic (Keshan Disease Research
Group of the Chinese Academy of Medical Sciences 1979;
Moreno-Reyes et al. 2001; Oldfield 2002; Xia et al. 2005), are
found in China in East Asia. This agrees with the signatures of
local adaptation identified in those populations in China
living in Se deficient areas (fig. 2B). Thus, local adaptation
influenced not only selenoprotein genes but also the tight
regulation in the use of Se that has been observed in humans
under deficiency conditions (Allmang et al. 2009; Burk and
Hill 2009; Hatfield et al. 2012). The absence of signatures in
Cys-containing genes (fig. 2A) shows that the differentiation
observed in Se-related genes is not due to the genetic drift
experienced by populations in East Asia (Keinan et al. 2007).
We are more cautious in the interpretation of the signatures of positive selection in America (fig. 2A). First, they are
observed only in selenoprotein genes and are largely due to a
single gene, the DI2 gene. So, it is unclear whether Se availability (rather than other selective pressures in this gene) is
responsible for them. Second, the signatures are not widespread in this world region but strongest in one particular
population, the Colombian. In any case, little is known about
the levels of dietary Se in humans in America. Although white
muscle disease, a symptom of Se deficiency, is quite prevalent
across livestock from North and South America (Oldfield
2002), and at least ten Latin American and Caribbean
countries are thought to be Se deficient based on the composition of their animal feeds (McDowell 1976), broad Se
deficiency in humans has yet to be reported in America
(Oldfield 2002).
The selenoprotein genes contributing to the signatures of
positive selection in Asia (table 2) contain SNPs with known
functional consequences, some also with high levels of population differentiation. First, DI2, an oxidoreductase that catalyses the conversion of the iodine-dependent hormone T4
to its active form T3 in the thyroid, has a Thr to Ala substitution (rs225014) in exon 2 with high FST values for many East
Asian populations. Tissue samples from individuals homozygous for the Ala allele at this locus have been shown to exhibit
significantly lower DI2 enzyme velocity than samples with
Ala/Thr or Thr/Thr genotypes (Canani et al. 2005). The activity of this enzyme thus depends on two micronutrients, Se
and iodine, whose deficiency has been linked to Kashin–Beck
disease in China (Yao et al. 2011). Second, SelS, a gene involved in stress response in the endoplasmic reticulum and
with a role in mediating inflammation (Curran et al. 2005),
has a noncoding G to A substitution (rs34713741) associated
with colorectal cancer risk in European and Asian populations
(Meplan et al. 2010; Sutherland et al. 2010). This SNP shows
high FST values across many populations in East Asia. Finally,
GPx1, an antioxidant enzyme that reduces hydrogen peroxide
thereby protecting cells from oxidative damage, has a Pro to
Leu amino acid substitution (rs1050450) and a noncoding A
to G SNP (rs3811699) in the promoter region (Hamanishi
et al. 2004) with high FST across many East Asian populations.
This gene and substitution have been previously suggested to
MBE
be under positive selection in Asia (Foster et al. 2006).
Interestingly, the derived Leu allele at rs1050450, which is
found at lower frequencies in East Asia than in the rest of
the world (excluding America), results in 40% lower GPx1
activity than the ancestral Pro allele (Hamanishi et al. 2004).
In addition, the derived G allele at rs3811699 in GPx1 was
shown in the same study to confer a 25% decrease in transcriptional activity and is again found at lower frequencies in
East Asia than elsewhere (except America). This predominance of the ancestral alleles at these loci, with higher transcriptional activity and higher enzymatic activity, could reflect
adaptation to the low Se levels across much of the region.
On the other hand, it is interesting that the regulatory
processes with the strongest signatures of selection in Asia
(table 2 and fig. 3) are the biosynthesis of Sec by SPS2 and
SEPSECS and the insertion of Sec into proteins by CELF1 and
ELAVL1, which regulate SBP2, the SECIS (selenocysteine insertion sequence) binding protein that enables the Sec amino
acid carried by Ser-tRNASec to be inserted at the UGA codon.
It has been suggested that as SPS2 is itself a selenoprotein, and
therefore dependent on the supply of its own product, it
could serve as an auto-regulator of selenoprotein synthesis
(Guimaraes et al. 1996). In this regard, SBP2 has been described by some as the “master regulator” of selenoprotein
synthesis (Bubenik et al. 2009) but SNPs within SBP2 itself
contribute little to the signatures of positive selection found
in Asia. Genes that regulate SBP2 represent an alternative
target where adaptations that would alter the expression of
the entire selenoproteome could occur.
It is also interesting that, despite the hierarchical prioritization of Se supply to different tissues being an important homeostatic mechanism when dietary Se intake is insufficient
(Schomburg and Schweizer 2009), the Se transport protein
SelP and its tissue-specific receptors (Megalin and ApoER2)
do not appear to be the predominant targets for natural
selection in Asia. Nevertheless, SelP and ApoER2, the SelP
receptor in the brain and testis, contribute to the signatures
of selection in Central South Asia and East Asia whereas
Megalin, the kidney SelP receptor, does not. This is compatible with adaptations to maximize the uptake of Se by these
tissues under Se deficiency conditions in this region. Brain and
testis are known to accumulate Se when scarce at the expense
of other tissues (Burk and Hill 2009), which may be explained
by the sensitivity of neurons and sperm to oxidative damage
due to their above average oxygen consumption
(Steinbrenner and Sies 2009). Indeed, the impairment of
SelP and/or ApoER2 leads to severe neurological defects
and reduced fertility in mice (Burk and Hill 2009;
Steinbrenner and Sies 2009).
In conclusion, we provide evidence that genetic changes in
selenoprotein genes and genes that regulate the use of Se
were important for humans to adapt to environments that
do not provide adequate levels of dietary Se. Given the relevance of these adaptions to the ability of humans to inhabit
Se-deficient environments, and to individual susceptibility to
Se-related conditions (Xiong et al. 2010), a better understanding of the functional consequences of the alleles underlying
local adaptation is needed. It is, however, possible that the
1513
MBE
White et al. . doi:10.1093/molbev/msv043
magnitude of their functional impact is in line with the magnitude of their differentiation, which is spread across genes,
making such future assessment challenging. In any case, our
understanding of local adaptions to environmental Se will
only improve with better knowledge of the levels of Se in
the diet of human populations worldwide. Today, the lack
of comparable quantitative data on soil, plant or dietary Se
levels across the world hampers the correlation of human Se
status with genetic data. Progress on the biology of micronutrients and the evolution of their use by humans is thus likely
to come from linking alleles at functional sites with precise
measurement of these essential trace elements across the
world.
Materials and Methods
Candidate Genes
We compiled a list of 50 human genes (table 1). This list
includes the 25 selenoprotein genes in humans and the 6
genes that encode proteins with sulfur (in the form of cysteine, Cys) instead of Se (in the form of selenocysteine, Sec)
(Kryukov et al. 2003). These are paralogs of selenoprotein
genes and their functions may be related to their Sedependent counterparts. We have also included 19 genes
involved in the homeostasis and metabolism of Se and Sec
(Allmang et al. 2009). These are genes whose function and
regulation may depend on levels of Se in the diet.
We used the human selenoprotein gene annotation from
SelenoDB 1.0 (Castellano et al. 2008), which includes the annotation of the SECIS element of each selenoprotein gene.
The annotation of the Cys-containing homologs was also
obtained from SelenoDB 1.0. The annotation of genes involved in the homeostasis and metabolism of Se and Sec, as
well as the alternative transcript forms for all genes, was obtained from RefSeq (Pruitt et al. 2009). For analysis, the 50
genes were split into three groups as defined in table 1: 1)
Selenoprotein genes, 2) Cys-containing paralogous genes, and
3) genes involved in the regulation of Se and Sec, which includes the SelP and SPS2 selenoprotein genes.
Pseudogenes
Following Andres et al. (2010), we used a group of 48 neutrally
evolving loci to derive the neutral expectation for some of the
analyses (supplementary table S3, Supplementary Material
online). These loci are unlinked (physically dispersed)
known pseudogenes that are greater than 400 nt in length
and located over 2,000 nt from any known gene. To ensure
that these pseudogenes are neutral with respect to natural
selection, they 1) do not overlap evolutionary conserved regions of the human genome; 2) do not include simple repeats
or low complexity DNA; 3) do not differ from the rest of the
human genome in terms of recombination rate; 4) are present
as pseudogenes in chimpanzee, orangutan, and rhesus; and 5)
are present in a single copy in the human and chimpanzee
genomes or, when a member of a gene family, only one
member is included. In addition, processed ribosomal pseudogenes and olfactory receptor pseudogenes are not
1514
included. The length of these pseudogenes ranges from 422
to 2,832 nt, and their combined length is 43,700 nt.
Human Samples
We used samples from the Foundation Jean Dausset-Centre
d’Etude du Polymorphisme Humain; Human Genome
Diversity Panel (HGDP-CEPH) (supplementary table S1,
Supplementary Material online). We restricted our analysis
to the H952 HGDP-CEPH set detailed in Rosenberg (2006),
which excludes individuals with second-degree relationships
or closer. Of this set 940 individuals were available for sequencing. Additionally, we removed one Naxi individual
(HGDP01339) prior to variant calling as a result of likely contamination during the sequencing process. The set used for
variant calling consisted of 928 individuals (supplementary
table S2, Supplementary Material online). These individuals
from 53 populations were grouped in one of the following
geographical regions for analysis: Africa, Middle East, Europe,
Central South Asia, East Asia, Oceania, and America. Previous
analyses have shown the Middle East, Central South Asia, and
America to contain inbred populations (Pemberton et al.
2012), which may bias analyses based on allele frequency differences among populations. We therefore excluded from our
analysis the population from each of these regions having the
most inbred individuals (Pemberton et al. 2012) (supplementary table S2, Supplementary Material online). Our final analysis is thus based on 855 individuals from 50 worldwide
populations.
Array Capture Design
We designed an Agilent custom array (Agilent Technologies)
to target all exons (coding and noncoding untranslated regions, UTRs) plus 400 nt of the surrounding introns (200 nt
at each side of each exon) and 2,000 nt upstream (to include promoter regions) of all candidate genes (table 1).
Additionally, for genes where the target region was shorter
than 4,000 nt we added 2,000 nt of extra downstream region
and for tRNA genes 3,000 nt were added upstream and downstream. The complete coding region of each pseudogene was
targeted (supplementary table S3, Supplementary Material
online).
Using the human reference sequence (hg18) as a template,
we designed a total of 469,695 probes (419,468 and 50,227 for
the candidate and the putatively neutrally evolving genes,
respectively). The probes were of 60 nt in length, started
50 nt before and after each targeted region, and were set
every 1 nt to maximize capture efficiency. The probes target
a total of 524,875 nt (470,800 and 54,083 nt for the candidate
and the putatively neutrally evolving genes, respectively) of
the human genome.
Library Preparation, Capture, and Sequencing
We prepared 940 libraries from the HGDP-CEPH samples
following a double index protocol (Meyer and Kircher
2010). The genomic DNA (100 ml, 10 ng/ml) of each sample
was sonicated (Bioruptor) to fragments of 150–300 nt in
length. Target capture was performed in batches of pooled
Genetic Adaptation to Dietary Selenium . doi:10.1093/molbev/msv043
libraries with around 90 samples per pool (Hodges et al. 2009).
The enriched libraries were amplified and sequenced using
the Illumina GAIIx platform yielding 76-bp paired-end reads
Variant Calling
Base-calling was performed with Ibis (Kircher et al. 2009).
Reads were mapped to the human reference genome
(hg19) using BWA (Li and Durbin 2009) yielding an average
on-target coverage of 20-fold per individual and per gene after
filtering out duplicate reads and those with a mapping quality
less than 25. The GATK IndelRealigner was used to improve
read alignment in indel regions (McKenna et al. 2010). We
called genotypes using the GATK UnifiedGenotyper version
2.2 (DePristo et al. 2011), and filtered them to remove genotypes in sites that 1) had a coverage below 8 in more than
50% of samples, 2) had an average coverage above 100, 3)
were indels or SNPs within 5 bp of an indel, 4) were multiallelic sites, 5) had an SNP quality less than 20, and 6) had a
strand bias greater than 10. We additionally filtered out sites
that did not have 1:1 human–chimp correspondence in the
Ensembl EPO 6 primate alignments (Paten et al. 2008) or were
identified as being prone to systematic error (Castellano et al.
2014). Genotypes with a quality (GQ) less than 20 were
masked out but the positions retained unless all genotypes
at that position had a GQ less than 20. This resulted in 7,989
and 1,035 SNPs in the candidate genes and pseudogenes,
respectively, across populations.
Population Differentiation
We measured population differentiation with the FST statistic
according to the formula of Weir and Cockerham (1984). This
formula calculates FST per SNP in an unbiased way (that accounts for both genetic and statistical sampling) using an
analysis of variance approach. The formula represents the
proportion of the overall genetic diversity that is accounted
for by allele frequency differences between pairs of populations. As originally defined by Wright (1951), FST values range
from 0 (no population differentiation) to 1 (no allele sharing
among populations); however, the Weir and Cockerham
(1984) method can result in negative FST values which were
considered as 0 in our analyses. We did not calculate FST
values for sites where fewer than 50% of individuals in a
population had a genotype call. To calculate an FST value
per gene (based on all SNPs across that gene), we used a
weighted average by first summing the numerator over all
SNPs then summing the denominator over all SNPs before
finding the quotient (Weir and Cockerham 1984). Thus, the
resulting value is not the same as the average of single SNP FST
values. We calculated FST between each pair of populations in
supplementary table S2, Supplementary Material online, and
for each pair of populations, an FST value was obtained for
each gene in table 1.
Genes and SNPs Contributing to the Signatures of
Positive Selection
An overview of this analysis can be found in figure 1. To
identify the genes that contribute most to the patterns in
MBE
figure 2, we used the rank of their FST values in the 5% upper
tail of the neutral distribution. We then sorted the ranks of
these extreme FST values in ascending order and gave the first
rank a value of 1, the next rank a value of 2, and so on (fig. 1B).
Any identical ranks were given the same value. All values for
each gene were then summed (fig. 1B). This produces a combined measure of the number of times the FST of a gene
departs from the neutral expectation between populations
in two geographical regions and the strength of such departure. Genes can achieve a high summed value by having many
FST values within the 5% tail or by having a few very extreme
FST values, or more usually by a combination of both. The
genes with the highest summed values contribute most to
the signatures of positive selection shown in figure 2A. We
present these genes in table 2 and supplementary tables
S4–S6, Supplementary Material online. We then took the
five highest ranking genes in East Asia from each group
(table 2 and supplementary table S4, Supplementary
Material online) and performed the same ranking process
using the per-SNP FST values from each of these genes, this
time summing values for each SNP across pairs of populations
(supplementary tables S8 and S9, Supplementary Material
online). This indicates those SNPs within each gene with
the highest contribution to the signatures of positive selection
in East Asia in figure 2A.
We further classified the populations from China into two
groups: Those populations sampled from within the most Se
deficient regions (pink areas of map in Oldfield, page 45; less
than 0.02 ppm of Se based on local animal feeds and forages)
(Oldfield 2002) and those populations that were sampled
from regions likely to be Se adequate (more than 0.02 ppm
of Se) or where it was difficult to tell (supplementary table S7,
Supplementary Material online), making our test conservative. A two-sided Mann–Whitney U test was then performed
using vectors of FST ranks from these two groups to ascertain
whether the signals of positive selection in Se-related genes
are stronger in populations that are likely to be Se deficient
(fig. 2B).
Classic Neutrality Tests
We computed Tajima’s D and Fay & Wu’s H statistics (Tajima
1989; Fay and Wu 2000) using the Perl program “Neutrality
Test Pipeline” (NTP) (Andres et al. 2010). Tajima’s D compares the number of segregating sites (S) against the average
nucleotide diversity () in a gene. S and are both independent estimators of (4Ne) and are expected to be equal
under neutrality, but behave differently when a population
evolves nonneutrally. Significantly negative values of Tajima’s
D are consistent with positive or purifying selection (excess of
low frequency alleles). Fay & Wu’s H can help to distinguish
positive selection from purifying selection and population size
expansion in cases where Tajima’s D is negative. A negative
Fay & Wu’s H detects the excess of high-frequency-derived
alleles produced when an advantageous variant has rapidly
increased to high frequencies pulling linked variation with it.
H has the greatest power to detect selection when the
1515
MBE
White et al. . doi:10.1093/molbev/msv043
selected allele is very close to being fixed or has just fixed in
the population.
NTP runs a modified version of msstats (Thornton 2003)
on genotype input data and computes a P value for each
output statistic based on supplied coalescent simulations.
Each gene and each control region were run individually.
Fay & Wu’s H test requires an outgroup in order to distinguish
the derived allele from the ancestral one. The inferred sequence of the human–chimp ancestor for each gene and
control region was obtained from Ensembl (Flicek et al.
2013) and added to the NTP input. There were a number
of positions at which the inferred ancestral allele did not
match either the present human reference or alternative
allele. These positions were removed prior to the NTP analysis
because in this situation neither allele can be assigned as being
derived. We used the Benjamini and Hochberg method
(Benjamini and Hochberg 1995) for multiple test correction.
Coalescent Simulations
We simulated a neutral expectation using a slightly modified
four population demography (Africa, Europe, Asia, and
America) from Gutenkunst et al. (2009). This demographic
scenario accounts for both population expansion and admixture between populations and includes a 48% admixture between the European and American populations after the split
between the Asian and European populations. This model
was based on a population of Mexican ancestry living in
the United States. The American populations in our data
set show very little European admixture (Li et al. 2008) and
so this European and American admixture was removed from
our demography. The use of this demographic scenario required the assignment of our 50 populations to the four
populations included in the demography. This was done
with reference to the k = 4 plot in Li et al. (2008). This grouped
the populations as shown in supplementary table S2,
Supplementary Material online, for NTP.
We performed 10,000 coalescent simulations with ms
(Hudson 2002) for each gene and each pseudogene in each
HGDP-CEPH population. Population and gene/pseudogenespecific details, for example, the number of chromosomes and
segregating sites were used in each simulation. NTP accounts
for missing genotype data by marking those positions as also
missing in the ms simulations.
Data Deposition
Human gene annotations and their SNPs in the HGDP-CEPH
populations are available in SelenoDB 2.0 (Romagne et al.
2014) at http://www.selenodb.org.
Supplementary Material
Supplementary tables S1–S9 are available at Molecular Biology
and Evolution online (http://www.mbe.oxfordjournals.org/).
Acknowledgments
The authors thank Raymond Burk, Kristina Hill, and Alain
Krol for help with the list of genes involved in the homeostasis
and metabolism of Se and Sec. They also thank Mark
1516
Stoneking for access to the HGDP-CEPH samples and Ryan
Gutenkunst for advice about population admixture in these
samples. They also thank Felix M. Key for help with the NTP
and ms analyses. This work was supported by the Max Planck
Society.
References
Ahmad K, Khan ZI, Ashraf M, Shah ZA, Ejaz A, Valeem EE. 2009. Timecourse chages in selenium status of soil and forage in a pasture in
Sargodha, Punjab, Pakistan. Pak J Bot. 41:2397–2401.
Allmang C, Wurth L, Krol A. 2009. The selenium to selenoprotein pathway in eukaryotes: more molecular partners than anticipated.
Biochim Biophys Acta. 1790:1415–1423.
Andres AM, Dennis MY, Kretzschmar WW, Cannons JL, Lee-Lin SQ,
Hurle B, Schwartzberg PL, Williamson SH, Bustamante CD, Nielsen
R, et al. 2010. Balancing selection maintains a form of ERAP2 that
undergoes nonsense-mediated decay and affects antigen presentation. PLoS Genet. 6:e1001157.
Benjamini Y, Hochberg Y. 1995. Controlling the false discovery rate: a
practical and powerful approach to multiple testing. J R Stat Soc B.
57:289–300.
Bhutta ZA, Salam RA. 2012. Global nutrition epidemiology and trends.
Ann Nutr Metab. 61(Suppl. 1): 19–27.
Bifano AL, Atassi T, Ferrara T, Driscoll DM. 2013. Identification of nucleotides and amino acids that mediate the interaction between
ribosomal protein L30 and the SECIS element. BMC Mol Biol. 14:12.
Bubenik JL, Ladd AN, Gerber CA, Budiman ME, Driscoll DM. 2009.
Known turnover and translation regulatory RNA-binding proteins
interact with the 30 UTR of SECIS-binding protein 2. RNA Biol. 6:
73–83.
Burk RF, Hill KE. 2009. Selenoprotein P-expression, functions, and roles in
mammals. Biochim Biophys Acta. 1790:1441–1447.
Caballero B. 2002. Global patterns of child health: the role of nutrition.
Ann Nutr Metab. 46(Suppl. 1): 3–7.
Canani LH, Capp C, Dora JM, Meyer EL, Wagner MS, Harney JW, Larsen
PR, Gross JL, Bianco AC, Maia AL. 2005. The type 2 deiodinase A/G
(Thr92Ala) polymorphism is associated with decreased enzyme velocity and increased insulin resistance in patients with type 2 diabetes mellitus. J Clin Endocrinol Metab. 90:3472–3478.
Cann HM, de Toma C, Cazes L, Legrand MF, Morel V, Piouffre L, Bodmer
J, Bodmer WF, Bonne-Tamir B, Cambon-Thomsen A, et al. 2002. A
human genome diversity cell line panel. Science 296:261–262.
Castellano S, Andres AM, Bosch E, Bayes M, Guigo R, Clark AG. 2009.
Low exchangeability of selenocysteine, the 21st amino acid, in vertebrate proteins. Mol Biol Evol. 26:2031–2040.
Castellano S, Gladyshev VN, Guigo R, Berry MJ. 2008. SelenoDB 1.0: a
database of selenoprotein genes, proteins and SECIS elements.
Nucleic Acids Res. 36:D332–D338.
Castellano S, Parra G, Sanchez-Quinto FA, Racimo F, Kuhlwilm M,
Kircher M, Sawyer S, Fu Q, Heinze A, Nickel B, et al. 2014.
Patterns of coding variation in the complete exomes of three
Neandertals. Proc Natl Acad Sci U S A. 111:6666–6671.
Curran JE, Jowett JB, Elliott KS, Gao Y, Gluschenko K, Wang J, Abel Azim
DM, Cai G, Mahaney MC, Comuzzie AG, et al. 2005. Genetic variation in selenoprotein S influences inflammatory response. Nat
Genet. 37:1234–1241.
DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C,
Philippakis AA, del Angel G, Rivas MA, Hanna M, et al. 2011. A
framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. 43:491–498.
Diamond J. 2002. Evolution, consequences and future of plant and
animal domestication. Nature 418:700–707.
Dumitrescu AM, Liao XH, Abdullah MS, Lado-Abeal J, Majed FA, Moeller
LC, Boran G, Schomburg L, Weiss RE, Refetoff S. 2005. Mutations in
SECISBP2 result in abnormal thyroid hormone metabolism. Nat
Genet. 37:1247–1252.
Fay JC, Wu CI. 2000. Hitchhiking under positive Darwinian selection.
Genetics 155:1405–1413.
Genetic Adaptation to Dietary Selenium . doi:10.1093/molbev/msv043
Flicek P, Ahmed I, Amode MR, Barrell D, Beal K, Brent S, Carvalho-Silva
D, Clapham P, Coates G, Fairley S, et al. 2013. Ensembl 2013. Nucleic
Acids Res. 41:D48–D55.
Foster CB, Aswath K, Chanock SJ, McKay HF, Peters U. 2006.
Polymorphism analysis of six selenoprotein genes: support for a
selective sweep at the glutathione peroxidase 1 locus (3p21) in
Asian populations. BMC Genet. 7:56.
Fu Q, Mittnik A, Johnson PL, Bos K, Lari M, Bollongino R, Sun C, Giemsch
L, Schmitz R, Burger J, et al. 2013. A revised timescale for human
evolution based on ancient mitochondrial genomes. Curr Biol. 23:
553–559.
Gromer S, Johansson L, Bauer H, Arscott LD, Rauch S, Ballou DP,
Williams CH Jr, Schirmer RH, Arner ES. 2003. Active sites of thioredoxin reductases: why selenoproteins? Proc Natl Acad Sci U S A.
100:12618–12623.
Guimaraes MJ, Peterson D, Vicari A, Cocks BG, Copeland NG, Gilbert DJ,
Jenkins NA, Ferrick DA, Kastelein RA, Bazan JF, et al. 1996.
Identification of a novel selD homolog from eukaryotes, bacteria,
and archaea: is there an autoregulatory mechanism in selenocysteine metabolism? Proc Natl Acad Sci U S A. 93:15086–15091.
Gutenkunst RN, Hernandez RD, Williamson SH, Bustamante CD. 2009.
Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet. 5:
e1000695.
Hamanishi T, Furuta H, Kato H, Doi A, Tamai M, Shimomura H,
Sakagashira S, Nishi M, Sasaki H, Sanke T, et al. 2004. Functional
variants in the glutathione peroxidase-1 (GPx-1) gene are associated
with increased intima-media thickness of carotid arteries and risk of
macrovascular diseases in japanese type 2 diabetic patients. Diabetes
53:2455–2460.
Hatfield DL, Berry MJ, Gladyshev VN. 2012. Selenium: its molecular biology and role in human health. New York: Springer.
Hodges E, Rooks M, Xuan Z, Bhattacharjee A, Benjamin Gordon D,
Brizuela L, Richard McCombie W, Hannon GJ. 2009. Hybrid selection
of discrete genomic intervals on custom-designed microarrays for
massively parallel sequencing. Nat Protoc. 4:960–974.
Hudson RR. 2002. Generating samples under a Wright-Fisher neutral
model of genetic variation. Bioinformatics 18:337–338.
Hurst R, Siyame EW, Young SD, Chilimba AD, Joy EJ, Black CR, Ander EL,
Watts MJ, Chilima B, Gondwe J, et al. 2013. Soil-type influences
human selenium status and underlies widespread selenium deficiency risks in Malawi. Sci Rep. 3:1425.
Keinan A, Mullikin JC, Patterson N, Reich D. 2007. Measurement of the
human allele frequency spectrum demonstrates greater genetic drift
in East Asians than in Europeans. Nat Genet. 39:1251–1255.
Keshan Disease Research Group of the Chinese Academy of Medical
Sciences.. 1979. Observations on effect of sodium selenite in prevention of Keshan disease. Chin Med J (Engl). 92:471–476.
Khan ZI, Ashraf M, Danish M, Ahmad K, Valeem EE. 2008. Assessment of
selenium content in pasture and ewes in Punjab, Pakistan. Pak J Bot.
40:1159–1162.
Khan ZI, Hussain A, Ashraf M, McDowell R. 2006. Mineral status of soils
and forages in southwestern Punjab-Pakistan: micro-minerals. AsianAust J Anim Sci. 19:1139–1147.
Kircher M, Stenzel U, Kelso J. 2009. Improved base calling for the
Illumina Genome Analyzer using machine learning strategies.
Genome Biol. 10:R83.
Koivistoinen P, Huttunen JK. 1986. Selenium in food and nutrition in
Finland. An overview on research and action. Ann Clin Res. 18:13–17.
Kryukov GV, Castellano S, Novoselov SV, Lobanov AV, Zehtab O, Guigo
R, Gladyshev VN. 2003. Characterization of mammalian selenoproteomes. Science 300:1439–1443.
Latreche L, Jean-Jean O, Driscoll DM, Chavatte L. 2009. Novel structural
determinants in human SECIS elements modulate the translational
recoding of UGA as selenocysteine. Nucleic Acids Res. 37:5868–5880.
Li H, Durbin R. 2009. Fast and accurate short read alignment with
Burrows-Wheeler transform. Bioinformatics 25:1754–1760.
Li JZ, Absher DM, Tang H, Southwick AM, Casto AM, Ramachandran S,
Cann HM, Barsh GS, Feldman M, Cavalli-Sforza LL, et al. 2008.
MBE
Worldwide human relationships inferred from genome-wide patterns of variation. Science 319:1100–1104.
McDowell LR. 1976. Mineral deficiencies and toxicities and their effect
on beef production in developing countries. In: Smith AJ, editor. Beef
cattle production in developing countries. Edinburgh (United
Kingdom): University of Edinburgh, Centre for Tropical Veterinary
Medicine. p. 216–241.
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A,
Garimella K, Altshuler D, Gabriel S, Daly M, et al. 2010. The
Genome Analysis Toolkit: a MapReduce framework for analyzing
next-generation DNA sequencing data. Genome Res. 20:1297–1303.
Meplan C, Hughes DJ, Pardini B, Naccarati A, Soucek P, Vodickova L,
Hlavata I, Vrana D, Vodicka P, Hesketh JE. 2010. Genetic variants in
selenoprotein genes increase risk of colorectal cancer. Carcinogenesis
31:1074–1079.
Mertz W. 1981. The essential trace elements. Science 213:1332–1338.
Meyer M, Kircher M. 2010. Illumina sequencing library preparation for
highly multiplexed target capture and sequencing. Cold Spring Harb
Protoc. 2010:pdb prot5448.
Moreno-Reyes R, Egrise D, Neve J, Pasteels JL, Schoutens A. 2001.
Selenium deficiency-induced growth retardation is associated with
an impaired bone metabolism and osteopenia. J Bone Miner Res. 16:
1556–1563.
Novakovic R, Cavelaars A, Geelen A, Nikolic M, Altaba II, Vinas BR, Ngo J,
Golsorkhi M, Medina MW, Brzozowska A, et al. 2014. Socioeconomic determinants of micronutrient intake and status in
Europe: a systematic review. Public Health Nutr. 17:1031–1045.
Novakovic R, Cavelaars AE, Bekkering GE, Roman-Vinas B, Ngo J,
Gurinovic M, Glibetic M, Nikolic M, Golesorkhi M, Medina MW,
et al. 2013. Micronutrient intake and status in Central and Eastern
Europe compared with other European countries, results from the
EURRECA network. Public Health Nutr. 16:824–840.
Oldfield JE. 2002. Selenium world atlas. Grimbergen (Belgium): STDA.
Paten B, Herrero J, Beal K, Fitzgerald S, Birney E. 2008. Enredo and Pecan:
genome-wide mammalian consistency-based multiple alignment
with paralogs. Genome Res. 18:1814–1828.
Pemberton TJ, Absher D, Feldman MW, Myers RM, Rosenberg NA, Li JZ.
2012. Genomic patterns of homozygosity in worldwide human populations. Am J Hum Genet. 91:275–292.
Perry GH, Dominy NJ, Claw KG, Lee AS, Fiegler H, Redon R, Werner J,
Villanea FA, Mountain JL, Misra R, et al. 2007. Diet and the evolution
of human amylase gene copy number variation. Nat Genet. 39:
1256–1260.
Pruitt KD, Tatusova T, Klimke W, Maglott DR. 2009. NCBI Reference
Sequences: current status, policy and new initiatives. Nucleic Acids
Res. 37:D32–D36.
Rayman MP. 2012. Selenium and human health. Lancet 379:
1256–1268.
Romagne F, Santesmasses D, White L, Sarangi GK, Mariotti M, Hubler R,
Weihmann A, Parra G, Gladyshev VN, Guigo R, et al. 2014. SelenoDB
2.0: annotation of selenoprotein genes in animals and their genetic
diversity in humans. Nucleic Acids Res. 42:D437–D443.
Rosenberg NA. 2006. Standardized subsets of the HGDP-CEPH Human
Genome Diversity Cell Line Panel, accounting for atypical and duplicated samples and pairs of close relatives. Ann Hum Genet. 70:
841–847.
Safaralizadeh R, Kardar GA, Pourpak Z, Moin M, Zare A, Teimourian S.
2005. Serum concentration of selenium in healthy individuals living
in Tehran. Nutr J. 4:32.
Schomburg L, Schweizer U. 2009. Hierarchical regulation of selenoprotein expression and sex-specific effects of selenium. Biochim Biophys
Acta. 1790:1453–1462.
Selinus O, Alloway BJ. 2005. Essentials of medical geology: impacts of
the natural environment on public health. Amsterdam (The
Netherlands): Elsevier Academic Press.
Snider GW, Ruggles E, Khan N, Hondal RJ. 2013. Selenocysteine confers
resistance to inactivation by oxidation in thioredoxin reductase:
comparison of selenium and sulfur enzymes. Biochemistry 52:
5472–5481.
1517
White et al. . doi:10.1093/molbev/msv043
Steinbrenner H, Sies H. 2009. Protection against reactive oxygen species
by selenoproteins. Biochim Biophys Acta. 1790:1478–1485.
Steinmann D, Nauser T, Koppenol WH. 2010. Selenium and sulfur
in exchange reactions: a comparative study. J Org Chem. 75:
6696–6699.
Stringer CB, Andrews P. 1988. Genetic and fossil evidence for the origin
of modern humans. Science 239:1263–1268.
Sutherland A, Kim DH, Relton C, Ahn YO, Hesketh J. 2010.
Polymorphisms in the selenoprotein S and 15-kDa selenoprotein
genes are associated with altered susceptibility to colorectal
cancer. Genes Nutr. 5:215–223.
Tajima F. 1989. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123:585–595.
Thomson CD. 2004. Selenium and iodine intakes and status in New
Zealand and Australia. Br J Nutr. 91:661–672.
Thornton K. 2003. Libsequence: a C++ class library for evolutionary
genetic analysis. Bioinformatics 19:2325–2327.
1518
MBE
Tishkoff SA, Reed FA, Ranciaro A, Voight BF, Babbitt CC, Silverman JS,
Powell K, Mortensen HM, Hirbo JB, Osman M, et al. 2007.
Convergent adaptation of human lactase persistence in Africa and
Europe. Nat Genet. 39:31–40.
Weir BS, Cockerham CC. 1984. Estimating F-statistics for the analysis of
population-structure. Evolution 38:1358–1370.
Wright S. 1951. The genetical structure of populations. Ann Eugen. 15:
323–354.
Xia Y, Hill KE, Byrne DW, Xu J, Burk RF. 2005. Effectiveness of selenium
supplements in a low-selenium area of China. Am J Clin Nutr. 81:
829–834.
Xiong YM, Mo XY, Zou XZ, Song RX, Sun WY, Lu W, Chen Q, Yu YX,
Zang WJ. 2010. Association study between polymorphisms in selenoprotein genes and susceptibility to Kashin-Beck disease.
Osteoarthr Cartil. 18:817–824.
Yao Y, Pei F, Kang P. 2011. Selenium, iodine, and the relation with
Kashin-Beck disease. Nutrition 27:1095–1100.