PDF - Oxford Academic - Oxford University Press

Network-Level and Population Genetics Analysis of the
Insulin/TOR Signal Transduction Pathway Across
Human Populations
Pierre Luisi,1 David Alvarez-Ponce,2,3 Giovanni Marco Dall’Olio,1 Martin Sikora,4 Jaume Bertranpetit,*,1
and Hafid Laayouni1
1
Institute of Evolutionary Biology (UPF-CSIC) CEXS-UPF-PRBB, Barcelona, Catalonia, Spain
Departament de Genètica, Universitat de Barcelona, Barcelona, Spain
3
Department of Biology, National University of Ireland Maynooth, Maynooth, County Kildare, Ireland
4
Department of Genetics, Stanford University School of Medicine
*Corresponding author: E-mail: [email protected].
Associate editor: Douglas Crawford
2
Genes and proteins rarely act in isolation, but they rather operate as components of complex networks of interacting
molecules. Therefore, for understanding their evolution, it may be helpful to take into account the interaction networks in
which they participate. It has been shown that selective constraints acting on genes depend on the position that they
occupy in the network. Less understood is how the impact of local adaptation at the intraspecific level is affected by the
network structure. Here, we analyzed the patterns of molecular evolution of 67 genes involved in the insulin/target of
rapamycin (TOR) signal transduction pathway. This well-characterized pathway plays a key role in fundamental processes
such as energetic metabolism, growth, reproduction, and aging and is involved in metabolic disorders such as obesity,
insulin resistance, and diabetes. For that purpose, we combined genotype data from worldwide human populations with
current knowledge of the structure and function of the pathway. We identified the footprint of recent positive selection in
nine of the studied genomic regions. Most of the adaptation signals were observed among Middle East and North African,
European, and Central South Asian populations. We found that positive selection preferentially targets the most central
elements in the pathway, in contrast to previous observations in the whole human interactome. This observation indicates
that the impact of positive selection on genes involved in the insulin/TOR pathway is affected by the pathway structure.
Key words: human populations, positive selection, insulin/TOR signal transduction pathway, network evolution.
Introduction
During their diaspora, humans have been exposed to a wide
range of different environments, to which they have had to
adapt. To study the process of adaptation, we can interrogate genomic data to determine which genes evolved under positive selection. Since positive selection leads to the
adaptation of a specific phenotype to a given environment,
such knowledge may be helpful to understand the complex
phenotype–genotype relationship, which remains largely
unknown. Whereas comparison of the genomes of different
species can provide insight into the evolutionary forces
that acted at a large timescale, using data from individuals
of the same species enables the detection of the genomic
footprints of more recent adaptive events.
Genes in a genome are affected by disparate evolutionary forces, and one of the goals in Evolutionary Biology is to
determine the factors underlying this variability. Genes and
proteins rarely act in isolation but rather operate as components of complex networks of interacting molecules.
Therefore, for understanding their evolution, it may be
helpful to take into account the interaction networks in
which they participate (Loewe 2009; Yamada and Bork
2009; Barton et al. 2010; Wolf et al. 2010). The combination
of comparative genomics and network data has shown that
the patterns of molecular evolution of genes are related to
the structure of the networks in which they participate,
with a number of measures of network position correlating
with levels of selective constraint. First, genes occupying
more central positions in a protein–protein interaction
network are more selectively constrained than those acting
at the periphery (Fraser et al. 2002; Hahn and Kern 2005;
Lemos et al. 2005). Second, genes encoding interacting proteins are similarly affected by purifying selection (Fraser
et al. 2002; Lemos et al. 2005), which can be attributed
to molecular coevolution or to interacting genes being affected by similar selective pressures (Codoñer and Fares
2008; Fares et al., 2011). And third, in a number of pathways, including the insulin/TOR pathway (Alvarez-Ponce
et al. 2009; Alvarez-Ponce, Aguade et al. 2011; AlvarezPonce, Guirao-Rico et al. 2011), a polarity in the distribution of levels of selective constraint levels has been
observed along the upstream/downstream pathway axis
(Rausher et al. 1999; Riley et al. 2003; Sharkey et al.
2005; Livingstone and Anderson 2009; Ramsay et al.
© The Author 2011. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please
e-mail: [email protected]
Mol. Biol. Evol. 29(5):1379–1392. 2012 doi:10.1093/molbev/msr298
Advance Access publication December 1, 2011
1379
Research article
Abstract
MBE
Luisi et al. · doi:10.1093/molbev/msr298
2009; Montanucci et al. 2011). Less understood is how the
patterns of positive selection are affected by the position of
genes in the network, although some tendencies have been
observed: positive selection seems to preferentially target
genes encoding the most peripheral proteins in the human
protein–protein interaction network (Kim et al. 2007), and
in Drosophila, positive selection tends to target genes acting at branch points of the pathways involved in glucose
metabolism (Flowers et al. 2007). More work is warranted
to understand the relationship between the structure of
functional networks and the action of positive selection.
Molecular pathways can be classified as metabolic, signal
transduction, and transcriptional regulatory pathways. Signaling pathways allow the integration of cell function with
extracellular stimuli. The insulin/TOR signal transduction
pathway plays a key role in controlling energetic metabolism, growth, reproduction, and aging in response to nutritional conditions (Oldham and Hafen 2003; LeRoith et al.
2004). The structure and function of this pathway have
been well characterized in a number of organisms, revealing
a high degree of conservation across metazoans (Oldham
and Hafen 2003; Kirkness et al. 2010). Insulin is secreted in
response to high levels of blood sugar. Upon binding to
insulin, the insulin receptor (InR) undergoes autophosphorylation, thus becoming able to recruit a complex of
proteins to the inner cell membrane, including the phosphatidylinositol 3-kinase (PI3K). PI3K catalyzes the biosynthesis of phosphatidylinositol 3,4,5-trisphosphate (PIP3),
a membrane lipid that serves as anchor for pleckstrin homology domain-containing proteins, including PDK1, PKB,
and PKC. Once in the membrane, these proteins become
able to trigger the activation of downstream proteins, including transcription factors and ribosomal proteins, which
in turn mediate the physiological effects of insulin.
The prevalence of obesity and closely related diseases
such as insulin resistance and type 2 diabetes is dramatically different among human populations (King et al.
1998; Reiner et al. 2007; van Vliet-Ostaptchouk et al.
2009). The differences observed among individuals of different ethnical origin in the same country (Diamond 2003;
Wang and Beydoun 2007; Anderson and Whitaker 2009;
Diamond 2011) underline the fact that these metabolic disorders are not only caused by environmental exposures but
also by genetic factors. The insulin/TOR pathway plays
a key role in these diseases; indeed, dysfunction of its components often leads to insulin resistance, diabetes, obesity,
or cancer (Oldham and Hafen 2003; LeRoith et al. 2004;
Taguchi and White 2008). Thus, applying population genetics analysis to describe how natural selection acted in
different populations on the genes involved in this pathway
may provide key insight into the etiology of these diseases.
Here, we used human population genetics data to study
the impact of recent positive selection (within the last
;5,000–45,000 years) on the molecular evolution of 67
genes involved in the human insulin/TOR signal transduction pathway. We identified signals of potential recent positive selection in nine of the studied genes. Most of these
signals were observed in Middle East and North African,
1380
European, and Central South Asian populations. We found
that the impact of positive selection on the insulin/TOR
pathway genes depends on the position that their encoded
products occupy in the interaction network. Contrary to
previously observed in the human protein–protein interaction network (Kim et al. 2007), positive selection preferentially targets genes encoding the most central proteins in
the pathway. This result indicates that the structure of the
pathway influences the patterns of molecular evolution of
its components.
Materials and Methods
Genomic Regions Analyzed
We obtained a list of 69 human genes from a previous study
of the vertebrate insulin/TOR pathway (Alvarez-Ponce,
Aguade et al. 2011). Most of these genes are known to
be involved in the pathway according to the literature or
are close paralogs that cluster within these genes in phylogenetic trees. We excluded from the analysis genes that, despite having paralogs that are involved in the pathway, are
known not to play a role in insulin/TOR signaling: INSRR,
encoding the insulin-related receptor, which is not activated
by insulin or IGFs (Zhang and Roth 1992); and PIK3CG, encoding the protein p110c, which is activated by G protein–
coupled receptors and not by tyrosine kinases (Stephens
et al. 1997; Leopoldt et al. 1998). Additionally, we included
in the present analysis the genes encoding insulin (INS) and
the insulin-like growth factors I (IGF1) and II (IGF2). Some of
the methods used for detecting signals of positive selection
(see below) have been devised for autosomal regions or
provide results that cannot be compared between the
autosomal regions and the ones linked to the X chromosome. Therefore, genes IRS4, FOXO4, and MYCL2, which
are linked to chromosome X, were excluded from the
analysis. The final list therefore consisted of 67 genes, which
belong to 24 groups of paralogs (supplementary table S1,
Supplementary Material online).
For each gene, we analyzed the genomic region including
the gene plus 200 flanking kilobases (100 Kb upstream and
100 Kb downstream of the transcript). Gene coordinates
were obtained from release 36 of the human genome at
NCBI. For genes with multiple splicing variants, we considered the transcript spanning the longest chromosome region. Two pairs of overlapping regions (INS and IGF2;
PDPK1 and AC141586.2) were merged, and therefore,
the analyses were conducted on a total of 65 genomic regions (supplementary table S1, Supplementary Material online), of which 58 were used in network-level analyses (see
below).
Population Genomic Data
For each of the 65 studied genomic regions, we obtained
single nucleotide polymorphism (SNP) data from the Human Genome Diversity Panel–Centre d’Études du Polymorphisme Humain (HGDP-CEPH) (Cann et al. 2002).
We restricted our analysis to a subset composed of 940
genotyped individuals from 39 populations as described
Network-Level and Population Genetics Analysis · doi:10.1093/molbev/msr298
in Gonzalez-Neira et al. (2006). This subpanel does not contain any pair of individuals with a first or second degree
relationship (Rosenberg 2006). These populations were
grouped into seven geographic regions as defined by Li
et al. (2008): Sub-Saharan Africa, Middle East and North
Africa, Europe, Central South Asia, East Asia, America,
and Oceania. These seven groups differ in both the number
of populations (ranging from two to nine) and the number
of individuals (from 27 to 219; supplementary table S2, Supplementary Material online). For each individual, data for
;650,000 SNPs distributed along the whole genome is
available (Jakobsson et al. 2008; Li et al. 2008).
We removed all SNPs with a minor allele frequency
(MAF) lower than 5% either considering individuals within
each geographic area or considering all individuals from the
seven geographic regions together, depending on whether
the methods were applied within a specific geographic area
or to compare different regions, respectively. After filtering
at a global level, our data set consists of 586,210 SNPs (the
number of SNPs used for separate analysis of each geographic area is provided in supplementary table S3, Supplementary Material online).
Detection of Positive Selection
In order to detect signals of positive selection in the studied
genomic regions, three different methods were used. The
integrated haplotype score (iHS) is based on the extended
haplotype homozygosity measure, which aims at detecting
the footprint of positive selection from the local haplotype
structure. The two other methods (FST and DDAF indexes)
are based on genetic differentiation between geographic
areas.
A selective sweep causes the selected allele to spread
rapidly through the population, leading to an increased
linkage disequilibrium compared with the neutral expectation. In each geographic region, we computed a raw iHS for
each SNP following the method proposed by Voight et al.
(2006). Standardized iHS were obtained by grouping SNPs
into bins of frequency 0.05, subtracting the mean, and dividing by the standard deviation for all SNPs in the same
bin. Extreme positive or negative values indicate high extended haplotype homozygosity of haplotypes carrying
the ancestral or derived allele, respectively. Hence, we consider both extreme positive or negative iHS as potential signatures of positive selection, using jiHSsj as in Voight et al.
(2006). Genetic distances were obtained from the genetic
map provided by the HapMap Consortium release 22
(International HapMap Consortium 2007). Genotype
phases were inferred using the fastPhase program (Scheet
and Stephens 2006).
The FST index (Weir and Hill 2002) summarizes genetic
differentiation among populations by estimating the proportion of interindividual variance that can be explained by
the intergroup variance. For each SNP, we computed the
FST index between each geographic region and the remaining ones using the PopGen module from BioPerl (Stajich
et al. 2002). DDAF (Grossman et al. 2010) describes the
difference in derived allele frequencies (DAF) between
MBE
two sets of individuals. This score was computed as
DS DNS, where DS and DNS are the DAF in the studied
geographic region and in the remaining ones, respectively.
DDAF scores range from 1 to 1, with negative and
positive values indicating whether the derived allele is
less or more frequent in the given geographic area, respectively. Large FST and jDDAFj values can be the result of
the action of positive selection on the studied SNP or in
a linked one in a specific geographic area (Lewontin and
Krakauer 1973; Grossman et al. 2010). Computation
of the iHS and DDAF score requires ancestral allele information. Ancestral states inferred from comparison with
orthologous sequences in the chimpanzee and rhesus
macaque genomes were obtained from the UCSC Genome
Bioinformatics Site (http://genome.ucsc.edu/; table
‘‘snp128OrthoPanTro2RheMac2’’; Karolchik et al. 2008).
Extreme values for the used statistics can also be the result
of nonselective events such as demographic changes and
genetic drift. As these events act on the genome randomly,
in contrast to positive selection, which targets specific genes,
we adopted an outlier approach to infer the action of positive
selection (Kelley et al. 2006). Once scores from the three
methods were calculated for all SNPs in the genome, we computed empirical P values for the SNPs in the studied genomic
regions as in Kelley et al. (2006). Under this approach, the P
value corresponds to the probability that a value randomly
drawn from the whole genome is more extreme than the
value observed at the SNP of interest. If the locus of interest
shows an outlier value (i.e., less than 5% of the SNPs from the
background present a greater value than the SNP of interest),
positive selection is invoked. Because differentiation indexes
(Beaumont and Nichols 1996; Barreiro et al. 2008; Gardner
et al. 2008; Jost 2008) strongly correlate with heterozygosity,
we compared these indexes with the ones observed at loci
with similar MAF values. For that purpose, we divided the
background SNP set into bins of 10,000 SNPs of similar
MAF values, and P values for each SNP were computed as
the proportion of SNPs in the same bin with a greater score.
Having calculated P values separately for each SNP of
interest, we integrated results for all SNPs within each of
the studied genomic regions. For that purpose, we
performed the Fisher’s combination test (Zaykin et al.
2008) for each genomic region, geographic area, and
K
P
method. The statistic is computed as ZF 5 2 logPi ,
i51
where K is the number of SNPs in the studied genomic
region and Pi is the empirical P value for the SNP i. This
statistic follows a v2 distribution with 2K degrees of
freedom. This method shows a good performance for
combining information from SNPs even if they are not
completely independent (Peng et al. 2009). Moreover,
the HGDP-CEPH data set was generated using Illumina
arrays that mostly contain tag SNPs, which capture most
of the information from all SNPs in the region by linkage
disequilibrium, and hence present a low level of linkage
disequilibrium among them. After obtaining the P values
from this Fisher’s combination test, we evaluated their
1381
Luisi et al. · doi:10.1093/molbev/msr298
significance taking into account the whole-genome
context. For that purpose, we used a genomic region-level
background containing the 65 genomic regions encoding
the insulin/TOR pathway plus a set of 6,329 nonoverlapping regions centered on autosomal genes distributed
across the genome. The complete background gene set obtained thus includes 6,394 genomic regions and 386,401
SNPs. To avoid overlap among background genomic regions, they were selected in each autosomal chromosome
as follows: beginning from the gene at the lowest physical
position, genes located at a minimum distance of 200 Kb
from the previous one were included until reaching the end
of the chromosome. From this set of genes, the genomic
regions including the genes plus 200 flanking kilobases were
considered for the analysis, as genomic regions were constructed for insulin/TOR pathway genes. For each of these
background genomic regions, we also computed a P value
using the Fisher’s combination test. P values were corrected
for multiple testing using the false discovery rate approach
(Benjamini and Hochberg 1995). Genomic regions were
considered to have evolved under positive selection if they
had a corrected P value lower than a given threshold in any
of the studied regions for iHS and, at least, one of the genetic differentiation-based methods used (i.e., either FST or
DDAF). Since we are considering three methods and seven
geographic regions, we used a highly conservative threshold
of 2.381 103 (i.e., 0.050/[3 7]).
Network-Level Analysis
We evaluated whether the structure of the insulin/TOR
pathway has an effect on the molecular evolution of its
components. For that purpose, we considered whether
1) there is a polarity in the impact of positive selection
along the upstream/downstream axis of the pathway; 2)
the incidence of positive selection is different for central
and peripheral genes in the pathway; and 3) genes encoding physically interacting proteins evolve under similar
selective pressures.
For these analyses, we used the graph constructed by
Alvarez-Ponce, Aguade et al. (2011), which we expanded
by adding an additional node representing insulin and
the insulin-like growth factors I and II (IGF1 and IGF2),
and an arch representing the activation of the insulin
and IGF1 receptors (InR and IGF1R) by these hormones.
As in Alvarez-Ponce, Aguade et al. (2011), we eliminated
from network-level analyses: 1) genes EIF4E2 and EIF4E3,
as their encoded products likely act as negative regulators
of insulin signaling (Joshi et al. 2004; Hernandez et al. 2005);
2) PTEN, as its encoded product does not interact with any
other protein in our data set (for a review, see Vinciguerra
and Foti 2006); and 3) genes CYTH1–4, whose encoded
products occupy an unclear position in the pathway (Fuss
et al. 2006; Hafner et al. 2006). Hence, network-level analyses were conducted on a total of 60 genes located at 58
genomic regions; and the final graph consists of 22 nodes
(representing groups of paralogous genes) connected by 40
arcs (interactions), of which 33 are physical (fig. 1).
1382
MBE
We first tested for a polarity in the action of positive
selection along the pathway by comparing pathway
positions of genes evolving under positive selection with
positions of genes without signals of adaptive evolution.
For each gene, pathway position, defined as the number
of steps required for the transduction of the signal from
InR and IGF1R (position 0) to each of the components
of the pathway (positions 1–10), was computed as in
Alvarez-Ponce, Aguade et al. (2011). Genes INS, IGF1,
and IGF2 were assigned the position 1, as their encoded
products act immediately upstream of these receptors (see
fig. 1).
Second, we evaluated whether genes evolving under
positive selection differ from those without signals of adaptive evolution in terms of a number of metrics of centrality
within the network: connectivity, betweenness and closeness (table 1). For each node, connectivity was computed
as the number of nodes to which it is connected, betweenness was calculated as the number of shortest paths passing
through it, and closeness as the inverse of the average
length of the shortest paths to all the other nodes in
the network. These metrics were computed using the Network Analysis plug-in for Cytoscape (Smoot et al. 2011),
taking into account physical interactions among the studied proteins only. Additionally, these centrality measures
were also computed taking into account the entire human
interactome constructed by Bossi and Lehner (2009) and
removing self-interactions (data not shown).
Finally, we investigated whether genes evolving under
positive selection encode proteins that tend to interact
to each other. To that end, we performed a Monte Carlo
test using as statistic the number of interactions involving
two paralogous groups (nodes) with at least one gene
evolving under positive selection (X). The statistical significance of X was evaluated by comparing the observed values with those calculated on a set of 10,000 randomizations
of the network, each with the same nodes and the same
number of interactions as the original one. We used
two different randomization techniques. In method 1,
each interaction was assigned by randomly choosing
two different nodes (method 1). Additionally, we used
a switching algorithm, which preserves the connectivity
of each particular node (method 2). Each random network was generated from the original one by repeatedly
choosing two interactions at random (e.g., A–B and C–D)
and exchanging the edges (yielding A–D and C–B or A–C
and B–D). In each random network, this process was iterated 100 m times, where m is the number of edges.
P values were computed as the proportion of randomized
networks with an X value equal to or higher than the
observed one. We also performed these analyses using
as statistic the number of interactions involving the
encoded products of two particular genes, rather than paralogous groups, evolving under positive selection (X#).
For these analyses, we assumed that, if two groups of paralogs interact, all proteins encoded by genes in one group
interact with all proteins encoded by genes in the other
group.
Network-Level and Population Genetics Analysis · doi:10.1093/molbev/msr298
MBE
FIG. 1. Structure of the insulin/TOR signal transduction pathway. Each node represents a paralogous group. Blue nodes contain genes that
activate the pathway, whereas red nodes contain pathway inhibitors. Within each paralogous group, genes are represented as circles with the
number corresponding to genes as listed below (for more details, see supplementary table S1, Supplementary Material online). White circles
represent genes included in our network-level analysis, whereas circles in dark gray represent genes that were excluded. The nine genes for
which we observed the signature of positive selection are represented by diamonds. Black, blue, and red edges represent protein–protein,
metabolic, and transcriptional interactions, respectively. Gray gradations of the background represent the different cellular compartments
where the proteins act in the activated pathway (from lighter to darker gray: extracellular action, plasma membrane, cytosol, and nucleus).
Numbers on the lower part refers to pathway position (i.e., the number of steps required for signals transduction from the insulin receptor).
Gene code: 1, IGF2; 1#, INS; 2, IGF1; 3, IGF1R; 4, INSR; 5, DOK1; 6, IRS1; 7, DOK3; 8, FRS3; 9, DOK2; 10, FRS2; 11, IRS2; 12, DOK4; 13, DOK6; 14,
DOK5; 15, PIK3R3; 16, PIK3R1; 17, PIK3R2; 18, PIK3CD; 19, PIK3CB; 20, PIK3CA; 21, VEPH1; 22, PDPK1; 22#, AC141586.2; 23, AKT3; 24, AKT1; 25,
AKT2; 26, PRKCZ; 27, PRKCE; 28, PRKCD; 29, PRKCI; 30, PRKCQ; 31, PRKCH; 32, PRKCB; 33, PRKCA; 34, PRKCG; 35, TSC1; 36, AC093151.2; 37,
FOXO3; 38, FOXO1; 39, GSK3B; 40, GSK3A; 41, TSC2; 42, EIF2B5; 43, GYS2; 44, GYS1; 45, MYCL1; 46, MYCN; 47, MYC; 48, RHEB; 49, RHEBL1; 50,
MTOR; 51, EIF4EBP3; 52, EIF4EBP1; 53, EIF4EBP2; 54, RPS6KB2; 55, RPS6KB1; 56, EIF4E2; 57, EIF4E3; 58, EIF4E; 59, EIF4E1B; 60, RPS6; 61, CYTH3; 62,
CYTH1; 63, CYTH2; 64, CYTH4; 65, PTEN.
Action of Positive Selection on Paralogous Genes
To contrast whether genes belonging to the same paralogous group are similarly targeted by positive selection, we
used a Monte Carlo method using as statistic the number
of pairs of paralogs in which both exhibit signals of positive
selection (Y). We tested whether the observed Y value is
higher than expected at random from 10,000 random sets
of paralogous groups. Each randomization had the same 65
genes and the same number of pairs of paralogs (112 pairs).
Each pair of paralogs was assigned by randomly choosing
two different genes.
Results
High Incidence of Positive Selection in the Insulin/
TOR Pathway
We used population genetics data from seven human
worldwide geographic areas to study the patterns of molecular evolution of the genomic regions containing 67 autosomal genes involved in the insulin/TOR signal
transduction pathway. In total, we analyzed 4,442 SNPs
spanning 19.3 Mb of the human genome (supplementary
table S1, Supplementary Material online). After removing
SNPs with MAF lower than 5% in the entire data set,
4,005 SNPs were used in the study, providing an average
density of approximately 1 SNP/5 Kb.
We combined three complementary methods (iHS, FST,
and DDAF) to infer the action of positive selection in these
genomic regions. Supplementary fig. S1–S3, Supplementary
Material online show, for each continental group, the iHS,
FST, and DDAF scores, respectively, for the SNPs located in
large genomic regions centered on the genes of interest.
These plots allow comparing observations for SNPs within
the gene and nearby with SNPs in the surrounding region,
thus, they provide a better visualization of potential signals
of positive selection. Using a stringent criterion (see Materials
and Methods), we identified footprints of potential positive
selection in nine genes belonging to five different groups of
paralogs (table 2, supplementary tables S4–S6, Supplementary
Material online). For six of these genes, the signature of
positive selection was observed in a single geographic area,
whereas for the three other genes positive selection has acted
in two distinct geographic regions, always from Middle East
and North Africa, Europe, or Central South Asia regions.
First, we observed the footprint of positive selection for
three genes encoding docking proteins (DOK1, DOK2, and
DOK5), belonging to the IRS paralogous group. DOK1
fulfills our criterion in Middle East and North Africa
1383
MBE
Luisi et al. · doi:10.1093/molbev/msr298
Table 1. Metrics of Network Position Taking Into Account
Protein–Protein Interactions.
Paralogous
Group
INS
InR
IRS
p85
p110
MELT
PDK1
PKB
PKC
TSC1
FOXO
GSK3
TSC2
EIF2BE
GYS
MYC
RHEB
TOR
4EBP
S6K
EIF4E
RPS6
Pathway
Positiona
21
0
1
2
3
4
4
5
5
5
6
6
6
7
7
7
7
8
9
9
10
10
Connectivity
1
4
6
3
1
2
3
7
7
2
2
5
3
1
1
2
2
5
2
4
2
1
Closeness
0.304
0.429
0.525
0.375
0.276
0.296
0.429
0.553
0.525
0.300
0.382
0.438
0.396
0.309
0.309
0.368
0.356
0.477
0.350
0.396
0.375
0.288
Betweenness
0.000
0.121
0.247
0.095
0.000
0.005
0.038
0.400
0.265
0.010
0.081
0.201
0.108
0.000
0.000
0.000
0.014
0.155
0.013
0.105
0.023
0.000
a
Position across the upstream/downstream axis of the pathway, defined as the
number of steps required for signal transduction from the insulin receptor (InR).
and in Europe and presents also a significantly high
genetic differentiation between East Asia and the rest
of the regions according to both the FST and the DDAF
scores. DOK2 exhibits the signature of positive selection in
Central South Asia, as well as significantly high FST and
DDAF scores between Sub-Saharan Africa and the other
regions. We also observed significant iHS at DOK2 in Middle East and North Africa and in Europe. DOK5 meets the
criterion in Middle East and North Africa and in Central
South Asia and shows a significant degree of genetic differentiation in Sub-Saharan Africa (for both the FST and
the DDAF scores), Europe (DDAF), and East Asia (FST).
From the PKB group, AKT1 and AKT3 satisfy our criterion
in Europe and Middle East and in North Africa, respectively. Both genes also exhibit a significantly high FST score
(AKT1 for East Asia and AKT3 for Sub-Saharan Africa).
Moreover, we detected signatures of positive selection
for two genes that belong to the PKC paralogous group:
PRKCI (in Oceania) and PRKCH (in Middle East and North
Africa and in Europe). For the latter gene, we also identified an important degree of genetic differentiation between Sub-Saharan Africa and the other regions and
a significantly high iHS in Central South Asia and in East
Asia. RPS6KB2, from the S6K group, exhibits signatures of
adaptive evolution in Central South Asia and a significantly high relevant genetic differentiation of the SubSaharan Africa area, as well as significant iHS in Middle East
and North Africa and in East Asia. Finally, we observed the
signature of positive selection in East Asia for EIF4E.
It is interesting to note that for the seven genes for
which we observed signals of positive selection in Middle
1384
East and North Africa, Europe, or Central South Asia
(DOK1, DOK2, DOK5, AKT1, AKT3, PRKCH, and RPS6KB2),
the selected haplotypes seem to have spread through all
three areas. Indeed, in most cases, these genes meet our
criterion in two geographic regions or at least exhibit
a statistically significant iHS in two or three areas. Moreover, distributions along the genomic regions of the iHS
at the SNP level are very similar in these geographic areas
(e.g., see DOK1 and DOK2; supplementary fig. S1, Supplementary Material online). We carried out a complementary
analysis using the software Sweep 1.1 (Sabeti et al. 2002),
which indicates that the haplotypes for which we detect a
selective sweep are shared between the three areas (see
supplementary results, Supplementary Material online),
which is consistent with a genome-wide scan performed
by Pickrell et al. (2009).
We considered whether there is an enrichment in genes
with signals of positive selection among the insulin/TOR signal transduction pathway genes as compared with the set of
6,394 genomic regions that compose our genomic background. To that end, we performed a hypergeometric test
in each geographic area separately (table 3). We observed
that there is an overrepresentation of genes with the signature of positive selection in the insulin/TOR pathway genes
in Middle East and North Africa (four genes under positive
selection; P 5 0.0033), Europe (three genes; P 5 0.0167), and
Central South Asia (three genes; P 5 0.0184).
Influence of Network Structure on Positive
Selection
Having identified the genes involved in the insulin/TOR signal transduction pathway that likely evolved under positive
selection (table 2), we investigated the relationship between the patterns of evolution of the studied genes
and the position that their encoded products occupy in
the network. For that purpose, we first evaluated whether
the nine genomic regions showing footprints of positive
selection differed from the 49 remaining ones in terms
of centrality within the network and position along the upstream/downstream pathway axis. The position of genes
along the pathway (defined as the number of steps
required to transduce the signal from the insulin/IGF1
receptor—position 0—to the remaining elements in the
pathway) does not significantly differ between both gene
groups (Mann–Whitney’s U test, P 5 0.816; fig. 2A).
Remarkably, genes evolving under positive selection
encode proteins that have a significantly higher connectivity (P 5 0.0102; fig. 2B), closeness (P 5 0.0042; fig. 2C), and
betweenness (P 5 0.0057; fig. 2D) within the insulin/TOR
network. These results remain significant when all kinds of
interactions (i.e., not only protein–protein interactions but
also metabolic and transcriptional activation interactions)
are considered (table 4). Thus, the impact of positive selection on the insulin/TOR pathway genes is related to the
centralities of their encoded products in the interaction
network, with genes acting at the center of the network
being more likely to evolve under positive selection.
MBE
Network-Level and Population Genetics Analysis · doi:10.1093/molbev/msr298
Table 2. List of Genomic Regions with Footprints of Positive Selection.
Paralogous group
IRS
Gene
DOK1
DOK2
#SNPs
11
13
ZFa
96.2
68.1
Pb
2.914 3 10211
1.244 3 1025
q-valueb
2.245 3 1029***
2.733 3 1024**
iHS
FST
DDAF
11
13
10
101.0
93.9
63.1
4.342 3 10212
1.283 3 1029
2.310 3 1026
3.985 3 10210***
8.428 3 1028***
7.510 3 1025**
EASIA
FST
DDAF
13
11
99.9
79.4
1.330 3 10210
2.009 3 1028
1.318 3 1028***
1.016 3 1026***
SSAFR
FST
DDAF
49
41
162.7
146.1
4.375 3 1025
1.715 3 1025
7.544 3 1024*
4.80 3 1024*
MENA
iHS
45
213.5
4.815 3 10212
4.554 3 10210***
Geographic area
MENAc
Method
iHS
FST
EURc
EUR
3.677 3 10
8.294 3 1026
2.295 3 10231***
2.271 3 1024**
SSAFR
FST
DDAF
89
80
409.0
285.4
3.186 3 10220
4.178 3 1029
3.977 3 10217***
6.361 3 1027***
MENAc
iHS
DDAF
88
83
419.3
241.1
6.75 3 10222
1.255 3 1024
1.831 3 10219***
2.244 3 1023*
DDAF
79
237.4
4.478 3 1025
8.471 3 1024*
iHS
FST
DDAF
72
89
71
310.6
326.8
270.2
2.863 3 10
7.289 3 10211
5.026 3 10210
2.792 3 10212***
1.011 3 1028***
4.753 3 1028***
EASIA
FST
89
276.9
2.886 3 1026
7.665 3 1025***
MENAc
iHS
FST
DDAF
25
25
24
95.0
98.7
89.4
1.273 3 1024
4.874 3 1025
2.671 3 1024
2.089 3 1023 *
8.717 3 1024*
4.194 3 1023*
EASIA
FST
25
100.1
3.391 3 1025
6.12 3 1024*
25
S6K
c
FST
58
185.4
4.542 3 10
7.75 3 1024*
EURc
iHS
FST
57
58
230.1
192.1
7.439 3 10210
1.136 3 1025
4.552 3 1028***
2.445 3 1024**
SSAFR
FST
DDAF
132
116
508.3
365.2
1.190 3 10217
5.331 3 1028
1.661 3 10214***
4.497 3 1026***
MENAc
iHS
DDAF
117
116
749.4
369.6
2.803 3 10255
2.307 3 1028
5.832 3 10252***
1.398 3 1026**
EURc
iHS
DDAF
115
112
808.6
369.7
2.063 3 10265
3.029 3 1029
1.287 3 10261***
2.224 3 1027***
CSASIA
iHS
125
430.8
9.120 3 10212
6.779 3 10210***
PRKCI
OCEc
iHS
FST
DDAF
25
29
24
117.7
135.4
103.5
2.179 3 1027
3.956 3 1028
5.947 3 1026
8.343 3 1026***
1.614 3 1026***
1.360 3 1024**
RPS6KB2
SSAFR
FST
DDAF
20
17
112.5
91.6
8.095 3 1029
3.471 3 1027
5.319 3 1027***
2.167 3 1025***
iHS
16
83.6
1.730 3 1026
5.043 3 1025**
PRKCH
MENA
CSASIA
EASIA
eIF-4E
214
SSAFR
AKT3
PKC
2.879 3 1024**
234
1.382 3 10
357.4
170.3
CSASIA
AKT1
160.2
44
49
c
EUR
PKB
46
iHS
FST
CSASIA
DOK5
iHS
25
EIF4E
EASIAc
U
3
3
3
3
212
iHS
FST
DDAF
iHS
15
20
15
17
116.8
100.8
68.8
118.5
3.546
3.746
7.002
2.780
10
1027
1025
10211
iHS
FST
DDAF
33
42
33
121.9
163.3
125.5
3.452 3 1025
5.059 3 1027
1.400 3 1025
2.766
1.780
1.388
1.059
3
3
3
3
10210***
1025***
1023*
1029***
6.883 3 1024*
1.755 3 1025***
3.055 3 1024**
NOTE. N-Genes were considered to have evolved under positive selection in a given geographic area if q-value , 2.381 103 for the iHS test and for either the FST or
DDAF tests (see Materials and Methods). Geographic areas for which any of these scores is significant are also displayed.
SSAFR, Sub-Saharan Africa; MENA, Middle East and North Africa; EUR, Europe; CSASIA, Central South Asia; EASIA, East Asia; OCE, Oceania.
a
Fisher’s combination test statistic computed from the empirical P-values obtained for SNPs within each genomic region.
b
Significance associated to ZF.
c
Geographic areas for which the gene is considered to have evolved under positive selection.
*q-value , 2.381 103 (significance at 5% with Bonferroni correction for seven regions studied with three methods).
**q-value , 4.762 104 (significance at 1% with Bonferroni correction).
***q-value , 4.762 105 (significance at 1% with Bonferroni correction).
1385
MBE
Luisi et al. · doi:10.1093/molbev/msr298
Table 3. Enrichment Analysis for Signals of Positive Selection
among the Insulin/TOR Pathway Genes.
Geographic Area
SSAFR
MENA
EUR
CSASIA
EASIA
AME
OCE
Allf
Na
6389
6394
6384
6393
6365
6274
6308
6242
Sb
18
100
97
100
151
129
136
631
mc
65
65
65
65
65
65
65
65
Kd
0
4
3
3
1
0
1
9
Pe
0.168
0.003**
0.017*
0.018*
0.459
0.743
0.411
0.116
NOTE.—SSAFR, Sub-Saharan Africa; MENA, Middle East and North Africa; EUR,
Europe; CSASIA, Central South Asia; EASIA, East Asia; AME, America; OCE,
Oceania.
a
Genes in the genomic region-level background distribution for which a score has
been computed.
b
Genes evolving under positive selection in the genomic region-level background
distribution.
c
Genes in the pathway for which a score has been computed.
d
Genes evolving under positive selection in the pathway.
e
Statistical significance of the Hypergeometric test.
f
Union of all geographic regions.
*P , 0.05; **P , 0.01.
Some gene features such as expression level, expression breadth (the number of different tissues in which
a gene is expressed), and the length of the encoded
proteins can also affect the action of natural selection
on genes (Duret and Mouchiroud 2000; Pal et al.
2001; Subramanian and Kumar 2004; Kosiol et al.
2008). Therefore, a putative correlation between these
features and connectivity measures could potentially
account for the observed trend. In order to rule out this
possibility, we used partial correlation analysis to test for
the association between positive selection (which we encoded as a binary variable) and centrality measures while
controlling for the aforementioned variables simultaneously (see Hardy 1993; Cohen et al. 2003). Gene expression levels and breadths and the length of the
encoded proteins were obtained from Alvarez-Ponce,
Aguade et al. (2011). This analysis shows that the relationship between the patterns of positive selection
and the centrality measures remains significant after
controlling for these parameters (Spearman’s partial
rank correlation coefficient, q 5 0.397, P 5 0.005 for connectivity; q 5 0.417, P 5 0.003 for closeness; and q 5
0.412, P 5 0.004 for betweenness). Therefore, the observed relationship between the pathway structure
and the impact of positive selection is not attributable
to confounding factors that influence gene evolution.
7.5
7.5
7.0
7.0
6.5
6.5
6.0
6.0
Connectivity
Position
5.5
5.0
4.5
4.0
5.5
5.0
4.5
3.5
4.0
3.0
3.5
2.5
2.0
Mean
Mean±SE
Mean±1.96*SE
NS
3.0
2.5
S
0.56
NS
S
Mean
Mean±SE
Mean±1.96*SE
0.34
0.32
0.54
0.30
0.52
0.28
0.26
0.24
0.48
Betweenness
Closeness
0.50
0.46
0.44
0.18
0.16
0.42
0.14
0.40
0.12
0.38
0.36
0.22
0.20
Mean
Mean±SE
Mean±1.96*SE
NS
S
0.10
0.08
0.06
NS
S
Mean
Mean±SE
Mean±1.96*SE
FIG. 2. Comparison between genes evolving under positive selection and the remaining ones. Mean and standard error (SE) of topological
measures for the nine genes under positive selection (S) and the 49 genes without signals of adaptive evolution (NS): (A) for pathway position;
(B) for connectivity; (C) for closeness; and (D) for betweenness. Centrality measures were calculated taking into account the physical
interactions within the insulin/TOR pathway.
1386
MBE
Network-Level and Population Genetics Analysis · doi:10.1093/molbev/msr298
Table 4. Relationship between the Structure of the Insulin/TOR Pathway and the Impact of Positive Selection.
Mann–Whitney test
Networkb
—
(i)
(ii)
(iii)
Parameter
Position
Connectivity
Closeness
Betweenness
Connectivity
Closeness
Betweenness
Connectivity
Closeness
Betweenness
Mean for
genes under
positive selection
4.667
5.778
0.500
0.244
6.889
0.536
0.188
35.444
0.291
0.0115
Mean for genes
without signal of
positive selection
4.673
3.612
0.410
0.116
4.571
0.455
0.098
38.087
0.294
0.006
U
209.5
103
91
95.5
116
101
101.5
186
179
153
P
0.816
0.010*
0.004**
0.006**
0.024*
0.009**
0.009**
0.646
0.538
0.228
Partial correlationa
r
20.203
0.397
0.417
0.412
0.432
0.450
0.419
20.132
20.170
20.074
P
0.167
0.005**
0.003**
0.004**
0.002**
0.001**
0.003**
0.372
0.250
0.618
a
Spearman’s partial correlation between the impact of positive selection and network parameters controlling for gene expression level and breadth and length of the
encoded proteins (see Materials and Methods).
b
Three different sets of interactions were used for centrality calculations: (i) protein-protein interactions within the insulin/TOR pathway; (ii) all kinds of interactions
within the pathway (i.e., protein-protein, metabolic, and transcriptional activation interactions); and (iii) protein-protein interactions from the whole interactome.
*P, 0.05; **P, 0.01.
Likewise, partial correlation analysis confirms the absence of
association between the pathway position and the action of
positive selection (q 5 0.203, P 5 0.167).
We also considered the effect on our observations of the
threshold used to contrast whether genes evolved under
positive selection (supplementary table S7, Supplementary
Material online). The connectivity, closeness, and betweenness within the insulin/TOR network remain significantly
higher for genes that meet the criterion to invoke positive
selection when using a significance level of 0.01. If we reduce this level to 0.001, the differences of centrality measures are still higher for the genes that exhibit the footprint
of positive selection, even if most of these differences do
not reach significance. This can be attributed to a lack
of power since we detected signatures of selection for only
five of the former nine genes while using such a low significance level. Nevertheless, these results strongly support
the association between the impact of positive selection
on genes and the central position they occupy in the
insulin/TOR pathway.
We additionally studied the differences in centrality between the nine genes evolving under positive selection and
the nonselected ones taking into account the entire interactome (Bossi and Lehner 2009) rather than interactions
among the insulin/TOR pathway components only. None
of the three centrality measures presents significant differences between both gene groups (table 4). Equivalent results were obtained using partial correlation analysis to
account for expression breadth, expression level, and
protein length (table 4).
Finally, we considered whether the insulin/TOR pathway
proteins encoded by genes evolving under positive selection tend to interact to each other. Of 33 physical interactions (fig. 1), five connect two nodes containing one or
more genes evolving under positive selection (X 5 5).
Of 10,000 randomizations of the network, only 128 exhibit
an X value higher or equal to the observed one using
method 1 (P 5 0.0128). Therefore, genes evolving under
positive selection in the pathway are more likely to encode
proteins that interact to each other than expected in
a random network. However, this tendency vanishes when
a randomization technique that preserves the degree distribution of each node is applied (method 2; P 5 0.370),
suggesting that the observed excess of interactions between proteins encoded by genes evolving under positive
selection is the result of the higher degree of these proteins.
We additionally performed a similar analysis considering
a network in which nodes represent particular proteins,
rather than groups of proteins encoded by paralogous
genes. This network contains 58 nodes connected by
420 interactions, of which 21 connect two proteins encoded by genes with signals of positive selection. This value
is again higher than expected in a random network using
method 1 (X# 5 21; P 5 0.0001), but not using method 2
(P 5 0.325).
Influence of Paralogy on Positive Selection
We considered whether the probability that a gene evolved
under positive selection is related to the number of paralogous copies. The number of paralogs involved in the
insulin/TOR pathway (supplementary table S1, Supplementary Material online) does not significantly differ
between the nine genomic regions with footprints of positive
selection and the 49 remaining ones (Mann–Whitney’s U
test, P 5 0.061). Consequently, the impact of positive selection on the insulin/TOR pathway genes seems to be independent of the size of the paralogous group.
We contrasted whether genes in the same paralogous
group are similarly targeted by natural selection by using
a Monte Carlo method to test whether the number of pairs
of paralogs where both genes show the signature of positive
selection (Y 5 5) is higher than expected at random. We
observed a greater or equal Y value for only 466 permutations of 10,000 (P 5 0.0466). Therefore, when a gene
evolves under positive selection, the probability that its paralogs also show signatures of adaptation increases.
1387
Luisi et al. · doi:10.1093/molbev/msr298
Discussion
High Incidence of Positive Selection in the Insulin/
TOR Signal Transduction Pathway Genes
The present analysis aims at detecting recent adaptive
events (within the last 5,000–45,000 years) in human populations worldwide at genes involved in the insulin/TOR
signal transduction pathway and relating these events to
the structure of the pathway. To detect footprints of positive selection, we used methods devised to identify long
high-frequency haplotypes (iHS) and high levels of genetic
differentiation among populations (FST and DDAF). However, processes such as genetic drift and demographic history can also affect differentiation among populations and
linkage disequilibrium patterns, making difficult to identify
signals of natural selection. In particular, dramatic changes
in population size such as bottlenecks and population expansions occurred during the recent human history, affecting allele frequencies (Teshima et al. 2006). Moreover,
populations from America and Oceania were isolated
and had small effective population sizes, and consequently
must have faced dramatic genetic drift, resulting in important allele frequency fluctuations that affected the entire
genome. Since nonselective processes affect the whole genome, in opposition to positive selection, which acts on
specific loci, the putative effect of these factors can be ruled
out by comparing the locus of interest with an empirical
distribution built from all SNPs in the genome. However, it
should be noted that genomic regions potentially evolving
under positive selection detected using this approach
might also represent false positives (Teshima et al. 2006;
Pickrell et al. 2009). Therefore, we used a stringent criterion
to reduce the amount of false positives raised by alternative
explanations and to maximize the detection of specific
footprints of positive selection. For that purpose, we
combined the results from different methods based on extended haplotype homozygosity and genetic differentiation, which are expected to be differentially affected by
nonselective processes, thus providing nonoverlapping
false negatives. However, despite these precautions, demographic factors can increase the variance among genes, and
thus, the possibility of false positives cannot be completely
discarded.
The polymorphism data that we used is based on SNPs
that were previously ascertained on a reduced number of
HapMap samples from Africa, Europe, and East Asia.
Therefore, the allele frequency spectrum observed is
skewed toward high frequencies, to a different extent
in the seven geographic regions. To reduce this variation
among geographic regions, we removed rare variants by
applying a filter according to the MAF. Moreover, the
methods that we used, which are based on genetic differentiation and long-range haplotypes, reduce the effect
of ascertainment bias on our results as they do not
require low-frequency variants. Indeed, Romero et al.
(2009) showed that this bias has a moderate effect to
study population differentiation using SNP data from
HGDP-CEPH; and Voight et al. (2006) claimed that
1388
MBE
the iHS method is robust to regional variation in SNP
ascertainment.
Among the 65 studied genomic regions, we identified
a total of nine that have potentially evolved under positive
selection in human populations (table 2). The nine insulin/
TOR pathway genes that fall within these genomic regions
belong to five paralogous groups: IRS (insulin receptor substrate), PKB (protein kinase B), PKC (protein kinase C), S6K
(S6 kinase), and eIF-4E (eukaryotic translation initiation factor 4E). This relatively high incidence of adaptive evolution
contrasts with a previous comparative genomics analysis
showing little evidence of positive selection in the vertebrate insulin/TOR pathway (Alvarez-Ponce, Aguade et al.
2011). On the other hand, genes detected as evolving under
positive selection in both analyses remarkably overlap. In
vertebrates, footprints of positive selection were detected
at AKT3, PRKCD, and IRS4 (the latter of which was excluded
from the present analysis), which belong to the PKB, PKC,
and IRS paralogous groups, respectively (Alvarez-Ponce,
Aguade et al. 2011). Interestingly, our analyses revealed that
AKT3 also shows a pattern of positive selection and that
the PKC and IRS paralogous groups include two and three
genes, respectively, with signatures of positive selection.
Consistently, in the present study, we found that, in the
insulin/TOR pathway, if a gene is targeted by recent positive selection, the frequency at which genes in the same
paralogous group show signals of positive selection is
higher than expected at random. This observation could
potentially be explained by paralogous genes encoding proteins with similar structures and functions and occupying
similar positions in the pathway. Taken together, these results suggest that positive selection repeatedly targets specific groups of paralogous genes within the insulin/TOR
pathway.
A close examination of the gene content of the regions
that evolved under positive selection, together with an
analysis of the levels of linkage disequilibrium among SNPs,
show that some of them also contain other genes linked to
the genes of interest that might be responsible for the observed signal of positive selection (supplementary fig. S4,
Supplementary Material online). Two of the studied genomic regions (DOK1 and RPS6KB2) contain a high number of
additional genes with a high level of linkage disequilibrium,
and thus, the signals of selection detected in these regions
cannot be clearly ascribed to the insulin/TOR pathway
genes. However, excluding these two genomic regions from
all our analyses does not change the observed relationship
between the pathway structure and the impact of positive
selection (supplementary table S8, Supplementary Material
online).
The distribution of footprints of positive selection
among the different geographic areas shows that positive
selection in the insulin/TOR pathway genes occurred
mainly in Middle East and North Africa, Europe, and
Central South Asia (table 3). In these regions, the proportion of genes evolving under positive selection in the insulin/TOR pathway is significantly higher than that observed
in the genomic background, whereas in other geographic
Network-Level and Population Genetics Analysis · doi:10.1093/molbev/msr298
regions, we observe a very low incidence of positive selection (table 3). The same tendency is observed if DOK1 and
RPS6KB2 are excluded: the test remains significant in Middle East and North Africa, and it is marginally significant in
Europe and in Central South Asia (supplementary table S9,
Supplementary Material online). It is worth noting that
East Asia, America, and Oceania are the regions in which
we detected a highest incidence of positive selection across
our genomic background containing 6,394 genomic regions. This suggests that the absence of signals of positive
selection in the insulin/TOR pathway genes in these three
geographic areas cannot be attributed to a difficulty of detecting selective events in these areas. More problematic is
the case of Sub-Saharan Africa, where we detected very few
genomic regions that evolved under positive selection
across the genome (18 regions of 6,394). This could be
due to the general low linkage disequilibrium observed
in these populations. Indeed, in these populations 1) hot
spots are more uniformly distributed along the genome
(1000 Genomes Project Consortium 2010); and 2) historically, recombination has impacted the genome to a greater
extent because they have a higher effective population size,
and they did not face any strong bottleneck like other populations. In a previous study of the nucleotide diversity of
the insulin minisatellite (INS VNTR) region in another set of
populations from Sub-Saharan Africa, Europe, and Asia,
a pattern similar to the one observed for the other insulin/TOR pathway genes was observed, that is, potential selective events in non-African populations (Stead et al.
2003). Moreover, we found that the haplotypes that have
undergone a selective sweep are shared among populations
from Middle East and North Africa, Europe, and Central
South Asia (see supplementary results, Supplementary Material online). Interestingly, most of those potentially
shared sweeps also show significant FST scores between
Sub-Saharan Africa and the other regions (table 2), which
can be the result of the high proportion of lower DAF values observed in Sub-Saharan Africa compared with the
other areas (negative DDAF; results not shown). Therefore,
those signals of genetic differentiation could also be a consequence of the shared signatures in non-African populations, rather than evidence for selection in Africans. This
suggests that positive selection acted early during the migration out of Africa (Stead et al. 2003) and that shared
evolutionary history (Gutenkunst et al. 2009; Moorjani
et al. 2011) accounts for the similar patterns observed in
these three areas.
It has been observed that, in the United States, the
prevalence of obesity and related diseases such as insulin
resistance and type 2 diabetes is much lower in individuals with European and Asian ancestry (compared with
Native American, Hispanic, Afro-American, and Pacific
Islander groups) (Wang and Beydoun 2007; Anderson
and Whitaker 2009). This observation underlines the
genetic basis of these diseases, also supported by a number of heritability studies (Rankinen et al. 2006; Mathias
et al. 2009; O’Rahilly 2009; Wang et al. 2009; Hinney
et al. 2010). The implication of the insulin/TOR signal
MBE
transduction pathway is clear since dysfunction of its
components often lead to insulin resistance, diabetes,
obesity, and cancer (Oldham and Hafen 2003; LeRoith
et al. 2004; Taguchi and White 2008). The enrichment
of adaptation events at genes involved in the insulin/TOR
pathway in European, Central South Asian, and North African and Middle Eastern populations could potentially be related to the differences among populations in the prevalence
of these metabolic disorders, although a direct link to specific phenotypes cannot be established yet. Thus, further
analysis of the insulin/TOR pathway genes evolving under
positive selection might provide insight into the etiology
of these metabolic disorders.
Distribution of Positive Selection Across the Insulin/
TOR Pathway
In the last decade, the availability of large-scale interaction
data has allowed us to investigate the distribution of evolutionary forces across functional networks. A number of analyses have shown that there is a relationship between the
structure of metabolic, regulatory, and protein–protein interaction networks and the patterns of molecular evolution of
their components (for a review, see Cork and Purugganan
2004; Eanes 2011; Zera 2011). Most of these studies have focused on the levels of selective constraint acting on genes,
which can be inferred from the nonsynonymous to synonymous divergence ratios, dN/dS, obtained from betweenspecies comparisons. Values of dN/dS near to one are expected
for genes evolving neutrally, whereas values lower than one
are indicative of purifying selection maintaining the sequence
of the encoded proteins. Here, we considered whether the
incidence of positive selection in human populations is
related to the structure of the insulin/TOR pathway.
It has been observed that the dN/dS ratios correlate with
the position of genes along the upstream/downstream axis
of the insulin/TOR signal transduction pathway, with
downstream genes exhibiting lower dN/dS values both
in Drosophila (Alvarez-Ponce et al. 2009; Alvarez-Ponce,
Guirao-Rico et al. 2011) and vertebrates (Alvarez-Ponce,
Aguade et al. 2011). In addition to purifying selection,
the dN/dS ratios are also affected by the action of positive
selection, and therefore, this pattern could potentially result from a higher incidence of positive selection in the upstream part of the pathway. However, in the present
intraspecific analysis, we found no evidence for a polarity
in the incidence of positive selection along the pathway (fig.
2A). These results contrast with a number of analyses showing that upstream genes in metabolic and transcription regulatory networks are generally subject to stronger natural
selection (Rausher et al. 1999; Sharkey et al. 2005; Livingstone
and Anderson 2009; Ramsay et al. 2009; Wright and Rausher
2010; Rhone et al. 2011) (but see Montanucci et al. 2011).
Whereas genes whose encoded products occupy more
central positions in a protein–protein interaction network
tend to evolve under stronger purifying selection (Fraser
et al. 2002; Hahn and Kern 2005; Lemos et al. 2005;
Montanucci et al. 2011), positive selection seems to preferentially target genes acting at the periphery of the human
1389
MBE
Luisi et al. · doi:10.1093/molbev/msr298
protein interaction network (Kim et al. 2007). In contrast,
we found that adaptive events are more frequent at genes
acting at the most central positions of the insulin/TOR
pathway. Indeed, proteins encoded by genes evolving under positive selection show higher connectivity, betweenness, and closeness centralities (fig. 2B–D). These most
highly connected and central proteins in the pathway
could potentially exert a higher influence on its function.
These contrasting results indicate that the relationship between positive selection and measures of centrality deserves further consideration. Moreover, the three
centrality measures calculated for the insulin/TOR pathway
are highly correlated (0.90 q 0.98). Therefore, the observed associations between the impact of positive selection and either connectivity, closeness, or betweenness
are not independent and may reflect the same underlying
factors. When centrality measures were calculated from the
entire interactome (Bossi and Lehner 2009) rather than
hand-curated protein–protein interaction data among insulin/TOR pathway proteins (fig. 1), no difference was detected among both gene groups. Therefore, intrapathway
interactions seem to have a more relevant effect on the
impact of positive selection than interactions between
the insulin/TOR pathway genes and the other genes. However, it should be noted that high-throughput interaction
data are subject to a high proportion of false positives and
false negatives (Bader et al. 2004; Deeds et al. 2006), which
might obscure interactome-level analyses.
Finally, we found that genes evolving under positive selection tend to encode proteins that physically interact to
each other. However, this trend vanishes when network
randomization techniques that preserve the degree of each
node are used. Therefore, the tendency is probably the result of genes with signatures of positive selection being involved in more interactions and hence being more likely to
interact to each other. These contrasting results highlight
the relevance of choosing an adequate null model for testing network properties. Further analyses of other pathways
both at the interspecific and the intraspecific levels are warranted to establish whether the patterns observed in the
insulin/TOR pathway are particular to this pathway or represent a general trend across the whole interactome.
Conclusion
We found that positive selection acted on nine of the studied genes involved in the human insulin/TOR signal transduction pathway, mostly in European, Central South Asian,
and North African and Middle Eastern populations. The
impact of positive selection on gene evolution depends
on the position that the encoded products occupy in
the pathway, with the most highly connected and central
genes of the network being preferentially targeted by positive selection. These observations broaden current knowledge on how natural selection distributes across pathways
within an intraspecific framework, more particularly considering short periods of population-specific adaptation.
1390
Supplementary Material
Supplementary results and figures S1–S4 and tables S1–S9
are available at Molecular Biology and Evolution online
(http://www.mbe.oxfordjournals.org/).
Acknowledgments
Bioinformatics services were kindly provided by the Genomic Diversity node, Spanish Bioinformatic Institute. This
work was funded by grants BFU2010-19443 (subprogram
BMC) awarded by Ministerio de Educación y Ciencia
(Spain) and the Direcció General de Recerca, Generalitat
de Catalunya (Grup de Recerca Consolidat 2009 SGR
1101). P.L. is supported by a PhD fellowship from ‘‘Acción
Estratégica de Salud, en el marco del Plan Nacional
de Investigación Cientı́fica, Desarrollo e Innovación
Tecnológica 2008–2011’’ from Instituto de Salud Carlos III.
References
Alvarez-Ponce D, Aguadé M, Rozas J. 2009. Network-level molecular
evolutionary analysis of the insulin/TOR signal transduction
pathway across 12 Drosophila genomes. Genome Res. 19:234–242.
Alvarez-Ponce D, Aguadé M, Rozas J. 2011. Comparative genomics
of the vertebrate insulin/TOR signal transduction pathway:
a network-level analysis of selective pressures. Genome Biol Evol.
3:87–101.
Alvarez-Ponce D, Guirao-Rico S, Orengo DJ, Segarra C, Rozas J,
Aguadé M. 2011. Molecular population genetics of the insulin/
TOR signal transduction pathway: a network-level analysis in
Drosophila melanogaster. Mol Biol Evol. 29:123–132.
Anderson SE, Whitaker RC. 2009. Prevalence of obesity among US
preschool children in different racial and ethnic groups. Arch
Pediatr Adolesc Med. 163:344–348.
Bader JS, Chaudhuri A, Rothberg JM, Chant J. 2004. Gaining
confidence in high-throughput protein interaction networks.
Nat Biotechnol. 22:78–85.
Barreiro LB, Laval G, Quach H, Patin E, Quintana-Murci L. 2008.
Natural selection has driven population differentiation in
modern humans. Nat Genet. 40:340–345.
Barton MD, Delneri D, Oliver SG, Rattray M, Bergman CM. 2010.
Evolutionary systems biology of amino acid biosynthetic cost in
yeast. PLoS One 5:e11935.
Beaumont M, Nichols R. 1996. Evaluating loci for use in the genetic
analysis of population structure. Proc R Soc Lond B Biol Sci.
263:1619–1626.
Benjamini Y, Hochberg Y. 1995. Controlling the false discovery rate:
a practical and powerful approach to multiple testing. J R Stat
Soc. 57:289–300.
Bossi A, Lehner B. 2009. Tissue specificity and the human protein
interaction network. Mol Syst Biol. 5:260.
Cann HM, de Toma C, Cazes L, et al. 2002. (41 co-authors).A human
genome diversity cell line panel. Science 296:261–262.
Codoñer FM, Fares MA. 2008. Why should we care about molecular
evolution? Evol Bioinform Online. 14:29–38.
Cohen J, Cohen P, West S, Aiken L. 2003. Applied multiple
regression/correlation analysis for the behavioral sciences.
Mahwah (NJ): Lawrence Erlbaum Associates.
Cork JM, Purugganan MD. 2004. The evolution of molecular genetic
pathways and networks. Bioessays 26:479–484.
Deeds EJ, Ashenberg O, Shakhnovich EI. 2006. A simple physical
model for scaling in protein-protein interaction networks. Proc
Natl Acad Sci U S A. 103:311–316.
Network-Level and Population Genetics Analysis · doi:10.1093/molbev/msr298
Diamond J. 2003. The double puzzle of diabetes. Nature 423:
599–602.
Diamond J. 2011. Medicine: diabetes in India. Nature 469:478–479.
Duret L, Mouchiroud D. 2000. Determinants of substitution rates in
mammalian genes: expression pattern affects selection intensity
but not mutation rate. Mol Biol Evol. 17:68–74.
Eanes WF. 2011. Molecular population genetics and selection in the
glycolytic pathway. J Exp Biol. 214:165–171.
Fares MA, Ruiz-González MX, Labrador JP. 2011. Protein coadaptation and the design of novel approaches to identify
protein-protein interactions. IUBMB Life. 63:264–471.
Flowers JM, Sezgin E, Kumagai S, Duvernell DD, Matzkin LM,
Schmidt PS, Eanes WF. 2007. Adaptive evolution of metabolic
pathways in Drosophila. Mol Biol Evol. 24:1347–1354.
Fraser HB, Hirsh AE, Steinmetz LM, Scharfe C, Feldman MW. 2002.
Evolutionary rate in the protein interaction network. Science
296:750–752.
Fuss B, Becker T, Zinke I, Hoch M. 2006. The cytohesin Steppke is
essential for insulin signalling in Drosophila. Nature 444:945–948.
Gardner M, Bertranpetit J, Comas D. 2008. Worldwide genetic
variation in dopamine and serotonin pathway genes: implications for association studies. Am J Med Genet B Neuropsychiatr
Genet. 147B:1070–1075.
Gonzalez-Neira A, Ke X, Lao O, et al. (14 co-authors). 2006. The
portability of tagSNPs across populations: a worldwide survey.
Genome Res. 16:323–330.
Grossman SR, Shylakhter I, Karlsson EK, et al. (13 co-authors). 2010.
A composite of multiple signals distinguishes causal variants in
regions of positive selection. Science 327:883–886.
Gutenkunst RN, Hernandez RD, Williamson SH, Bustamante CD.
2009. Inferring the joint demographic history of multiple
populations from multidimensional SNP frequency data. PLoS
Genet. 5:e1000695.
Hafner M, Schmitz A, Grune I, Srivatsan SG, Paul B, Kolanus W,
Quast T, Kremmer E, Bauer I, Famulok M. 2006. Inhibition of
cytohesins by SecinH3 leads to hepatic insulin resistance. Nature
444:941–944.
Hahn MW, Kern AD. 2005. Comparative genomics of centrality and
essentiality in three eukaryotic protein-interaction networks.
Mol Biol Evol. 22:803–806.
Hardy M. 1993. Regression with dummy variables. London: Sage
Publications.
Hernandez G, Altmann M, Sierra JM, Urlaub H, Diez del Corral R,
Schwartz P, Rivera-Pomar R. 2005. Functional analysis of seven
genes encoding eight translation initiation factor 4E (eIF4E)
isoforms in Drosophila. Mech Dev. 122:529–543.
Hinney A, Vogel CI, Hebebrand J. 2010. From monogenic to
polygenic obesity: recent advances. Eur Child Adolesc Psychiatry.
19:297–310.
International HapMap Consortium. 2007. A second generation
human haplotype map of over 3.1 million SNPs. Nature
449:851–861.
Jakobsson M, Scholz SW, Scheet P, et al. (24 co-authors). 2008.
Genotype, haplotype and copy-number variation in worldwide
human populations. Nature 451:998–1003.
Joshi B, Cameron A, Jagus R. 2004. Characterization of
mammalian eIF4E-family members. Eur J Biochem. 271:
2189–2203.
Jost L. 2008. G(ST) and its relatives do not measure differentiation.
Mol Ecol. 17:4015–4026.
Karolchik D, Kuhn RM, Baertsch R, et al. (25 co-authors). 2008. The
UCSC Genome Browser Database: 2008 update. Nucleic Acids
Res. 36:D773–D779.
Kelley JL, Madeoy J, Calhoun JC, Swanson W, Akey JM. 2006.
Genomic signatures of positive selection in humans and the
limits of outlier approaches. Genome Res. 16:980–989.
MBE
Kim PM, Korbel JO, Gerstein MB. 2007. Positive selection at the
protein network periphery: evaluation in terms of structural
constraints and cellular context. Proc Natl Acad Sci U S A.
104:20274–20279.
King H, Aubert RE, Herman WH. 1998. Global burden of diabetes,
1995-2025: prevalence, numerical estimates, and projections.
Diabetes Care 21:1414–1431.
Kirkness EF, Haas BJ, Sun W, et al. (71 co-authors). 2010. Genome
sequences of the human body louse and its primary
endosymbiont provide insights into the permanent parasitic
lifestyle. Proc Natl Acad Sci U S A. 107:12168–12173.
Kosiol C, Vinar T, da Fonseca RR, Hubisz MJ, Bustamante CD,
Nielsen R, Siepel A. 2008. Patterns of positive selection in six
Mammalian genomes. PLoS Genet. 4:e1000144.
Lemos B, Bettencourt BR, Meiklejohn CD, Hartl DL. 2005. Evolution
of proteins and gene expression levels are coupled in Drosophila
and are independently associated with mRNA abundance,
protein length, and number of protein-protein interactions. Mol
Biol Evol. 22:1345–1354.
Leopoldt D, Hanck T, Exner T, Maier U, Wetzker R, Nurnberg B.
1998. Gbetagamma stimulates phosphoinositide 3-kinasegamma by direct interaction with two domains of the catalytic
p110 subunit. J Biol Chem. 273:7024–7029.
LeRoith D, Taylor S, Olefsky J. 2004. Diabetes mellitus: a fundamental
and clinical text. Philadelphia (PA): Lippincott.
Lewontin R, Krakauer J. 1973. Distribution of gene frequency as
a test of the theory of the selective neutrality of polymorphisms.
Genetics 74:175–195.
Li JZ, Absher DM, Tang H, et al. (11 co-authors). 2008. Worldwide
human relationships inferred from genome-wide patterns of
variation. Science 319:1100–1104.
Livingstone K, Anderson S. 2009. Patterns of variation in the
evolution of carotenoid biosynthetic pathway enzymes of higher
plants. J Hered. 100:754–761.
Loewe L. 2009. A framework for evolutionary systems biology. BMC
Syst Biol. 3:27.
Mathias RA, Deepa M, Deepa R, Wilson AF, Mohan V. 2009.
Heritability of quantitative traits associated with type 2 diabetes
mellitus in large multiplex families from South India. Metabolism
58:1439–1445.
Montanucci L, Laayouni H, Dall’Olio GM, Bertranpetit J. 2011.
Molecular evolution and network-level analysis of the Nglycosylation metabolic pathway across primates. Mol Biol Evol.
28:813–823.
Moorjani P, Patterson N, Hirschhorn JN, Keinan A, Hao L,
Atzmon G, Burns E, Ostrer H, Price AL, Reich D. 2011. The
history of African gene flow into Southern Europeans,
Levantines, and Jews. PLoS Genet. 7:e1001373.
Oldham S, Hafen E. 2003. Insulin/IGF and target of rapamycin signaling:
a TOR de force in growth control. Trends Cell Biol. 13:79–85.
1000 Genomes Project Consortium. 2010. A map of human genome
variation from population-scale sequencing. Nature 467:
1061–1073.
O’Rahilly S. 2009. Human genetics illuminates the paths to
metabolic disease. Nature 462:307–314.
Pal C, Papp B, Hurst LD. 2001. Highly expressed genes in yeast evolve
slowly. Genetics 158:927–931.
Peng G, Luo L, Siu H, et al. (12 co-authors). 2009. Gene and pathwaybased second-wave analysis of genome-wide association studies.
Eur J Hum Genet. 18:111–117.
Pickrell JK, Coop G, Novembre J, et al. (11 co-authors). 2009. Signals
of recent positive selection in a worldwide sample of human
populations. Genome Res. 19:826–837.
Ramsay H, Rieseberg LH, Ritland K. 2009. The correlation of
evolutionary rate with pathway position in plant terpenoid
biosynthesis. Mol Biol Evol. 26:1045–1053.
1391
Luisi et al. · doi:10.1093/molbev/msr298
Rankinen T, Zuberi A, Chagnon YC, Weisnagel SJ, Argyropoulos G,
Walts B, Perusse L, Bouchard C. 2006. The human obesity gene
map: the 2005 update. Obesity (Silver Spring) 14:529–644.
Rausher MD, Miller RE, Tiffin P. 1999. Patterns of evolutionary rate
variation among genes of the anthocyanin biosynthetic
pathway. Mol Biol Evol. 16:266–274.
Reiner AP, Carlson CS, Ziv E, Iribarren C, Jaquish CE,
Nickerson DA. 2007. Genetic ancestry, population substructure, and cardiovascular disease-related traits among
African-American participants in the CARDIA Study. Hum
Genet. 121:565–575.
Rhone B, Brandenburg JT, Austerlitz F. 2011. Impact of selection on
genes involved in regulatory network: a modelling study. J Evol
Biol. 24:2087–2098.
Riley RM, Jin W, Gibson G. 2003. Contrasting selection pressures on
components of the Ras-mediated signal transduction pathway
in Drosophila. Mol Ecol. 12:1315–1323.
Romero IG, Manica A, Goudet J, Handley LL, Balloux F. 2009. How
accurate is the current picture of human genetic variation?
Heredity 102:120–126.
Rosenberg NA. 2006. Standardized subsets of the HGDP-CEPH
Human Genome Diversity Cell Line Panel, accounting for
atypical and duplicated samples and pairs of close relatives. Ann
Hum Genet. 70:841–847.
Sabeti PC, Reich DE, Higgins JM, et al. (17 co-authors). 2002.
Detecting recent positive selection in the human genome from
haplotype structure. Nature 419:832–837.
Scheet P, Stephens M. 2006. A fast and flexible statistical model for
large-scale population genotype data: applications to inferring
missing genotypes and haplotypic phase. Am J Hum Genet.
78:629–644.
Sharkey TD, Yeh S, Wiberley AE, Falbel TG, Gong D, Fernandez DE.
2005. Evolution of the isoprene biosynthetic pathway in kudzu.
Plant Physiol. 137:700–712.
Smoot ME, Ono K, Ruscheinski J, Wang PL, Ideker T. 2011.
Cytoscape 2.8: new features for data integration and network
visualization. Bioinformatics 27:431–432.
Stajich JE, Block D, Boulez K, et al. (12 co-authors). 2002. The Bioperl
toolkit: Perl modules for the life sciences. Genome Res.
12:1611–1618.
Stead JD, Hurles ME, Jeffreys AJ. 2003. Global haplotype diversity in
the human insulin gene region. Genome Res. 13:2101–2111.
Stephens LR, Eguinoa A, Erdjument-Bromage H, et al. (11 co-authors).
1997. The G beta gamma sensitivity of a PI3K is dependent upon
a tightly associated adaptor, p101. Cell 89:105–114.
1392
MBE
Subramanian S, Kumar S. 2004. Gene expression intensity shapes
evolutionary rates of the proteins encoded by the vertebrate
genome. Genetics 168:373–381.
Taguchi A, White MF. 2008. Insulin-like signaling, nutrient
homeostasis, and life span. Annu Rev Physiol. 70:191–212.
Teshima KM, Coop G, Przeworski M. 2006. How reliable are empirical
genomic scans for selective sweeps? Genome Res. 16:702–712.
van Vliet-Ostaptchouk JV, Hofker MH, van der Schouw YT,
Wijmenga C, Onland-Moret NC. 2009. Genetic variation in the
hypothalamic pathways and its role on obesity. Obes Rev.
10:593–609.
Vinciguerra M, Foti M. 2006. PTEN and SHIP2 phosphoinositide
phosphatases as negative regulators of insulin signalling. Arch
Physiol Biochem. 112:89–104.
Voight BF, Kudaravalli S, Wen X, Pritchard JK. 2006. A map of recent
positive selection in the human genome. PLoS Biol. 4:e72.
Wang X, Ding X, Su S, Spector TD, Mangino M, Iliadou A, Snieder H.
2009. Heritability of insulin sensitivity and lipid profile depend
on BMI: evidence for gene-obesity interaction. Diabetologia
52:2578–2584.
Wang Y, Beydoun MA. 2007. The obesity epidemic in the United
States—gender, age, socioeconomic, racial/ethnic, and geographic characteristics: a systematic review and meta-regression
analysis. Epidemiol Rev. 29:6–28.
Weir BS, Hill WG. 2002. Estimating F-statistics. Annu Rev Genet.
36:721–750.
Wolf YI, Gopich IV, Lipman DJ, Koonin EV. 2010. Relative
contributions of intrinsic structural-functional constraints and
translation rate to the evolution of protein-coding genes.
Genome Biol Evol. 2:190–199.
Wright KM, Rausher MD. 2010. The evolution of control and
distribution of adaptive mutations in a metabolic pathway.
Genetics 184:483–502.
Yamada T, Bork P. 2009. Evolution of biomolecular networks:
lessons from metabolic and protein interactions. Nat Rev Mol
Cell Biol. 10:791–803.
Zaykin DV, Zhivotovsky LA, Czika W, Shao S, Russell D. 2008.
Combining p-values in large scale genomics experiments. Pharm
Stat. 6:217–226.
Zera AJ. 2011. Microevolution of intermediary metabolism:
evolutionary genetics meets metabolic biochemistry. J Exp
Biol. 214:179–190.
Zhang B, Roth RA. 1992. The insulin receptor-related receptor.
Tissue expression, ligand binding specificity, and signaling
capabilities. J Biol Chem. 267:18320–18328.