Network-Level and Population Genetics Analysis of the Insulin/TOR Signal Transduction Pathway Across Human Populations Pierre Luisi,1 David Alvarez-Ponce,2,3 Giovanni Marco Dall’Olio,1 Martin Sikora,4 Jaume Bertranpetit,*,1 and Hafid Laayouni1 1 Institute of Evolutionary Biology (UPF-CSIC) CEXS-UPF-PRBB, Barcelona, Catalonia, Spain Departament de Genètica, Universitat de Barcelona, Barcelona, Spain 3 Department of Biology, National University of Ireland Maynooth, Maynooth, County Kildare, Ireland 4 Department of Genetics, Stanford University School of Medicine *Corresponding author: E-mail: [email protected]. Associate editor: Douglas Crawford 2 Genes and proteins rarely act in isolation, but they rather operate as components of complex networks of interacting molecules. Therefore, for understanding their evolution, it may be helpful to take into account the interaction networks in which they participate. It has been shown that selective constraints acting on genes depend on the position that they occupy in the network. Less understood is how the impact of local adaptation at the intraspecific level is affected by the network structure. Here, we analyzed the patterns of molecular evolution of 67 genes involved in the insulin/target of rapamycin (TOR) signal transduction pathway. This well-characterized pathway plays a key role in fundamental processes such as energetic metabolism, growth, reproduction, and aging and is involved in metabolic disorders such as obesity, insulin resistance, and diabetes. For that purpose, we combined genotype data from worldwide human populations with current knowledge of the structure and function of the pathway. We identified the footprint of recent positive selection in nine of the studied genomic regions. Most of the adaptation signals were observed among Middle East and North African, European, and Central South Asian populations. We found that positive selection preferentially targets the most central elements in the pathway, in contrast to previous observations in the whole human interactome. This observation indicates that the impact of positive selection on genes involved in the insulin/TOR pathway is affected by the pathway structure. Key words: human populations, positive selection, insulin/TOR signal transduction pathway, network evolution. Introduction During their diaspora, humans have been exposed to a wide range of different environments, to which they have had to adapt. To study the process of adaptation, we can interrogate genomic data to determine which genes evolved under positive selection. Since positive selection leads to the adaptation of a specific phenotype to a given environment, such knowledge may be helpful to understand the complex phenotype–genotype relationship, which remains largely unknown. Whereas comparison of the genomes of different species can provide insight into the evolutionary forces that acted at a large timescale, using data from individuals of the same species enables the detection of the genomic footprints of more recent adaptive events. Genes in a genome are affected by disparate evolutionary forces, and one of the goals in Evolutionary Biology is to determine the factors underlying this variability. Genes and proteins rarely act in isolation but rather operate as components of complex networks of interacting molecules. Therefore, for understanding their evolution, it may be helpful to take into account the interaction networks in which they participate (Loewe 2009; Yamada and Bork 2009; Barton et al. 2010; Wolf et al. 2010). The combination of comparative genomics and network data has shown that the patterns of molecular evolution of genes are related to the structure of the networks in which they participate, with a number of measures of network position correlating with levels of selective constraint. First, genes occupying more central positions in a protein–protein interaction network are more selectively constrained than those acting at the periphery (Fraser et al. 2002; Hahn and Kern 2005; Lemos et al. 2005). Second, genes encoding interacting proteins are similarly affected by purifying selection (Fraser et al. 2002; Lemos et al. 2005), which can be attributed to molecular coevolution or to interacting genes being affected by similar selective pressures (Codoñer and Fares 2008; Fares et al., 2011). And third, in a number of pathways, including the insulin/TOR pathway (Alvarez-Ponce et al. 2009; Alvarez-Ponce, Aguade et al. 2011; AlvarezPonce, Guirao-Rico et al. 2011), a polarity in the distribution of levels of selective constraint levels has been observed along the upstream/downstream pathway axis (Rausher et al. 1999; Riley et al. 2003; Sharkey et al. 2005; Livingstone and Anderson 2009; Ramsay et al. © The Author 2011. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: [email protected] Mol. Biol. Evol. 29(5):1379–1392. 2012 doi:10.1093/molbev/msr298 Advance Access publication December 1, 2011 1379 Research article Abstract MBE Luisi et al. · doi:10.1093/molbev/msr298 2009; Montanucci et al. 2011). Less understood is how the patterns of positive selection are affected by the position of genes in the network, although some tendencies have been observed: positive selection seems to preferentially target genes encoding the most peripheral proteins in the human protein–protein interaction network (Kim et al. 2007), and in Drosophila, positive selection tends to target genes acting at branch points of the pathways involved in glucose metabolism (Flowers et al. 2007). More work is warranted to understand the relationship between the structure of functional networks and the action of positive selection. Molecular pathways can be classified as metabolic, signal transduction, and transcriptional regulatory pathways. Signaling pathways allow the integration of cell function with extracellular stimuli. The insulin/TOR signal transduction pathway plays a key role in controlling energetic metabolism, growth, reproduction, and aging in response to nutritional conditions (Oldham and Hafen 2003; LeRoith et al. 2004). The structure and function of this pathway have been well characterized in a number of organisms, revealing a high degree of conservation across metazoans (Oldham and Hafen 2003; Kirkness et al. 2010). Insulin is secreted in response to high levels of blood sugar. Upon binding to insulin, the insulin receptor (InR) undergoes autophosphorylation, thus becoming able to recruit a complex of proteins to the inner cell membrane, including the phosphatidylinositol 3-kinase (PI3K). PI3K catalyzes the biosynthesis of phosphatidylinositol 3,4,5-trisphosphate (PIP3), a membrane lipid that serves as anchor for pleckstrin homology domain-containing proteins, including PDK1, PKB, and PKC. Once in the membrane, these proteins become able to trigger the activation of downstream proteins, including transcription factors and ribosomal proteins, which in turn mediate the physiological effects of insulin. The prevalence of obesity and closely related diseases such as insulin resistance and type 2 diabetes is dramatically different among human populations (King et al. 1998; Reiner et al. 2007; van Vliet-Ostaptchouk et al. 2009). The differences observed among individuals of different ethnical origin in the same country (Diamond 2003; Wang and Beydoun 2007; Anderson and Whitaker 2009; Diamond 2011) underline the fact that these metabolic disorders are not only caused by environmental exposures but also by genetic factors. The insulin/TOR pathway plays a key role in these diseases; indeed, dysfunction of its components often leads to insulin resistance, diabetes, obesity, or cancer (Oldham and Hafen 2003; LeRoith et al. 2004; Taguchi and White 2008). Thus, applying population genetics analysis to describe how natural selection acted in different populations on the genes involved in this pathway may provide key insight into the etiology of these diseases. Here, we used human population genetics data to study the impact of recent positive selection (within the last ;5,000–45,000 years) on the molecular evolution of 67 genes involved in the human insulin/TOR signal transduction pathway. We identified signals of potential recent positive selection in nine of the studied genes. Most of these signals were observed in Middle East and North African, 1380 European, and Central South Asian populations. We found that the impact of positive selection on the insulin/TOR pathway genes depends on the position that their encoded products occupy in the interaction network. Contrary to previously observed in the human protein–protein interaction network (Kim et al. 2007), positive selection preferentially targets genes encoding the most central proteins in the pathway. This result indicates that the structure of the pathway influences the patterns of molecular evolution of its components. Materials and Methods Genomic Regions Analyzed We obtained a list of 69 human genes from a previous study of the vertebrate insulin/TOR pathway (Alvarez-Ponce, Aguade et al. 2011). Most of these genes are known to be involved in the pathway according to the literature or are close paralogs that cluster within these genes in phylogenetic trees. We excluded from the analysis genes that, despite having paralogs that are involved in the pathway, are known not to play a role in insulin/TOR signaling: INSRR, encoding the insulin-related receptor, which is not activated by insulin or IGFs (Zhang and Roth 1992); and PIK3CG, encoding the protein p110c, which is activated by G protein– coupled receptors and not by tyrosine kinases (Stephens et al. 1997; Leopoldt et al. 1998). Additionally, we included in the present analysis the genes encoding insulin (INS) and the insulin-like growth factors I (IGF1) and II (IGF2). Some of the methods used for detecting signals of positive selection (see below) have been devised for autosomal regions or provide results that cannot be compared between the autosomal regions and the ones linked to the X chromosome. Therefore, genes IRS4, FOXO4, and MYCL2, which are linked to chromosome X, were excluded from the analysis. The final list therefore consisted of 67 genes, which belong to 24 groups of paralogs (supplementary table S1, Supplementary Material online). For each gene, we analyzed the genomic region including the gene plus 200 flanking kilobases (100 Kb upstream and 100 Kb downstream of the transcript). Gene coordinates were obtained from release 36 of the human genome at NCBI. For genes with multiple splicing variants, we considered the transcript spanning the longest chromosome region. Two pairs of overlapping regions (INS and IGF2; PDPK1 and AC141586.2) were merged, and therefore, the analyses were conducted on a total of 65 genomic regions (supplementary table S1, Supplementary Material online), of which 58 were used in network-level analyses (see below). Population Genomic Data For each of the 65 studied genomic regions, we obtained single nucleotide polymorphism (SNP) data from the Human Genome Diversity Panel–Centre d’Études du Polymorphisme Humain (HGDP-CEPH) (Cann et al. 2002). We restricted our analysis to a subset composed of 940 genotyped individuals from 39 populations as described Network-Level and Population Genetics Analysis · doi:10.1093/molbev/msr298 in Gonzalez-Neira et al. (2006). This subpanel does not contain any pair of individuals with a first or second degree relationship (Rosenberg 2006). These populations were grouped into seven geographic regions as defined by Li et al. (2008): Sub-Saharan Africa, Middle East and North Africa, Europe, Central South Asia, East Asia, America, and Oceania. These seven groups differ in both the number of populations (ranging from two to nine) and the number of individuals (from 27 to 219; supplementary table S2, Supplementary Material online). For each individual, data for ;650,000 SNPs distributed along the whole genome is available (Jakobsson et al. 2008; Li et al. 2008). We removed all SNPs with a minor allele frequency (MAF) lower than 5% either considering individuals within each geographic area or considering all individuals from the seven geographic regions together, depending on whether the methods were applied within a specific geographic area or to compare different regions, respectively. After filtering at a global level, our data set consists of 586,210 SNPs (the number of SNPs used for separate analysis of each geographic area is provided in supplementary table S3, Supplementary Material online). Detection of Positive Selection In order to detect signals of positive selection in the studied genomic regions, three different methods were used. The integrated haplotype score (iHS) is based on the extended haplotype homozygosity measure, which aims at detecting the footprint of positive selection from the local haplotype structure. The two other methods (FST and DDAF indexes) are based on genetic differentiation between geographic areas. A selective sweep causes the selected allele to spread rapidly through the population, leading to an increased linkage disequilibrium compared with the neutral expectation. In each geographic region, we computed a raw iHS for each SNP following the method proposed by Voight et al. (2006). Standardized iHS were obtained by grouping SNPs into bins of frequency 0.05, subtracting the mean, and dividing by the standard deviation for all SNPs in the same bin. Extreme positive or negative values indicate high extended haplotype homozygosity of haplotypes carrying the ancestral or derived allele, respectively. Hence, we consider both extreme positive or negative iHS as potential signatures of positive selection, using jiHSsj as in Voight et al. (2006). Genetic distances were obtained from the genetic map provided by the HapMap Consortium release 22 (International HapMap Consortium 2007). Genotype phases were inferred using the fastPhase program (Scheet and Stephens 2006). The FST index (Weir and Hill 2002) summarizes genetic differentiation among populations by estimating the proportion of interindividual variance that can be explained by the intergroup variance. For each SNP, we computed the FST index between each geographic region and the remaining ones using the PopGen module from BioPerl (Stajich et al. 2002). DDAF (Grossman et al. 2010) describes the difference in derived allele frequencies (DAF) between MBE two sets of individuals. This score was computed as DS DNS, where DS and DNS are the DAF in the studied geographic region and in the remaining ones, respectively. DDAF scores range from 1 to 1, with negative and positive values indicating whether the derived allele is less or more frequent in the given geographic area, respectively. Large FST and jDDAFj values can be the result of the action of positive selection on the studied SNP or in a linked one in a specific geographic area (Lewontin and Krakauer 1973; Grossman et al. 2010). Computation of the iHS and DDAF score requires ancestral allele information. Ancestral states inferred from comparison with orthologous sequences in the chimpanzee and rhesus macaque genomes were obtained from the UCSC Genome Bioinformatics Site (http://genome.ucsc.edu/; table ‘‘snp128OrthoPanTro2RheMac2’’; Karolchik et al. 2008). Extreme values for the used statistics can also be the result of nonselective events such as demographic changes and genetic drift. As these events act on the genome randomly, in contrast to positive selection, which targets specific genes, we adopted an outlier approach to infer the action of positive selection (Kelley et al. 2006). Once scores from the three methods were calculated for all SNPs in the genome, we computed empirical P values for the SNPs in the studied genomic regions as in Kelley et al. (2006). Under this approach, the P value corresponds to the probability that a value randomly drawn from the whole genome is more extreme than the value observed at the SNP of interest. If the locus of interest shows an outlier value (i.e., less than 5% of the SNPs from the background present a greater value than the SNP of interest), positive selection is invoked. Because differentiation indexes (Beaumont and Nichols 1996; Barreiro et al. 2008; Gardner et al. 2008; Jost 2008) strongly correlate with heterozygosity, we compared these indexes with the ones observed at loci with similar MAF values. For that purpose, we divided the background SNP set into bins of 10,000 SNPs of similar MAF values, and P values for each SNP were computed as the proportion of SNPs in the same bin with a greater score. Having calculated P values separately for each SNP of interest, we integrated results for all SNPs within each of the studied genomic regions. For that purpose, we performed the Fisher’s combination test (Zaykin et al. 2008) for each genomic region, geographic area, and K P method. The statistic is computed as ZF 5 2 logPi , i51 where K is the number of SNPs in the studied genomic region and Pi is the empirical P value for the SNP i. This statistic follows a v2 distribution with 2K degrees of freedom. This method shows a good performance for combining information from SNPs even if they are not completely independent (Peng et al. 2009). Moreover, the HGDP-CEPH data set was generated using Illumina arrays that mostly contain tag SNPs, which capture most of the information from all SNPs in the region by linkage disequilibrium, and hence present a low level of linkage disequilibrium among them. After obtaining the P values from this Fisher’s combination test, we evaluated their 1381 Luisi et al. · doi:10.1093/molbev/msr298 significance taking into account the whole-genome context. For that purpose, we used a genomic region-level background containing the 65 genomic regions encoding the insulin/TOR pathway plus a set of 6,329 nonoverlapping regions centered on autosomal genes distributed across the genome. The complete background gene set obtained thus includes 6,394 genomic regions and 386,401 SNPs. To avoid overlap among background genomic regions, they were selected in each autosomal chromosome as follows: beginning from the gene at the lowest physical position, genes located at a minimum distance of 200 Kb from the previous one were included until reaching the end of the chromosome. From this set of genes, the genomic regions including the genes plus 200 flanking kilobases were considered for the analysis, as genomic regions were constructed for insulin/TOR pathway genes. For each of these background genomic regions, we also computed a P value using the Fisher’s combination test. P values were corrected for multiple testing using the false discovery rate approach (Benjamini and Hochberg 1995). Genomic regions were considered to have evolved under positive selection if they had a corrected P value lower than a given threshold in any of the studied regions for iHS and, at least, one of the genetic differentiation-based methods used (i.e., either FST or DDAF). Since we are considering three methods and seven geographic regions, we used a highly conservative threshold of 2.381 103 (i.e., 0.050/[3 7]). Network-Level Analysis We evaluated whether the structure of the insulin/TOR pathway has an effect on the molecular evolution of its components. For that purpose, we considered whether 1) there is a polarity in the impact of positive selection along the upstream/downstream axis of the pathway; 2) the incidence of positive selection is different for central and peripheral genes in the pathway; and 3) genes encoding physically interacting proteins evolve under similar selective pressures. For these analyses, we used the graph constructed by Alvarez-Ponce, Aguade et al. (2011), which we expanded by adding an additional node representing insulin and the insulin-like growth factors I and II (IGF1 and IGF2), and an arch representing the activation of the insulin and IGF1 receptors (InR and IGF1R) by these hormones. As in Alvarez-Ponce, Aguade et al. (2011), we eliminated from network-level analyses: 1) genes EIF4E2 and EIF4E3, as their encoded products likely act as negative regulators of insulin signaling (Joshi et al. 2004; Hernandez et al. 2005); 2) PTEN, as its encoded product does not interact with any other protein in our data set (for a review, see Vinciguerra and Foti 2006); and 3) genes CYTH1–4, whose encoded products occupy an unclear position in the pathway (Fuss et al. 2006; Hafner et al. 2006). Hence, network-level analyses were conducted on a total of 60 genes located at 58 genomic regions; and the final graph consists of 22 nodes (representing groups of paralogous genes) connected by 40 arcs (interactions), of which 33 are physical (fig. 1). 1382 MBE We first tested for a polarity in the action of positive selection along the pathway by comparing pathway positions of genes evolving under positive selection with positions of genes without signals of adaptive evolution. For each gene, pathway position, defined as the number of steps required for the transduction of the signal from InR and IGF1R (position 0) to each of the components of the pathway (positions 1–10), was computed as in Alvarez-Ponce, Aguade et al. (2011). Genes INS, IGF1, and IGF2 were assigned the position 1, as their encoded products act immediately upstream of these receptors (see fig. 1). Second, we evaluated whether genes evolving under positive selection differ from those without signals of adaptive evolution in terms of a number of metrics of centrality within the network: connectivity, betweenness and closeness (table 1). For each node, connectivity was computed as the number of nodes to which it is connected, betweenness was calculated as the number of shortest paths passing through it, and closeness as the inverse of the average length of the shortest paths to all the other nodes in the network. These metrics were computed using the Network Analysis plug-in for Cytoscape (Smoot et al. 2011), taking into account physical interactions among the studied proteins only. Additionally, these centrality measures were also computed taking into account the entire human interactome constructed by Bossi and Lehner (2009) and removing self-interactions (data not shown). Finally, we investigated whether genes evolving under positive selection encode proteins that tend to interact to each other. To that end, we performed a Monte Carlo test using as statistic the number of interactions involving two paralogous groups (nodes) with at least one gene evolving under positive selection (X). The statistical significance of X was evaluated by comparing the observed values with those calculated on a set of 10,000 randomizations of the network, each with the same nodes and the same number of interactions as the original one. We used two different randomization techniques. In method 1, each interaction was assigned by randomly choosing two different nodes (method 1). Additionally, we used a switching algorithm, which preserves the connectivity of each particular node (method 2). Each random network was generated from the original one by repeatedly choosing two interactions at random (e.g., A–B and C–D) and exchanging the edges (yielding A–D and C–B or A–C and B–D). In each random network, this process was iterated 100 m times, where m is the number of edges. P values were computed as the proportion of randomized networks with an X value equal to or higher than the observed one. We also performed these analyses using as statistic the number of interactions involving the encoded products of two particular genes, rather than paralogous groups, evolving under positive selection (X#). For these analyses, we assumed that, if two groups of paralogs interact, all proteins encoded by genes in one group interact with all proteins encoded by genes in the other group. Network-Level and Population Genetics Analysis · doi:10.1093/molbev/msr298 MBE FIG. 1. Structure of the insulin/TOR signal transduction pathway. Each node represents a paralogous group. Blue nodes contain genes that activate the pathway, whereas red nodes contain pathway inhibitors. Within each paralogous group, genes are represented as circles with the number corresponding to genes as listed below (for more details, see supplementary table S1, Supplementary Material online). White circles represent genes included in our network-level analysis, whereas circles in dark gray represent genes that were excluded. The nine genes for which we observed the signature of positive selection are represented by diamonds. Black, blue, and red edges represent protein–protein, metabolic, and transcriptional interactions, respectively. Gray gradations of the background represent the different cellular compartments where the proteins act in the activated pathway (from lighter to darker gray: extracellular action, plasma membrane, cytosol, and nucleus). Numbers on the lower part refers to pathway position (i.e., the number of steps required for signals transduction from the insulin receptor). Gene code: 1, IGF2; 1#, INS; 2, IGF1; 3, IGF1R; 4, INSR; 5, DOK1; 6, IRS1; 7, DOK3; 8, FRS3; 9, DOK2; 10, FRS2; 11, IRS2; 12, DOK4; 13, DOK6; 14, DOK5; 15, PIK3R3; 16, PIK3R1; 17, PIK3R2; 18, PIK3CD; 19, PIK3CB; 20, PIK3CA; 21, VEPH1; 22, PDPK1; 22#, AC141586.2; 23, AKT3; 24, AKT1; 25, AKT2; 26, PRKCZ; 27, PRKCE; 28, PRKCD; 29, PRKCI; 30, PRKCQ; 31, PRKCH; 32, PRKCB; 33, PRKCA; 34, PRKCG; 35, TSC1; 36, AC093151.2; 37, FOXO3; 38, FOXO1; 39, GSK3B; 40, GSK3A; 41, TSC2; 42, EIF2B5; 43, GYS2; 44, GYS1; 45, MYCL1; 46, MYCN; 47, MYC; 48, RHEB; 49, RHEBL1; 50, MTOR; 51, EIF4EBP3; 52, EIF4EBP1; 53, EIF4EBP2; 54, RPS6KB2; 55, RPS6KB1; 56, EIF4E2; 57, EIF4E3; 58, EIF4E; 59, EIF4E1B; 60, RPS6; 61, CYTH3; 62, CYTH1; 63, CYTH2; 64, CYTH4; 65, PTEN. Action of Positive Selection on Paralogous Genes To contrast whether genes belonging to the same paralogous group are similarly targeted by positive selection, we used a Monte Carlo method using as statistic the number of pairs of paralogs in which both exhibit signals of positive selection (Y). We tested whether the observed Y value is higher than expected at random from 10,000 random sets of paralogous groups. Each randomization had the same 65 genes and the same number of pairs of paralogs (112 pairs). Each pair of paralogs was assigned by randomly choosing two different genes. Results High Incidence of Positive Selection in the Insulin/ TOR Pathway We used population genetics data from seven human worldwide geographic areas to study the patterns of molecular evolution of the genomic regions containing 67 autosomal genes involved in the insulin/TOR signal transduction pathway. In total, we analyzed 4,442 SNPs spanning 19.3 Mb of the human genome (supplementary table S1, Supplementary Material online). After removing SNPs with MAF lower than 5% in the entire data set, 4,005 SNPs were used in the study, providing an average density of approximately 1 SNP/5 Kb. We combined three complementary methods (iHS, FST, and DDAF) to infer the action of positive selection in these genomic regions. Supplementary fig. S1–S3, Supplementary Material online show, for each continental group, the iHS, FST, and DDAF scores, respectively, for the SNPs located in large genomic regions centered on the genes of interest. These plots allow comparing observations for SNPs within the gene and nearby with SNPs in the surrounding region, thus, they provide a better visualization of potential signals of positive selection. Using a stringent criterion (see Materials and Methods), we identified footprints of potential positive selection in nine genes belonging to five different groups of paralogs (table 2, supplementary tables S4–S6, Supplementary Material online). For six of these genes, the signature of positive selection was observed in a single geographic area, whereas for the three other genes positive selection has acted in two distinct geographic regions, always from Middle East and North Africa, Europe, or Central South Asia regions. First, we observed the footprint of positive selection for three genes encoding docking proteins (DOK1, DOK2, and DOK5), belonging to the IRS paralogous group. DOK1 fulfills our criterion in Middle East and North Africa 1383 MBE Luisi et al. · doi:10.1093/molbev/msr298 Table 1. Metrics of Network Position Taking Into Account Protein–Protein Interactions. Paralogous Group INS InR IRS p85 p110 MELT PDK1 PKB PKC TSC1 FOXO GSK3 TSC2 EIF2BE GYS MYC RHEB TOR 4EBP S6K EIF4E RPS6 Pathway Positiona 21 0 1 2 3 4 4 5 5 5 6 6 6 7 7 7 7 8 9 9 10 10 Connectivity 1 4 6 3 1 2 3 7 7 2 2 5 3 1 1 2 2 5 2 4 2 1 Closeness 0.304 0.429 0.525 0.375 0.276 0.296 0.429 0.553 0.525 0.300 0.382 0.438 0.396 0.309 0.309 0.368 0.356 0.477 0.350 0.396 0.375 0.288 Betweenness 0.000 0.121 0.247 0.095 0.000 0.005 0.038 0.400 0.265 0.010 0.081 0.201 0.108 0.000 0.000 0.000 0.014 0.155 0.013 0.105 0.023 0.000 a Position across the upstream/downstream axis of the pathway, defined as the number of steps required for signal transduction from the insulin receptor (InR). and in Europe and presents also a significantly high genetic differentiation between East Asia and the rest of the regions according to both the FST and the DDAF scores. DOK2 exhibits the signature of positive selection in Central South Asia, as well as significantly high FST and DDAF scores between Sub-Saharan Africa and the other regions. We also observed significant iHS at DOK2 in Middle East and North Africa and in Europe. DOK5 meets the criterion in Middle East and North Africa and in Central South Asia and shows a significant degree of genetic differentiation in Sub-Saharan Africa (for both the FST and the DDAF scores), Europe (DDAF), and East Asia (FST). From the PKB group, AKT1 and AKT3 satisfy our criterion in Europe and Middle East and in North Africa, respectively. Both genes also exhibit a significantly high FST score (AKT1 for East Asia and AKT3 for Sub-Saharan Africa). Moreover, we detected signatures of positive selection for two genes that belong to the PKC paralogous group: PRKCI (in Oceania) and PRKCH (in Middle East and North Africa and in Europe). For the latter gene, we also identified an important degree of genetic differentiation between Sub-Saharan Africa and the other regions and a significantly high iHS in Central South Asia and in East Asia. RPS6KB2, from the S6K group, exhibits signatures of adaptive evolution in Central South Asia and a significantly high relevant genetic differentiation of the SubSaharan Africa area, as well as significant iHS in Middle East and North Africa and in East Asia. Finally, we observed the signature of positive selection in East Asia for EIF4E. It is interesting to note that for the seven genes for which we observed signals of positive selection in Middle 1384 East and North Africa, Europe, or Central South Asia (DOK1, DOK2, DOK5, AKT1, AKT3, PRKCH, and RPS6KB2), the selected haplotypes seem to have spread through all three areas. Indeed, in most cases, these genes meet our criterion in two geographic regions or at least exhibit a statistically significant iHS in two or three areas. Moreover, distributions along the genomic regions of the iHS at the SNP level are very similar in these geographic areas (e.g., see DOK1 and DOK2; supplementary fig. S1, Supplementary Material online). We carried out a complementary analysis using the software Sweep 1.1 (Sabeti et al. 2002), which indicates that the haplotypes for which we detect a selective sweep are shared between the three areas (see supplementary results, Supplementary Material online), which is consistent with a genome-wide scan performed by Pickrell et al. (2009). We considered whether there is an enrichment in genes with signals of positive selection among the insulin/TOR signal transduction pathway genes as compared with the set of 6,394 genomic regions that compose our genomic background. To that end, we performed a hypergeometric test in each geographic area separately (table 3). We observed that there is an overrepresentation of genes with the signature of positive selection in the insulin/TOR pathway genes in Middle East and North Africa (four genes under positive selection; P 5 0.0033), Europe (three genes; P 5 0.0167), and Central South Asia (three genes; P 5 0.0184). Influence of Network Structure on Positive Selection Having identified the genes involved in the insulin/TOR signal transduction pathway that likely evolved under positive selection (table 2), we investigated the relationship between the patterns of evolution of the studied genes and the position that their encoded products occupy in the network. For that purpose, we first evaluated whether the nine genomic regions showing footprints of positive selection differed from the 49 remaining ones in terms of centrality within the network and position along the upstream/downstream pathway axis. The position of genes along the pathway (defined as the number of steps required to transduce the signal from the insulin/IGF1 receptor—position 0—to the remaining elements in the pathway) does not significantly differ between both gene groups (Mann–Whitney’s U test, P 5 0.816; fig. 2A). Remarkably, genes evolving under positive selection encode proteins that have a significantly higher connectivity (P 5 0.0102; fig. 2B), closeness (P 5 0.0042; fig. 2C), and betweenness (P 5 0.0057; fig. 2D) within the insulin/TOR network. These results remain significant when all kinds of interactions (i.e., not only protein–protein interactions but also metabolic and transcriptional activation interactions) are considered (table 4). Thus, the impact of positive selection on the insulin/TOR pathway genes is related to the centralities of their encoded products in the interaction network, with genes acting at the center of the network being more likely to evolve under positive selection. MBE Network-Level and Population Genetics Analysis · doi:10.1093/molbev/msr298 Table 2. List of Genomic Regions with Footprints of Positive Selection. Paralogous group IRS Gene DOK1 DOK2 #SNPs 11 13 ZFa 96.2 68.1 Pb 2.914 3 10211 1.244 3 1025 q-valueb 2.245 3 1029*** 2.733 3 1024** iHS FST DDAF 11 13 10 101.0 93.9 63.1 4.342 3 10212 1.283 3 1029 2.310 3 1026 3.985 3 10210*** 8.428 3 1028*** 7.510 3 1025** EASIA FST DDAF 13 11 99.9 79.4 1.330 3 10210 2.009 3 1028 1.318 3 1028*** 1.016 3 1026*** SSAFR FST DDAF 49 41 162.7 146.1 4.375 3 1025 1.715 3 1025 7.544 3 1024* 4.80 3 1024* MENA iHS 45 213.5 4.815 3 10212 4.554 3 10210*** Geographic area MENAc Method iHS FST EURc EUR 3.677 3 10 8.294 3 1026 2.295 3 10231*** 2.271 3 1024** SSAFR FST DDAF 89 80 409.0 285.4 3.186 3 10220 4.178 3 1029 3.977 3 10217*** 6.361 3 1027*** MENAc iHS DDAF 88 83 419.3 241.1 6.75 3 10222 1.255 3 1024 1.831 3 10219*** 2.244 3 1023* DDAF 79 237.4 4.478 3 1025 8.471 3 1024* iHS FST DDAF 72 89 71 310.6 326.8 270.2 2.863 3 10 7.289 3 10211 5.026 3 10210 2.792 3 10212*** 1.011 3 1028*** 4.753 3 1028*** EASIA FST 89 276.9 2.886 3 1026 7.665 3 1025*** MENAc iHS FST DDAF 25 25 24 95.0 98.7 89.4 1.273 3 1024 4.874 3 1025 2.671 3 1024 2.089 3 1023 * 8.717 3 1024* 4.194 3 1023* EASIA FST 25 100.1 3.391 3 1025 6.12 3 1024* 25 S6K c FST 58 185.4 4.542 3 10 7.75 3 1024* EURc iHS FST 57 58 230.1 192.1 7.439 3 10210 1.136 3 1025 4.552 3 1028*** 2.445 3 1024** SSAFR FST DDAF 132 116 508.3 365.2 1.190 3 10217 5.331 3 1028 1.661 3 10214*** 4.497 3 1026*** MENAc iHS DDAF 117 116 749.4 369.6 2.803 3 10255 2.307 3 1028 5.832 3 10252*** 1.398 3 1026** EURc iHS DDAF 115 112 808.6 369.7 2.063 3 10265 3.029 3 1029 1.287 3 10261*** 2.224 3 1027*** CSASIA iHS 125 430.8 9.120 3 10212 6.779 3 10210*** PRKCI OCEc iHS FST DDAF 25 29 24 117.7 135.4 103.5 2.179 3 1027 3.956 3 1028 5.947 3 1026 8.343 3 1026*** 1.614 3 1026*** 1.360 3 1024** RPS6KB2 SSAFR FST DDAF 20 17 112.5 91.6 8.095 3 1029 3.471 3 1027 5.319 3 1027*** 2.167 3 1025*** iHS 16 83.6 1.730 3 1026 5.043 3 1025** PRKCH MENA CSASIA EASIA eIF-4E 214 SSAFR AKT3 PKC 2.879 3 1024** 234 1.382 3 10 357.4 170.3 CSASIA AKT1 160.2 44 49 c EUR PKB 46 iHS FST CSASIA DOK5 iHS 25 EIF4E EASIAc U 3 3 3 3 212 iHS FST DDAF iHS 15 20 15 17 116.8 100.8 68.8 118.5 3.546 3.746 7.002 2.780 10 1027 1025 10211 iHS FST DDAF 33 42 33 121.9 163.3 125.5 3.452 3 1025 5.059 3 1027 1.400 3 1025 2.766 1.780 1.388 1.059 3 3 3 3 10210*** 1025*** 1023* 1029*** 6.883 3 1024* 1.755 3 1025*** 3.055 3 1024** NOTE. N-Genes were considered to have evolved under positive selection in a given geographic area if q-value , 2.381 103 for the iHS test and for either the FST or DDAF tests (see Materials and Methods). Geographic areas for which any of these scores is significant are also displayed. SSAFR, Sub-Saharan Africa; MENA, Middle East and North Africa; EUR, Europe; CSASIA, Central South Asia; EASIA, East Asia; OCE, Oceania. a Fisher’s combination test statistic computed from the empirical P-values obtained for SNPs within each genomic region. b Significance associated to ZF. c Geographic areas for which the gene is considered to have evolved under positive selection. *q-value , 2.381 103 (significance at 5% with Bonferroni correction for seven regions studied with three methods). **q-value , 4.762 104 (significance at 1% with Bonferroni correction). ***q-value , 4.762 105 (significance at 1% with Bonferroni correction). 1385 MBE Luisi et al. · doi:10.1093/molbev/msr298 Table 3. Enrichment Analysis for Signals of Positive Selection among the Insulin/TOR Pathway Genes. Geographic Area SSAFR MENA EUR CSASIA EASIA AME OCE Allf Na 6389 6394 6384 6393 6365 6274 6308 6242 Sb 18 100 97 100 151 129 136 631 mc 65 65 65 65 65 65 65 65 Kd 0 4 3 3 1 0 1 9 Pe 0.168 0.003** 0.017* 0.018* 0.459 0.743 0.411 0.116 NOTE.—SSAFR, Sub-Saharan Africa; MENA, Middle East and North Africa; EUR, Europe; CSASIA, Central South Asia; EASIA, East Asia; AME, America; OCE, Oceania. a Genes in the genomic region-level background distribution for which a score has been computed. b Genes evolving under positive selection in the genomic region-level background distribution. c Genes in the pathway for which a score has been computed. d Genes evolving under positive selection in the pathway. e Statistical significance of the Hypergeometric test. f Union of all geographic regions. *P , 0.05; **P , 0.01. Some gene features such as expression level, expression breadth (the number of different tissues in which a gene is expressed), and the length of the encoded proteins can also affect the action of natural selection on genes (Duret and Mouchiroud 2000; Pal et al. 2001; Subramanian and Kumar 2004; Kosiol et al. 2008). Therefore, a putative correlation between these features and connectivity measures could potentially account for the observed trend. In order to rule out this possibility, we used partial correlation analysis to test for the association between positive selection (which we encoded as a binary variable) and centrality measures while controlling for the aforementioned variables simultaneously (see Hardy 1993; Cohen et al. 2003). Gene expression levels and breadths and the length of the encoded proteins were obtained from Alvarez-Ponce, Aguade et al. (2011). This analysis shows that the relationship between the patterns of positive selection and the centrality measures remains significant after controlling for these parameters (Spearman’s partial rank correlation coefficient, q 5 0.397, P 5 0.005 for connectivity; q 5 0.417, P 5 0.003 for closeness; and q 5 0.412, P 5 0.004 for betweenness). Therefore, the observed relationship between the pathway structure and the impact of positive selection is not attributable to confounding factors that influence gene evolution. 7.5 7.5 7.0 7.0 6.5 6.5 6.0 6.0 Connectivity Position 5.5 5.0 4.5 4.0 5.5 5.0 4.5 3.5 4.0 3.0 3.5 2.5 2.0 Mean Mean±SE Mean±1.96*SE NS 3.0 2.5 S 0.56 NS S Mean Mean±SE Mean±1.96*SE 0.34 0.32 0.54 0.30 0.52 0.28 0.26 0.24 0.48 Betweenness Closeness 0.50 0.46 0.44 0.18 0.16 0.42 0.14 0.40 0.12 0.38 0.36 0.22 0.20 Mean Mean±SE Mean±1.96*SE NS S 0.10 0.08 0.06 NS S Mean Mean±SE Mean±1.96*SE FIG. 2. Comparison between genes evolving under positive selection and the remaining ones. Mean and standard error (SE) of topological measures for the nine genes under positive selection (S) and the 49 genes without signals of adaptive evolution (NS): (A) for pathway position; (B) for connectivity; (C) for closeness; and (D) for betweenness. Centrality measures were calculated taking into account the physical interactions within the insulin/TOR pathway. 1386 MBE Network-Level and Population Genetics Analysis · doi:10.1093/molbev/msr298 Table 4. Relationship between the Structure of the Insulin/TOR Pathway and the Impact of Positive Selection. Mann–Whitney test Networkb — (i) (ii) (iii) Parameter Position Connectivity Closeness Betweenness Connectivity Closeness Betweenness Connectivity Closeness Betweenness Mean for genes under positive selection 4.667 5.778 0.500 0.244 6.889 0.536 0.188 35.444 0.291 0.0115 Mean for genes without signal of positive selection 4.673 3.612 0.410 0.116 4.571 0.455 0.098 38.087 0.294 0.006 U 209.5 103 91 95.5 116 101 101.5 186 179 153 P 0.816 0.010* 0.004** 0.006** 0.024* 0.009** 0.009** 0.646 0.538 0.228 Partial correlationa r 20.203 0.397 0.417 0.412 0.432 0.450 0.419 20.132 20.170 20.074 P 0.167 0.005** 0.003** 0.004** 0.002** 0.001** 0.003** 0.372 0.250 0.618 a Spearman’s partial correlation between the impact of positive selection and network parameters controlling for gene expression level and breadth and length of the encoded proteins (see Materials and Methods). b Three different sets of interactions were used for centrality calculations: (i) protein-protein interactions within the insulin/TOR pathway; (ii) all kinds of interactions within the pathway (i.e., protein-protein, metabolic, and transcriptional activation interactions); and (iii) protein-protein interactions from the whole interactome. *P, 0.05; **P, 0.01. Likewise, partial correlation analysis confirms the absence of association between the pathway position and the action of positive selection (q 5 0.203, P 5 0.167). We also considered the effect on our observations of the threshold used to contrast whether genes evolved under positive selection (supplementary table S7, Supplementary Material online). The connectivity, closeness, and betweenness within the insulin/TOR network remain significantly higher for genes that meet the criterion to invoke positive selection when using a significance level of 0.01. If we reduce this level to 0.001, the differences of centrality measures are still higher for the genes that exhibit the footprint of positive selection, even if most of these differences do not reach significance. This can be attributed to a lack of power since we detected signatures of selection for only five of the former nine genes while using such a low significance level. Nevertheless, these results strongly support the association between the impact of positive selection on genes and the central position they occupy in the insulin/TOR pathway. We additionally studied the differences in centrality between the nine genes evolving under positive selection and the nonselected ones taking into account the entire interactome (Bossi and Lehner 2009) rather than interactions among the insulin/TOR pathway components only. None of the three centrality measures presents significant differences between both gene groups (table 4). Equivalent results were obtained using partial correlation analysis to account for expression breadth, expression level, and protein length (table 4). Finally, we considered whether the insulin/TOR pathway proteins encoded by genes evolving under positive selection tend to interact to each other. Of 33 physical interactions (fig. 1), five connect two nodes containing one or more genes evolving under positive selection (X 5 5). Of 10,000 randomizations of the network, only 128 exhibit an X value higher or equal to the observed one using method 1 (P 5 0.0128). Therefore, genes evolving under positive selection in the pathway are more likely to encode proteins that interact to each other than expected in a random network. However, this tendency vanishes when a randomization technique that preserves the degree distribution of each node is applied (method 2; P 5 0.370), suggesting that the observed excess of interactions between proteins encoded by genes evolving under positive selection is the result of the higher degree of these proteins. We additionally performed a similar analysis considering a network in which nodes represent particular proteins, rather than groups of proteins encoded by paralogous genes. This network contains 58 nodes connected by 420 interactions, of which 21 connect two proteins encoded by genes with signals of positive selection. This value is again higher than expected in a random network using method 1 (X# 5 21; P 5 0.0001), but not using method 2 (P 5 0.325). Influence of Paralogy on Positive Selection We considered whether the probability that a gene evolved under positive selection is related to the number of paralogous copies. The number of paralogs involved in the insulin/TOR pathway (supplementary table S1, Supplementary Material online) does not significantly differ between the nine genomic regions with footprints of positive selection and the 49 remaining ones (Mann–Whitney’s U test, P 5 0.061). Consequently, the impact of positive selection on the insulin/TOR pathway genes seems to be independent of the size of the paralogous group. We contrasted whether genes in the same paralogous group are similarly targeted by natural selection by using a Monte Carlo method to test whether the number of pairs of paralogs where both genes show the signature of positive selection (Y 5 5) is higher than expected at random. We observed a greater or equal Y value for only 466 permutations of 10,000 (P 5 0.0466). Therefore, when a gene evolves under positive selection, the probability that its paralogs also show signatures of adaptation increases. 1387 Luisi et al. · doi:10.1093/molbev/msr298 Discussion High Incidence of Positive Selection in the Insulin/ TOR Signal Transduction Pathway Genes The present analysis aims at detecting recent adaptive events (within the last 5,000–45,000 years) in human populations worldwide at genes involved in the insulin/TOR signal transduction pathway and relating these events to the structure of the pathway. To detect footprints of positive selection, we used methods devised to identify long high-frequency haplotypes (iHS) and high levels of genetic differentiation among populations (FST and DDAF). However, processes such as genetic drift and demographic history can also affect differentiation among populations and linkage disequilibrium patterns, making difficult to identify signals of natural selection. In particular, dramatic changes in population size such as bottlenecks and population expansions occurred during the recent human history, affecting allele frequencies (Teshima et al. 2006). Moreover, populations from America and Oceania were isolated and had small effective population sizes, and consequently must have faced dramatic genetic drift, resulting in important allele frequency fluctuations that affected the entire genome. Since nonselective processes affect the whole genome, in opposition to positive selection, which acts on specific loci, the putative effect of these factors can be ruled out by comparing the locus of interest with an empirical distribution built from all SNPs in the genome. However, it should be noted that genomic regions potentially evolving under positive selection detected using this approach might also represent false positives (Teshima et al. 2006; Pickrell et al. 2009). Therefore, we used a stringent criterion to reduce the amount of false positives raised by alternative explanations and to maximize the detection of specific footprints of positive selection. For that purpose, we combined the results from different methods based on extended haplotype homozygosity and genetic differentiation, which are expected to be differentially affected by nonselective processes, thus providing nonoverlapping false negatives. However, despite these precautions, demographic factors can increase the variance among genes, and thus, the possibility of false positives cannot be completely discarded. The polymorphism data that we used is based on SNPs that were previously ascertained on a reduced number of HapMap samples from Africa, Europe, and East Asia. Therefore, the allele frequency spectrum observed is skewed toward high frequencies, to a different extent in the seven geographic regions. To reduce this variation among geographic regions, we removed rare variants by applying a filter according to the MAF. Moreover, the methods that we used, which are based on genetic differentiation and long-range haplotypes, reduce the effect of ascertainment bias on our results as they do not require low-frequency variants. Indeed, Romero et al. (2009) showed that this bias has a moderate effect to study population differentiation using SNP data from HGDP-CEPH; and Voight et al. (2006) claimed that 1388 MBE the iHS method is robust to regional variation in SNP ascertainment. Among the 65 studied genomic regions, we identified a total of nine that have potentially evolved under positive selection in human populations (table 2). The nine insulin/ TOR pathway genes that fall within these genomic regions belong to five paralogous groups: IRS (insulin receptor substrate), PKB (protein kinase B), PKC (protein kinase C), S6K (S6 kinase), and eIF-4E (eukaryotic translation initiation factor 4E). This relatively high incidence of adaptive evolution contrasts with a previous comparative genomics analysis showing little evidence of positive selection in the vertebrate insulin/TOR pathway (Alvarez-Ponce, Aguade et al. 2011). On the other hand, genes detected as evolving under positive selection in both analyses remarkably overlap. In vertebrates, footprints of positive selection were detected at AKT3, PRKCD, and IRS4 (the latter of which was excluded from the present analysis), which belong to the PKB, PKC, and IRS paralogous groups, respectively (Alvarez-Ponce, Aguade et al. 2011). Interestingly, our analyses revealed that AKT3 also shows a pattern of positive selection and that the PKC and IRS paralogous groups include two and three genes, respectively, with signatures of positive selection. Consistently, in the present study, we found that, in the insulin/TOR pathway, if a gene is targeted by recent positive selection, the frequency at which genes in the same paralogous group show signals of positive selection is higher than expected at random. This observation could potentially be explained by paralogous genes encoding proteins with similar structures and functions and occupying similar positions in the pathway. Taken together, these results suggest that positive selection repeatedly targets specific groups of paralogous genes within the insulin/TOR pathway. A close examination of the gene content of the regions that evolved under positive selection, together with an analysis of the levels of linkage disequilibrium among SNPs, show that some of them also contain other genes linked to the genes of interest that might be responsible for the observed signal of positive selection (supplementary fig. S4, Supplementary Material online). Two of the studied genomic regions (DOK1 and RPS6KB2) contain a high number of additional genes with a high level of linkage disequilibrium, and thus, the signals of selection detected in these regions cannot be clearly ascribed to the insulin/TOR pathway genes. However, excluding these two genomic regions from all our analyses does not change the observed relationship between the pathway structure and the impact of positive selection (supplementary table S8, Supplementary Material online). The distribution of footprints of positive selection among the different geographic areas shows that positive selection in the insulin/TOR pathway genes occurred mainly in Middle East and North Africa, Europe, and Central South Asia (table 3). In these regions, the proportion of genes evolving under positive selection in the insulin/TOR pathway is significantly higher than that observed in the genomic background, whereas in other geographic Network-Level and Population Genetics Analysis · doi:10.1093/molbev/msr298 regions, we observe a very low incidence of positive selection (table 3). The same tendency is observed if DOK1 and RPS6KB2 are excluded: the test remains significant in Middle East and North Africa, and it is marginally significant in Europe and in Central South Asia (supplementary table S9, Supplementary Material online). It is worth noting that East Asia, America, and Oceania are the regions in which we detected a highest incidence of positive selection across our genomic background containing 6,394 genomic regions. This suggests that the absence of signals of positive selection in the insulin/TOR pathway genes in these three geographic areas cannot be attributed to a difficulty of detecting selective events in these areas. More problematic is the case of Sub-Saharan Africa, where we detected very few genomic regions that evolved under positive selection across the genome (18 regions of 6,394). This could be due to the general low linkage disequilibrium observed in these populations. Indeed, in these populations 1) hot spots are more uniformly distributed along the genome (1000 Genomes Project Consortium 2010); and 2) historically, recombination has impacted the genome to a greater extent because they have a higher effective population size, and they did not face any strong bottleneck like other populations. In a previous study of the nucleotide diversity of the insulin minisatellite (INS VNTR) region in another set of populations from Sub-Saharan Africa, Europe, and Asia, a pattern similar to the one observed for the other insulin/TOR pathway genes was observed, that is, potential selective events in non-African populations (Stead et al. 2003). Moreover, we found that the haplotypes that have undergone a selective sweep are shared among populations from Middle East and North Africa, Europe, and Central South Asia (see supplementary results, Supplementary Material online). Interestingly, most of those potentially shared sweeps also show significant FST scores between Sub-Saharan Africa and the other regions (table 2), which can be the result of the high proportion of lower DAF values observed in Sub-Saharan Africa compared with the other areas (negative DDAF; results not shown). Therefore, those signals of genetic differentiation could also be a consequence of the shared signatures in non-African populations, rather than evidence for selection in Africans. This suggests that positive selection acted early during the migration out of Africa (Stead et al. 2003) and that shared evolutionary history (Gutenkunst et al. 2009; Moorjani et al. 2011) accounts for the similar patterns observed in these three areas. It has been observed that, in the United States, the prevalence of obesity and related diseases such as insulin resistance and type 2 diabetes is much lower in individuals with European and Asian ancestry (compared with Native American, Hispanic, Afro-American, and Pacific Islander groups) (Wang and Beydoun 2007; Anderson and Whitaker 2009). This observation underlines the genetic basis of these diseases, also supported by a number of heritability studies (Rankinen et al. 2006; Mathias et al. 2009; O’Rahilly 2009; Wang et al. 2009; Hinney et al. 2010). The implication of the insulin/TOR signal MBE transduction pathway is clear since dysfunction of its components often lead to insulin resistance, diabetes, obesity, and cancer (Oldham and Hafen 2003; LeRoith et al. 2004; Taguchi and White 2008). The enrichment of adaptation events at genes involved in the insulin/TOR pathway in European, Central South Asian, and North African and Middle Eastern populations could potentially be related to the differences among populations in the prevalence of these metabolic disorders, although a direct link to specific phenotypes cannot be established yet. Thus, further analysis of the insulin/TOR pathway genes evolving under positive selection might provide insight into the etiology of these metabolic disorders. Distribution of Positive Selection Across the Insulin/ TOR Pathway In the last decade, the availability of large-scale interaction data has allowed us to investigate the distribution of evolutionary forces across functional networks. A number of analyses have shown that there is a relationship between the structure of metabolic, regulatory, and protein–protein interaction networks and the patterns of molecular evolution of their components (for a review, see Cork and Purugganan 2004; Eanes 2011; Zera 2011). Most of these studies have focused on the levels of selective constraint acting on genes, which can be inferred from the nonsynonymous to synonymous divergence ratios, dN/dS, obtained from betweenspecies comparisons. Values of dN/dS near to one are expected for genes evolving neutrally, whereas values lower than one are indicative of purifying selection maintaining the sequence of the encoded proteins. Here, we considered whether the incidence of positive selection in human populations is related to the structure of the insulin/TOR pathway. It has been observed that the dN/dS ratios correlate with the position of genes along the upstream/downstream axis of the insulin/TOR signal transduction pathway, with downstream genes exhibiting lower dN/dS values both in Drosophila (Alvarez-Ponce et al. 2009; Alvarez-Ponce, Guirao-Rico et al. 2011) and vertebrates (Alvarez-Ponce, Aguade et al. 2011). In addition to purifying selection, the dN/dS ratios are also affected by the action of positive selection, and therefore, this pattern could potentially result from a higher incidence of positive selection in the upstream part of the pathway. However, in the present intraspecific analysis, we found no evidence for a polarity in the incidence of positive selection along the pathway (fig. 2A). These results contrast with a number of analyses showing that upstream genes in metabolic and transcription regulatory networks are generally subject to stronger natural selection (Rausher et al. 1999; Sharkey et al. 2005; Livingstone and Anderson 2009; Ramsay et al. 2009; Wright and Rausher 2010; Rhone et al. 2011) (but see Montanucci et al. 2011). Whereas genes whose encoded products occupy more central positions in a protein–protein interaction network tend to evolve under stronger purifying selection (Fraser et al. 2002; Hahn and Kern 2005; Lemos et al. 2005; Montanucci et al. 2011), positive selection seems to preferentially target genes acting at the periphery of the human 1389 MBE Luisi et al. · doi:10.1093/molbev/msr298 protein interaction network (Kim et al. 2007). In contrast, we found that adaptive events are more frequent at genes acting at the most central positions of the insulin/TOR pathway. Indeed, proteins encoded by genes evolving under positive selection show higher connectivity, betweenness, and closeness centralities (fig. 2B–D). These most highly connected and central proteins in the pathway could potentially exert a higher influence on its function. These contrasting results indicate that the relationship between positive selection and measures of centrality deserves further consideration. Moreover, the three centrality measures calculated for the insulin/TOR pathway are highly correlated (0.90 q 0.98). Therefore, the observed associations between the impact of positive selection and either connectivity, closeness, or betweenness are not independent and may reflect the same underlying factors. When centrality measures were calculated from the entire interactome (Bossi and Lehner 2009) rather than hand-curated protein–protein interaction data among insulin/TOR pathway proteins (fig. 1), no difference was detected among both gene groups. Therefore, intrapathway interactions seem to have a more relevant effect on the impact of positive selection than interactions between the insulin/TOR pathway genes and the other genes. However, it should be noted that high-throughput interaction data are subject to a high proportion of false positives and false negatives (Bader et al. 2004; Deeds et al. 2006), which might obscure interactome-level analyses. Finally, we found that genes evolving under positive selection tend to encode proteins that physically interact to each other. However, this trend vanishes when network randomization techniques that preserve the degree of each node are used. Therefore, the tendency is probably the result of genes with signatures of positive selection being involved in more interactions and hence being more likely to interact to each other. These contrasting results highlight the relevance of choosing an adequate null model for testing network properties. Further analyses of other pathways both at the interspecific and the intraspecific levels are warranted to establish whether the patterns observed in the insulin/TOR pathway are particular to this pathway or represent a general trend across the whole interactome. Conclusion We found that positive selection acted on nine of the studied genes involved in the human insulin/TOR signal transduction pathway, mostly in European, Central South Asian, and North African and Middle Eastern populations. The impact of positive selection on gene evolution depends on the position that the encoded products occupy in the pathway, with the most highly connected and central genes of the network being preferentially targeted by positive selection. These observations broaden current knowledge on how natural selection distributes across pathways within an intraspecific framework, more particularly considering short periods of population-specific adaptation. 1390 Supplementary Material Supplementary results and figures S1–S4 and tables S1–S9 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/). Acknowledgments Bioinformatics services were kindly provided by the Genomic Diversity node, Spanish Bioinformatic Institute. This work was funded by grants BFU2010-19443 (subprogram BMC) awarded by Ministerio de Educación y Ciencia (Spain) and the Direcció General de Recerca, Generalitat de Catalunya (Grup de Recerca Consolidat 2009 SGR 1101). P.L. is supported by a PhD fellowship from ‘‘Acción Estratégica de Salud, en el marco del Plan Nacional de Investigación Cientı́fica, Desarrollo e Innovación Tecnológica 2008–2011’’ from Instituto de Salud Carlos III. References Alvarez-Ponce D, Aguadé M, Rozas J. 2009. Network-level molecular evolutionary analysis of the insulin/TOR signal transduction pathway across 12 Drosophila genomes. Genome Res. 19:234–242. Alvarez-Ponce D, Aguadé M, Rozas J. 2011. Comparative genomics of the vertebrate insulin/TOR signal transduction pathway: a network-level analysis of selective pressures. Genome Biol Evol. 3:87–101. Alvarez-Ponce D, Guirao-Rico S, Orengo DJ, Segarra C, Rozas J, Aguadé M. 2011. Molecular population genetics of the insulin/ TOR signal transduction pathway: a network-level analysis in Drosophila melanogaster. Mol Biol Evol. 29:123–132. Anderson SE, Whitaker RC. 2009. Prevalence of obesity among US preschool children in different racial and ethnic groups. Arch Pediatr Adolesc Med. 163:344–348. Bader JS, Chaudhuri A, Rothberg JM, Chant J. 2004. Gaining confidence in high-throughput protein interaction networks. Nat Biotechnol. 22:78–85. Barreiro LB, Laval G, Quach H, Patin E, Quintana-Murci L. 2008. Natural selection has driven population differentiation in modern humans. Nat Genet. 40:340–345. Barton MD, Delneri D, Oliver SG, Rattray M, Bergman CM. 2010. Evolutionary systems biology of amino acid biosynthetic cost in yeast. PLoS One 5:e11935. Beaumont M, Nichols R. 1996. Evaluating loci for use in the genetic analysis of population structure. Proc R Soc Lond B Biol Sci. 263:1619–1626. Benjamini Y, Hochberg Y. 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc. 57:289–300. Bossi A, Lehner B. 2009. Tissue specificity and the human protein interaction network. Mol Syst Biol. 5:260. Cann HM, de Toma C, Cazes L, et al. 2002. (41 co-authors).A human genome diversity cell line panel. Science 296:261–262. Codoñer FM, Fares MA. 2008. Why should we care about molecular evolution? Evol Bioinform Online. 14:29–38. Cohen J, Cohen P, West S, Aiken L. 2003. Applied multiple regression/correlation analysis for the behavioral sciences. Mahwah (NJ): Lawrence Erlbaum Associates. Cork JM, Purugganan MD. 2004. The evolution of molecular genetic pathways and networks. Bioessays 26:479–484. Deeds EJ, Ashenberg O, Shakhnovich EI. 2006. A simple physical model for scaling in protein-protein interaction networks. Proc Natl Acad Sci U S A. 103:311–316. Network-Level and Population Genetics Analysis · doi:10.1093/molbev/msr298 Diamond J. 2003. The double puzzle of diabetes. Nature 423: 599–602. Diamond J. 2011. Medicine: diabetes in India. Nature 469:478–479. Duret L, Mouchiroud D. 2000. Determinants of substitution rates in mammalian genes: expression pattern affects selection intensity but not mutation rate. Mol Biol Evol. 17:68–74. Eanes WF. 2011. Molecular population genetics and selection in the glycolytic pathway. J Exp Biol. 214:165–171. Fares MA, Ruiz-González MX, Labrador JP. 2011. Protein coadaptation and the design of novel approaches to identify protein-protein interactions. IUBMB Life. 63:264–471. Flowers JM, Sezgin E, Kumagai S, Duvernell DD, Matzkin LM, Schmidt PS, Eanes WF. 2007. Adaptive evolution of metabolic pathways in Drosophila. Mol Biol Evol. 24:1347–1354. Fraser HB, Hirsh AE, Steinmetz LM, Scharfe C, Feldman MW. 2002. Evolutionary rate in the protein interaction network. Science 296:750–752. Fuss B, Becker T, Zinke I, Hoch M. 2006. The cytohesin Steppke is essential for insulin signalling in Drosophila. Nature 444:945–948. Gardner M, Bertranpetit J, Comas D. 2008. Worldwide genetic variation in dopamine and serotonin pathway genes: implications for association studies. Am J Med Genet B Neuropsychiatr Genet. 147B:1070–1075. Gonzalez-Neira A, Ke X, Lao O, et al. (14 co-authors). 2006. The portability of tagSNPs across populations: a worldwide survey. Genome Res. 16:323–330. Grossman SR, Shylakhter I, Karlsson EK, et al. (13 co-authors). 2010. A composite of multiple signals distinguishes causal variants in regions of positive selection. Science 327:883–886. Gutenkunst RN, Hernandez RD, Williamson SH, Bustamante CD. 2009. Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet. 5:e1000695. Hafner M, Schmitz A, Grune I, Srivatsan SG, Paul B, Kolanus W, Quast T, Kremmer E, Bauer I, Famulok M. 2006. Inhibition of cytohesins by SecinH3 leads to hepatic insulin resistance. Nature 444:941–944. Hahn MW, Kern AD. 2005. Comparative genomics of centrality and essentiality in three eukaryotic protein-interaction networks. Mol Biol Evol. 22:803–806. Hardy M. 1993. Regression with dummy variables. London: Sage Publications. Hernandez G, Altmann M, Sierra JM, Urlaub H, Diez del Corral R, Schwartz P, Rivera-Pomar R. 2005. Functional analysis of seven genes encoding eight translation initiation factor 4E (eIF4E) isoforms in Drosophila. Mech Dev. 122:529–543. Hinney A, Vogel CI, Hebebrand J. 2010. From monogenic to polygenic obesity: recent advances. Eur Child Adolesc Psychiatry. 19:297–310. International HapMap Consortium. 2007. A second generation human haplotype map of over 3.1 million SNPs. Nature 449:851–861. Jakobsson M, Scholz SW, Scheet P, et al. (24 co-authors). 2008. Genotype, haplotype and copy-number variation in worldwide human populations. Nature 451:998–1003. Joshi B, Cameron A, Jagus R. 2004. Characterization of mammalian eIF4E-family members. Eur J Biochem. 271: 2189–2203. Jost L. 2008. G(ST) and its relatives do not measure differentiation. Mol Ecol. 17:4015–4026. Karolchik D, Kuhn RM, Baertsch R, et al. (25 co-authors). 2008. The UCSC Genome Browser Database: 2008 update. Nucleic Acids Res. 36:D773–D779. Kelley JL, Madeoy J, Calhoun JC, Swanson W, Akey JM. 2006. Genomic signatures of positive selection in humans and the limits of outlier approaches. Genome Res. 16:980–989. MBE Kim PM, Korbel JO, Gerstein MB. 2007. Positive selection at the protein network periphery: evaluation in terms of structural constraints and cellular context. Proc Natl Acad Sci U S A. 104:20274–20279. King H, Aubert RE, Herman WH. 1998. Global burden of diabetes, 1995-2025: prevalence, numerical estimates, and projections. Diabetes Care 21:1414–1431. Kirkness EF, Haas BJ, Sun W, et al. (71 co-authors). 2010. Genome sequences of the human body louse and its primary endosymbiont provide insights into the permanent parasitic lifestyle. Proc Natl Acad Sci U S A. 107:12168–12173. Kosiol C, Vinar T, da Fonseca RR, Hubisz MJ, Bustamante CD, Nielsen R, Siepel A. 2008. Patterns of positive selection in six Mammalian genomes. PLoS Genet. 4:e1000144. Lemos B, Bettencourt BR, Meiklejohn CD, Hartl DL. 2005. Evolution of proteins and gene expression levels are coupled in Drosophila and are independently associated with mRNA abundance, protein length, and number of protein-protein interactions. Mol Biol Evol. 22:1345–1354. Leopoldt D, Hanck T, Exner T, Maier U, Wetzker R, Nurnberg B. 1998. Gbetagamma stimulates phosphoinositide 3-kinasegamma by direct interaction with two domains of the catalytic p110 subunit. J Biol Chem. 273:7024–7029. LeRoith D, Taylor S, Olefsky J. 2004. Diabetes mellitus: a fundamental and clinical text. Philadelphia (PA): Lippincott. Lewontin R, Krakauer J. 1973. Distribution of gene frequency as a test of the theory of the selective neutrality of polymorphisms. Genetics 74:175–195. Li JZ, Absher DM, Tang H, et al. (11 co-authors). 2008. Worldwide human relationships inferred from genome-wide patterns of variation. Science 319:1100–1104. Livingstone K, Anderson S. 2009. Patterns of variation in the evolution of carotenoid biosynthetic pathway enzymes of higher plants. J Hered. 100:754–761. Loewe L. 2009. A framework for evolutionary systems biology. BMC Syst Biol. 3:27. Mathias RA, Deepa M, Deepa R, Wilson AF, Mohan V. 2009. Heritability of quantitative traits associated with type 2 diabetes mellitus in large multiplex families from South India. Metabolism 58:1439–1445. Montanucci L, Laayouni H, Dall’Olio GM, Bertranpetit J. 2011. Molecular evolution and network-level analysis of the Nglycosylation metabolic pathway across primates. Mol Biol Evol. 28:813–823. Moorjani P, Patterson N, Hirschhorn JN, Keinan A, Hao L, Atzmon G, Burns E, Ostrer H, Price AL, Reich D. 2011. The history of African gene flow into Southern Europeans, Levantines, and Jews. PLoS Genet. 7:e1001373. Oldham S, Hafen E. 2003. Insulin/IGF and target of rapamycin signaling: a TOR de force in growth control. Trends Cell Biol. 13:79–85. 1000 Genomes Project Consortium. 2010. A map of human genome variation from population-scale sequencing. Nature 467: 1061–1073. O’Rahilly S. 2009. Human genetics illuminates the paths to metabolic disease. Nature 462:307–314. Pal C, Papp B, Hurst LD. 2001. Highly expressed genes in yeast evolve slowly. Genetics 158:927–931. Peng G, Luo L, Siu H, et al. (12 co-authors). 2009. Gene and pathwaybased second-wave analysis of genome-wide association studies. Eur J Hum Genet. 18:111–117. Pickrell JK, Coop G, Novembre J, et al. (11 co-authors). 2009. Signals of recent positive selection in a worldwide sample of human populations. Genome Res. 19:826–837. Ramsay H, Rieseberg LH, Ritland K. 2009. The correlation of evolutionary rate with pathway position in plant terpenoid biosynthesis. Mol Biol Evol. 26:1045–1053. 1391 Luisi et al. · doi:10.1093/molbev/msr298 Rankinen T, Zuberi A, Chagnon YC, Weisnagel SJ, Argyropoulos G, Walts B, Perusse L, Bouchard C. 2006. The human obesity gene map: the 2005 update. Obesity (Silver Spring) 14:529–644. Rausher MD, Miller RE, Tiffin P. 1999. Patterns of evolutionary rate variation among genes of the anthocyanin biosynthetic pathway. Mol Biol Evol. 16:266–274. Reiner AP, Carlson CS, Ziv E, Iribarren C, Jaquish CE, Nickerson DA. 2007. Genetic ancestry, population substructure, and cardiovascular disease-related traits among African-American participants in the CARDIA Study. Hum Genet. 121:565–575. Rhone B, Brandenburg JT, Austerlitz F. 2011. Impact of selection on genes involved in regulatory network: a modelling study. J Evol Biol. 24:2087–2098. Riley RM, Jin W, Gibson G. 2003. Contrasting selection pressures on components of the Ras-mediated signal transduction pathway in Drosophila. Mol Ecol. 12:1315–1323. Romero IG, Manica A, Goudet J, Handley LL, Balloux F. 2009. How accurate is the current picture of human genetic variation? Heredity 102:120–126. Rosenberg NA. 2006. Standardized subsets of the HGDP-CEPH Human Genome Diversity Cell Line Panel, accounting for atypical and duplicated samples and pairs of close relatives. Ann Hum Genet. 70:841–847. Sabeti PC, Reich DE, Higgins JM, et al. (17 co-authors). 2002. Detecting recent positive selection in the human genome from haplotype structure. Nature 419:832–837. Scheet P, Stephens M. 2006. A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet. 78:629–644. Sharkey TD, Yeh S, Wiberley AE, Falbel TG, Gong D, Fernandez DE. 2005. Evolution of the isoprene biosynthetic pathway in kudzu. Plant Physiol. 137:700–712. Smoot ME, Ono K, Ruscheinski J, Wang PL, Ideker T. 2011. Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics 27:431–432. Stajich JE, Block D, Boulez K, et al. (12 co-authors). 2002. The Bioperl toolkit: Perl modules for the life sciences. Genome Res. 12:1611–1618. Stead JD, Hurles ME, Jeffreys AJ. 2003. Global haplotype diversity in the human insulin gene region. Genome Res. 13:2101–2111. Stephens LR, Eguinoa A, Erdjument-Bromage H, et al. (11 co-authors). 1997. The G beta gamma sensitivity of a PI3K is dependent upon a tightly associated adaptor, p101. Cell 89:105–114. 1392 MBE Subramanian S, Kumar S. 2004. Gene expression intensity shapes evolutionary rates of the proteins encoded by the vertebrate genome. Genetics 168:373–381. Taguchi A, White MF. 2008. Insulin-like signaling, nutrient homeostasis, and life span. Annu Rev Physiol. 70:191–212. Teshima KM, Coop G, Przeworski M. 2006. How reliable are empirical genomic scans for selective sweeps? Genome Res. 16:702–712. van Vliet-Ostaptchouk JV, Hofker MH, van der Schouw YT, Wijmenga C, Onland-Moret NC. 2009. Genetic variation in the hypothalamic pathways and its role on obesity. Obes Rev. 10:593–609. Vinciguerra M, Foti M. 2006. PTEN and SHIP2 phosphoinositide phosphatases as negative regulators of insulin signalling. Arch Physiol Biochem. 112:89–104. Voight BF, Kudaravalli S, Wen X, Pritchard JK. 2006. A map of recent positive selection in the human genome. PLoS Biol. 4:e72. Wang X, Ding X, Su S, Spector TD, Mangino M, Iliadou A, Snieder H. 2009. Heritability of insulin sensitivity and lipid profile depend on BMI: evidence for gene-obesity interaction. Diabetologia 52:2578–2584. Wang Y, Beydoun MA. 2007. The obesity epidemic in the United States—gender, age, socioeconomic, racial/ethnic, and geographic characteristics: a systematic review and meta-regression analysis. Epidemiol Rev. 29:6–28. Weir BS, Hill WG. 2002. Estimating F-statistics. Annu Rev Genet. 36:721–750. Wolf YI, Gopich IV, Lipman DJ, Koonin EV. 2010. Relative contributions of intrinsic structural-functional constraints and translation rate to the evolution of protein-coding genes. Genome Biol Evol. 2:190–199. Wright KM, Rausher MD. 2010. The evolution of control and distribution of adaptive mutations in a metabolic pathway. Genetics 184:483–502. Yamada T, Bork P. 2009. Evolution of biomolecular networks: lessons from metabolic and protein interactions. Nat Rev Mol Cell Biol. 10:791–803. Zaykin DV, Zhivotovsky LA, Czika W, Shao S, Russell D. 2008. Combining p-values in large scale genomics experiments. Pharm Stat. 6:217–226. Zera AJ. 2011. Microevolution of intermediary metabolism: evolutionary genetics meets metabolic biochemistry. J Exp Biol. 214:179–190. Zhang B, Roth RA. 1992. The insulin receptor-related receptor. Tissue expression, ligand binding specificity, and signaling capabilities. J Biol Chem. 267:18320–18328.
© Copyright 2026 Paperzz