Amodelforthegeneticarchitectureofquantitativetraitsunder stabilizingselection YuvalB.Simonsa,1,KevinBullaugheyb,RichardR.Hudsonb,andGuySellaa,1 aDepartmentofBiologicalSciences,ColumbiaUniversity,NewYork,NY10027; bDepartmentofEcology&Evolution,UniversityofChicago,Chicago,Illinois60637; 1Towhomcorrespondenceshouldbeaddressed:[email protected], [email protected] Abstract Genome-wideassociationstudies(GWAS)inhumansarerevealingthegeneticarchitecture ofbiomedical,lifehistoryandanthropomorphictraits,i.e.,thefrequenciesandeffectsizes of variants contributing to heritable variation in a trait. To interpret these findings, we need to understand how genetic architecture is shaped by basic population genetics processes—notably, by mutation, natural selection and genetic drift. Because many quantitativetraitsaresubjecttostabilizingselectionandgeneticvariationthataffectsone traitoftenaffectsmanyothers,wemodelthegeneticarchitectureofafocaltraitthatarises under stabilizing selection in a multi-dimensional trait space. We solve the model for the phenotypicdistributionandallelicdynamicsatsteadystateandderiverobust,closedform solutionsforsummariesofgeneticarchitecture.Ourresultssuggestthatthedistributionof genetic variance among the loci discovered in GWAS take a simple form that depends on one evolutionary parameter, and provide a simple interpretation for missing heritability and why it varies among traits. We test our predictions against the results of GWAS for heightandbodymassindex(BMI)andfindthattheyfitthedatawell,allowingustomake inferencesaboutthedegreeofpleiotropyandthemutationaltargetsize.Ourfindingshelp tounderstandwhyGWASforheightexplainmoreoftheheritablevariancethansimilarlysizedGWASforBMI,andtopredicthowfutureincreasesinsamplesizewilltranslateinto explainedheritability. 1 Much of the phenotypic variation in human populations, including variation in morphological, life history and biomedical traits, is “quantitative”, meaning that heritable variation in the trait is largely due to small contributions from many genetic variants segregatinginthepopulation(1,2).Quantitativetraitshavebeenstudiedsincethebirthof biometrics over a century ago (1-3), but only in the past decades have technological advancesmadeitpossibletosystematicallydissecttheirgeneticbasis(4-6).Notably,since 2007,genome-wideassociationstudies(GWAS)inhumanshaveledtotheidentificationof thousands of variants reproducibly associated with hundreds of quantitative traits, includingsusceptibilitytoawidevarietyofdiseases(4).Whilestillongoing,thesestudies alreadyprovideimportantinsightsintothegeneticarchitectureofquantitativetraits,i.e., the number of variants that contribute to heritable variation and their frequencies and effectsizes. Perhaps the most striking observation to emerge from these studies is that, despite the large sample size of many GWAS, all variants significantly associated with any given trait typicallyaccountforless(oftenmuchless)than25%ofthenarrowsenseheritability(4,7, 8) (but see (9)). (Henceforth, we use “heritability” to refer to narrow sense heritability). Whilemanyfactorshavebeenhypothesizedtocontributetothe“missingheritability”(7,8, 10-14),themoststraightforwardexplanationandtheemergingconsensusisthatmuchof the heritable variation derives from variants with frequencies that are too low or effect sizesthataretoosmallforcurrentstudiestodetect.Comparisonsamongtraitsalsosuggest that there are substantial differences in architectures. For example, recent meta-analyses GWASuncoveredseventimesasmanyvariantsforheight(697)thanforbodymassindex (97), and together the variants for height account for more than four times the heritable varianceforbodymassindex(~20%vs.~3-5%,respectively),despitethesmallersample sizeoftheGWAS(250kcomparedto340k,respectively)(15,16). These first glimpses underscore the need for theory that explains how genetic architecturesareshapedinthecourseofevolutionandwhytheymightdifferamongtraits. Such theory should allow us to use GWAS findings to make inferences about underlying evolutionary parameters, helping to answer enduring questions about the processes that maintainphenotypicvariationinquantitativetraits(5,17).Fromapracticalperspective,it 2 willhelptointerpretGWASfindingsandspecificallyitmayinformsolutionstothe“missing heritability” problem (18-21). In so doing, it will inform the design of future mapping studiesandphenotypicpredictions(18,20-22). Development of such theory can be guided by empirical observations and first-principles considerations. First, we know that the architecture of a trait arises from genetic and populationgeneticprocesses.Newmutationsaffectingatraitariseataratethatdepends onits“mutationaltargetsize”(i.e.,thenumberofsitesatwhichamutationwouldaffectthe trait).Oncetheyarise,thetrajectoriesofvariantsthroughthepopulationaredetermined bytheinterplaybetweengeneticdrift,demographicprocesses,andnaturalselectionacting on them. These processes determine the number and frequencies of segregating variants underlying variation in the trait. The genetic architecture further depends on the relationship between the selection on variants and their effects on the trait. Notably, selection on variants depends not only on their effect on the focal trait but also on their pleiotropic effects on other traits. We would therefore expect both direct and pleiotropic selectiontoshapethejointdistributionofallelefrequenciesandeffectsizes. Multiple lines of evidence suggest that many quantitative traits are subject to stabilizing selection, i.e., selection favoring an intermediate trait value (5, 23-27). For instance, a declineinfitnesscomponents(e.g.,viabilityandfecundity)isobservedwithdisplacement frommeanvaluesforavarietyoftraitsinhumanpopulations(28-30),inotherspeciesin the wild (31, 32) and in experimental manipulations (31, 33). While less is known about theselectionactingoncomplexdiseases,somelikelyreflectlargedeviationsfromoptimal valuesofunderlyingtraitsunderstabilizingselection.Whatremainsunclearistheextent towhichstabilizingselectionisactingdirectlyonvariationinagiventraitorresultsfrom pleiotropiceffectsofthisvariationonothertraits. Yet other lines of evidence suggest that pleiotropy is pervasive. For one, theoretical considerations about the variance in fitness in natural populations and its accompanying geneticloadsuggestthatonlyamoderatenumberofindependenttraitscanbeeffectively selected on at the same time (34). Thus, the aforementioned relationships between the value of a focal trait and fitness are likely heavily affected by the pleiotropic effects of genetic variation on other traits (26, 34-36). Second, many of the variants detected in 3 human GWAS have been found to be associated with more than one trait (37-41). For example,arecentanalysisofGWASrevealedthatvariantsthatdelaytheageofmenarchein women tend to delay the age of voice drop in men, decrease body mass index, increase adultheight,anddecreaseriskofmalepatternbaldness(37).Moregenerally,theextentof pleiotropy revealed by GWAS appears to be increasing rapidly with improvements in the powerandmethodology(37,42-44).Suchconsiderationspointtothepotentialimportance ofpleiotropicselectiononquantitativegeneticvariationformanytraits. Moreover, the discoveries emerging from human GWAS suggest that genetic variance is predominated by additive contributions from numerous variants with small effect sizes. Specifically, most or all of the heritability explained in GWAS of many traits derives from variants with squared effect sizes that are substantially smaller than the total genetic variance(e.g.(15,16,45,46)).Moreover,statisticalquantificationsofthegeneticvariance tagged by genotyping suggest that such variants may account for the bulk of heritable variance in many traits (9, 47, 48). Other analyses of GWAS results suggest that nonadditive interactions, e.g., dominance and epistasis, have only minor contributions to heritable variance, consistent with theory and other lines of evidence (49-55). Notably, considerable efforts to detect epistatic interactions in human GWAS have, by and large, cameupempty-handed(9,56,57),withfewcounter-examplesmostlyinvolvingvariantsin theMHCregion(52,56,58,59).Motivatedbytheseconsiderations,wemodelhowdirect and pleiotropic stabilizing selection shape the genetic architecture of continuous, quantitative traits by considering additive variants of small effects and assuming that togethertheyaccountformostoftheheritablevariance. There has been relatively little theoretical work aimed at relating population genetics processeswiththeresultsemergingfromGWAS.Moreover,thefewexistingmodelshave reached divergent predictions about genetic architecture, largely because of making differentassumptionsabouttheeffectsofpleiotropy.Pritchard(20)consideredthe“purely pleiotropic” extreme, in which selection on variants is independent of their effect on the traitbeingconsidered(hefocusedondiseasesusceptibility).Inthiscase,wewouldexpect the largest contribution to genetic variance in a trait to come from mutations that have largeeffectsizesbutarealsoweaklyselectedorneutral,allowingthemtoascendtohigher 4 frequenciesthanothermutations.Otherstudieshaveconsideredtheoppositeextreme,in which selection on variants stems entirely from their effect on the trait under consideration (27, 60-64), and have shown that the greatest contribution to genetic variancearisesfromstronglyselectedmutations(61,62)(wereturntothiscasebelow). Inpractice,wewouldexpectmosttraitstofallsomewhereinbetweenthesetwoextremes. Whiletherearecompellingreasonstobelievethatquantitativegeneticvariationishighly pleiotropic,theeffectsofvariantsondifferenttraitsarelikelytobecorrelated.Ifso,evenif agiventraitisnotsubjecttoselection,variantsthathavealargeeffectonitwillalsotend tohavelargereffectsontraitsthatareunderselection(e.g.,bycausingalargeperturbation to pathways that affect multiple traits) (36). Motivated by such considerations, EyreWalker (2010) and Caballero et al. (2015) considered models in which the correlation between the strength of selection on an allele and its effect size can vary between the purelypleiotropicanddirectselectionextremes(alsocf.65).Thesemodelsdivergeintheir predictions about architecture, however. Assuming, as seems plausible, an intermediate correlationbetweenthestrengthofselectionandeffectsize,Eyre-Walkerfindsthatgenetic variance should be dominated by strongly selected mutations, whereas Caballero et al. conclude the greatest contribution should arise from weakly selected ones (19). Their conclusionsdifferbecauseofhowtheychosetomodeltherelationshipbetweenselection and effect size, a choice based largely on mathematical convenience. We approach this problem by explicitly modeling stabilizing selection on multiple traits, thereby learning, ratherthanassuming,therelationshipbetweenselectionandeffectsizes. TheModel We model an individual’s phenotype as a vector in an n-dimensional Euclidian space, in whicheachdimensioncorrespondstoanadditive,continuous,quantitativetrait.Wefocus onthearchitectureofoneofthesetraits(say,the1stdimension),wherethetotalnumberof traits parameterizes pleiotropy. Fitness is assumed to decline with distance from the optimal phenotype positioned at the origin, thereby introducing stabilizing selection into themodel.Specifically,weassumethatabsolutefitnesstakestheform 5 W 𝑟 = exp − /0 12 0 , (1) where𝑟is the (n-dimensional) phenotype,𝑟 = 𝑟 is the distance from the origin and w parameterizes the strength of stabilizing selection. However, we later show that the specificformofthefitnessfunctiondoesn’tmatter.Moreover,theadditiveenvironmental contribution to the phenotype can be absorbed into w (see (66) and SI Section 1.1); we thereforeconsideronlythegeneticcontribution. Wefurtherassumethestandardpopulationdynamicforadiploid,panmicticpopulationof constantsizeN,includingWright-Fishersamplingwithviabilityselection(asdescribedby Eq.(1)),mutation,freerecombinationandMendeliansegregation.Mutationsaffectingthe phenotypeareintroducedaccordingtoaninfinitesitesmodel.Thenumberpergamete,per generation follows a Poisson distribution with mean U, reflecting both the mutational targetsizeandthemutationratepersite.Thesizeofmutationsinthen-dimensionaltrait space, 𝑎1 = 𝑎 1 ,isdrawnfromsomedistribution,assumingonlythat𝑎1 ≪ 𝑤 1 .Welater show that this requirement is equivalent to the standard assumption about selection coefficients satisfying𝑠 ≪ 1(also cf. Supplementary Section 4.3). The directions of mutationsareassumedtobeuniformly(isotropically)distributed,althoughwelatershow thatourresultsarerobusttorelaxingthisassumption.Anindividual’sphenotypecanthen beobtainedaddingupthephenotypiceffectsofhisorheralleles,i.e., 𝑟= 8 1 ; :<8 𝑔: 𝑎: , (2) where L is the number of sites with fixed or segregating mutations,𝑎: is the phenotypic differencebetweenhomozygotesfortheancestralandderivedallelesatsitel(becauseof theinfinitesitesassumption,thereareatmosttwoalleles)and𝑔: = 0, 1or2isthenumber ofderivedallelesatsitel. Results Thephenotypicdistribution.Inthefirstthreesections,wedevelopthetoolsthatwelater use to study genetic architecture. We start by considering the equilibrium distribution of phenotypesinthepopulationandgeneralizepreviousresultsforthecasewithasingletrait 6 (27, 60, 61, 64). Under biologically sensible conditions on the rate and size of mutations (i.e., when 1 ≫ 𝑈 ≫ 1/2𝑁 and 𝑎1 ≪ 𝑤 1 ; SI Section 4), this distribution is well approximated by a tight multivariate normal centered at the optimum, namely by the probabilitydensity: f 𝑟 = 8 F 1DE 0 0 exp − /0 1E 0 (3) wherethevarianceofthedistributionsatisfies𝜎 1 ≪ 𝑤 1 (seeSISection4).Intuitively,the phenotypic distribution is normal because it derives from additive and (approximately) i.i.d.contributionsfrommanysegregatingsites.Thedistributionstaystightlyconcentrated around the optimum because stabilizing selection becomes stronger with increasing displacement from it, and because the population responds to such selection by minor changestoallelefrequenciesatmanysegregatingsites,rapidlyoffsettingthedisplacement. With phenotypes close to the optimum, only the curvature of the fitness function at the optimum (i.e., the multi-dimensional second derivative) affects the selection acting on individuals.Inaddition,itisalwayspossibletochooseanorthonormalcoordinatesystem centered at the optimum, in which the trait under consideration varies along the first coordinateandaunitchangeinothertraits(inothercoordinates)neartheoptimumhave the same effect on fitness. These considerations suggest that the equilibrium behavior is insensitivetoourchoiceoffitnessfunction.Moreover,intheSI(Section5),weshowthat perturbations of the population mean from the optimum are rapidly offset by minor changestoallelefrequenciesatnumeroussites,andthatthisbehaviorlendsrobustnessto theequilibriumdynamicswithrespecttothepresenceofmajorloci,changesintheoptimal phenotypeovertime,andmoderateasymmetriesinthemutationaldistribution. Allelic dynamic. Next, we consider the dynamics at a segregating site, and generalize previousresultsforthecasewithasingletrait(62-64).Thedynamicscanbedescribedin terms of the first two moments of change in allele frequency in a single generation (cf. (67)). For an allele with phenotypic effect𝑎and frequency q, the distribution of contributions to the phenotype from the genetic background,𝑅, follows from the 7 equilibrium phenotypic distribution (Eq. 3) and is well approximated by probability density 8 f 𝑅 𝑎, 𝑞 = 1DE 0 exp − F/0 KLMN 1E 0 0 . (4) By averaging the fitness of the three genotypes over the distribution of genetic backgrounds,wefindthatthefirstmomentiswellapproximatedby E Δ𝑞 ≈ N0 S2 0 𝑝𝑞 𝑞 − 8 1 , (5) assumingthat𝑎1 and𝜎 1 ≪ 𝑤 1 (cf.SISection4).Bythesametoken,wefindthat V Δ𝑞 ≈ VM 1W , (6) whichisthestandardsecondmomentwithgeneticdrift. The functional form of the first moment is equivalent to that of the standard viability selectionmodelwithunder-dominance.Thisresultisahallmarkofstabilizingselectionon (additive) quantitative traits: with the population mean at the optimum, the dynamics at differentsitesaredecoupledandselectionatagivensitereducesthephenotypicvariance (i.e., ½a2pq), thereby pushing rare alleles to loss. Comparison with the standard viability selection model shows that the selection coefficient in our model is s=a2/4w2, or S=2Ns=Na2/2w2 in scaled units. In other words, the selection acting on an allele is proportionaltoitssize-squaredinthen-dimensionaltraitspace(wherewtranslateseffect sizeintounitsoffitness). Therelationshipbetweenselectionandeffectsize.Thestatisticalrelationshipbetween thestrengthofselectionactingonmutationsandtheireffectonagiventraitfollowsfrom theaforementionedgeometricinterpretationofselection.Specifically,allmutationswitha given selection coefficient, s, lie on a(𝑛 − 1)– dimensional hypersphere with radius𝑎 = 2𝑤 𝑠,andanygivenmutationsatisfies 𝑠= 8 S2 0 𝑎1 = 8 S2 0 \ 1 [<8 𝑎[ , (7) 8 where𝑎[ is the allele’s effect on the i-th trait (Fig. 1A). Our assumption that mutation is isotropic then implies that the probability density of mutations on the hypersphere is uniform. (b) Trait2 (n-1)-dimensional hypersphere ! a (n-2)-dimensional cross-section Probability density (a) a1 Trait1 (focaltrait) a Tr 3 it 0.4 n=1 n=4 n=10 n→∞ 0.3 0.2 0.1 Effect size Hin units of 0.0 -4 -2 0 2 a ën 2 L 4 Fig. 1. The distribution of effect sizes corresponding to a given selection coefficient. (a) Mutations with selection coefficient, s, lie on a(𝑛 − 1)– dimensional hypersphere with radius𝑎 = 2𝑤 𝑠;andtheprobabilityofsuchmutationswitheffectsizea1isproportional to the volume of the(𝑛 − 2)– dimensional cross section of that hyper-sphere with projection𝑎8 . (b) The distribution of effect sizes corresponding to a given selection coefficient,measuredinunitsofthedistribution’sstandarddeviation. The distribution of effect sizes on a focal trait, a1, corresponding to a given selection coefficient,s,follows.Giventhatmutationissymmetricinanygiventrait,𝐸 𝑎8 𝑠 = 0,and giventhatitissymmetricamongtraits, E 𝑎81 𝑠 = 𝑎1 𝑛 = 2𝑤 1 𝑛 𝑠. (8) Moregenerally,theprobabilitydensitycorrespondingtoaneffectsizea1isproportionalto thevolumeofthe(𝑛 − 2)–dimensionalcrosssectionofthathyper-spherewithprojection a1(Fig.1A).Forasingletrait,thisimpliesthat𝑎8 = ±𝑎withprobability½,andfor𝑛 > 1,it impliestheprobabilitydensity φ\ 𝑎8 𝑎 = b \ 1 /b (\c8) 1 \/1 8 1D N 0 \ 1− 8 Nd0 \ N0 \ Fef 0 (9) (cf. SI Section 1.2). Intriguingly, when the number of traits n increases, this density approachesanormaldistribution,i.e., 9 Nd N 0 /\ ~N(0,1). (10) Thislimitisalreadywellapproximatedforamoderatenumberoftraits(e.g.,n=10;Fig.1B). The limit behavior also holds when we relax the assumption of isotropic mutation. This generalizationisanimportantone,becausehavingchosenaparameterizationoftraitsin whichthefitnessfunctionneartheoptimumisisotropic,wecannolongerassumethatthe distributionofmutationsisalsoisotropic(68).Specifically,mutationsmighttendtohave larger effects on some traits than on others and their effects on different traits might be correlated.InSISection5,weshow,however,thatexcludingpathologicalcases,thelimit distribution (Eq. 10) also holds for anisotropic mutation. To this end, we introduce the conceptofaneffectivenumberoftraits,whichcantakeanyrealvalue≥1andisdefinedas thenumberoftraits,withexpectedfitnesseffectequaltothatofthefocaltrait,requiredto satisfy Eq. 8. In the limit of high pleiotropy, the selection strength on a mutation and the magnitude of its effect on a focal trait are correlated, implying that the kind of “purely pleiotropic” extreme postulated in previous work cannot arise under our model (19-21). Mountingevidencethatgeneticvariationishighlypleiotropic(seeIntroduction)alongside the robustness of our model raise the intriguing possibility that this limit form applies quitegenerally. Genetic architecture. We can now derive closed forms for summaries of genetic architecture (cf. SI Section 2). For mutations with a given selection coefficient, the frequency distribution follows from the diffusion approximation based on the first two moments of change in allele frequency (Eq. 5 and 6; (67)), and the distribution of effect sizesfollowsfromthegeometricconsiderationsoftheprevioussection.Conditionalonthe selection coefficient, these distributions are independent and therefore the joint distribution of frequency and effect size equals their product. Summaries of architecture canbeexpressedasexpectationsoverthejointdistributionoffrequenciesandeffectsizes for a given selection coefficient, and then weighted according to the distribution of selection coefficients. Given that we know little about the distribution of selection 10 coefficientsofmutationsaffectingquantitativetraits,weexaminehowsummariesdepend onthestrengthofselection. Expected variance per site. We focus on the distribution of additive genetic variance among sites, a central feature of architecture that is key to connecting our model with GWASresults.Westartbyconsideringhowselectionaffectstheexpectedcontributionofa site to additive genetic variance in a focal trait. We include monomorphic sites in the expectation,suchthattheexpectedtotalvarianceisgivenbytheproductoftheexpectation per-site and the population mutation rate (i.e., 2NU). Under the infinite sites assumption, sitescanbemonomorphicorbi-allelicandtheirexpectedcontributionis E d0𝑎81 𝑝𝑞 𝑆 = E 𝑎81 𝑆 E d0𝑝𝑞 𝑆 = 12 0 \W 𝑆E d0𝑝𝑞 𝑆 (11) (expressed in terms of the scaled selection coefficient S). Thus, the degree of pleiotropy only affects the expectation through a multiplicative constant. This effect of pleiotropy is not identifiable from data, because even if we could measure genetic variance in units of fitness(e.g.,asopposedtounitsofthetotalphenotypicvariance),westillwouldn’tbeable todistinguishbetweenwandn.Weinsteadfocusontheproportionaleffectofselectionon thecontributiontovariance,whichisinsensitivetothedegreeofpleiotropy. Theproportionalcontributiontogeneticvarianceasafunctionoftheselectioncoefficient wasdescribedbyKeighleyandHill(intheonedimensionalcase;(62))andisshowninFig. 2A.Whenselectionisstrong(S>>1),itseffectonallelefrequency(whichscaleswith1/S)is canceled out by its relationship with the effect size, yielding a constant contribution to geneticvariancepersite,vs=2w2/(nN),regardlessoftheselectioncoefficient(SISection3; Fig. 2A). Henceforth, we measure genetic variance relative to vs. When selection is effectively neutral (𝑆 ≪ 1) and thus too weak to affect the allele frequency, the expected contribution of a site to genetic variance scales with the effect size and is ½S, which is considerably lower than under strong selection (SI Section 3; Fig. 2A). In between these selection regimes, allele frequency is affected by under-dominance, resulting in a more complexdependencyontheselectioncoefficient(SISection3).Themaximalcontribution togeneticvariancepersiteoccurswithinthisrange(at𝑆 ≈ 10)andisslightlyhigher(by ~30%)thanunderstrongselection(Fig.2A).Henceforth,werefertothisselectionregime 11 as intermediate, to distinguish it from the considerably weaker selection in the nearly neutralregime.Theseresultssuggestthateffectivelyneutralsitesshouldcontributemuch lesstogeneticvariancethanintermediateandstronglyselectedones(61,62). withpleiotropy (a) Strong 1 0.75 0.5 0.25 0 Effectively neutral Scaled selection coefficient HSL 0.1 1 10 100 1000 100 (b) 80 60 Strongselection(S=100) Intermediateselection(S=10) Effectivelyneutral(S=0.5) 40 20 0 0 Threshold variance Hrelative to vs L 1 2 3 4 5 % variance from loci > v 1.25 Intermediate % variance from loci > v Expected variance Hin units of vs L withoutpleiotropy 1.5 100 80 (c) 6 4 2 60 0 5 10 15 40 20 0 0 Threshold variance Hrelative to vs L 1 2 3 4 5 Fig. 2. The distribution of additive genetic variance among sites. In (a), we plot the expectedcontributionasafunctionofthescaledselectioncoefficient.In(b)&(c),weshow theproportionofgeneticvariancestemmingfromsitesthatcontributemorethanthevalue onthex-axis,forasingletrait(b)andinthepleiotropiclimit(c). Distribution of variance among sites. Next, we consider how genetic variance is distributed among sites with a given selection coefficient. We focus on the distribution amongsegregatingsites(includingmonomorphiceffectswouldjustaddapointmassat0). This distribution is especially relevant to interpreting the results of GWAS, because, to a firstapproximation,astudywilldetectonlysiteswithcontributionstovarianceexceeding a certain threshold (which decreases as the study size increases; see Discussion). We therefore represent the distribution in terms of the proportion of genetic variance, G(v), arisingfromsiteswhosecontributiontogeneticvarianceexceedsathresholdv. Webeginwiththecaseofasingletrait(n=1),inwhichselectiononanalleledeterminesits effect size (Fig 2B). When selection is strong (S>>1), the proportion of genetic variance exceeding a threshold v is also insensitive to the selection coefficient and takes a simple form,with G 𝑣 = exp(−2𝑣) (12) (SISection3).Incontrast,intheeffectivelyneutralrange(𝑆 ≪ 1), G 𝑣 = 1 − 𝑣 𝑣lNm , (13) 12 8 wherethedependencyontheselectioncoefficiententersthrough𝑣lNm = 𝑆,whichisthe n maximalcontributiontovarianceforasitewithselectioncoefficientSthatisobtainedwith an allele frequency of ½ (SI Section 3). In the intermediate selection regime,G 𝑣 is also intermediate and takes a more elaborate functional form (SI Section 3). These results suggest how genetic variance would be distributed among sites given any distribution of selectioncoefficients(Fig2B):startingfromsitesthatcontributethemost,thedistribution would at first be dominated by contributions from strongly selected sites, then the intermediate selected sites would begin to contribute, whereas effectively neutral sites 8 wouldenteronlyfor𝑣 < 𝑆 ≪ 1. n The effect of pleiotropy is to cause sites with a given selection coefficient to have a distributionofeffectsizesonthefocaltrait,therebyincreasingthecontributiontogenetic varianceofsomesitesanddecreasingitforothers.InSISection3,weshowthatincreasing the degree of pleiotropy, n, increases the proportion of genetic variance,𝐺 𝑣 , for any threshold,v,regardlessofthedistributionofselectioncoefficients(Fig.S2).Whenvariation inatraitissufficientlypleiotropicforthedistributionofeffectsizestoattainthelimitform (Eq.10): G 𝑣 = 1 + 2 𝑣 exp −2 𝑣 (14) (15) forstronglyselectedsitesand G 𝑣 = exp −4 𝑣 𝑆 foreffectivelyneutralones(Fig.2C).Forintermediatelyselectedsites,𝐺 𝑣 issimilartothe caseofstrongselection,withmeasurabledifferencesonlywhen𝑣 ≫ 𝑣r (seeinsetinFig.2C andSISection3).WewouldthereforeexpectthatasthesamplesizeofGWASincreaseand thethresholdcontributiontovariancedecreases,intermediateandstronglyselectedsites willbediscoveredfirst,andeffectivelyneutralsiteswillbediscoveredmuchlater.IntheSI (SISection3andFig.S13),wealsoderivecorollariesoftheseresultsforthedistributionof numbersofsegregatingsitesthatmakeagivencontributiontogeneticvariance. 13 Discussion Inhumans,GWASformanytraitsdisplayasimilarbehavior:whensamplesizesaresmall, the studies discovered almost nothing, but once they exceeded a threshold sample size, boththenumberofassociationsdiscoveredandtheheritabilityexplainedsuddenlybegan to increase rapidly (4, 69). Intriguingly though, both the threshold study size and rate of increase vary among traits. These observations raiseseveral questions, including: How is the threshold study size determined? How should the number of associations and explained heritability increase with study size once this threshold is exceeded? With an orderofmagnitudeincreaseinstudysizesintothemillionsimminent,howmuchmoreof the genetic variance in traits should we expect to explain? The theory that we developed offerstentativeanswerstothesequestions. To relate our theory to GWAS, we must first account for the power to detect loci that contributetoquantitativegeneticvariation.Instudiesofcontinuoustraits,thepowercan beapproximatedbyastepfunctioninthecontributionoflocitoadditivegeneticvariance (ignoringtheeffectsofgenotyping,whichareconsideredbelow,andsomeotherpotential complications,e.g.(70)).Inotherwords,locithatcontributemorethanathresholdvalue 𝑣 ∗ shouldbedetectedandthosethatcontributelessshouldnot.Thethresholdisaffected bythestudysize,𝑚,andbythetotalphenotypicvarianceinthetrait,𝑉v ,where𝑣 ∗ 𝑚, 𝑉v ∝ 𝑉v /𝑚(SI Section 6) (69). Given a trait and study size, the number of associations discoveredandheritabilityexplainedshouldthenfollowfromthetailsofthedistributions thatwehaveconsidered,correspondingtothethreshold𝑣 ∗ 𝑚, 𝑉v . Our results therefore suggest that when genetic variation in a trait is sufficiently pleiotropic, the first loci to be discovered in GWAS will be intermediate or strongly selected,withcorrespondinglylargeeffectsizes(i.e.,𝑆 ≈ W\ 12 0 𝑎81 ≫ 1).Thethresholdstudy size for their discovery is proportional to𝑉v /𝑣r , i.e., to the total phenotypic variance measuredinunitsoftheexpectedcontributionofastronglyselectedsitetovariance.This result makes intuitive sense, as the total phenotypic variance can be viewed as the backgroundnoiseforthediscoveryofindividualstrongly(andintermediate)selectedsites. Oncethethresholdstudysizeisexceeded,theincreaseinthenumberofassociationsand 14 explained heritability with study size directly follow from the functional forms that we derivedforintermediateandstronglyselectedsites(Eq.14andS41andFigs.2C,S13B,3A andS15A).Thetraitspecificbehaviorboilsdowntothesameparameter,𝑉v /𝑣r ,relatingthe studysizewiththethreshold𝑣 ∗ .Someresultsaremodifiedwhenvariationinatraitisonly weakly pleiotropic, which is probably less common: notably, intermediate selected sites wouldbegintobediscoveredonlyafterstronglyselectedones(cf.Fig.2B,S13A,andS14) andthethresholdstudysizeforstronglyselectedsiteswouldbehigher(cf.Eq.12andFig. S14A). 30 30 (a) 80 Strongselection(S=100) Intermediateselection(S=10) 60 Effectivelyneutral(S=0.5) 40 20 20 10 10 20 0 % explained % heritability heritability explained 100 1 00 50 100 200 11 Study size Hin units of VP êvs L 2 5 10 20 (b) Re-sequencing g pin oty Gen % heritability explained 10 30 100 33 10 30 100 300 300 Scaled selection selection coefficient Scaled coefficientHSL HSL Fig.3.PredictionsforGWASofcontinuoustraits,inthepleiotropiclimit(cf.SISection3 forderivations).(a)Theproportionofheritabilityexplainedasafunctionofstudysize.(b) The heritability explained in re-sequencing and genotyping studies as a function of the scaled selection coefficient. The study size was chosen such that a re-sequencing study would capture 25% of the strongly selected variance (corresponding to a study size of ~16𝑉v /𝑣r ).Forthecasewithoutpleiotropy,seeFig.S14A-B. Our results further suggest that for missing heritability to be mainly due to the limited powerofgenotypingtodetectrarevariants(14,71),selectiononthegeneticvarianceina trait would have to be very strong. GWAS impute allele frequencies at loci that are not includedinthegenotypingarray,andtheaccuracyofthisimputationdeclinesrapidlywhen minorallelefrequenciesbecometoosmalltobewellrepresentedintheimputationpanel (72, 73). This decline in power is dealt with in several ways, all of which are similar in effect to considering alleles above a threshold frequency that allows them to be imputed accurately (74). For example, a study using an Illumina 1M SNP array and the 1000 15 genomes phase III as an imputation panel would include only loci with MAF above ~1% (75).Eveniflocibelowthatfrequencywereimputedwithperfectaccuracy,however,they would only be detected if their contribution to variance exceeded the study’s threshold, implying that they would also have to have extremely large effect sizes; and under our model,extremelylargeeffectsizesimplyverystrongselection(Fig.3B).Asanillustration, if a re-sequencing study identified 25% of the strongly selected (S>>1) variance, a genotyping study with the same sample size would suffer a 50% decrease in explained heritabilityonlyif𝑆 ≈ 200(assumingthesamegenotypingplatformandimputationpanel as above). If we further assuming a constant effective population size of 104 this would imply a (heterozygote) selection disadvantage of 1%. We note, however, that recent populationgrowthofthekindthathasbeeninferredforEuropeanpopulations(76,77)is expectedtoreducethefrequencyofallelesunderverystrongselection(77,78),leadingto agreaterlossofpowerduetogenotyping(seebelow). While some heritable variation may also arise from effectively neutral mutations, our results indicate that their expected contribution to genetic variance per site is much smallerthanforstronglyselectedandintermediateones(Fig.2;(61,62)).Oneimplication is that for the contribution of effectively neutral variation to be comparable to that of intermediate and strongly selected variation, they would need to have a much larger mutationaltargetsize.Itisdifficulttoevaluatewhetherthisisplausible,becausecurrent GWASarelikelyseverelyunderpoweredtodetectthem(butsee(79)).Forexample,sites withS=0.1areexpectedtocontributeonly1/20thofthegeneticvariancethatarisesfrom strongly selected ones; in the high pleiotropy limit, only 1% of that variance would be mappedwhenstudysizesarelargeenoughtomap95%ofthestronglyselectedvariance. Importantly,thesetheoreticalpredictionscanbeusedtomakeinferencesbasedonGWAS results.Asanillustration,weconsiderheightandbodymassindex(BMI)inEuropeans,two traits for which GWAS have discovered a sufficiently large number of genome-wide significantassociationstoallowforwellpoweredtestsandinferences(697forheight(16) and 97 for BMI (15)). We fit our theoretical predictions for the distribution of variance among loci to the distributions observed for the genome-wide significant associations reported for each of these traits (cf. Supplement Section 7 for details). In so doing, we assume that the mapped loci are under intermediate or strong selection, because our 16 theory suggests that they should be the first to be mapped. We also make the plausible assumptionthatgeneticvarianceforthesetraitsishighlypleiotropic(seeIntroduction;37, 42) and therefore assume the limit form for the distribution of effect sizes. Under these assumptions,weexpectthedistributionofvarianceamonglocitobewellapproximatedby asimpleform(Eq.S89),whichdependsonasingleparameter(vs).Whenweestimatethis parameter,wefindthatthetheoreticaldistributionfitsthedatawell(Fig.4A).Specifically, wecannotrejectourmodelbasedonthedata,suggestingthatwithonlyoneparameter,we obtain good fits to the data for both traits (by a Kolmogorov-Smirnov test,𝑝 = 0.14for heightand𝑝 = 0.54forBMI;cf.SupplementSection7).Bycomparison,withoutpleiotropy (n=1),ourpredictionsprovideapoorfittothesedata(byaKolmogorov-Smirnovtest,𝑝 < 10cx forheightand𝑝 = 0.05forBMI;Fig.S11). (a) 25 Model +95%CI 20 15 Height 10 BMI 5 0 Data 0 Explainedheritability(%) Heritability (%) %heritabilityfromloci>v % heritability 2 4 6 Threshold variance Thresholdvariance(relativetov s) (b) 40 Height 30 20 10 0 BMI 0 200 400 600 800 Study size (in thousands) 1000 Studysize(inthousands) Fig.4.ModelfitandpredictionsforheightandBMI.In(a),weshowthefitforassociated loci. In (b), we present our predictions for future increases in the heritability explained with GWAS size. The predicted number of associations is shown in Fig. S16. 95% CIs are basedonbootstrap;seeSupplementSectionS7fordetails. By fitting the model to GWAS results, we can also make inferences about evolutionary parametersunderlyingquantitativegeneticvariation.Estimatingthedegreeofpleiotropy (n) as an additional parameter in the model, we find that for both height and BMI, n is sufficientlylargeforittobeindistinguishablefromthehighpleiotropylimit.Basedonthe shape of the fitted distributions in this limit and the number of loci that fall above the 17 threshold contribution to variance of the studies, we can also estimate the mutational target size and proportion of the heritable variance arising from mutations within the rangeofselectioncoefficientsvisibleintheseGWAS.Theseestimatessuggestthat,within this range, height has a target size of ~5Mb, which account for ~50% of the heritable variance,whereasBMIhasatargetsizeof~1Mb,whichexplains~15%ofthevariance(see SITableS2). TheseparameterestimatescanhelptointerpretGWASresultsandmakepredictionsabout futurestudies.TheysuggestthattheGWASforheightsucceededinmappingasubstantially greater proportion of the heritable variance (~20% compared to ~3-5%) despite their smaller sample size (~250K compared to ~340K), because the proportion of variance arisingfrommutationswithintherangeofdetectableselectioneffectsismuchgreaterfor heightthanforBMI(~50%comparedto~15%).Furthermore,theestimatesoftargetsizes and the relationship between sample size and threshold contribution to variance can be usedtopredicthowtheexplainedheritabilityandnumberofassociationsshouldincrease with sample size (Figs. 4B and S16). These predictions are likely under-estimates as the rangeofselectioneffectsitselfshouldalsoincreasewithsamplesize. Tomaketheseinferencesandpredictionsmorereliable,animportantnextstepwillbeto account for the effects of recent human demography on the distributions of genetic variance(78,80-82).Notably,therecentgrowthofEuropeanpopulations(76,77)would have affected the distribution of genetic variance arising from very strongly selected mutations. Specifically, it will have reduced its expectation, because many of these mutationswouldhaveenteredthepopulationsincetheonsetofgrowthandthuswillbeat lowerfrequencies(77,78,82).Thisconsiderationalongsidetheeffectofgenotypingwould suggestthattherangeofselectioneffectsrevealedbycurrentGWASisinfactboundfrom above, and includes only moderately selected loci. However, moving beyond qualitative considerationswouldrequireincorporatingtheeffectsofdemographyintothemodel(77, 78,82).DoingsomayallowustopredicttheoutcomeoffutureGWASaswellastolearn about the relative importance of different evolutionary and genetic forces in shaping quantitativegeneticvariationfordifferenttraits. 18 Acknowledgements.WehavebenefitedhugelyfromdiscussionsandcommentsfromGuy Amster, Nick Barton, Jeremy Berg, Graham Coop, Laura Hayward, David Murphy, Joe Pickrell, Jonathan Pritchard and Molly Przeworski. The work was funded by NIHgrantGM115889toGS. References: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. FalconerDS&MackayTFC(1996) Introductiontoquantitativegenetics(Benjamin Cummings,Essex,England)4thEd. Lynch M & Walsh B (1998) Genetics and analysis of quantitative traits (Sinauer Sunderland,MA). Provine WB (2001) The origins of theoretical population genetics: With a new afterword(UniversityofChicagoPress). VisscherPM,BrownMA,McCarthyMI,&YangJ(2012)Fiveyearsofgwasdiscovery. AmJHumGenet90:7-24. Barton NH & Keightley PD (2002) Understanding quantitative genetic variation. NatureReviewsGenetics3:11-21. ManolioTA(2010)Genomewideassociationstudiesandassessmentoftheriskof disease.NewEnglJMed363(2):166-176. Bloom JS, Ehrenreich IM, Loo WT, Lite T-LV, & Kruglyak L (2013) Finding the sourcesofmissingheritabilityinayeastcross.Nature494(7436):234-237. Manolio TA, et al. (2009) Finding the missing heritability of complex diseases. Nature461:747-753. Zaitlen N, et al. (2013) Using extended genealogy to estimate components of heritabilityfor23quantitativeanddichotomoustraits.PLoSGenet9(5):e1003520. Gibson G (2012) Rare and common variants: Twenty arguments. Nat Rev Genet 13(2):135-145. Eichler EE, et al. (2010) Missing heritability and strategies for finding the underlyingcausesofcomplexdisease.NatRevGenet11(6):446-450. Zuk O, Hechter E, Sunyaev SR, & Lander ES (2012) The mystery of missing heritability:Geneticinteractionscreatephantomheritability.ProcNatlAcadSciUSA 109(4):1193-1198. Lee SH, Wray NR, Goddard ME, & Visscher PM (2011) Estimating missing heritability for disease from genome-wide association studies. Am J Hum Genet 88(3):294-305. Zuk O, et al. (2014) Searching for missing heritability: Designing rare variant associationstudies.ProcNatlAcadSciUSA111:E455-E464. Locke AE, et al. (2015) Genetic studies of body mass index yield new insights for obesitybiology.Nature518:197-206. Wood AR, et al. (2014) Defining the role of common variation in the genomic and biologicalarchitectureofadulthumanheight.NatGenet46(11):1173–1186. de Vladar HP & Barton N (2014) Stability and response of polygenic traits to stabilizingselectionandmutation.Genetics197(2):749-767. 19 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. AgarwalaV,FlannickJ,SunyaevS,ConsortiumGD,&AltshulerD(2013)Evaluating empiricalboundsoncomplexdiseasegeneticarchitecture.NatGenet45:1418-1427. Caballero A, Tenesa A, & Keightley PD (2015) The nature of genetic variation for complex traits revealed by gwas and regional heritability mapping analyses. Genetics201(4):1601-1613. Pritchard JK (2001) Are rare variants responsible for susceptibility to complex diseases?AmJHumGenet69:124–137. Eyre-Walker A (2010) Genetic architecture of a complex trait and its implications forfitnessandgenome-wideassociationstudies. ProcNatlAcadSciUSA107:1752– 1756. SpeedD&BaldingDJ(2015)Relatednessinthepost-genomicera:Isitstilluseful? NatRevGenet16(1):33-44. TurelliM(1984)Heritablegeneticvariationviamutation-selectionbalance:Lerch's zetameetstheabdominalbristle.TheorPopulBiol25(2):138-193. BartonNH&TurelliM(1989)Evolutionaryquantitativegenetics:Howlittledowe know?AnnuRevGenet23:337–370. Hill WG & Kirkpatrick M (2010) What animal breeding has taught us about evolution.AnnuRevEcolEvolSyst41:1-19. Johnson T & Barton N (2005) Theoretical models of selection and mutation on quantitativetraits.PhilosTransRSocLondBBiolSci360(1459):1411-1425. LandeR(1976)Naturalselectionandrandomgeneticdriftinphenotypicevolution. Evolution30(2):314-334. Byars SG, Ewbank D, Govindaraju DR, & Stearns SC (2010) Natural selection in a contemporaryhumanpopulation.ProcNatlAcadSciUSA107(suppl1):1787-1792. StulpG,PolletTV,VerhulstS,&BuunkAP(2012)Acurvilineareffectofheighton reproductivesuccessinhumanmales.BehavEcolSociobiol66(3):375-384. Frederick DA & Haselton MG (2007) Why is muscularity sexy? Tests of the fitness indicatorhypothesis.PersonalityandSocialPsychologyBulletin33(8):1167-1183. EndlerJA(1986)Naturalselectioninthewild(PrincetonUniversityPress). Kingsolver JG, et al. (2001) The strength of phenotypic selection in natural populations.AmNat157(3):245–261. Charlesworth B, Lande R, & Slatkin M (1982) A neo-darwinian commentary on macroevolution.Evolution36(3):474-498. Barton NH (1990) Pleiotropic models of quantitative variation. Genetics 124(3):773-782. Kondrashov AS & Turelli M (1992) Deleterious mutations, apparent stabilizing selectionandthemaintenanceofquantitativevariation.Genetics132(2):603-618. Wagner GP (1996) Apparent stabilizing selection and the maintenance of neutral geneticvariation.Genetics143(1):617-619. PickrellJK,etal.(2016)Detectionandinterpretationofsharedgeneticinfluenceson 42humantraits.NatGenet48(7):709-717. Sivakumaran S,etal. (2011) Abundant pleiotropy in human complex diseases and traits.AmJHumGenet89(5):607-618. Solovieff N, Cotsapas C, Lee PH, Purcell SM, & Smoller JW (2013) Pleiotropy in complextraits:Challengesandstrategies.NatRevGenet14(7):483-495. 20 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. CotsapasC,etal.(2011)Pervasivesharingofgeneticeffectsinautoimmunedisease. PLoSGenet7(8):e1002254. Andreassen OA, et al. (2013) Improved detection of common variants associated withschizophreniaandbipolardisorderusingpleiotropy-informedconditionalfalse discoveryrate.PLoSGenet9(4):e1003455. Bulik-SullivanB,etal.(2015)Anatlasofgeneticcorrelationsacrosshumandiseases andtraits.NatGenet47(11):1236-1241. Lee SH, Yang J, Goddard ME, Visscher PM, & Wray NR (2012) Estimation of pleiotropy between complex diseases using single-nucleotide polymorphismderived genomic relationships and restricted maximum likelihood. Bioinformatics 28(19):2540-2542. Cross-Disorder Group of the Psychiatric Genomics Consortium (2013) Genetic relationshipbetweenfivepsychiatricdisordersestimatedfromgenome-widesnps. NatGenet45(9):984-994. Perry JRB, et al. (2014) Parent-of-origin-specific allelic associations among 106 genomiclociforageatmenarche.Nature514(7520):92-97. ScottRA,etal.(2012)Large-scaleassociationanalysesidentifynewlociinfluencing glycemic traits and provide insight into the underlying biological pathways. Nat Genet44(9):991-1005. Shi H, Kichaev G, & Pasaniuc B (2016) Contrasting the genetic architecture of 30 complextraitsfromsummaryassociationdata.AmJHumGenet99(1):139-153. YangJ,etal.(2010)Commonsnpsexplainalargeproportionoftheheritabilityfor humanheight.NatGenet42(7):565-569. TeslovichTM,etal.(2010)Biological,clinicalandpopulationrelevanceof95locifor bloodlipids.Nature466(7307):707-713. Charlesworth B (1979) Evidence against fisher's theory of dominance. Nature 278(5707):848-849. Crow JF (2010) On epistasis: Why it is unimportant in polygenic directional selection.PhilosTransRSocLondBBiolSci365(1544):1241-1244. Clayton DG (2009) Prediction and interaction in complex disease genetics: Experienceintype1diabetes.PLoSGenet5(7):e1000540. Hill WG, Goddard ME, & Visscher PM (2008) Data and theory point to mainly additivegeneticvarianceforcomplextraits.PLoSGenet4(2):e1000008. Ávila V, et al. (2014) The action of stabilizing selection, mutation, and drift on epistaticquantitativetraits.Evolution68(7):1974-1987. WrightS(1929)Theevolutionofdominance.AmNat63(689):556-561. WeiW-H,HemaniG,&HaleyCS(2014)Detectingepistasisinhumancomplextraits. NatRevGenet15(11):722-733. Wood AR, et al. (2014) Another explanation for apparent epistasis. Nature 514(7520):E3-E5. Evans DM, et al. (2011) Interaction between erap1 and hla-b27 in ankylosing spondylitis implicates peptide handling in the mechanism for hla-b27 in disease susceptibility.NatGenet43(8):761-767. The International Multiple Sclerosis Genetics Consortium (2015) Class ii hla interactions modulate genetic risk for multiple sclerosis. Nat Genet 47(10):11071113. 21 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. 70. 71. 72. 73. 74. 75. 76. 77. 78. 79. 80. Barton N & Turelli M (1987) Adaptive landscapes, genetic distance and the evolutionofquantitativecharacters.GenetRes49(02):157-173. BulmerMG(1972)Thegeneticvariabilityofpolygeniccharactersunderoptimizing selection,mutationanddrift.GenetRes19(01):17-25. Keightley PD & Hill WG (1988) Quantitative genetic variability maintained by mutation-stabilizingselectionbalanceinfinitepopulations.GenetRes52(01):33-43. Robertson A (1956) The effect of selection against extreme deviants based on deviationoronhomozygosis.JGenet54(2):236–248. Wright S (1935) The analysis of variance and the correlations between relatives withrespecttodeviationsfromanoptimum.JGenet30(2):243-256. Keightley PD & Hill WG (1990) Variation maintained in quantitative traits with mutation-selection balance: Pleiotropic side-effects on fitness traits. Proc R Soc LondBBiolSci242(1304):95-100. Lande R (1975) The maintenance of geneticvariability by mutation in a polygenic characterwithlinkedloci.GenetRes26(3):221-235. Ewens WJ (2004) Mathematical population genetics 1: I. Theoretical introduction (SpringerScience&BusinessMedia). Martin G & Lenormand T (2006) A general multivariate extension of fisher's geometrical model and the distribution of mutation fitness effects across species. Evolution60(5):893-907. Sham PC & Purcell SM (2014) Statistical power and significance testing in largescalegeneticstudies.NatRevGenet15(5):335-346. DicksonSP,WangK,KrantzI,HakonarsonH,&GoldsteinDB(2010)Rarevariants createsyntheticgenome-wideassociations.PLoSBiol8(1):e1000294. CirulliET&GoldsteinDB(2010)Uncoveringtherolesofrarevariantsincommon diseasethroughwhole-genomesequencing.NatRevGenet11(6):415-425. Li Y, Willer C, Sanna S, & Abecasis G (2009) Genotype imputation. Annu Rev GenomicsHumGenet10:387-406. PritchardJK&PrzeworskiM(2001)Linkagedisequilibriuminhumans:Modelsand data.AmJHumGenet69(1):1-14. Evangelou E & Ioannidis JPA (2013) Meta-analysis methods for genome-wide associationstudiesandbeyond.NatRevGenet14(6):379-389. The1000GenomesProjectConsortium(2015)Aglobalreferenceforhumangenetic variation.Nature526(7571):68-74. TennessenJA,etal.(2012)Evolutionandfunctionalimpactofrarecodingvariation fromdeepsequencingofhumanexomes.Science337(6090):64-69. Keinan A & Clark AG (2012) Recent explosive human population growth has resultedinanexcessofraregeneticvariants.Science336(6082):740-743. Simons YB, Turchin MC, Pritchard JK, & Sella G (2014) The deleterious mutation loadisinsensitivetorecentpopulationhistory.NatGenet46(3):220-224. LohP-R,etal.(2015)Contrastinggeneticarchitecturesofschizophreniaandother complexdiseasesusingfastvariance-componentsanalysis. NatGenet47(12):13851392. GazaveE,ChangD,ClarkAG,&KeinanA(2013)Populationgrowthinflatestheperindividualnumberofdeleteriousmutationsandreducestheirmeaneffect. Genetics 195(3):969-978. 22 81. 82. Uricchio LH, Torres R, Witte JS, & Hernandez RD (2015) Population genetic simulations of complex phenotypes with implications for rare variant association tests.GenetEpidemiol39(1):35-44. LohmuellerKE(2014)Theimpactofpopulationdemographyandselectiononthe geneticarchitectureofcomplextraits.PLoSGenet10(5):e1004379. 23 Amodelforthegeneticarchitectureofquantitativetraitsunderstabilizing selection SupplementaryInformation Simons,Y.B.,Bullaughey,K.,Hudson,R.R.andG.Sella TableofContents 1. Themodel...................................................................................................................................................2 1.1. Absorbingtheenvironmentalcontributionintothefitnessfunction..........................2 1.2. Thedistributionofmutationaleffectsizesonagiventrait..............................................2 2. Solvingforsummariesofgeneticarchitecture..........................................................................4 2.1. Thefirsttwomomentsofchangeinallelefrequency.........................................................5 2.2. Sojourntime..........................................................................................................................................6 2.3. Calculatingexpectationsofsummariesofarchitecture.....................................................6 3. Additivegeneticvarianceandnumberofsegregatingsites................................................8 3.1. Expectations..........................................................................................................................................8 3.2. Densities...............................................................................................................................................10 3.3. Comparingpredictionsagainstsimulations.........................................................................14 4. Justificationforassumptions..........................................................................................................15 4.1. Normalandisotropicphenotypicdistributionaroundtheoptimum.......................16 4.2. Thephenotypicvariancesatisfies𝜎 " ≪ 𝑤 " .........................................................................16 4.3. Mutationaleffectsizessatisfy𝑎 " ≪ 𝑤 " ................................................................................17 4.4. Deviationsofthemeanphenotypefromtheoptimumcanbeneglected................17 5. Modelrobustness.................................................................................................................................19 5.1. Changestotheoptimalphenotype...........................................................................................19 5.2. Asymmetricmutationalinput.....................................................................................................22 5.3. Majoreffectloci.................................................................................................................................27 5.4. Anisotropicmutation......................................................................................................................28 6. ThepowertodetectlociinGWAS................................................................................................33 6.1. Re-sequencingstudies...................................................................................................................34 6.2. Genotyping..........................................................................................................................................36 7. Inference..................................................................................................................................................38 7.1. Thecompositelikelihood.............................................................................................................38 7.2. Determining𝑣 * andremovingoutliers...................................................................................40 7.3. Estimatingtargetsizeandexplainedvariance....................................................................41 7.4. Estimatingconfidenceintervals................................................................................................42 7.5. Testinggoodnessoffit...................................................................................................................44 8. Glossaryofnotation............................................................................................................................46 9. Bibliography...........................................................................................................................................47 10. Additionalfigures..............................................................................................................................49 1 1. Themodel 1.1. Absorbingtheenvironmentalcontributionintothefitnessfunction Here,weshowthattheadditiveenvironmentalcontributiontothephenotypecan beabsorbedintothefitnessfunction,whichjustifiesourconsideringonlythe additivegeneticcontributioninouranalysis.Thisresulthasbeenderivedmultiple timesfortheonedimensionalcase(e.g.,(1)).Theargumentinthemultidimensionalcaseissimilarandincludedforcompleteness. First,assumethattheadditiveenvironmentalcontributiontothephenotype,𝑟* ,is distributedasamulti-normalwithmean0andisotropicvariance𝑉* .Theexpected absolutefitnessofanindividualwithadditivegeneticcontributiontothe phenotype,𝑟- ,isgivenbyaveragingfitnessoverthedistributionofenvironmental contributions.Namely, W 𝑟- = = 0 𝑒 63 "123 4 5 0 𝑒 0?23 /A 5 4/5 8 9 5 8 3 5:3 W 𝑟- + 𝑟* = 5 9< 5(>5 =:3 ) . 0 𝑒 63 "123 4 5 9 5 8 3 5:3 𝑒 8 9< =93 5>5 5 (S1) Giventhatabsolutefitnessisdefineduptoamultiplicativeconstant,wecan thereforeabsorbtheadditiveenvironmentalcontributionbyusingtheGaussian fitnessfunction W 𝑟- = exp − 6< 5 "A 5 , (S2) where𝑤 " = 𝑤 " + 𝑉* .Evenwhentheenvironmentalcontributionisanisotropic,we canalwayschooseacoordinatesysteminwhichtheeffectivefitnessfunctiontakes anisotropicformaroundthefitnesspeak(Eq.1). 1.2. Thedistributionofmutationaleffectsizesonagiventrait Inthemaintext,wedefinethedistributionofphenotypiceffectsofnewlyarising mutationsinthen-dimensionaltraitspace,𝑎.Here,weconsidertheprojectionof 2 theseeffectsonagiventrait,𝑎0 ,takenwithoutlossofgeneralitytobeonthe1st dimension.Thedistributionofeffectsizesonafocaltraitwilldependonthedegree ofpleiotropy,n,andtheformofthisdependencybecomesimportantwhenwe considerhowpleiotropyaffectsgeneticarchitecture. Wewanttocalculatethedistributionofeffectsizesonthefocaltrait,𝑎0 ,conditional ontheiroveralleffect,𝑎 = 𝑎 .Weassumethatthedistributionofeffectsofdenovo mutationsisisotropicintraitspace.Theeffectofamutation,𝑎,thereforehasequal probabilitytooccupyanypointonann-dimensionalspherewithradius𝑎.LetSI 𝑥 denotethesurfaceareaofanm-dimensionalsphereofradius𝑥and𝜃denotethe anglebetweenthevector𝑎anditsprojection𝑎0 ,i.e.,𝑎0 = 𝑎 cos 𝜃.Intheseterms, thesurfaceareaelementcorrespondingtoangle𝜃is SO80 𝑎 sin 𝜃 𝑎𝑑𝜃, (S3) andbya changeofvariables,thesurfaceareaelementcorrespondingtoprojection 𝑎0 onthefocaltraitis since𝑑𝑎0 = UST UV S 𝑎" − 𝑎0" SO80 𝑎 sin 𝜃 𝑎𝑑𝜃 = SO80 𝑑𝜃 = 𝑎 sin 𝜃 𝑑𝜃 = S 5 8ST5 𝑑𝑎0 , (S4) 𝑎" − 𝑎0" 𝑑𝜃.Thisresultimpliesthatthe probabilitydensityof𝑎0 is φO 𝑎0 𝑎 = S 5 8ST5 X4YT X4 S S S 5 8ST5 Z = [Z 4 5 4YT 5 1− ST5 S5 4Y] 5 0 S (S5) (forasimilarderivation,see(2)). Next,weconsiderthehighpleiotropylimitformofthisdistribution.Foranydegree ofpleiotropy,thesymmetryofthemutationaldistributionimpliesthat E 𝑎0 𝑎 = 0 (S6) andtheequivalenceamongtraitsimpliesthat 3 V 𝑎0 𝑎 = 𝑎" 𝑛 (S7) (cf.maintextformoredetails).Itfollowsthatwhennbecomessufficientlylarge, 𝑎0 𝑎 ≪ 1andtherefore 1− Inaddition,Γ ST5 4Y] 5 S5 O " /Γ O ST5 " S5 ≈ exp − O80 " ≈ . (S8) 𝑛/2.SubstitutingtheseexpressionsintoEq.S5,we findthatforsufficientlylargenthedistributionofeffectsizesapproachesthe normaldistribution φO 𝑎0 𝑎 ≈ 0 "1 S5 0 O exp − ST5 " S5 O . (S9) Asweelaborateinthemaintext,importantimplicationsaboutquantitativegenetic variationfollowfromthishighpleiotropylimit.Thelimitalsoholdsquitegenerally whenthedistributionofeffectsizesisanisotropic(seeSectionS5.4). 2. Solvingforsummariesofgeneticarchitecture Here,wederiveclosedformsforsummariesofgeneticarchitectureunderour model.Webeginbyderivingthefirsttwomomentsofchangeinallelefrequencyina singlegeneration.Withthesemomentsathand,weusethediffusionapproximation tocalculatethesojourntimeforallelesthatcontributetoquantitativegenetic variation(3).Togetherwiththedistributionofeffectsizesderivedintheprevious section,thesojourntimeallowsustoobtainclosedformsforsummariesofgenetic architecture.Specifically,wecanobtainaclosedformforanysummarythatcanbe describedasafunctionofallelefrequenciesandeffectsizesatsitescontributingto quantitativegeneticvariation.Weusetheseexpressionstocalculatethesummaries usedinthemaintext,forexampletheexpectedadditivegeneticvarianceandits distributionacrosssites. 4 2.1. Thefirsttwomomentsofchangeinallelefrequency Weassumethat: • Thephenotypicdistributionatsteadystateiswellapproximatedbyan isotropicmultivariatenormaldistributioncenteredattheoptimum,namely bytheprobabilitydensityis f 𝑟 = • 0 "1e 5 4 𝑒𝑥𝑝 − 5 65 . "e 5 (S10) Thatboth𝑎" and𝜎 " ≪ 𝑤 " . TheseassumptionsarejustifiedinSectionS4. Werelyontheseassumptionstocalculatethefirsttwomomentsofchangein frequencyinasinglegenerationforanallelewithphenotypiceffect𝑎andfrequency q.Thefitnessesofthethreegenotypesatthesitedependonitsdistributionof geneticbackgrounds,i.e.,onthetotalphenotypiccontributionofsitesotherthanthe focalone𝑅.FollowingEq.S10andassumingeveryallelecontributesonlyasmall proportionofthegeneticvariance,thedistributionof𝑅iswellapproximatedby f 𝑅 𝑎, 𝑞 = 5 0 exp "1e 5 4/5 − j?kS "e 5 . (S11) Theexpectedfitnessesofthethreegenotypesthenfollowfromintegratingover backgrounds: O A 𝑊nn = f 𝑅 𝑎, 𝑞 W 𝑅 = j 𝑊n0 = f 𝑅 𝑎, 𝑞 W 𝑅 + 𝑎 2 = j 𝑊00 = j A 5 ?e 5 exp − O A A 5 ?e 5 S5 k5 exp − (S12) (S13) . (S14) − 𝑞 , (S15) , " A 5 ?e 5 S 5 (k80 ")5 "(A 5 ?e 5 ) and f 𝑅 𝑎, 𝑞 W 𝑅 + 𝑎 = A A 5 ?e 5 O exp − S 5 (k80)5 "(A 5 ?e 5 ) Thefirstmomentofchangeinallelefrequencyisthen E Δ𝑞 = −𝑝𝑞 p qrr 8qrT ?k qrT 8qTT q ≈− S5 sA 5 𝑝𝑞 0 " 5 relyingonourassumptionsthat𝑎" and𝜎 " ≪ 𝑤 " .Thefunctionalformofthefirst momentisequivalenttothatofthestandardviabilityselectionmodelwithunderdominanceandselectioncoefficient𝑠 = 𝑆= vS 5 "A 5 . S5 sA 5 orscaledselectioncoefficient (S16) , (S17) Similarly,wefindthat V Δ𝑞 ≈ pk "v whichisthestandardsecondmomentwithgeneticdrift. 2.2. Sojourntime Basedonthefirsttwomoments,wecanusethediffusionapproximationtocalculate thesojourntimeasafunctionofallelefrequency,i.e.,thedensityofthetimethatan allelespendsatagivenfrequency𝑞beforeitfixesorislost(3).Foramutantallele withinitialfrequency1/2𝑁andscaledselectioncoefficientS,thesojourntimeis v 1 y τ 𝑞𝑆 = z{| y " v 1 y z{| y " 5 * } TY5~ /• €(08€) 5 * } TY5~ /• €(08€) f8 𝑆, 𝑞 f? 𝑆, 1 2𝑁 0 ≤ 𝑞 ≤ 1/2𝑁 f? 𝑆, 𝑞 f8 𝑆, 1 2𝑁 1 2𝑁 ≤ 𝑞 ≤ 1 whereerfistheerrorfunctionandf± 𝑆, 𝑦 ≡ erf 𝑆 2 ± erf (S18) 𝑆 1 − 2𝑦 2 . Thesojourntimetakessimplelimitingformswhenselectioniseffectivelyneutral (𝑆 ≪ 1)orstrong(𝑆 ≫ 1).Intheeffectivelyneutralrange,itiswellapproximated " byτ 𝑞 𝑆 = ,andinthestronglyselectedrange,itiswellapproximatedby k " τ 𝑞 𝑆 = exp −𝑆𝑞 . k 2.3. Calculatingexpectationsofsummariesofarchitecture Manysummariesofinterestcanbeexpressedassumsoversegregatingsitesof somefunction𝑐(𝑞, 𝑎1),where𝑞isthederivedallelefrequencyanda1istheeffect sizeonthetrait.Forexample,theadditivegeneticvarianceinatraitisgivenbythe 6 sumof𝑣(𝑞, 𝑎0 ) = ½𝑎0" 𝑞(1 − 𝑞)oversites.Theexpectationoversuchsummaries canbeexpressedas E 𝐶 = 2𝑁𝑈 𝑞 𝑎1 𝑐(𝑞, 𝑎0 )𝜌(𝑞, 𝑎0 ), (S19) where𝐶isthesummerysummedoverallsites,2NUisthepopulationmutationrate pergenerationand𝜌(𝑞, 𝑎0 )isthedensityofsiteswiththecorrespondingfrequency andeffectsizeperunitmutationalinput. Thedensity𝜌(𝑞, 𝑎0 )canbebrokendownintocontributionsfromsiteswithdifferent selectioncoefficients,i.e., 𝜌 𝑞, 𝑎0 = 𝑆 f(𝑆) τ 𝑞 𝑆 η 𝑎0 𝑆 , (S20) wheref(𝑆)isthedistributionofselectioncoefficientsand𝜏 𝑞 𝑆 isthesojourntime ofamutationwithselectioncoefficient𝑆(Eq.S18).Theprobabilitydensityη 𝑎0 𝑆 ofeffectsizesgivenselectioncoefficient𝑆followsfromEqs.S5andS16 η 𝑎0 𝑆 = φŽ 𝑎0 𝑎(𝑆) = φŽ 𝑎0 2𝑤 " 𝑁 𝑆 . (S21) Thisallowsustobreakdownoursummariesintocontributionsfromsiteswith differentselectioncoefficients E 𝐶 = 2𝑁𝑈 𝑆 f(𝑆)E 𝐶 𝑆 (S22) (S23) with E 𝐶𝑆 = 𝑞 𝑎1 𝑐(𝑞, 𝑎0 )τ 𝑞 𝑆 η 𝑎0 𝑆 . WeuseEq.S23tostudyhowsummariesofarchitecturedependonthestrengthsof selection,andhowthesesummarieswilldependondifferentdistributionsof selectioncoefficients.Thisallowsustodrawgeneralimplicationsaboutgenetic architecturedespiteourlimitedknowledgeaboutthisdistribution. 7 3. Additivegeneticvarianceandnumberofsegregatingsites Thedistributionsofadditivegeneticvarianceandofthenumberofsegregatingsites arecriticaltounderstandinggeneticarchitectureandspecificallytointerpreting resultsofGWAS.Herewederiveclosedformsforbothdistributionsaswellas simpleapproximationsunderstrongandeffectivelyneutralselection. 3.1. Expectations Webeginbyconsideringtheexpectedcontributionofasitetoadditivegenetic variance.Substitutingthecontributiontovariancefromasinglesite𝑣(𝑞, 𝑎0 ) = 0 " 𝑎0" 𝑞(1 − 𝑞)intoEq.S23,wefindthat E 𝑉𝑆 = 1 = = k2 1 k2 𝑞 𝑎1 𝑣 𝑞, 𝑎0 τ 𝑞 𝑆 η 𝑎0 𝑆 𝑞 1−𝑞 τ 𝑞 𝑆 𝑞 1−𝑞 τ 𝑞 𝑆 ST 2𝑤2 𝑆 𝑛𝑁 𝑎21 η 𝑎1 𝑆 = 2𝑤2 1 𝑆𝑞 𝑛𝑁 k 2 1 − 𝑞 τ 𝑞 𝑆 (S24). (S25) Thetotaladditivegeneticvarianceis 𝜎 " = 2𝑁𝑈 y f(𝑆)E 𝑉 𝑆 . TheclosedformforE 𝑉 𝑆 (Eq.S24)wasintegratednumericallytoobtainFig.2ain themaintext.Wecanusetheresultsof(4)toobtainananalyticapproximationfor E 𝑉 𝑆 : E 𝑉𝑆 = 2𝑤2 𝑛𝑁 𝜋𝑆 erfi 4 𝑆 4 𝑆 exp − 4 + O 1 2𝑁 , (S26) witherfibeingtheimaginaryerrorfunction(erfi 𝑥 = erf 𝑖𝑥 /𝑖). Intheeffectivelyneutralandstrongselectionlimits,wecanusesimplelimiting formsforthesojourntimetoderivesimpleformsfor𝐸 𝑉 𝑆 .Intheeffectively neutrallimit,τ 𝑞 𝑆 ≈ 2/𝑞andso E 𝑉𝑆 y≪0 ≈ "A 5 y Ov " . (S27) 8 Inthestrongselectionlimit,τ 𝑞 𝑆 ≈ 2 exp(−𝑆𝑞) /𝑞andso E 𝑉𝑆 y≫0 "A 5 Theconstant Ov ≈ "A 5 Ov . (S28) ,whichrecursinourderivations(e.g.,Eq.S24),thushasasimple interpretation:itistheexpectedcontributionofstronglyselectedsitestoadditive geneticvariance(perunitmutationalinput).Wethereforedenoteitby 𝑣” = "A 5 Ov (S29) andhenceforthmeasurevarianceintheseunits. Wenextconsiderhowtheexpectednumberofsegregatingsitesdependsonthe strengthofselection.Thisexpectation(perunitmutationalinput)issimplythe meansojourntimeofanewlyarisingmutation.Formally,itfollowsfrom substitutingc 𝑞, 𝑎0 = 1intoEq.S23,i.e., E 𝐾𝑆 = 𝑞 𝑎1 τ 𝑞 𝑆 η 𝑎0 𝑆 = 𝑞 τ 𝑞𝑆 𝑎1 η 𝑎0 𝑆 = 𝑞 τ 𝑞 𝑆 . (S30) InFig.S1,wehavecalculatedthisintegralnumericallyfordifferentvaluesofSto showthatthenumberofsegregatingsitesdependsonlyweaklyon𝑆. segregating sites Number of 30 FigureS1.Thenumberofsegregatingsites 20 perunitmutationalinput(orequivalently theexpectedsojourntimeofanewly 10 0 0 arisingmutation),Eq.S30,isonlyweakly 10 20 30 40 dependentonthestrengthofselection. Scaled selection coefficient Calculatedforapopulationsizeof20,000. 9 3.2. Densities Here,weconsiderhowtheadditivegeneticvarianceisdistributedamongsites.The densityofsegregatingsiteswithagivencontributiontovariancevcanbecalculated bysubstitutingDirac’sdeltafunctionδ 𝑣 − ½𝑎0" 𝑞 1 − 𝑞 intoEq.S23,yielding 𝜌 𝑣|𝑆 = E 𝛿 𝑣 − ½𝑎0" 𝑞 1 − 𝑞 = 𝑎1 = τ q? 𝑣, 𝑎0 𝑆 𝑎1 " šœ 0 " 𝑞 𝑎1 δ 𝑣 − ½𝑎0" 𝑞 1 − 𝑞 τ 𝑞 𝑆 η 𝑎0 𝑆 = + τ q8 𝑣, 𝑎0 𝑆 τ q? 𝑣, 𝑎0 𝑆 + τ q8 𝑣, 𝑎0 𝑆 whereq± 𝑣, 𝑎0 = 0 š›= œ,ST 𝑆 = š›Y œ,ST " ST5 08•œ/ST5 η 𝑎0 𝑆 šœ η 𝑎0 𝑆 , (S31) 1 ± 1 − 8𝑣/𝑎0" arethetwofrequenciesthatyield𝑣 = 𝑎0" 𝑞 1 − 𝑞 .ThisintegralcanbecalculatednumericallyforanySandη 𝑎0 𝑆 (e.g., fordifferentdegreesofpleiotropy). AswediscussinthemaintextandinSectionS6,thetailsofthisdistributionare especiallyimportantfortheinterpretationofGWAS.Notably,toafirst approximation,thelocicapturedinagivenGWASwouldbethosewhose contributiontoadditivevarianceexceedssomethresholdcontribution𝑣 ∗ . Weareparticularlyinterestedintheproportionofadditivegeneticvariance capturedinGWAS,whichweapproximateintermsoftheadditivevariancecarried bysitesinthe𝑣 > 𝑣 ∗ tail.Theexpectedvarianceinsuchatailis E 𝑉 𝑣∗, 𝑆 = œ¡œ ∗ 𝑣𝜌 𝑣 𝑆 , (S32) (S33) anditsproportionofthetotaladditivegeneticvarianceis G 𝑣∗ 𝑆 ≡ £(2|œ * ,y) £(2|y) = ¥¦¥∗ œ¤(œ|y) œ¤(œ|y) ¥ . Foradistributionofselectioncoefficients,theproportionofvariancebecomes G 𝑣∗ = } § œ ∗ |y £(2|y)|(y) £(2|y)|(y) } = y G 𝑣 ∗ |𝑆 £(2|y)|(y) } £(2|y)|(y) . (S34) 10 Forcaseswithoutpleiotropyorwithextensivepleiotropy,wecangreatlysimplify theseexpressionsinthelimitsofeffectivelyneutralandstrongselection.We illustratehowtheseapproximationscanbeobtainedfortheproportionofadditive varianceinatail,G 𝑣 ∗ 𝑆 (TableS1).Whenselectioniseffectivelyneutral,then τ 𝑞 𝑆 = 2/𝑞andthus ¥¦¥∗ G 𝑣 ∗ |𝑆 = ¥ = = " y œ¡œ ∗ 𝑎1 " y œ¡œ ∗ 𝑎1 " = y 𝑣 𝑎1 ›= œ,ST " 𝑣 0? " 𝑣>𝑣∗ " 08•œ/ST5 08•œ/ST5 + œ¤(œ|y) œ¤(œ|y) " " ›Y œ,ST ST5 08•œ/ST5 " + η 𝑎0 𝑆 = 08 " 08•œ/ST5 𝑎1 > •œ ∗ η 𝑎0 𝑆 ST5 y ST5 08•œ/ST5 η 𝑎0 𝑆 1 − 8𝑣 ∗ /𝑎0" η 𝑎0 𝑆 , (S35) withvariancemeasuredinunitsof𝑣” andeffectsizemeasuredinunitsof 𝑣” . Withoutpleiotropy(𝑛 ≈ 1),weknowthat𝑎0 = ± 𝑆andtheexpressionforthe proportionofvariancesimplifiesto G 𝑣 ∗ |𝑆 = 1 − 8𝑣 ∗ /𝑆, (S36) where𝑆/8isthemaximalcontributiontovarianceforamutationwithselection coefficient𝑆.Whenthereisahighdegreeofpleiotropy(𝑛 ≫ 1),𝑎0 isapproximately normallydistributedwithmean0andvariance𝑆andtheexpressionforthe proportionofvariancesimplifiesto G 𝑣 ∗ |𝑆 = ST5 𝑎1 > •œ ∗ y 1 − 8𝑣 ∗ /𝑎0" 0 "1y exp(−𝑎0" /2𝑆) = exp − 4𝑣∗ 𝑆 .(S37) Whenselectionisstrong,derivedallelesarerare(𝑞 ≪ 1),implyingthat𝑣 ≪ 𝑎0" and 𝑞 ≈ 2𝑣/𝑎0" ,andthesojourntimeiswellapproximatedbyτ 𝑞 𝑆 ≈ 2 exp(−𝑆𝑞)/𝑞. Theproportionofvariancethensimplifiesto 11 G 𝑣 ∗ |𝑆 ≈ ≈ œ¡œ ∗ 𝑎1 𝑣>𝑣∗ ST = 𝑣>𝑣∗ ST = ST5 𝑎1 y 𝑣 " 𝑣τ 𝑞 𝑣, 𝑎0 𝑆 𝑎21 exp 𝑣 −2𝑆𝑣/𝑎21 ST5 2 η 𝑎21 η 𝑎0 𝑆 𝑎1 𝑆 2 exp −2𝑆𝑣/𝑎21 η 𝑎1 𝑆 exp −2𝑆𝑣 ∗ /𝑎0" η 𝑎0 𝑆 . (S38) (S39) Withoutpleiotropy,thisexpressionfurthersimplifiesto G 𝑣 ∗ |𝑆 ≈ exp(−2𝑣∗ ), andwhenthereisahighdegreeofpleiotropy(𝑛 ≫ 1),then G 𝑣 ∗ |𝑆 ≈ ST5 𝑎1 y exp − "yœ ∗ ST5 0 "1y exp − = 1 + 2 𝑣∗ exp −2 𝑣∗ . ST5 "y (S40) Selection Effectivelyneutral(𝑆 ≪ 1) Stronglyselected(𝑆 ≫ 1) #oftraits 𝑛 ≈ 1 𝑛 ≫ 1 𝑛 ≈ 1 𝑛 ≫ 1 E 𝑉𝑆 𝑆/2 𝑆/2 1 1 exp −4𝑣 ∗ /𝑆 exp(−2𝑣 ∗ ) 1 + 2 𝑣 ∗ exp(−2 𝑣 ∗ ) G 𝑣∗ 𝑆 1 − 8𝑣 ∗ /𝑆 TableS1.Simplelimitsfortheexpectedproportionofvariancearisingfromsites withadditivecontributiongreaterthan𝑣 ∗ . Anoteworthypropertyoftheproportionofvariancefromsiteswithcontributions 𝑣 > 𝑣 ∗ ,G 𝑣 ∗ |𝑆 ,isthatitalwaysappearstoincreasewiththedegreeofpleiotropy,n (FigureS2).Wedonothaveaproofforthispropertybutcansuggestanintuitive explanation.Withoutpleiotropy(n=1),theselectioncoefficientdeterminesthe effectsize,suchthatanycontribution𝑣 ∗ togeneticvariancecorrespondstoa specificminorallelefrequency𝑞∗ .Thesiteswithcontributions𝑣 > 𝑣 ∗ aretherefore 12 thosewithminorallelefrequencies𝑞 > 𝑞∗ .Pleiotropycausessiteswithagiven selectioncoefficienttohaveadistributionofeffectsizesonthetraitunder consideration.Asaresult,somesiteswithfrequenciesabove𝑞∗ endupwith contributionstovariancebelow𝑣 ∗ whileothersexceed𝑣 ∗ .Tounderstandhowthis affectsG 𝑣 ∗ |𝑆 ,recallthatforanyselectioncoefficient,thedensityofvariants alwaysrapidlyincreasesas𝑞∗ decreases.Aslongasthecontribution𝑣 ∗ andthe correspondingfrequencywithoutpleiotropy𝑞∗ arenotcloseto0,wemaytherefore expectthatintroducingpleiotropywouldresultinpushingmoresitesabove𝑣 ∗ than thosepushedbelow𝑣 ∗ ,resultinginanetincreasetotheproportionG 𝑣 ∗ |𝑆 . S=1 0.6 0.4 0.8 0.6 0.4 0.5 1 v 1.5 * 0.0 2 xvs 0 S=100 1.0 n=1 n=2 n=3 n=10 n ∞ 0.8 0.6 0.4 0.2 0.2 0.2 0.0 0 n=1 n=2 n=3 n=10 n ∞ GHv* L 0.8 S=10 1.0 n=1 n=2 n=3 n=10 n ∞ GHv* L G(v*|S) GHv* L 1.0 2 4 6 0.0 10 xvs 0 8 2 4 Thresholdcontributiontovariance(v*) v 6 8 v* * 10 xvs FigureS2.Theeffectofpleiotropyontheproportionofthevarianceexplainedby sitescontributingmorethen𝑣 ∗ tothevariance,𝐺(𝑣 ∗ |𝑆)(seeEq.S33).Forall selectioncoefficients,theproportionofvarianceexplainedincreasesasthenumber oftraits,i.e.thedegreeofpleiotropy,increases. Lastly,weconsiderhowthenumberofsegregatingsitesdependsontheir contributiontogeneticvariancevaryasafunctionofthiscontribution(Fig.S13). ThenumberofsegregatingsitesKwith𝑣 > 𝑣 ∗ perunitmutationalinputis E 𝐾 𝑣∗, 𝑆 = œ¡œ ∗ 𝜌 𝑣 𝑆 , (S41) (S42) andthereforethedistributionofvarianceatthesesitesis f(𝑣) = 𝜌 𝑣 𝑆 / 𝑣>𝑣∗ 𝜌 𝑣 𝑆 . Weusetheseresultsbelow,whenweestimatemodelparametersfromGWAS results(SectionS7).Theyalsoallowustotranslatethepropertiesofthedensityof 13 geneticvarianceintopropertiesofthedensityofsites(Fig.S13).Notably,by applyingthesamereasoningonecanshowthatthedensityofsitesisinsensitiveto thedistributionofstrongselectioncoefficientsandpleiotropyalwaysincreasesthe numberofsitesabove𝑣 ∗ (i.e.,forany𝑣 ∗ >0). 3.3. Comparingpredictionsagainstsimulations Wetestedourtheoreticalderivationsforthetotalgeneticvarianceandits distributionamongsitesagainstcomputersimulations.Tothisend,weranforward simulationofthefullmodel(thecodeisavailableathttps://github.com/sellalab) witharangeofparametervalueschosentobalancebetweenbiologicallyplausibility andmaintainingmanageablerunningtimes.Notably,weusedapopulationsizeof 𝑁 = 1000,withaburn-intimeof10,000generations.Wevariedthenumberof traitstoincluden=1,3,and10,andvariedthemutationrateperhaploidgenome pergenerationwithintherange1/2N ≤ 𝑈 ≤ 1.Selectioncoefficientswerechosen fromanexponentialdistributionwithmeanE(S)=0.1,10,and50.Forsimplicity,we took𝑤 " = 1,whichistantamounttochoosingtheunitsthatweusetomeasure effectsizes.Forthesakeofcomputationalefficiency,thesimulationswere performedwithfecundityselection,butweransimulationswithviabilityselection forasubsetoftheparameterstoensurethatthischoicedoesnotaffectourresults (notshown). Ourtheoreticalderivationsforboththetotalgeneticvarianceanditsdistribution amongsitesagreewellwithresultsfromsimulations(Fig.S3).Notably,wefound thataslongas1 2𝑁 ≪ 𝜎 " /𝑤 " ≪ 1(cf.SectionS3.3),ourtheoreticalpredictionsfor thetotalgeneticvariance(Eq.S25)areindistinguishablefromthesimulationresults (Fig.S3A).Theoreticalandsimulationresultsseemtoagreeevenwhen𝜎 " /𝑤 " ≤ 1/2N,althoughweconsiderthisrangetobeoflesserbiologicalrelevance(cf. SectionS4.4).Wetestedourpredictionsforthedistributionofvarianceacrosssites intermsofG 𝑣 ∗ (Eq.S34),i.e.,theproportionofthevariancearisingfromsitesthat contributemorethan𝑣 ∗ (Fig.S3B),andalsofoundthemtobehighlyaccurate. 14 10-5 1 0.5 10 10-2 10-3 10-2 10-1 Mutation rate (U) Mutationrate(U) E(S)=0.1 100 10-4 10-3 1 10-2 10-1 100 Mutation rate (U) Mutationrate(U) E(S)=10 0.5 0 2 Totalgeneticvariance(! Total genetic variance (<2)) 10-3 E(S)=10 E(S)=10 0 E(S)=50 E(S)=50 10 0 0 0.05 0.1 0.15 0.2 Xvs 0 1 2 3 4 5 xvs Threshold contribution to variance (v*) Threshold contribution to variance (v*) Thresholdcontributiontovariance(v*) Thresholdcontributiontovariance(v*) 0 10-2 10 -4 10 1 G(v*) G(v*) E(S)=0.1 E(S)=0.1 -1 G(v*) G(v*) G(v*) G(v*) (b) 10 2 Total genetic variance (<2) ) Totalgeneticvariance(! 2 Total genetic variance (<2)) Totalgeneticvariance(! (a) -3 -2 -1 10 10 Mutation rate (U) Mutationrate(U) E(S)=50 10 0 0.5 0 0 1 2 3 4 5 xvs Threshold contribution to variance (v*) Thresholdcontributiontovariance(v*) n=1 n=3 n=10Data Theory FigureS3.Testingtheoreticalpredictionsforgeneticvarianceagainstsimulations. (a)Totalgeneticvariance(inunitsof𝑤 " )asafunctionofthemutationrate.The rangeofrelevantmutationrates,1/2𝑁 ≪ 𝑈 ≪ 1,andtheexpectedrangeforthe validityofourderivation,𝑤 " /2𝑁 ≪ 𝜎 " ≪ 𝑤 " ,aremarkedbyagreybox.(b)The distributionofvarianceamongsites,for𝑈 = 0.01;G(𝑣 ∗ )istheproportionof variancefromsitescontributingmorethan𝑣 ∗ (Eq.S34).Errorbarsrepresentone standarddeviation.Foreachsetofparameters,thenumberofsimulationswas chosentoobtainstandarddeviationsbelow10%.Inpractice,weoftenreached standarddeviations≪ 10%,whichiswhyerrorbarsaremostlytoosmalltobe visible. 4. Justificationforassumptions Here,wejustifytheassumptionsthatwerelieduponinderivingthefirsttwo momentsofchangeinallelefrequency(cf.SectionS2.1;modelingassumptionsare motivatedintheintroductiontothemaintext).Werelyinpartonself-consistency arguments,whichshouldnotbemistakenforbeingcircular:specifically,wemake assumptionsaboutthebehaviorofthesystemandshowthatthesolutiontowhich wearrivesatisfiestheseassumptions. 15 4.1. Normalandisotropicphenotypicdistributionaroundtheoptimum Theassumptionthatthephenotypicdistributioniswellapproximatedbyanormal distributionstemsfromanadditivemodelofquantitativetraits.Byassumingthat thephenotypearisesfrommanyadditivecontributionsandthattheseadditive contributionsarisefromsomeunderlyingdistribution,normalityfollowsfromthe lawoflargenumbers.Intermsofmodelparameters,wewouldexpectnormalityto holdiftherateofmutationsaffectingthetraitissufficientlylarge,i.e.,when2𝑁𝑈 ≫ 1. Wefurtherassumethatthephenotypicdistributionisisotropicanditsmeanisat theoptimum.Isotropyofthephenotypicdistributionfollowsfromassuming isotropyinthemutationalinput.InSectionS5.4,weexploretheconsequencesof anisotropyinthemutationalinput.InSectionS4.4,wefurthershowthatthe fluctuationsofthemeanphenotypearoundtheoptimumovertimehavenegligible effectsonallelicdynamics;asimilarargumentappliestofluctuationsinthe variance. 4.2. Thephenotypicvariancesatisfies𝝈𝟐 ≪ 𝒘𝟐 Withthemeanphenotypecenteredattheoptimum,requiringthat𝜎 " ≪ 𝑤 " is equivalenttoassumingthatmovingastandarddeviationawayfromthemean phenotypeentailsonlyaminorreductioninfitness.Thisseemsplausibleformany phenotypes:if,forexample,thisassumptiondidnotholdforhumanheight,then individualswhoseheightisastandarddeviationormoreawayfromthepopulation meanwouldsufferfromasubstantialreductioninfitness.Arguably,deviationsfrom themeanheightwouldthenberecognizedasaverycommonandseveredisease. Anotherlineofargumentthatitislikelythat𝜎 " ≪ 𝑤 " isbasedonourresults.Ifwe assumethatmutationsarestronglyselected,thenourresultssuggestthat 𝜎 " = 2𝑁𝑈 ∙ 𝑣” = s¯ O 𝑤 " . (S43) 16 Itfollowsthatiftherateofmutationsaffectingthephenotypeunderconsideration satisfies𝑈 ≪ 1then𝜎 " ≪ 𝑤 " .Thenumberofmutationsperdiploidhumangenome pergenerationisestimatedtobe~60(5),andlessthan10%ofthegenomeis assumedtobefunctional(6),suggestingthatthenumberofdenovomutationswith anyeffectonfunctionislessthan3perhaploidpergeneration.Itthenseems plausiblethatthe(haploid)mutationrateaffectingaspecifictraitsatisfies𝑈 ≪ 1. AssumingthatmutationsareweaklyselectedincreasesthevarianceinEq.S43only moderatelyandassumingthemutationsareeffectivelyneutralwouldsuggestitis muchsmaller,leavingtheaboveargumentintact. 4.3. Mutationaleffectsizessatisfy𝒂𝟐 ≪ 𝒘𝟐 Aswearguedintheintroductionofthemaintext,variantsforwhichthestronger condition𝑎" ≪ 𝜎 " holdsaccountformostoralloftheheritabilityexplainedin GWASformanytraits(e.g.(7-10)).Moreover,evidenceformanytraitssuggeststhat thesameistrueforthevariantsthatunderlietheheritabilitythatremainstobe explained(11-14).Indeed,forthisassumptiontobeviolated,muchofthegenetic variancewouldhavetoarisefrommutationsthathaveaverylargeimpactonfitness (i.e.,withsontheorderof1).Whilethismaybethecaseforsomediseases(e.g., autism(15)),itdoesnotappeartobethecaseformostphenotypesthathavebeen examined. 4.4. Deviationsofthemeanphenotypefromtheoptimumcanbeneglected Inreality,themeanphenotypeofthepopulationfluctuatesaroundtheoptimum. Here,wederiveequationsforthedynamicofthemeanphenotypeinorderto estimatethemagnitudeandtimescaleofthesefluctuations.Wethenshowthatthese fluctuationshaveanegligibleeffectonthefirsttwomomentsofchangeinallele frequencyandthusontheresultsthatfollowfromthesemoments. Webeginbyderivingthefirstandsecondmomentofchangeinmeanphenotype.To thisend,weassumethedistributionofphenotypesisamultivariatenormal centeredaroundameanphenotype,𝑟,i.e.that 17 𝑓 𝑟 = 0 "1e 5 4 5 𝑒𝑥𝑝 − 686 5 "e 5 . (S44) Theexpectedchangeinmeanphenotypeduetoselectioninonegenerationis therefore E Δ𝑟 = 9 | 6 ² 6 6 9 | 6 ² 6 −𝑟 =− e5 A 5 ?e 5 𝑟≈− e5 A5 𝑟. (S45) Bythesametoken,thevarianceinΔ𝑟issimplythesamplingvariance V Δ𝑟 ≈ e5 v , (S46) whereinbothcaseswereliedontheassumptionthat𝜎 " ≪ 𝑤 " . ThesetwomomentsdefineanOrnstein-Uhlenbeckprocessin𝑟,allowingustorely onwell-knownresults(16).Notably,whenthemeanphenotype𝑟startsfarfromthe optimum,itdecaysexponentiallytotheoptimumwithexponent𝜎 " /𝑤 " (seeSection S5.1below).Atsteadystate,𝑟willfluctuatewithmeanzeroand𝐸 𝑟 " = 𝑛 𝑤 " 2𝑁 overatimescaleof𝑤 " 𝜎 " generations.Thetypicaldisplacementof𝑟inanygiven directionwillbe 𝑤 " 2𝑁,reflectingabalancebetweendriftandthepullof selectiontowardtheoptimum. Nextweshowthatthesefluctuationsofthemeanhavenegligibleeffectsonallelic trajectories.Tothisend,wederivethefirsttwomomentsofchangeinallele frequency,butthistime,weincludetheeffectofthedisplacementof𝑟fromthe optimum.Whilethesecondmomentremainsthesame,thefirstmomentbecomes E Δ𝑞 ≈ − 6⋅S "A 5 𝑝𝑞 − S5 sA 5 𝑝𝑞 0 " −𝑞 =− 6´ y A 5 /"v "v 𝑝𝑞 − y "v 𝑝𝑞 0 " − 𝑞 ,(S47) where𝑟S is𝑟’scomponentinthedirectionof𝑎.However,ouranalysisestablishes that 6´ A 5 /"v isascalaroftheorderof1,whichfluctuatesaroundzeroonatimescale of𝑤 " 𝜎 " .Wecanthereforecomparethefirsttermintheaboveequation,which representsdirectionalselection,andthesecondterm,whichrepresentsstabilizing selection.Whenstabilizingselectionisstrong,𝑆 ≫ 1,thestabilizingselectionterm 18 dominatesoverthedirectionalselectionterm.Incontrast,whenselectionisweak, i.e.,𝑆 ≈ 1orsmaller,theninanygivengeneration,thedirectionaltermisnot necessarilynegligible.However,inthiscase,bothtermsaffectsubstantialchangein allelefrequencyonlyoveratimescaleof2𝑁generations;onthistimescale,if2𝑁 ≫ 𝑤 " 𝜎 " ,thedirectionaleffectwouldaveragetozero.Thedirectionaltermwill becomeimportantonlywhen2𝑁 ≤ 𝑤 " 𝜎 " ,thatis𝜎 " ≤ 𝑤 " 2𝑁.For𝜎 " tobethat small,virtuallyallallelesmusthave𝑆 ≪ 1,suchthattheirtrajectorieswillbe determinedbydrift,notselection.Insummary,regardlessoftheselectionactingon anallele,fluctuationsofthemeanphenotypearoundtheoptimumwillhavea negligibleeffectonitstrajectory. 5. Modelrobustness Inthissection,weconsiderthesensitivityofourresultstorelaxingsomeofthe simplifyingmodelingassumptionsaboutselectionandmutation.Specifically,we showourresultstoberobusttomoderatechangestotheoptimalphenotype;small asymmetryinthemutationalinput;thepresenceofmajorlocimaintainedathigh frequencybyselectionontraitsthatarenotincludedinthemodel;aswellasto mostformsofanisotropicmutation. 5.1. Changestotheoptimalphenotype Wefirstconsiderhowchangestotheoptimalphenotypeovertimewouldaffectour results.ItiseasytoimaginehoweventssuchasmigrationfromAfricatoEuropeor theonsetofagriculturemayhaveintroducedrapidchangesinbeneficial phenotypes.Inordertoevaluatethepotentialimpactofsuchevents,weconsider howaninstantaneouschangetotheoptimalphenotypewouldaffecttheallelic dynamics. Webeginbyconsideringhowsuchaninstantaneouschangetotheoptimumwould affectthemeanphenotype.Iftheshifttotheoptimumissmall,ontheorderofthe fluctuationsinthemeanphenotypeatsteadystateorsmaller,thenthearguments providedinSectionS4.4willstillholdandtheshiftwouldhaveanegligiblysmall effectonourresults.Wethereforeassumethattheshiftinoptimum,𝑧,islarge 19 comparedtothescaleoffluctuations(𝑧 " ≫ 𝑤 " /2𝑁).Thisassumptionmeansthat wecanuseadeterministicapproximation(basedonEq.S45)anddescribethe changeinmeanphenotypeinasinglegenerationby Δ𝑟 ≈ E Δ𝑟 = − e5 𝑟−𝑧 A5 (S48) (neglectinghighermoments).Furtherassumingthatthemeanphenotypewasatthe optimum,0,beforetheoptimumshifted(attime𝑡 = 0)andneglectingchangesto thegeneticvariance𝜎,wefindthat 𝑟 𝑡 = 𝑧 1 − exp − e5 A5 𝑡 (S49). Thus,themean𝑟adaptstothenewoptimumonatimescaleof𝑤 " /𝜎 " generations (cf.(17)forasimilarderivation). Wecanrelyonthisapproximationtolearnwhenashiftinoptimumwillhave negligibleeffectsonalleletrajectories.RecallingEq.S47,thefirstmomentofchange inallelefrequencyisgivenby E Δ𝑞 ≈ − { · 8¸ ⋅S "A 5 𝑝𝑞 − S5 sA 5 𝑝𝑞 0 " − 𝑞 , (S50) where,basedonourapproximation(Eq.S49),thedirectionalselectionterm introducedbytheshiftinoptimumtakesthetime-dependentform E Δ¹ 𝑞 = − { · 8¸ ⋅S "A 5 𝑝𝑞 ≈ ¸⋅S e5 "A A5 exp − 5 𝑡 𝑝𝑞. (S51) Theeffectofthisdirectionaltermovertheentireadaptivetrajectorycanbe quantifiedbycomparingtheexpectedallelefrequencyafteradaptationtotheshift, 𝑞¹ ,withinitialfrequencybeforetheshift,𝑞n ,i.e., ln 𝑞¹ 𝑞n = · £ »¼ k k(·) = ¸⋅S "A 5 · 𝑝 𝑡 exp − e5 A5 𝑡 < ¸⋅S "A 5 · exp − e5 A5 𝑡 = ¸⋅S "e 5 .(S52) Thisresultsuggeststhattherelativechangeinallelefrequencywillbenegligibleso longas 20 ¸⋅S "e 5 ≪ 1. (S53) Thisconditionsuggeststhatmutationswithsmallereffectswouldbelessaffectedby theshiftintheoptimum.Itfurthersuggeststhatallelesthatsatisfy𝑎" ≪ 𝜎 " ,as appearstobethecaseformostlocidiscoveredinGWAS(e.g.,(7-10)),willbe negligiblyaffectedbyshiftsinoptimumontheorderofthetotalgeneticvariation (i.e.,𝑧 ≤ 𝜎).Theseanalyticpredictionsareconfirmedbysimulations(Fig.S4). G(v*) G(v*) 1 z E(a 2 ) = 0 2σ 2 0.5 1 2.5 0.5 0 0 5 10 15 xvs Thresholdcontributiontovariance(v*) Threshold contibution to variance (v*)) FigureS4.Distributionofthecontributionsofsitestovarianceafterashiftinthe optimum.They-axisistheproportionofthevarianceexplainedbysitesthat contributemorethan𝑣 ∗ tothevariance.Thedashedblackisthetheoretical predictionwithoutadaptationandincolorsimulationresultswithadaptation. Whentherootmeansquareof ¸⋅S "e 5 becomeslargerthan1,directionalselection substantiallyaffectsallelefrequenciesandthereforethecontributionsofsitesto variance,aspredictedbyEq.S53.(Sincemutationissymmetricthemeanof zeroandwequantifyitscharacteristicvaluebyitsrootmeansquare ¸ ¾ S5 "e 5 ¸⋅S "e 5 is .) Simulationswererunwithanexponentialdistributionofselectioncoefficientswith E(𝑆) = 25,𝑁 = 1,000,𝑛 = 1,𝑈 = 0.01,andaburn-intimeof10,000generations. Resultsweretaken50generationsaftertheshiftinoptimum,which,forthese parameters,isjustafterthepopulationmeanhasreachedthenewoptimum. 21 5.2. Asymmetricmutationalinput Inthissection,weconsiderthesensitivityofourresultstoasymmetriesinthe mutationalinput,i.e.,tothecaseinwhichmutationsinagivendirectionintrait space,𝑏,aremorelikelytoarisethanmutationsintheoppositedirection,−𝑏(see (18)fortreatmentofthisprobleminthelimitofhighper-sitemutationrate). Anasymmetricmutationalinputintroducesashiftinthemeanphenotypeevery generation.Withnewmutationsarisingatfrequency1/2𝑁,theexpectedshiftis ΔÁ 𝑟 = 2𝑁𝑈EÁ 𝑎 ∙ 1 2𝑁 = 𝑈EÁ 𝑎 , (S54) whereEÁ istheexpectationovernewlyarisingmutations.Foreachtrait,effects haveacharacteristicsize E 𝑎" 𝑛= 𝑣” E(𝑆).Thecharacteristiceffectsizesets thescaleforthemaximalshiftinanydirection,thatis Δ 𝑟 isoftheorderof 𝑈 E 𝑎" 𝑛orsmaller.Wethereforeparameterizetheshiftinmeanphenotypedue tonewmutationsby ΔÁ 𝑟 = 𝑈EÁ 𝑎 = 𝑈 E 𝑎2 𝑛 𝑏, (S55) wherethevector𝑏parameterizesthestrengthanddirectionofthebiasand𝑏 = 𝑏 isassumedtobe<<1. Atsteadystate,themutationalshiftmustbeoffsetbyselection,suchthat ΔÁ 𝑟 + Δà 𝑟 + ΔX 𝑟 = 0, (S56) whereΔà 𝑟andΔX 𝑟aretheexpectedshiftsduetodirectionalandstabilizing selection,respectively.Wepreviouslyfoundthattheexpecteddirectionalshiftis Δ¹ 𝑟 = − e5 A5 𝑟, (S57) where𝑟denotesthemeanphenotype(cf.Eq.S45).Asweshownext,when mutationsarestronglyselected,stabilizingselectionoffsetsthemutationalshiftto maintainthemeanphenotypeattheoptimum,implyingthatdirectionalselectionis negligible.Incontrast,whenmutationsareeffectivelyneutral,stabilizingselectionis 22 negligibleandadirectionaltermmightnotbenegligiblebycomparison.However, aslongasasymmetryissmall,𝑏 ≪ 1,weshowthatthisdirectionaltermisnotlarge enoughtochangethealleledynamics,bothwhenallmutationsareeffectively neutralandwhensomemutationsarestronglyselected. First,weconsidertheshiftinmeanphenotypeduetostabilizingselection.Thisshift arisesbecause,withasymmetricmutationalinput,thedistributionofphenotypes becomesskewed.Therefore,evenifthemeanphenotypeisattheoptimum, individualswithagivenfitnessmayhaveanasymmetricdistributionofphenotypes aroundtheoptimum,leadingstabilizingselectiontochangethemeanphenotype. Wehavealreadyshown(Eq.S15)thattheexpectedchangeinallelefrequenciesper generationduetostabilizingselectionatanygivensiteiis 𝐸 Δ𝑞Ä = − SÅ5 sA 5 𝑝Ä 𝑞Ä 0 " − 𝑞Ä . (S58) Theexpectedchangeinmeanphenotypecanthenbecalculatedbyaddingupthe contributionsoversites SÅ5 Ä 𝑎Ä sA 5 𝑝Ä 𝑞Ä ΔX 𝑟 = −E 0 " − 𝑞Ä . (S59) Theright-handsideofthisequationreflectstheskewnessofthephenotypic distribution.Indeed,inonedimension,itcanbeshownthat Δy 𝑟 = − Æ] 6 "A 5 , (S60) withµÈ (𝑟)beingthethirdcentralmomentofthephenotypicdistribution.Inndimensions,foreverydirection𝑥, Δy 𝑟€ = − withµÈ 𝑟 ÊÉË 0 "A 5 𝐸 𝑟−𝑟 =E 𝑟−𝑟 Ê € 𝑟−𝑟 𝑟−𝑟 É " =− 𝑟−𝑟 0 "A 5 Ë É µÈ 𝑟 €ÉÉ , (S61) . Whensitesareunderstrongselection,ΔX 𝑟takesasimpleform.Assumingthe asymmetryissmall,theshiftduetostabilizingselectioncanbeexpandedinorders of𝑏.Theleadingterminthefrequencydistributiontakesthesameformasitdoes 23 withoutthebias.Forstronglyselectedalleleswithnobias,𝑞 ≪ 1andthereforethe frequencydependenceinthistermcanbeapproximatedby𝑝𝑞 0 " 0 − 𝑞 ≈ 𝑞. " Moreover,qscaleswith1/a2,implyingthatthedistributionof𝑎" 𝑞isindependentof 𝑎andthatE 𝑎 " 𝑞 = 4𝑤 " /𝑁(cf.SectionS3.1).Therefore,whenallsitesarestrongly selected,theleadingtermintheshiftduetostabilizingselectionis SÅ5 kÅ Ä 𝑎Ä sA 5 " Δny 𝑟 = −E = −Δ 𝑟. =− ¾ S5 k •A 5 E Ä 𝑎Ä =− 0 "v 2𝑁𝑈E 𝑎 = −𝑈 𝑣” E 𝑆 𝑏 (S62) Thus,toafirstorderin𝑏,theshiftofthemeanphenotypeduetostabilizing selectionoffsetsthemutationalshift,implyingthattherewillbenodirectionalterm andthatthealleledynamicswillnotbeaffectedbyasymmetry. Whenallelesareinsteadeffectivelyneutral,then𝑎" /4𝑤 " ≪ 1/2𝑁(cf.SectionS2.2) andallelefrequenciesarewellapproximatedbytheneutralsojourntime,𝜏 𝑞 ≈ 2/𝑞.Theshiftduetostabilizingselectionthensatisfies Δy 𝑟 = −E Ä 𝑎Ä SÅ5 0 sA " 𝑝𝑞 5 Ä Ä SÅ5 Ä 𝑎Ä sA 5 0 − E Ì ≪− − 𝑞Ä 0 "v E ≈ −E 𝑝𝑞 Ä 𝑎Ä 0 " −𝑞 = −Δ 𝑟, E Ä 𝑎Ä SÅ5 = sA 5 (S63) implyingthatitmakesanegligiblecontributiontooffsettingthemutationalshift.In thiscase,themutationaleffectonthemeanphenotypeisthereforeoffsetby directionalselection,where Δ¹ 𝑟 = − e5 A5 𝑟 ≈ −Δ 𝑟, (S64) indicatingadisplacementofthemeanphenotypefromtheoptimum 𝑟= A5 e5 Δ𝑀 𝑟. (S65) Thisdisplacementintroducesadirectionalselectiontermintothefirstmomentof changeinallelefrequencythat,iflargeenough,couldalteralleledynamics(cf. SectionS4.4). However,whenallallelesareeffectivelyneutral,wehave 24 A5 𝑟= A5 "A 5 1 Δ𝑀 𝑟 = "v¯œ ¾(y) " 𝑈 𝑣𝑠 E(𝑆)𝑏 = 2𝑁 e5 𝑣𝑠 E(𝑆) Î 𝑏, (S66) andthereforethescaleddirectionalselectioncoefficient,foranallelewitheffectsize 𝑎andscaledstabilizingselectioncoefficient𝑆 = 2𝑁 6⋅S "A 5 = Ï´ 𝑎 œÎ £(y) ~ (Ï / 𝑛)𝑎 œÎ £(y) = Ï œÎ y œÎ £(y) vS 5 "A 5 y = £ y ,isoftheorderof 𝑏, (S67) with𝑏S ~𝑏/ 𝑛beingtheprojectionof𝑏inthedirectionof𝑎.Since𝑏 ≪ 1,forall allelesotherthanthosewithunusuallylargeselectioncoefficients,thescaled directionalselectioncoefficientwillbemuchsmallerthan1andthetrajectorieswill stillbedeterminedbydriftandnotselection.Eveninthiscase,therefore,wedonot expectasymmetrytoaffectalleledynamics. Next,weconsiderthecasewherethereisamixofeffectivelyneutralandstrongly selectedmutations.Theexistenceofstronglyselectedmutationsinadditionto effectivelyneutralonesreducesthedeviationofthemeanphenotypefromthe optimum.Denotingtheproportionofstronglyselectedmutationsby𝑝” ,wehave 𝑟= 𝑤2 𝜎2 𝑈 1 − 𝑝” 𝑣” E 𝑆 *.O. 𝑏, (S68) whereE 𝑆 z.Ž. ≤ 1isthemeanscaledstabilizingselectioncoefficientforeffectively neutralmutations.Since𝜎 " > 2𝑁𝑈𝑝” 𝑣” ,wecanthenobtainanupperboundtothe magnitudeofscaleddirectionalselectioncoefficientforanallelewitheffectsize𝑎 andscaledstabilizingselectioncoefficient𝑆 = 2𝑁 < 0 1 " 𝑈𝑝𝑠 𝑣𝑠 6⋅S "A 5 = 2𝑁 𝑈 1 − 𝑝” 0 1 " 𝜎2 𝑈 1 − 𝑝” vS 5 "A 5 : 𝑣” E 𝑆 *.O. 𝑏 ⋅ 𝑎 𝑣” 𝐸 𝑆 *.O. 𝑏 ⋅ 𝑎~ 1−pÎ 2𝑝𝑠 E 𝑆 *.O. 𝑆𝑏. Withasubstantialproportionofstronglyselectedsites, 08pÑ "pÎ (S69) isoftheorderof1,and 08𝑝 therefore "p 𝑠 E 𝑆𝑒.𝑛. 𝑏 ≪ 1.Thisconditionimpliesthatforeffectivelyneutral Î alleles(i.e.,𝑆 ≤ 1),thescaleddirectionalselectioncoefficientis≪ 1andallele trajectorieswillbedeterminedbygeneticdrift,whereasforstronglyselectedalleles 25 (i.e.,when𝑆 ≫ 1),thescaleddirectionalselectioncoefficientis≪ 𝑆andtherefore negligiblecomparedtothescaledstabilizingselectioncoefficient. Weaklyselectedalleles(with𝑆~10)behavelargelylikestronglyselectedalleles exceptthatstabilizingselectiononthemonlypartiallycancelsoutthemutational bias(forexample,for𝑆 = 10only85%ofthebiasiscanceled).Therestofthebiasis canceledbydirectionalselectionandthereforeinducesasmallshiftinthemean phenotype.Itisstraightforwardtorepeattheargumentsgivenaboveandshowthat theshiftinthemeanphenotypeforatraitwithonlyweaklyselectedallelesora mixturethatincludesweaklyselectedallelesnegligiblyaffectsalleletrajectories. Thus,weconcludethatsmallasymmetryinmutationwillnotaffecttheallelic dynamic(seeFig.S5). * G(v*) % heritability from loci v>v * 1 0.8 0.6 0.4 0.2 0 0.6 0.4 0.4 0.6 0.80.8 Asymmetry Asymmetry(b) Bias (b) b= b= 0 0.05 0.1 0.2 0.35 0.5 0.7 0.9 1 1 0.5 0.5 00 00 00 00 0.2 0.20.2 0.4 0.6 0.8 0.8 0.8 11 0.2 0.4 0.4 0.4 0.6 0.6 0.8 Asymmetry Asymmetry (b) Bias Bias (b) Asymmetry(b) (b) 11 0 0.02 0.04 0.06 0.08 0.1 xvs Threshold contribution to variance (v* ) Thresholdcontributiontovariance(v*) 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 Meanphenotype phenotype Mean 0.5 0.5 0.5 0.5 0.5 0.5 b= b= b= 00 0.05 0.05 0.1 0.1 0.2 0.2 0.35 0.35 0.5 0.5 0.7 0.7 0.9 0.9 11 Meanphenotype phenotype Mean 11 vvss 11 1.5 1.5 11 0.5 0.5 0.5 0.5 00 0 00 0 000 0 0 0.2 0.2 0.4 0.4 0.6 0.8 0.4 0.8 0.6 0.6 0.8 0.8 0.8 111 11 1 00 00 0.2 0.20.2 000 0.20.2 0.2 0.2 0.4 0.4 0.6 0.6 0.8 0.2 Asymmetry (b) Bias Asymmetry Asymmetry Asymmetry(b) Bias Bias (b)(b) 1 111 (b)0 0 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 b= b= b= b= 000 0.05 0.05 0.05 0.1 0.1 0.1 0.2 0.2 0.2 0.35 0.35 0.35 0.5 0.5 0.5 0.7 0.7 0.7 0.9 0.9 0.9 xvs 00 000 0.02 0.04 0.06 0.08 5 00 0.02 1 0.04 2 0.06 3 0.08 4 0.1 5 xvs 000 11 22 33 44 50.1 ** * **)) Threshold contribution to variance (v Thresholdcontributiontovariance(v*) Threshold contribution to variance (v Thresholdcontributiontovariance(v*) Threshold contribution to variance (v ) Threshold contribution to variance (v ) Threshold contribution to variance (v ) 0.8 0.6 0.4 0.4 0.6 0.4 0.6 0.4 0.6 0.4 0.6 Asymmetry (b) Bias Asymmetry Bias (b) 0.8 0.8 0.80. b= 0 0. 0. 0. 0. 0. 0. 0. 0.2 0 FigureS5.Theeffectofasymmetricmutationalinputonmeanphenotypeandthe contributionofsitestovariance.(a)Themeanphenotype𝑟,inunitsof 𝑣” ,asa functionofmutationalbias𝑏.(b)Proportionofgeneticvarianceasafunctionofthe thresholdcontributiontovariance𝑣 ∗ ,i.e.,G(𝑣 ∗ ),fordifferentbiasstrengths. Simulationswererunwith𝑁 = 1,000,𝑛 = 1andwithdifferentmixturesof effectivelyneutral(distributedaround𝑆 = 0.1)andstrong(distributedaround𝑆 = 50)selectioncoefficients.Asymmetrywassimulatedbyhavingmoretraitincreasing 10% strong 100% strong 22 1.5 1.5 1.5 1.5 1.5 11 1 111 1.5 1.5 XX 22 2.5 2.5 * (b) 0 0.2 0.2 222 1.5 1.5 1.5 11 0.5 0.5 0.5 0 2.5 22 22.5 2.5 2.5 2.5 0%strong strong 100% strong 100%strong 10% % heritability from loci v>v* 1 0 s ss Mean Meanphenotype phenotype Mean phenotype Mean phenotype Mean phenotype Meanphenotype Mean phenotype Mean phenotype Mean phenotype 1.5 * 0 2.5 2.5 2.5 (a) XXX vvv 10%strong 10% 0% strong strong 22 1.5 1.5 1 0.5 vvs s 22 2.5 2.5 2 1.5 XX * % from G(v*) %heritability heritability fromloci lociv>v v>v*** % heritability from loci v>v 2.5 2.5 (a) 0%strong 0% strong 2.5 Meanphenotype Mean phenotype Mean phenotype 2 vs * G(v*) % from % heritability heritability from loci loci v>v v>v * X Meanphenotype Mean Mean phenotype phenotype 2.5 (a) 26 0 1 2 3 4 Threshold contribution to variance thantraitdecreasingmutations;if𝛽istheproportionoftraitincreasingmutations thentheasymmetrycoefficientis𝑏 = 2𝛽 − 1.Asexpected,onlyforlargebiases (when𝑏~1),aretheresubstantialchangesinthedistributionofthecontributionof sitestovariance.Simulationswererunwitha10,000generationsburn-inperiod withoutasymmetryandthen10,000generationswithasymmetryandaveraged overmanyruns(>300),withthenumberofrunsvariedacrossplotskeeperrorsin (b)below1%. 5.3. Majoreffectloci Inthissection,weshowthatourresultsareinsensitivetothepresenceofmajorloci, i.e.,individuallocithatcontributesubstantiallytoquantitativegeneticvariation.We haveinmind,forexample,lociwhoseallelesaremaintainedathighfrequenciesby balancingselectiononaMendeliantraitbuthavepleiotropiceffectsonthe quantitativetraitsunderconsideration(e.g.,MHCloci;(19,20)).Whilesuchloci violateourassumptions,weshowthattheydonotaffectthedynamicsatotherloci thatfulfillthem. Tothisend,wecalculatethefirsttwomomentsofchangeinallelefrequencyinthe presenceofamajorlocus.Wedenotethefrequencyandeffectsizeofthefocalallele by𝑞and𝑎,andthefrequencyandeffectsizeofthemajoralleleby𝑞 and𝑎 , respectively.Asinourpreviousderivations(SectionS2),thedistributionof backgroundphenotypiccontributionfromallotherloci,𝑅,iswellapproximatedby thenormaldistribution f 𝑅 𝑎 , 𝑞 , 𝑎, 𝑞 = 0 5) "1(e 5 8eÓ 4/5 exp − j?kS?kÓ SÓ 5) "(e 5 8eÓ 5 , (S70) where𝜎Â" isthecontributiontogeneticvariancefromthemajorlocus.The populationmeanremainsclosetotheoptimumbecauseanyshiftcausedbythe majorlocusisquicklycompensatedforbytheotherloci(seeSectionS4.4).Wethen averageoverboththisdistributionandthethreegenotypesatthemajorlocusto calculatethemeanfitnessassociatedwitheachgenotypeatthefocallocus.Namely, 27 𝑊nn = 1 − 𝑞 +2𝑞 1 − 𝑞 " +𝑞 j " j f 𝑅 𝑎 , 𝑞 , 𝑎, 𝑞 W 𝑅 + 𝑎 2 j f 𝑅 𝑎 , 𝑞 , 𝑎, 𝑞 W 𝑅 f 𝑅 𝑎 , 𝑞 , 𝑎, 𝑞 W 𝑅 + 𝑎 , (S71) andsimilarlyfortheothergenotypes.Inthisway,weobtainthefirstmomentofthe changeinallelefrequency E Δ𝑞 = −𝑝𝑞 p qrr 8qrT ?k qrT 8qTT q ≈− S5 sA 5 𝑝𝑞 𝑞 − 0 " , (S72) whichisthesameaswederivedintheabsenceofamajorlocus(Eq.S15).Similarly, wefindthesecondmomenttobeunaffected. 5.4. Anisotropicmutation Inthissection,weconsiderhowrelaxingtheassumptionthatthedistributionof newlyarisingmutationsisisotropicintraitspacewouldaffectourresults.Asnoted, wecanalwayschooseanorthonormalcoordinatesystemcenteredattheoptimum, inwhichthetraitunderconsiderationvariesalongthefirstcoordinateandaunit changeinothertraits(i.e.,inothercoordinates)neartheoptimumhavethesame effectonfitness.Thereis,however,noobviousreasonforthedistributionofnewly arisingmutationstobeisotropicinthiscoordinatesystem(see(21)for generalizationsofFisher’sGeometricModelalongsimilarlines). Anisotropyinmutationdoesnotaffectthemomentsofchangeinallelefrequency,as thesedependonlyontheselectiononanalleleorequivalentlyonitseffectsizebut notonitsdirectionintraitspace.Anisotropycouldaffectthedistributionofallelic effectsizesonthefocaltraitconditionalontheselectionactingonthem.Here,we provideheuristicargumentssuggestingthat,barringextremecases,wecandefine aneffectivenumberoftraits𝑛* andaneffectivestrengthofselectionwe2forwhich therelationshipbetweenselectionandeffectsizeinanisotropicmodelsiswell approximatedbytherelationshipfoundforisotropicones(SectionS1.2). 28 Wefocusonafamilyofanisotropicmutationaldistributionsthatcanbedescribed asaprojectionofamultivariatenormaldistributionontheunitsphereintrait space.Namely,wedrawthesizeofamutation𝑎 = 𝑎 fromsomedistributionand toobtainitsdirection,wedrawavector𝛼fromamulti-variatenormaldistribution MVN(0, 𝜮)andnormalizeit,i.e., × 𝑎 = 𝑎 , × (S73) (S74) andtherefore 𝑎0 = 𝑎 ×T × . Thisfamilyofmutationaldistributionsgivesusamathematicallytractable frameworkwithwhichtoexaminethebehaviorofourmodelwithanisotropy. Withanisotropy,thebehaviorofourmodelgreatlydependsontherelative contributionofthefocaltraittoselection,whichweparameterizeby 𝛾0 ≡ £ ×T5 £ ×5 = 𝜮TT Ù{ 𝜮 . (S75) Whenselectionactsmainlyonourfocaltrait,i.e.when𝛾0 ≈ 1,then|𝛼0 | ≈ 𝛼and therefore𝑎0 ≈ ±𝑎.Sucharelationshipbetweenthestrengthofselectionandeffect sizeiswellapproximatedbyanisotropicmodelwith𝑛* = 1.Wethereforefocuson casesinwhichthereisasignificantpleiotropiccontributiontoselection,i.e.,𝛾0 is substantiallylessthan1.Anisotropythenhastwoeffects:thefirstistointroduce heterogeneityinthestrengthofselectionondifferenttraitsandthesecondisto introducecorrelationsintheeffectsofamutationondifferenttraits,notably betweenthefocaltraitandothers. Wefirstconsiderthecaseinwhichthestrengthofselectiondiffersamongtraits,but traitsareuncorrelated,correspondingtoadiagonalcovariancematrix,𝜮.When manytraitshaveanon-negligiblecontributiontoselection,𝛼 " = 𝛼0" + 𝛼"" + ⋯ 𝛼O" wouldhaveasmallcoefficientofvariation,i.e.,CÜ" 𝛼 " = V 𝛼 " /E 𝛼 " " ≪ 1, becauseofthelawoflargenumbers.Inthiscase, 29 𝑎0 = 𝑎 = ×T × =𝑎 S 0/ÝT ×T ¾ ×5 ×T £ ×T5 , 1 + O CÜ" 𝛼 " ≈𝑎 Since𝛼0 / E 𝛼0" ~N(0,1)and𝑆 = ×T £ ×5 v "A 5 (S76) 𝑎" thisimpliesthat,conditionalonthe selectioncoefficient𝑆,theeffectsizeonthefocaltraitwillbedistributedas 𝑎0 ~N 0, "A 5 O3 v 𝑆 (S77) with𝑛* = 1/𝛾0 .Thisisthesamerelationshipbetweenselectionandeffectsizeas thehighpleiotropyisotropicmodelwith𝑛 = 𝑛* (Eq.8&10ofmaintext).Thisresult suggeststheconceptofaneffectivenumberoftraits,whichcanbethoughtofasthe numberoftraitsthathavethesameeffectonfitnessasthefocaloneandare requiredtoproducethesamestrengthofselectiononalleles.Theeffectivenumber oftraitsdescribesthedistributionofeffectsizesbothinthelimitofhighpleiotropy 𝑛* ≫ 1andlowpleiotropy𝑛* ≈ 1andsimulationsshowthatitdescribesthe distribution,atleastqualitatively,alsoforintermediatevaluesof𝑛* (Fig.S6). However,thereisanextremescenarioinwhichaneffectivenumberoftraitscannot describethedistributionofeffectsizes.ThishappenswhenCÜ" 𝛼 " ≥ 1,thatiswhen selectionactsmainlyonasmallnumberoftraitsbutourfocaltraitcontributesvery littletoselection(𝛾0 ≪ 1).Inthiscase,wemightbetemptedtouse𝑛* = 1/𝛾0 ≫ 1 but,asEq.S76suggests,thehighpleiotropylimitwouldbeinadequate.Infact,the varianceinselectiononnewly-arisingmutations(duetothecontributionofthe selectedtraits)willresultinalong-taileddistributionofeffectsizesonthefocal trait,whichisnotwell-approximatedbyanyisotropicmodel.Insummary,except fortheseextremecases,isotropicmodelsprovideagoodapproximationforthe relationshipbetweenselectionandeffectsize,evenwhenthereisheterogeneityin thestrengthofselectionondifferenttraits. Toillustratetheeffectofheterogeneityinthestrengthofselectionamongtraits,we considerasimpleexampleinwhichallnon-focaltraitsmakethesamecontribution toselectionandthereforecanbemodeledby 30 1 0 " 𝜮= 0 𝑐 0 0 ⋮ 0 0 𝑐" ⋯ , (S78) ⋱ with𝑐 " beingtheratiobetweentheexpectedcontributionofanon-focaltraitto selectionandtheexpectedcontributionofthefocaltraittoselection.Itiseasyto seethatinthismodel𝑛* = 1/𝛾0 = 1 + (𝑛 − 1)𝑐 " .Numericalresultsforthismodel areshowninFig.S6. Largecontribution offocaltrait c2=0.4 CV2(!2)≈0.2 "1=0.2 Manytraits n=11 Density 0.4 0.3 (b) 0.5 0.4 ne=5 Density (a) 0.5 0.2 0 -4 -2 0 Effectsize 2 4× 0.2 (d) 0.5 CV2(!2)≈1 "1≈0.5 0.4 0.6 ne≈2 0.4 0.2 0 -4 0.3 0 -4 a2 n Density 0.8 Density Fewtraits n=3 1 CV2(!2)≈0.2 "1≈0.02 ne≈50 0.1 0.1 (c) Smallcontribution offocaltrait c2=5 -2 0 2 Effectsize CV2(!2)≈1 "1≈0.1 4 × a2 n × a2 n ne≈10 0.3 0.2 0.1 -2 0 Effectsize 2 4 × a2 n 0 -4 -2 0 2 Effectsize 4 Anisotropicmodel Effectiveisotropicmodel FigureS6.Theeffectsofheterogeneityinthestrengthofselectionondifferenttraits onthedistributionofeffectsizesinthefocaltrait.Numericalresultsformodelswith thecorrelationmatrixdefinedinEq.S78areshowninblueandthecorresponding isotropicmodelinblackdashes.Whentherearemanyselectedtraits,anisotropic modelwith𝑛* = 1/𝛾0 = 1 + (𝑛 − 1)𝑐 " providesagoodapproximationofthe distributionofeffectsizes,bothwhenthefocaltraitcontributessubstantiallyto selection(a)andwhenitdoesnot(b).Whentherearefewtraits,anisotropicmodel 31 with𝑛* = 1/𝛾0 = 1 + (𝑛 − 1)𝑐 " providesagoodapproximationonlywhenthefocal traitcontributessubstantiallytoselection(c&d). Next,weconsiderthecaseinwhichtheeffectsizesondifferenttraitsarecorrelated, i.e.,whenthecovariancematrix𝜮hasoff-diagonalterms.𝑎0" ∝ 𝛼0" /𝛼 " andtherefore weparameterizetheeffectofthesetermsusingthecorrelationbetween𝛼0" and𝛼 " , 𝜌" ≡ corr α" , 𝛼0" .Ifthecorrelationissmall,𝜌" ≪ 1,thenourpreviousreasoning holds.Intheotherextreme,whenallselectedtraitsarehighlycorrelatedwiththe focaltrait,i.e.𝜌" ≈ 1,thentheproportionalcontributionofthefocaltraitto ×T5 selectionisconstant, ×5 ≈ £(×T5 ) £(× 5 ) = 𝛾0 ,andtheeffectsizeis𝑎0 = ± 𝛾0 𝑎.Thismodel isthereforeequivalenttoanisotropiconewith𝑛* = 1and𝑤*" = 𝛾0 𝑤 " ;thelatter changecorrespondstoincreasingthestrengthofselectiononthefocaltraitto accountforselectionontheothertraitswhicharehighlycorrelatedwiththefocal trait.Intermediatecasesaremorecomplex:whileeffectsizesarestilloftheorderof 𝛾0 𝑎,theshapeofthedistributionofeffectsizesisintermediatebetweenthesingle traitandhighpleiotropylimits.Isotropicmodelswithaneffectivenumberoftraits, 𝑛* < 1/𝛾0 ,andincreasedselection𝑤*" = 𝛾0 𝑛* 𝑤 " canqualitativelydescribethese casesbutmaynotcompletelycapturethedistributionofeffectsizes.Thevalueof𝑛* wouldchangefrom1when𝜌" → 1to1/𝛾0 when𝜌" → 0.Notethatwithalarge numberoftraits,verystrongcorrelationsamongmanyofthetraitswillbe necessaryinordertocreatealargeenough𝜌" tohaveasignificanteffecton𝑛* (see Fig.S7). Toillustratetheeffectofcorrelationsamongtraits,weconsiderthefollowingsimple example(FigureS7).Weassumethecorrelationmatrix𝜮takestheform 1 " 𝜮= 𝑟 𝑟" 𝑟" 1 𝑟" ⋮ 𝑟" 𝑟" 1 ⋯ ,(S79) ⋱ meaningthatthatalltraitscontributeequallytothefitnessandeverypairoftraits hasthesamecorrelationcoefficient𝑟 " .When𝑟 " = 0thisbecomesanisotropic 32 model.When𝑟 " = 1,effectsizesarealwaysidenticalforeverytrait;thus,thiscaseis equivalenttohavingonlyonetraitwithselectionthatisincreased𝑛-fold. Intermediatecasescanbeapproximatedbyfindinganeffectivenumberoftraits 𝑛* < 1/𝛾0 = 𝑛,suchthatanisotropicmodelwith𝑛* and𝑤*" = 𝛾0 𝑛* 𝑤 " = 𝑤 " 𝑛* /𝑛 qualitativelydescribesthedistributionofeffectsizes.Numericalresultsofthis modelareshowninFig.S7. (a) 0.5 0.4 ne=n=50 0.3 Density Density 0.4 (b)0.5 Mediumcorrelations r2=0.5,!2≈0.5 0.2 0.1 0.2 -2 0 Effectsize 2 4 × a2 n 0 -4 ne=1 1 0.5 0.1 0 -4 Highcorrelations r2=0.99,!2≈0.99 1.5 ne=5 0.3 (c) 2 Density Lowcorrelations r2=0.2,!2≈0.25 -2 Anisotropicmodel 0 2 Effectsize 4× a2 n 0 -4 -2 0 Effectsize 2 4× a2 n Effectiveisotropicmodel FigureS7.Effectsofcorrelationsamongtraitsonthedistributionofeffectsizes. NumericalresultsforourmodelwiththecorrelationmatrixdefinedinEq.S79and 𝑛 = 50traitsareshowninblueandthecorrespondingisotropicmodelinblack dashes.(a)Whencorrelationsarelow,theisotropicmodelapproximatesthe distributionofeffectsizeswell.(b)Withlargecorrelations,weneedtousean effectivenumberoftraits,inthisexample𝑛* = 5,andrescaleselection,inthiscase to𝑤*" = 𝑤 " 𝑛* /𝑛 = 𝑤 " /10,inordertoapproximatethedistributionofeffectsizes. (c)Whenthecorrelationsapproach1,thedistributionofeffectsizesbecomes singularandapproachesthedistributionforanisotropicmodelwith𝑛* = 1and 𝑤*" = 𝑤 " /𝑛 = 𝑤 " /50. 6. ThepowertodetectlociinGWAS Inthissection,wesummarizetheresultsthatwerelyoninconnectingour theoreticalresultswiththeobservationsinGWAS(seeDiscussioninmaintext). TheseresultsprovideafirstapproximationtothepowertodetectlociinGWASin re-sequencingandgenotypingstudies.Theyneglectsomepotentialcomplications, whichliebeyondthescopeofthisstudy(e.g.,syntheticassociations(22,23)). 33 6.1. Re-sequencingstudies First,weconsiderhowthepowertodetectalocusinaGWASdependsonits contributiontogeneticvariance.Tothisend,wefollowShamandPurcell(22)in assumingasimplifiedmodelforaGWASinwhichlociaredetectedusingalinear regressionofthephenotypeagainstthegenotypeatindividualloci,andthe dependenceofphenotypeongenotypefollowsanadditivemodel.Theslopeofthe regression(theregressioncoefficient),whichisalsoourestimateoftheeffectsize, 𝑎0 isthenapproximatelynormallydistributedas 𝑎0 ~N 𝑎0 , 2ä , å€ 08€ /" (S80) where𝑎0 isthetrueeffectsizeandxistheminorallelefrequencyatthelocus (which,duetothelargestudysizes,weassumeisestimatedwithouterror),𝑉æ isthe totalphenotypicvariance,and𝑚thestudysize(whichinrealitymaybeaneffective sizereflectingstudydesign,e.g.,whenthesamplewassplitintodiscoveryand validationpanels)(22). Underthenullhypothesis,theeffectsizeis0,meaningthat 𝑎0 Žèéé ~N 0, 2ä å€ 08€ /" (S81) andthereforetheestimatedcontributiontovariancehasachi-squareddistribution œêëìì 2ä /å = T 5 S € 5 T êëìì 08€ 2ä /å ~𝜒0" . (S82) Thepowertoidentifyalocusassignificantwithp-value𝑝∗ istheprobabilitythatthe estimatedcontributionofthelocustovariance,𝑣,islargeenoughthat Pr 𝑣Žèéé > 𝑣 < 𝑝∗ . (S83) Thisconditioncanbetranslatedintoathresholdcontributiontovariance𝑣 ∗ for whichlociwith𝑣 > 𝑣 ∗ areconsideredsignificant,i.e.Pr 𝑣Žèéé > 𝑣 ∗ = 𝑝∗ ,with𝑣 ∗ givenby 34 œ∗ 2ä /å = 2 erf 80 1 − 𝑝∗ " , (S84) anderfdenotingtheerrorfunction.Thepowertoidentifyalocusassignificant wouldthenbePr 𝑣 > 𝑣 ∗ ,andthedistribution𝑣givenby œ 2ä /å T 5 S € =5 08€ 2ä /å ~χ0" œ 2ä /å , (S85) with𝜒0" denotinganon-centralchi-squareddistributionwithonedegreeoffreedom. Therefore,powerisgivenby œ H 𝑣, 𝑝∗ = Pr 𝑣 > 𝑣 ∗ = h? withh± 𝑦, 𝑝∗ = 0 " 1 ± erf 2ä /å , 𝑝∗ + h8 œ 2ä /å , 𝑝 ∗ , (S86) 𝑦/2 ∓ erf 80 1 − 𝑝∗ andthetwotermscorrespond totheestimatedandtrueeffectsizeshavingthesameoroppositesign. Theformofthepowerfunctioncarriesimportantimplications(Eq.S86andFig.S8). Notably,itshowsthat(inthisapproximation)powerdependsonlyonthe contributionofalocustovariance,andthiscontributionshouldbemeasured relativeto,orinunitsof,VP/m.Thisscalemakesintuitivesense,becausethetotal phenotypicvariancegeneratesthebackgroundnoisefordetectinganyindividual locus,andthebackgroundnoisedecreasesisinverseproportionaltothestudysize. Inparticular,thethresholdcontributiontovariancev*,asdefinedabove,is proportionalto𝑉æ 𝑚andisalsothecontributiontovarianceatwhichpoweris 50%,i.e., H 𝑣 ∗ , 𝑝∗ = 1 2. (S87) Thepowerfunctioncanthenbeapproximatedbyastepfunction(seeFig.S8) H 𝑣 ≈ Θ 𝑣 − 𝑣∗ = 1 𝑣 > 𝑣∗ . 0 𝑣 < 𝑣∗ (S88) 35 Thiswillbeagoodapproximationwhenthenumberoflocithatfallatintermediate range(e.g.,withpowerbetween0.1and0.9)isnegligiblecomparedtothenumber thatfallsoutsidethisrange. Power Power 1.0 0.8 Exact 0.6 0.4 Stepfunction approximation 0.2 0.0 1 Contributiontovariance Contribution to variance HvL (inunitsofVP/m) 3 10 30 100 300 1000 FigureS8.Thepowertodetectlociasafunctionoftheircontributiontogenetic variance(giveninunitsof𝑉æ /𝑚).Shownaretheexactpowerfunction(Eq.S86)and itsstepfunctionapproximation(Eq.S88)for𝑝 = 5 ⋅ 108• . Furtherinsightscomefromconsideringthispowerfunctioninconjunctionwithour theoreticalresults(SectionS3).Notably,ourresultssuggestthatthefirstlocitobe detected,thosethatcontributethemosttovariance,areweaklyandstrongly selected,andthattheircontributionstovarianceareonthescaleofvs.Wetherefore expectGWAStobegintoidentifyloci(andaccountforgeneticvariance)whenthe studysizeissuchthat𝑣 ∗ ∝ 𝑉æ 𝑚isontheorderof𝑣” ,i.e.,when𝑚~ 𝑉æ 𝑣” .We wouldfurtherexpecttherateofincreaseinidentifyingnewloci(andaccountingfor thevariance)tobesimilarfordifferenttraitswhenthevarianceismeasuredin unitsof𝑣” . 6.2. Genotyping MostcurrentGWASrelyongenotypinginsteadofre-sequencing,resultinginan additionallossofpower(24).Specifically,thesestudiesimputetheallelesatloci 36 thatarenotincludedinthegenotypingplatform(25),andtheimputationbecomes imprecisewhentheimputedallelesarerare(Fig.S9).Ifcausallociwithrarealleles areincludedinGWAS,thisimprecisionleadstoanunder-estimationoftheireffect size,resultinginalossofpower(24).ForlociwithMAFxandeffectsizea,the expectedestimateoftheeffectsizewouldbereducedbyafactorof𝑟(𝑥),where 𝑟 " (𝑥)isthemeancorrelationbetweentheimputedandrealalleles(26),andthe distributionofestimatescanbeapproximatedby 𝑎~N 𝑟𝑎, 2ä å€ 08€ /" . (S89) Employingthereasoningoftheprevioussubsection,wecanthereforeapproximate thepowertodetectalocusbyH 𝑟 " 𝑣, 𝑝∗ ,whereHisthepowerfunctiondefinedin Eq.S86. 1 0.9 r2 0.8 0.7 Cutoff frequency 0.6 0.5 0.2 Minor allele frequency H%L 0.5 1 2 5 10 20 50 FigureS9.TheprecisionofimputationdecreaseswithMAF.Specificallyweshowthe meancorrelationbetweenimputedandrealgenotypesasfunctionofminor allele frequency,forastudyusinganIllumina1MSNParrayandthe1000genomesphase IIIasanimputationpanel(basedonExtendedFig.9Ain(27)).Weapproximatethe effectonpowerbyexcludinglociwithMAF<1%andassumingthatlociwith greaterMAFsareimputedcorrectly. Inpractice,GWASoftenincludeonlylociwithMAFaboveathreshold,whichis chosentoensurepreciseimputation.Wethereforeapproximatetheeffectof 37 genotypingonpowerbyexcludinglocibelowathresholdMAFandassumethatloci thatexceedthisthresholdareimputedcorrectly. 7. Inference Inthissection,wedescribehowweusedourmodeltomakeinferencesbasedon GWASresultsforheightandbodymassindex(BMI).AswenoteintheDiscussion, theseinferencesaremeantasanillustrationanddonotincorporatetheeffectsof demographyandafewotherfactors(e.g.,genotypinganderrorsintheestimationof effectsizes(22,24)),whichliebeyondthescopeofthisstudy. 7.1. Thecompositelikelihood Ourinferencesarebasedonacomposite-likelihoodapproach.Webeginby describingthecomposite-likelihoodfunctionanditsmaximization,whentheloci detectedbyGWASarestronglyselectedandcanbedescribedbythehigh-pleiotropy limit.Inthiscase,wehaveshownthatthedistributionofvarianceamonglociis insensitivetothedistributionofselectioncoefficients,dependsonasingle parameter𝑣” ,andiswellapproximatedbytheprobabilitydensity 𝜌 𝑣 = " zôõ 8" œ/œÎ œ (S90) (SectionS3.2).FurtherapproximatingthepowerinGWASasastepfunction(cf. SectionS6),wefindthattheprobabilitydensityofsitesthatexceedathreshold𝑣 ∗ canbeapproximatedby f 𝑣 𝑣” , 𝑣 ∗ = whereI 𝑥 ≡ ·¡€ ¤ œ ¥¦¥∗ ¤ œ = zôõ(8" œ/œÎ ) "œö(" œ ∗ /œÎ ) , (S91) exp(−𝑡)/𝑡(cf.Eq.S42).Wethereforeapproximatethelog- composite-likelihoodof𝑣” giventhecontributionstovarianceoftheKlocidetected inaGWAS, 𝑣Ä ù Äø0 ,by LCL 𝑣” 𝑣Ä =− 2 𝑣” ù Äø0 ù ∗ Äø0 , 𝑣 = ù Äø0 log f 𝑣Ä 𝑣” 𝑣Ä − 𝐾log I 2 𝑣 ∗ 𝑣” − = ù Äø0 log 𝑣Ä . (S92) 38 Itfollowsthatthecomposite-likelihoodismaximizedwhen 𝑣” + log I 2 𝑣 ∗ 𝑣” 𝑣” = argminœÎ 2 𝑣 where 𝑣 ≡ 0 ù Äø0 ù , (S93) 𝑣Ä . Wealsoconsiderthemodelswithoutpleiotropyandinwhichthedegreeof pleiotropyisaparameter.Inthecasewithoutpleiotropy, 𝜌 𝑣 = " zôõ 8"œ/œÎ œ (S94) (cf.SectionS3.2).Byfollowingthesamesteps,wefindthatthecomposite-likelihood isthenmaximizedwhen 𝑣” = argminœÎ 2𝑣 𝑣” + log I 2 𝑣 ∗ 𝑣” where𝑣 ≡ 0 , (S95) ù Äø0 𝑣Ä .Whenthedegreeofpleiotropy𝑛isaparameterofthemodel, ù wefoundthat 𝜌O 𝑣 = " exp ST œ − "œ/œÎ ST5 /(S 5 /O) φO 𝑎0 𝑎 (S96) (cf.SectionS6).Again,followingthesamesteps,wefindthattheprobabilitydensity ofsitesthatexceedathreshold𝑣 ∗ is fŽ 𝑣 𝑣” = ¤4 œ ¤ ¥¦¥∗ 4 œ (S97) andthelog-composite-likelihoodis LCL 𝑣” , 𝑛 𝑣Ä ù ∗ Äø0 , 𝑣 = ù Äø0 log 𝜌O 𝑣Ä − 𝐾 log œ¡œ ∗ 𝜌O 𝑣 . (S98) Inthelattercase,weusednumericalmaximizationtoshowthatthecompositelikelihoodestimatesforheightandBMIconvergetothehighpleiotropylimit. Specifically,wemaximizedthecomposite-likelihoodspecifyinganintervalof [1,1000]forn,whereforbothtraitstheestimatesconvergedtotheupperlimitof 1000.Whilenumericaloptimizationdoesnotallowustospecifyaninfiniteinterval, thelikelihoodfunctionandmaximalvalueforn=1000areindistinguishablefrom thoseinthehigh-pleiotropylimit. 39 7.2. Determining𝒗∗ andremovingoutliers Ourlikelihoodmaximizationrequiresustospecifythevalueofthethreshold𝑣 ∗ .We choosethisthresholdbasedontheempiricaldistributionsofthecontributionsto varianceamonggenome-widesignificantassociations(Fig.S10AandB).Specifically, whenthecontributionstovarianceapproachthelowerboundaryfordiscovery,we observeadeclineinthedensityofloci.Thisislikelyduetoagradualreductionin powerandsuggeststhatourapproximationforpower(asastepfunction)breaks downforthesevaluesofvariance.Wethereforechoosethresholdsthatappeartobe abovethisdecline(𝑣 ∗ = 1.4 ⋅ 108s 𝑉æ forheightand𝑣 ∗ = 1.35 ⋅ 108s 𝑉æ forBMI;Fig. S10AandB),resultingintheremovalof53lociforheightand11forBMI.Wealso examinehowourestimatesof𝑣” dependonthechoiceof𝑣 ∗ ,andfindthattheyare muchmoresensitivetoreducingthethresholdthantoincreasingit;infact,the estimatesweobtainbyincreasingthethresholdarewithintheconfidenceintervals oftheestimatewiththechosenthresholds(Fig.S10CandD).Thisanalysisfurther supportsourchoicetoexcludethelociwiththelowestcontributiontovariance.For BMI,wealsodroppedthelocuswiththelargestcontributiontovariance(nearFTO), whichappearstobeanoutlier(Fig.S10B)andhasbeensuggestedtobeunder balancingselection(28). 40 10 (a) 21 20 19 18 17 1 5 1.4 1.6 1.8 2 0 10 20 30 Contribution to variance (v) 4 3 2 2 1 Estimatedvs+95%CI 1 1.2 1.4 1.6 1.8 2 Cutoff contribution to variance ( ) Thresholdcontributiontovariance 3 (b) 2.5 2 0.8 1 1.2 1.4 1.6 1.8 Contribution to variance (v) nearFTO 0 0 10 20 30 Contribution to variance (v) (d) di Estim ffe re ated nt th vs re fo sh r old s 3 1 0 3 4 (c) E di stim ffe re ated nt th vs re for sh old s Estimated 0 1.2 Contribution to variance (v) % variance from loci > v 15 BMI 22 % variance from loci > v % variance from loci > v 20 Estimated Estimatedvs (inunitsof10-4×VP) % variance from loci > v %variancefromloci>v Height 2 Estimatedvs+95%CI 1 0 0.8 1 1.2 1.4 1.6 1.8 Cutoff contribution to variance ( ) Thresholdcontributiontovariance (inunitsof10-4×VP) (inunitsof10-4×VP) FigureS10.Determining𝑣 ∗ andremovingoutliers.Thetotalvariancefrom significantassociationsasafunctionofthethresholdcontributiontovariance,for height(a)andBMI(b).Theinsetsshowacloseupofthelowerrangeof contributionstovariance,highlightingthedeclineinthedensityofdiscoveredloci. Ourchosenthresholdsareshownbythedashedverticalline(inallgraphs).Our estimatesof𝑣” asafunctionofthechosenthreshold,forheight(c)andBMI(d). Whenweincreasethethreshold,theestimatesremainwithinthe95%CIofthe estimatewithourchosenthreshold. 7.3. Estimatingtargetsizeandexplainedvariance Weestimatethetargetsizeandthevarianceexplained,bothforvaryingstudysize andtotal,basedonourestimatesof𝑣” .Thepopulation-scaledmutationalinputper generationfromstronglyselectedloci,2𝑁𝑈” ,isestimatedby 2𝑁𝑈” = 𝐾/ œ¡œ ∗ 𝜌 𝑣 𝑣” , (S99) (cf.Eq.S41)andthecorrespondingestimateforthetargetsizeis 41 𝐿” = 2𝑁𝑈” 2𝑁𝑢, (S100) wheretheestimateforthepopulationscaledmutationratepersitepergeneration 2𝑁𝑢 ≈ 0.5isbasedonheterozygosity(27).Theexplainedvariancecorrespondingto GWASwithstudysize𝑚isestimatedby 𝜎”" 𝑚 = 2𝑁𝑈” œ¡œ ∗ å 𝑣𝜌 𝑣 𝑣” = 𝐾 œ¡œ ∗ å 𝑣𝜌 𝑣 𝑣” œ¡œ ∗ å 𝜌 𝑣 𝑣” ,(S101) whereweapproximatethethresholdcorrespondingtostudysize𝑚basedonthe studysize,𝑚n ,andthreshold,𝑣 ∗ ,incurrentGWAS,by 𝑣 ∗ 𝑚 = 𝑣 ∗ ⋅ 𝑚n 𝑚 . (S102) Toestimatethetotalvariancearisingfromstronglyselectedloci,wesimplysetthe thresholdinEq.(S101)to0. 7.4. Estimatingconfidenceintervals Weuseacombinationofnon-parametricandparametricbootstraptoestimate confidenceintervals(CI).Weusenon-parametricbootstraptoestimatetheCIfor themodelparameters𝑣” and𝐿” :specifically,weperform10,000iterations,inwhich weresamplethelociidentifiedbyGWASandrepeattheestimationof𝑣” .Weuse parametricbootstraptoestimatetheconfidenceintervalsinFig.5A,describingthe explainedvarianceasafunctionofthresholdbasedonourmodel.Tothatend,we relyonourmodelwiththepointestimatesfor𝑣” and𝐿y ,togenerate10,000samples fromGWASwiththespecifiedthreshold,andthencalculatethetotalvariance explainedbythesesamples.Weuseacombinationofnon-parametricand parametricbootstraptocalculatetheCIformodelpredictions,includingthetotal variance,𝜎y" ,andtheexplainedvariance,𝜎y" 𝑚 ,andnumberoflociasafunctionof studysize(Figs.5BandS16).Inthiscase,wegenerate10,000samplesby:i) estimating𝑣” basedonaresampledsetofGWASloci(similartothenon-parametric procedure),andii)usingtheestimated𝑣” andcorresponding𝐿y togenerateaGWAS samplebasedonourmodel(similartotheparametricprocedure);wethencalculate theappropriatesummarybasedonthelattersamples.Thistwostageprocedureis intendedtocapturetheuncertaintygeneratedbyboththeerrorsinestimatingour 42 basicmodelparametersandthenoisegeneratedbythestochasticprocesses underlyingthenumberandvarianceatsegregatinglocithatareyettobe discovered.TheresultingestimatesandCIaresummarizedinTableS2. Contributionto varianceperstrongly selectedlocus(inunits ofthetotalphenotypic variance) Expectedstudysize requiredtodescribe 50%ofthestrongly selectedvariance 𝑣” /𝑉æ 𝑚%n% ≈ 43𝑉æ /𝑣” Height BMI 1.8[1.5,2.3]´10-4 1[0.6,1.7]´10-4 230[190,290]K 420[250,770]K Numberofnewly arisingstrongly selectedmutationsper generationinthe population 2𝑁𝑈” 2300[1800, 3000] 600[300,1900] Mutationaltargetsize forstronglyselected mutations 𝐿” 4.6[3.6,6.0]Mbp 1.3[0.6,3.8]Mbp %contributionto phenotypicvariance fromstronglyselected loci 𝜎”" /𝑉æ 42[39,45]% 7[5,10]% Proportionof heritabilityfrom stronglyselectedloci 𝜎”" /𝑉& = 𝜎”" /ℎ" 𝑉æ 53[49,57]% 13[10,21]% TableS2.ParameterestimatesandtheirconfidenceintervalsforheightandBMI basedonGWASresults;theheritabilitywasassumedtobe0.8forheightand0.5for BMI(8,10). 43 7.5. Testinggoodnessoffit WeusetheKolmogorov-SmirnovD-statistic(29,30)totestthegoodnessoffitof ourmodelswithoutpleiotropyandinthehighpleiotropylimit.Sinceourparameter estimatesareinferredfromthedatathatwearetestingagainst,wecannotrelyon thestandardtablesforthep-values.Wethereforegeneratenulldistributionsforthe D-statisticusingparametricbootstrapbasedonourmodels.Specifically:i)we generate10% samplesofKsignificantlocibasedonthemodelunderconsideration, withthecorrespondingestimateof𝑣” ,ii)weinfer𝑣” basedoneachsample,andiii) calculatetheKolmogorov-SmirnovD-statisticbetweenthedistributionofvariance fortheKlociineachsampleandthecorrespondingtheoreticaldistributionbased onthe𝑣” inferredfromthatsample.TheresultingdistributionofD-statistics correspondstoournullhypothesis,i.e.,thatthelocidetectedinGWASarose accordingtoourmodel,andspecificallytothewaywecalculatetheD-statistic betweentheobserveddistributionofvariancefortheKdetectedlociandthe theoreticaldistributionthatweinferredbasedontheseobservations.Wethen calculatetheD-statistic,𝐷6 ,basedontherealdataandcorrespondingtheoretical distribution,andestimatetheone-sidedp-valueby 𝑝ù8y = #*+Ièé,Ùz--,Ù,*zÙ*.+Ù/¹¡¹9 #*+Ièé,Ùz--,Ù,*zÙ* . (S103) Notethatunlikethecommoncase,heretheinabilitytorejectthenullindicatesthat thedataisconsistentwithourmodel. 44 Withoutpleiotropy No pleiotropy Contributiontovariance (inunitsof10-4VP) Height 15 p̂K−S < 10 −5 p̂K−S =0.14 10 10 5 5 Contributiontovariance (inunitsof10-4VP) 0 0 15 BMI Highpleiotropy High pleiotropy 15 5 10 No pleiotropy 15 0 0 15 10 10 5 5 0 5 15 p̂K−S =0.54 p̂K−S =0.05 0 5 10 High pleiotropy 10 Contributiontovariance (inunitsof10-4×VP) 15 0 0 5 10 Contributiontovariance (inunitsof10-4×VP) 15 FigureS11.Q-Qplotscomparingthedistributionofvarianceamongsignificantloci takenfromtheGWASofheight(10)andBMI(8)withthetheoreticaldistributions inferredfromthesedata,basedonthemodelswithoutpleiotropy(a)andinthehigh pleiotropylimit(b).Theseplotsshowthatthemodelassuminghighpleiotropy cannotberejectedforeithertraitandfitsthesedatamuchbetterthanthemodel assumingno-pleiotropy. 45 8. Glossaryofnotation 𝑟 W(𝑟) Scaleofselection 𝑛 Numberoftraits(dimensions) 𝑎 n-dimensionaleffectsize 𝑎0 Effectsizeonfocaltrait 𝑈 Haploidmutationratepergeneration 𝜎 " Phenotypicvarianceinatrait 𝐾 Numberofsegregatingsites η(𝑎0 |𝑆) Distributionoffocaltraiteffectsizesconditionalonoveralleffectsize Distributionoffocaltraiteffectsizesconditionalonthescaledselection coefficient 𝑁a" 2w " Scaledselectioncoefficient 𝑞 Derivedallelefrequency 𝑝 Ancestralallelefrequency,𝑝 = 1 − 𝑞 𝑆= τ(𝑞|𝑆) Fitness 𝑤 φŽ (𝑎0 |𝑎) Phenotype Thesojourntimeforamutationwithscaledselectioncoefficient𝑆 0 𝑣 Contributiontovariancefromasite 𝑣 = 𝑎0" 𝑞 1 − 𝑞 𝑣” Expectedcontributionofastronglyselectedsitetovariance(𝑣” = " "A 5 Ov ). E(𝑉|𝑆) Expectedcontributiontogeneticvariancefromsitesunderselection𝑆 E(𝐾|𝑆) Expectedcontributiontothenumberofsegregatingsitesfromsites underselection𝑆 𝜌(𝑣) Densityofsegregatingsitescontributingvariance𝑣 G(𝑣 ∗ ) Fractionofvarianceexplainedbysitescontributingmorethan𝑣 ∗ tothe variance 𝑚 GWASstudysize H PowertoidentifyalocusinGWAS 46 9. Bibliography 1. LandeR(1975)Themaintenanceofgeneticvariabilitybymutationinapolygenic characterwithlinkedloci.GenetRes26(3):221-235. PoonA&OttoSP(2000)Compensatingforourloadofmutations:Freezingthe meltdownofsmallpopulations.Evolution54(5):1467-1479. EwensWJ(2004)Mathematicalpopulationgenetics1:I.Theoreticalintroduction (SpringerScience&BusinessMedia). KeightleyPD&HillWG(1988)Quantitativegeneticvariabilitymaintainedby mutation-stabilizingselectionbalanceinfinitepopulations.GenetRes52(01):33-43. KongA,etal.(2012)Rateofdenovomutationsandtheimportanceoffather'sageto diseaserisk.Nature488(7412):471-475. WardLD&KellisM(2012)Evidenceofabundantpurifyingselectioninhumansfor recentlyacquiredregulatoryfunctions.Science337(6102):1675-1678. PerryJRB,etal.(2014)Parent-of-origin-specificallelicassociationsamong106 genomiclociforageatmenarche.Nature514(7520):92-97. LockeAE,etal.(2015)Geneticstudiesofbodymassindexyieldnewinsightsfor obesitybiology.Nature518:197-206. ScottRA,etal.(2012)Large-scaleassociationanalysesidentifynewlociinfluencing glycemictraitsandprovideinsightintotheunderlyingbiologicalpathways.Nat Genet44(9):991-1005. WoodAR,etal.(2014)Definingtheroleofcommonvariationinthegenomicand biologicalarchitectureofadulthumanheight.NatGenet46(11):1173–1186. SchorkNJ,MurraySS,FrazerKA,&TopolEJ(2009)Commonvs.rareallele hypothesesforcomplexdiseases.CurrOpinGenetDev19(3):212-219. HuntKA,etal.(2013)Negligibleimpactofrareautoimmune-locuscoding-region variantsonmissingheritability.Nature498(7453):232-235. DoR,etal.(2015)Exomesequencingidentifiesrareldlrandapoa5alleles conferringriskformyocardialinfarction.Nature518(7537):102-106. PurcellSM,etal.(2014)Apolygenicburdenofraredisruptivemutationsin schizophrenia.Nature506(7487):185-190. ChenJA,PeñagarikanoO,BelgardTG,SwarupV,&GeschwindDH(2015)The emergingpictureofautismspectrumdisorder:Geneticsandpathology.AnnuRev PatholMechDis10:111-144. UhlenbeckGE&OrnsteinLS(1930)Onthetheoryofthebrownianmotion.PhysRev 36(5):823. LandeR(1976)Naturalselectionandrandomgeneticdriftinphenotypicevolution. Evolution30(2):314-334. CharlesworthB(2013)Stabilizingselection,purifyingselection,andmutationalbias infinitepopulations.Genetics194(4):955-971. LefflerEM,etal.(2013)Multipleinstancesofancientbalancingselectionshared betweenhumansandchimpanzees.Science339(6127):1578-1582. DennyJC,etal.(2013)Systematiccomparisonofphenome-wideassociationstudyof electronicmedicalrecorddataandgenome-wideassociationstudydata.NatBiotech 31(12):1102-1111. MartinG&LenormandT(2006)Ageneralmultivariateextensionoffisher's geometricalmodelandthedistributionofmutationfitnesseffectsacrossspecies. Evolution60(5):893-907. ShamPC&PurcellSM(2014)Statisticalpowerandsignificancetestinginlargescalegeneticstudies.NatRevGenet15(5):335-346. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 47 23. 24. 25. 26. 27. 28. 29. 30. DicksonSP,WangK,KrantzI,HakonarsonH,&GoldsteinDB(2010)Rarevariants createsyntheticgenome-wideassociations.PLoSBiol8(1):e1000294. EvangelouE&IoannidisJPA(2013)Meta-analysismethodsforgenome-wide associationstudiesandbeyond.NatRevGenet14(6):379-389. VisscherPM,BrownMA,McCarthyMI,&YangJ(2012)Fiveyearsofgwasdiscovery. AmJHumGenet90:7-24. PritchardJK&PrzeworskiM(2001)Linkagedisequilibriuminhumans:Modelsand data.AmJHumGenet69(1):1-14. The1000GenomesProjectConsortium(2015)Aglobalreferenceforhumangenetic variation.Nature526(7571):68-74. LiuX,etal.(2015)Signaturesofnaturalselectionatthefto(fatmassandobesity associated)locusinhumanpopulations.PLoSOne10(2):e0117093. GenestC&RemillardB(2008)Validityoftheparametricbootstrapforgoodness-offittestinginsemiparametricmodels.AnnInstHPoincareProbabStatist44(6):10961127. StuteW,ManteigaWG,&QuindimilMP(1993)Bootstrapbasedgoodness-of-fittests.Metrika40(1):243-256. 48 10. Additionalfigures Density of variance 8 4 2 0 0.0 Strongselection(S=100) Weakselection(S=10) Effectivelyneutral(S=0.5) 6 0.1 0.2 0.3 0.4 Minor allele frequency 0.5 FigureS12.Thedistributionofvarianceamongsitesasafunctionofminorallele frequency.Thisdensityisindependentoftheextentofpleiotropybecause,the expectedvariancecorrespondingtoagivenselectioncoefficientandfrequency dependsonlyontheexpectedsquaredeffectsize,whichisnotaffectedbythe degreeofpleiotropy. 49 400 withoutpleiotropy (a) 300 Number of loci > v Number of loci > v 500 Strongselection(S=100) Weakselection(S=10) Effectivelyneutral(S=0.5) 200 100 0 0 Contribution to genetic variation HvL 1 2 3 4 5 X vs 500 400 withpleiotropy (b) 300 Strongselection(S=100) Weakselection(S=10) Effectivelyneutral(S=0.5) 200 100 0 0 Contribution to genetic variation HvL 1 2 3 4 5 X vs FigureS13.ThenumberoflociperMbcontributingmorethan𝑣tothevariance,asa functionof𝑣.Calculatedforaone-dimensionalmodelwithoutpleiotropy(a)andfor thehighpleiotropylimit(b).ThisfigurecorrespondstoFig.2butshowsthenumber oflocidiscoveredperMbinsteadoftheproportionoftheheritablevariance explained.Calculatedforaconstantpopulationsizeof20,000,withamutationrate of1.2×108• (5). 50 Effectivelyneutral(S=0.5) 60 40 20 0 Study size Hin units of VP êvs L 10 20 50 100 200 500 1000 % heritability explained 80 (a)Strongselection(S=100) Intermediateselection(S=10) 30 (b) Re-sequencing 20 g pin oty Gen % heritability explained 100 10 0 1 Scaled selection coefficient HSL 3 10 30 100 300 FigureS14.TheoreticalpredictionsforGWASofcontinuoustraits,inthecase withoutpleiotropy(𝑛 = 1).ThisfigureisequivalenttoFig.3fromthemaintext, whichdescribesthecasewithpleiotropy(𝑛 ≫ 1).(a)Proportionofheritability explainedasafunctionofstudysize.(b)Proportionofheritabilityexplainedasa functionofscaledselectioncoefficientforstudiesbasedonre-sequencingand genotyping.Calculatedforastudysizeof~43𝑉æ /𝑣” ,correspondingtohaving25% ofstronglyselected(S>>1)varianceexplainedwithre-sequencing.Note,thatthis studysizeis~2.5timeslargerthantheonerequiredtoexplain25%ofstrongly selectedvariancewithpleiotropy(seeFig.3). 51 Number of loci 200 Strongselection(S=100) Weakselection(S=10) Effectivelyneutral(S=0.5) 150 100 50 0 1 Study size Hin units of VPêvs L 2 5 10 20 50 100 200 FigureS15.PredictionofthenumberperMboflociassociatedwithacontinuous traitinGWASasafunctionofstudysize,inthelimitofhighpleiotropy.Calculated foraconstantpopulationsizeof20,000,withamutationrateof1.2×108• (5). 52 Number of variants 104 Height 103 10 BMI 2 Data 101 10 Model +95%CI 0 0 200 400 600 800 Study size (in thousands) 1000 FigureS16.ModelpredictionsforthenumberofdiscoveredvariantsinfutureGWAS forheightandBMI.Thecorrespondingpredictionsforthe%heritabilityexplained appearinFig.4. 53
© Copyright 2026 Paperzz